Commit 495c8d35035e ("PM: hibernate: Add pm_hibernation_mode_is_suspend()")
that introduced pm_hibernation_mode_is_suspend() did not define it in
the case when CONFIG_HIBERNATION is unset, but CONFIG_SUSPEND is set.
Subsequent commit 0a6e9e098fcc ("drm/amd: Fix hybrid sleep") made the
amdgpu driver use that function which led to kernel build breakage in
the case mentioned above [1].
Address this by using appropriate #ifdeffery around the definition of
pm_hibernation_mode_is_suspend().
Fixes: 0a6e9e098fcc ("drm/amd: Fix hybrid sleep") Reported-by: KernelCI bot <bot@kernelci.org> Closes: https://groups.io/g/kernelci-results/topic/regression_pm_testing/115439919 [1] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An array of VM binds can potentially evict other buffer objects (BOs)
within the same VM under certain conditions, which may lead to NULL
pointer dereferences later in the bind pipeline. To prevent this, clear
the allow_res_evict flag in the xe_bo_validate call.
v2:
- Invert polarity of no_res_evict (Thomas)
- Add comment in code explaining issue (Thomas)
Cc: stable@vger.kernel.org Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6268 Fixes: 774b5fa509a9 ("drm/xe: Avoid evicting object of the same vm in none fault mode") Fixes: 77f2ef3f16f5 ("drm/xe: Lock all gpuva ops during VM bind IOCTL") Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20251009110618.3481870-1-matthew.brost@intel.com
(cherry picked from commit 8b9ba8d6d95fe75fed6b0480bb03da4b321bea08) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
[ removed exec parameter from xe_bo_validate() calls ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There may be cases in which the BAR0 also needs to move to accommodate
the bigger BAR2. However if it's not released, the BAR2 resize fails.
During the vram probe it can't be released as it's already in use by
xe_mmio for early register access.
Add a new function in xe_vram and let xe_pci call it directly before
even early device probe. This allows the BAR2 to resize in cases BAR0
also needs to move, assuming there aren't other reasons to hold that
move:
For BMG there are additional fix needed in the PCI side, but this
helps getting it to a working resize.
All the rebar logic is more pci-specific than xe-specific and can be
done very early in the probe sequence. In future it would be good to
move it out of xe_vram.c, but this refactor is left for later.
Currently in the drivers we have defined VRAM regions per device and per
tile. Initialization of these regions is done in two completely different
ways. To simplify the logic of the code and make it easier to add new
regions in the future, let's unify the way we initialize VRAM regions.
v2:
- fix doc comments in struct xe_vram_region
- remove unnecessary includes (Jani)
v3:
- move code from xe_vram_init_regions_managers to xe_tile_init_noalloc
(Matthew)
- replace ioremap_wc to devm_ioremap_wc for mapping VRAM BAR
(Matthew)
- Replace the tile id parameter with vram region in the xe_pf_begin
function.
v4:
- remove tile back pointer from struct xe_vram_region
- add new back pointers: xe and migarte to xe_vram_region
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> # rev3 Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-6-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Stable-dep-of: d30203739be7 ("drm/xe: Move rebar to be done earlier") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Let's move the xe_vram_region structure to a new header dedicated to VRAM
to improve modularity and avoid unnecessary dependencies when only
VRAM-related structures are needed.
v2: Fix build if CONFIG_DRM_XE_DEVMEM_MIRROR is enabled
v3: Fix build if CONFIG_DRM_XE_DISPLAY is enabled
v4: Move helper to get tile dpagemap to xe_svm.c
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Suggested-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> # rev3 Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-4-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Stable-dep-of: d30203739be7 ("drm/xe: Move rebar to be done earlier") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In future platforms, we will need to represent the device and tile
VRAM regions in a more dynamic way, so let's abandon the static
allocation of these structures and start use a dynamic allocation.
v2:
- Add a helpers for accessing fields of the xe_vram_region structure
v3:
- Add missing EXPORT_SYMBOL_IF_KUNIT for
xe_vram_region_actual_physical_size
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-3-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Stable-dep-of: d30203739be7 ("drm/xe: Move rebar to be done earlier") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Let's replace the manual call to ioremap_wc function with devm_ioremap_wc
function, ensuring that VRAM mappings are automatically released when
the driver is detached.
Since devm_ioremap_wc registers the mapping with the device's managed
resources, the explicit iounmap call in vram_fini is no longer needed,
so let's remove it.
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-2-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Stable-dep-of: d30203739be7 ("drm/xe: Move rebar to be done earlier") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Do read-modify-write so that we re-use the characterized reset value as
specified in TRM [1] to program calibration wait time which defines number
of cycles to wait for after startup state machine is in bandgap enable
state.
This fixes PLL lock timeout error faced while using RPi DSI Panel on TI's
AM62L and J721E SoC since earlier calibration wait time was getting
overwritten to zero value thus failing the PLL to lockup and causing
timeout.
match_region_by_range() is not using the helper function that also takes
extended linear cache size into account when comparing regions. This
causes a x2 region to show up as 2 partial incomplete regions rather
than a single CXL region with extended linear cache support. Replace
the open coded compare logic with the proper helper function for
comparison. User visible impact is that when 'cxl list' is issued,
no activa CXL region(s) are shown. There may be multiple idle regions
present. No actual active CXL region is present in the kernel.
[dj: Fix stable address]
Fixes: 0ec9849b6333 ("acpi/hmat / cxl: Add extended linear cache support for CXL") Cc: stable@vger.kernel.org Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
[ constify struct range ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Users can create as many monitoring groups as the number of RMIDs supported
by the hardware. However, on AMD systems, only a limited number of RMIDs
are guaranteed to be actively tracked by the hardware. RMIDs that exceed
this limit are placed in an "Unavailable" state.
When a bandwidth counter is read for such an RMID, the hardware sets
MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked
again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable
remains set on first read after tracking re-starts and is clear on all
subsequent reads as long as the RMID is tracked.
resctrl miscounts the bandwidth events after an RMID transitions from the
"Unavailable" state back to being tracked. This happens because when the
hardware starts counting again after resetting the counter to zero, resctrl
in turn compares the new count against the counter value stored from the
previous time the RMID was tracked.
This results in resctrl computing an event value that is either undercounting
(when new counter is more than stored counter) or a mistaken overflow (when
new counter is less than stored counter).
Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to
zero whenever the RMID is in the "Unavailable" state to ensure accurate
counting after the RMID resets to zero when it starts to be tracked again.
Example scenario that results in mistaken overflow
==================================================
1. The resctrl filesystem is mounted, and a task is assigned to a
monitoring group.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
21323 <- Total bytes on domain 0
"Unavailable" <- Total bytes on domain 1
Task is running on domain 0. Counter on domain 1 is "Unavailable".
2. The task runs on domain 0 for a while and then moves to domain 1. The
counter starts incrementing on domain 1.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 7345357 <- Total bytes on domain 0
4545 <- Total bytes on domain 1
3. At some point, the RMID in domain 0 transitions to the "Unavailable"
state because the task is no longer executing in that domain.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
"Unavailable" <- Total bytes on domain 0
434341 <- Total bytes on domain 1
4. Since the task continues to migrate between domains, it may eventually
return to domain 0.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 17592178699059 <- Overflow on domain 0 3232332 <- Total bytes on domain 1
In this case, the RMID on domain 0 transitions from "Unavailable" state to
active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when
the counter is read and begins tracking the RMID counting from 0.
Subsequent reads succeed but return a value smaller than the previously
saved MSR value (7345357). Consequently, the resctrl's overflow logic is
triggered, it compares the previous value (7345357) with the new, smaller
value and incorrectly interprets this as a counter overflow, adding a large
delta.
In reality, this is a false positive: the counter did not overflow but was
simply reset when the RMID transitioned from "Unavailable" back to active
state.
Here is the text from APM [1] available from [2].
"In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the
first QM_CTR read when it begins tracking an RMID that it was not
previously tracking. The U bit will be zero for all subsequent reads from
that RMID while it is still tracked by the hardware. Therefore, a QM_CTR
read with the U bit set when that RMID is in use by a processor can be
considered 0 when calculating the difference with a subsequent read."
resctrl_arch_rmid_read() adjusts the value obtained from MSR_IA32_QM_CTR to
account for the overflow for MBM events and apply counter scaling for all the
events. This logic is common to both reading an RMID and reading a hardware
counter directly.
Refactor the hardware value adjustment logic into get_corrected_val() to
prepare for support of reading a hardware counter.
PLL lockup and O_CMN_READY assertion can only happen after common state
machine gets enabled by programming DPHY_CMN_SSM register, but driver was
polling them before the common state machine was enabled which is
incorrect. This is as per the DPHY initialization sequence as mentioned in
J721E TRM [1] at section "12.7.2.4.1.2.1 Start-up Sequence Timing Diagram".
It shows O_CMN_READY polling at the end after common configuration pin
setup where the common configuration pin setup step enables state machine
as referenced in "Table 12-1533. Common Configuration-Related Setup
mentions state machine"
To fix this :
- Add new function callbacks for polling on PLL lock and O_CMN_READY
assertion.
- As state machine and clocks get enabled in power_on callback only, move
the clock related programming part from configure callback to power_on
callback and poll for the PLL lockup and O_CMN_READY assertion after state
machine gets enabled.
- The configure callback only saves the PLL configuration received from the
client driver which will be applied later on in power_on callback.
- Add checks to ensure configure is called before power_on and state
machine is in disabled state before power_on callback is called.
- Disable state machine in power_off so that client driver can re-configure
the PLL by following up a power_off, configure, power_on sequence.
The DPHY driver does not return the actual hs_clk_rate, so the DSI
driver has no idea what clock was actually achieved. Set the realized
hs_clk_rate to the opts struct, so that the DSI driver gets it back.
The data type of loca_last_write_offset is newoffset4 and is switched
on a boolean value, no_newoffset, that indicates if a previous write
occurred or not. If no_newoffset is FALSE, an offset is not given.
This means that client does not try to update the file size. Thus,
server should not try to calculate new file size and check if it fits
into the segment range. See RFC 8881, section 12.5.4.2.
Sometimes the current incorrect logic may cause clients to hang when
trying to sync an inode. If layoutcommit fails, the client marks the
inode as dirty again.
Fixes: 9cf514ccfacb ("nfsd: implement pNFS operations") Cc: stable@vger.kernel.org Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When pNFS client in the block or scsi layout mode sends layoutcommit
to MDS, a variable length array of modified extents is supplied within
the request. This patch allows the server to accept such extent arrays
if they do not fit within single memory page.
The issue can be reproduced when writing to a 1GB file using FIO with
O_DIRECT, 4K block and large I/O depth without preallocation of the
file. In this case, the server returns NFSERR_BADXDR to the client.
Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: d68886bae76a ("NFSD: Fix last write offset handling in layoutcommit") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove dprintk in nfsd4_layoutcommit. These are not needed
in day to day usage, and the information is also available
in Wireshark when capturing NFS traffic.
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: d68886bae76a ("NFSD: Fix last write offset handling in layoutcommit") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Compilers may optimize the layout of C structures, so we should not rely
on sizeof struct and memcpy to encode and decode XDR structures. The byte
order of the fields should also be taken into account.
This patch adds the correct functions to handle the deviceid4 structure
and removes the pad field, which is currently not used by NFSD, from the
runtime state. The server's byte order is preserved because the deviceid4
blob on the wire is only used as a cookie by the client.
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Stable-dep-of: d68886bae76a ("NFSD: Fix last write offset handling in layoutcommit") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When mounting file systems with a log that was dirtied on i386 on
other architectures or vice versa, log recovery is unhappy:
[ 11.068052] XFS (vdb): Torn write (CRC failure) detected at log block 0x2. Truncating head block from 0xc.
This is because the CRCs generated by i386 and other architectures
always diff. The reason for that is that sizeof(struct xlog_rec_header)
returns different values for i386 vs the rest (324 vs 328), because the
struct is not sizeof(uint64_t) aligned, and i386 has odd struct size
alignment rules.
This issue goes back to commit 13cdc853c519 ("Add log versioning, and new
super block field for the log stripe") in the xfs-import tree, which
adds log v2 support and the h_size field that causes the unaligned size.
At that time it only mattered for the crude debug only log header
checksum, but with commit 0e446be44806 ("xfs: add CRC checks to the log")
it became a real issue for v5 file system, because now there is a proper
CRC, and regular builds actually expect it match.
Fix this by allowing checksums with and without the padding.
Fixes: 0e446be44806 ("xfs: add CRC checks to the log") Cc: <stable@vger.kernel.org> # v3.8 Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
old_crc is a very misleading name. Rename it to expected_crc as that
described the usage much better.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
Stable-dep-of: e747883c7d73 ("xfs: fix log CRC mismatches between i386 and other architectures") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Enable the workaround for Neoverse-V3AE, and document this.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
... in section A.6.1 ("MIDR_EL1, Main ID Register").
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We intend that EL0 exception handlers unmask all DAIF exceptions
before calling exit_to_user_mode().
When completing single-step of a suspended breakpoint, we do not call
local_daif_restore(DAIF_PROCCTX) before calling exit_to_user_mode(),
leaving all DAIF exceptions masked.
When pseudo-NMIs are not in use this is benign.
When pseudo-NMIs are in use, this is unsound. At this point interrupts
are masked by both DAIF.IF and PMR_EL1, and subsequent irq flag
manipulation may not work correctly. For example, a subsequent
local_irq_enable() within exit_to_user_mode_loop() will only unmask
interrupts via PMR_EL1 (leaving those masked via DAIF.IF), and
anything depending on interrupts being unmasked (e.g. delivery of
signals) will not work correctly.
This was detected by CONFIG_ARM64_DEBUG_PRIORITY_MASKING.
Move the call to `try_step_suspended_breakpoints()` outside of the check
so that interrupts can be unmasked even if we don't call the step handler.
Fixes: 0ac7584c08ce ("arm64: debug: split single stepping exception entry") Cc: <stable@vger.kernel.org> # 6.17 Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
[catalin.marinas@arm.com: added Mark's rewritten commit log and some whitespace] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[ada.coupriediaz@arm.com: Fix conflict for v6.17 stable] Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The issue takes place if the length field of struct hfsplus_unistr
is bigger than HFSPLUS_MAX_STRLEN. The patch simply checks
the length of comparing strings. And if the strings' length
is bigger than HFSPLUS_MAX_STRLEN, then it is corrected
to this value.
v2
The string length correction has been added for hfsplus_strcmp().
Reported-by: Jiaming Zhang <r772577952@gmail.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
cc: syzkaller@googlegroups.com Link: https://lore.kernel.org/r/20250919191243.1370388-1-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With TLS enabled, records that are encrypted and appended to TLS TX
list can fail to see a retry if the underlying TCP socket is busy, for
example, hitting an EAGAIN from tcp_sendmsg_locked(). This is not known
to the NVMe TCP driver, as the TLS layer successfully generated a record.
Typically, the TLS write_space() callback would ensure such records are
retried, but in the NVMe TCP Host driver, write_space() invokes
nvme_tcp_write_space(). This causes a partially sent record in the TLS TX
list to timeout after not being retried.
This patch fixes the above by calling queue->write_space(), which calls
into the TLS layer to retry any pending records.
test_parse_test_list_file writes some data to
/tmp/bpf_arg_parsing_test.XXXXXX and parse_test_list_file() will read
the data back. However, after writing data to that file, we forget to
call fsync() and it's causing testing failure in my laptop. This patch
helps fix it by adding the missing fsync() call.
Fixes: 64276f01dce8 ("selftests/bpf: Test_progs can read test lists from file") Signed-off-by: Xing Guo <higuoxing@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251016035330.3217145-1-higuoxing@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
This assert can trigger here with non pin_map users that select
LATE_RESTORE, since the vmap is allowed to be NULL given that
save/restore can now use the blitter instead. The check here doesn't
seem to have much value anymore given that we no longer move pinned
memory, so any existing vmap is left well alone, and doesn't need to be
recreated upon restore, so just drop the assert here.
Fixes: 86f69c26113c ("drm/xe: use backup object for pinned save/restore") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6213 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20251010152457.177884-2-matthew.auld@intel.com
(cherry picked from commit a10b4a69c7f8f596d2c5218fbe84430734fab3b2) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
devm_kasprintf() may return NULL on memory allocation failure,
but the debug message prints cpus->dai_name before checking it.
Move the dev_dbg() call after the NULL check to prevent potential
NULL pointer dereference.
Fixes: cb8ea62e64020 ("ASoC: amd/sdw_utils: add sof based soundwire generic machine driver") Signed-off-by: Li Qiang <liqiang01@kylinos.cn> Link: https://patch.msgid.link/20251015075530.146851-1-liqiang01@kylinos.cn Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
HID_DG_PEN devices should have a suffix of "Stylus", as pointed out by
commit c0ee1d571626 ("HID: hid-input: Add suffix also for HID_DG_PEN").
However, on multitouch devices, these suffixes may be overridden. Before
that commit, HID_DG_PEN devices would get the "Stylus" suffix, but after
that, multitouch would override them to have an "UNKNOWN" suffix. Just add
HID_DG_PEN to the list of non-overriden suffixes in multitouch.
Commit 581c4484769e ("HID: input: map digitizer battery usage") added
handling of battery events for digitizers (typically for batteries
presented in stylii). Digitizers typically report correct battery levels
only when stylus is actively touching the surface, and in other cases
they may report battery level of 0. To avoid confusing consumers of the
battery information the code was added to filer out reports with 0
battery levels.
However there exist other kinds of devices that may legitimately report
0 battery levels. Fix this by filtering out 0-level reports only for
digitizer usages, and continue reporting them for other kinds of devices
(Smart Batteries, etc).
Remove the acquisition and release of q->elevator_lock in the
blkg_conf_open_bdev_frozen() and blkg_conf_exit_frozen() functions. The
elevator lock is no longer needed in these code paths since commit 78c271344b6f ("block: move wbt_enable_default() out of queue freezing
from sched ->exit()") which introduces `disk->rqos_state_mutex` for
protecting wbt state change, and not necessary to abuse elevator_lock
for this purpose.
This change helps to solve the lockdep warning reported from Yu Kuai[1].
Pass blktests/throtl with lockdep enabled.
Links: https://lore.kernel.org/linux-block/e5e7ac3f-2063-473a-aafb-4d8d43e5576e@yukuai.org.cn/ [1] Fixes: commit 78c271344b6f ("block: move wbt_enable_default() out of queue freezing from sched ->exit()") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 7f2799c546db ("blk-mq: cleanup shared tags case in
blk_mq_update_nr_requests()") moves blk_mq_tag_update_sched_shared_tags()
before q->nr_requests is updated, however, it's still using the old
q->nr_requests to resize tag depth.
Fix this problem by passing in expected new tag depth.
Fixes: 7f2799c546db ("blk-mq: cleanup shared tags case in blk_mq_update_nr_requests()") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reported-by: Chris Mason <clm@meta.com> Link: https://lore.kernel.org/linux-block/20251014130507.4187235-2-clm@meta.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
In try_to_register_card(), the return value of usb_ifnum_to_if() is
passed directly to usb_interface_claimed() without a NULL check, which
will lead to a NULL pointer dereference when creating an invalid
USB audio device. Fix this by adding a check to ensure the interface
pointer is valid before passing it to usb_interface_claimed().
One way we can crash there is if set.cnt zero, which is checked for with
ASSERT_EQ() above, but we proceed after this regardless of the outcome.
Instead of crashing, we should bail out with test failure early.
Similarly, if parse_test_list_file() fails, we shouldn't be even looking
at set, so bail even earlier if ASSERT_OK() fails.
Fixes: 64276f01dce8 ("selftests/bpf: Test_progs can read test lists from file") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev> Link: https://lore.kernel.org/r/20251014202037.72922-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The sc_c field is currently not updated in the host response to the
controller challenge leading to failures while attempting secure
channel concatenation. Fix this by adding a new sc_c variable to the
dhchap queue context structure which is appropriately set during
negotiate and then used in the host response.
Fixes: e88a7595b57f ("nvme-tcp: request secure channel concatenation") Signed-off-by: Martin George <marting@netapp.com> Signed-off-by: Prashanth Adurthi <prashana@netapp.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Two threads of the same process can potential read and write parallelly to
head and tail pointers of the same DBC request queue. This could lead to a
race condition and corrupt the DBC request queue.
Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com>
[jhugo: Add fixes tag] Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251007061837.206132-1-youssef.abdulrahman@oss.qualcomm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, if find_and_map_user_pages() takes a DMA xfer request from the
user with a length field set to 0, or in a rare case, the host receives
QAIC_TRANS_DMA_XFER_CONT from the device where resources->xferred_dma_size
is equal to the requested transaction size, the function will return 0
before allocating an sgt or setting the fields of the dma_xfer struct.
In that case, encode_addr_size_pairs() will try to access the sgt which
will lead to a general protection fault.
Return an EINVAL in case the user provides a zero-sized ALP, or the device
requests continuation after all of the bytes have been transferred.
Fixes: 96d3c1cadedb ("accel/qaic: Clean up integer overflow checking in map_user_pages()") Signed-off-by: Youssef Samir <quic_yabdulra@quicinc.com> Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251007122320.339654-1-youssef.abdulrahman@oss.qualcomm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
As soon as we queue MHI buffers to receive the bootlog from the device,
we could be receiving data. Therefore all the resources needed to
process that data need to be setup prior to queuing the buffers.
We currently initialize some of the resources after queuing the buffers
which creates a race between the probe() and any data that comes back
from the device. If the uninitialized resources are accessed, we could
see page faults.
Fix the init ordering to close the race.
Fixes: 5f8df5c6def6 ("accel/qaic: Add bootlog debugfs") Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Youssef Samir <youssef.abdulrahman@oss.qualcomm.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Reviewed-by: Carl Vanderlip <carl.vanderlip@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251007115750.332169-1-youssef.abdulrahman@oss.qualcomm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix spelling of CIP_NO_HEADER to prevent a kernel-doc warning.
Warning: amdtp-stream.h:57 Enum value 'CIP_NO_HEADER' not described in enum 'cip_flags'
Warning: amdtp-stream.h:57 Excess enum value '%CIP_NO_HEADERS' description in 'cip_flags'
The check for some lost idle pelt time should be always done when
pick_next_task_fair() fails to pick a task and not only when we call it
from the fair fast-path.
The case happens when the last running task on rq is a RT or DL task. When
the latter goes to sleep and the /Sum of util_sum of the rq is at the max
value, we don't account the lost of idle time whereas we should.
Fixes: 67692435c411 ("sched: Rework pick_next_task() slow-path") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Git bisects to: commit 4ae8d9aa9f9d ("sched/deadline: Fix dl_server getting stuck")
This happens since:
- dl_server hrtimer gets enqueued close to cpu offline, when
kthread_park enqueues a fair task.
- CPU goes offline and drmgr removes it from cpu_present_mask.
- hrtimer fires and warning is hit.
Fix it by stopping the dl_server before CPU is marked dead.
The original implementation used level detection for the first interrupt
after device reset to avoid potential interrupt line noise and missed
interrupts during the initialization phase. However, this approach
introduced unintended side effects when tested with certain touch panels,
including:
- Delayed hardware interrupt response
- Multiple spurious interrupt triggers
Switching back to edge detection for the first interrupt resolves these
issues while maintaining reliable interrupt handling.
Extensive testing across multiple platforms with touch panels from
various vendors confirms this change introduces no regressions.
The vop2_plane_atomic_check() function incorrectly checks
drm_rect_width(dest) twice instead of verifying both width and height.
Fix the second condition to use drm_rect_height(dest) so that invalid
destination rectangles with height < 4 are correctly rejected.
The color parameter passed to drm_draw_fill24() was truncated to 16
bits, leading to an incorrect color drawn to the target iosys_map.
Fix this behavior, widening the parameter to 32 bits.
Fixes: 31fa2c1ca0b2 ("drm/panic: Move drawing functions to drm_draw") Signed-off-by: Francesco Valla <francesco@valla.it> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/r/20251003-drm_draw_fill24_fix-v1-1-8fb7c1c2a893@valla.it Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Stephen noted that it is possible to not have an smp_mb() between
the loaded_mm store and the tlb_gen load in switch_mm(), meaning the
ordering against flush_tlb_mm_range() goes out the window, and it
becomes possible for switch_mm() to not observe a recent tlb_gen
update and fail to flush the TLBs.
[ dhansen: merge conflict fixed by Ingo ]
Fixes: 209954cbc7d0 ("x86/mm/tlb: Update mm_cpumask lazily") Reported-by: Stephen Dolan <sdolan@janestreet.com> Closes: https://lore.kernel.org/all/CAHDw0oGd0B4=uuv8NGqbUQ_ZVmSheU2bN70e4QhFXWvuAZdt2w@mail.gmail.com/ Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add a check for cxlfs before dereferencing it and return -EOPNOTSUPP if
there is no cxlfs created due to no hardware support.
Fixes: eb5dfcb9e36d ("cxl: Add support to handle user feature commands for set feature") Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Where applicable, enable media sampler power gating. Also, add
it to the powergate_info debugfs.
v2: Remove the sampler powergate status since it is cleared quickly anyway.
v3: Use vcs mask (Rodrigo) and fix the version check for media
v4: Remove extra spaces
v5: Media samplers are independent of vcs mask,
use Media version 1255 (Matt Roper)
Fixes: 38e8c4184ea0 ("drm/xe: Enable Coarse Power Gating") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20251010011047.2047584-1-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 4cbc08649a54c3d533df9832342d52d409dfbbf0) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
CIK GPUs such as Hawaii appear to use PP_TABLE_V0 in which case
the shutdown temperature is hardcoded in smu7_init_dpm_defaults
and is already multiplied by 1000. The value was mistakenly
multiplied another time by smu7_get_thermal_temperature_range.
Fixes: 4ba082572a42 ("drm/amd/powerplay: export the thermal ranges of VI asics (V2)") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1676 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When we backup ring contents to reemit after a queue reset,
we don't backup ring contents from the bad context. When
we signal the fences, we should set an error on those
fences as well.
Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Chips which use the IP discovery firmware loaded by the driver
reported incorrect harvesting information in the ip discovery
table in sysfs because the driver only uses the ip discovery
firmware for populating sysfs and not for direct parsing for the
driver itself as such, the fields that are used to print the
harvesting info in sysfs report incorrect data for some IPs. Populate
the relevant fields for this case as well.
Fixes: 514678da56da ("drm/amdgpu/discovery: fix fw based ip discovery") Acked-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
intel_frontbuffer_get() is what locks out subsequent set_tiling
changes to the bo. Thus the fence vs. modifier check must be done
after intel_frontbuffer_get(), or else a concurrent set_tiling ioctl
might sneak in and change the fence after the check has been done.
Close the race again. See commit dd689287b977 ("drm/i915: Prevent
concurrent tiling/framebuffer modifications") for the previous
instance.
v2: Reorder intel_user_framebuffer_destroy() to match the unwind (Jani)
Currently xe's intel_frontbuffer implementation forgets to
hold a reference on the bo. This makes the entire thing
extremely fragile as the cleanup order now depends on bo
references held by other things
(namely intel_fb_bo_framebuffer_fini()).
Move the bo refcounting to intel_frontbuffer_{get,release}()
so that both i915 and xe do this the same way.
I first tried to fix this by having xe do the refcounting
from its intel_bo_set_frontbuffer() implementation
(which is what i915 does currently), but turns out xe's
drm_gem_object_free() can sleep and thus drm_gem_object_put()
isn't safe to call while we hold fb_tracking.lock.
GuC IRQ and tasklet handler receive just single G2H message, and let other
messages to be received from next tasklet. During this chained tasklet
process, if reset process started, communication will be disabled.
Skip warning for this condition.
Stress testing the audio jack hotplug handling on a few Steam Deck units
revealed that the debounce circuit is responsible for having a negative
impact on the detection reliability, e.g. in some cases the ejection
interrupt is not fired, while in other instances it goes into a kind of
invalid state and generates a flood of misleading interrupts.
Add new entries to the DMI table introduced via commit 1bc40efdaf4a
("ASoC: nau8821: Add DMI quirk mechanism for active-high jack-detect")
and extend the quirk logic to allow bypassing the debounce circuit used
for jack detection on Valve Steam Deck LCD and OLED models.
While at it, rename existing NAU8821_JD_ACTIVE_HIGH quirk bitfield to
NAU8821_QUIRK_JD_ACTIVE_HIGH. This should help improve code readability
by differentiating from similarly named register bits.
The interrupt handler attempts to perform some IRQ status clear
operations *after* rather than *before* unmasking and enabling
interrupts. This is a rather fragile approach since it may generally
lead to missing IRQ requests or causing spurious interrupts.
Make use of the nau8821_irq_status_clear() helper instead of
manipulating the related register directly and ensure any interrupt
clearing is performed *after* the target interrupts are disabled/masked
and *before* proceeding with additional interrupt unmasking/enablement
operations.
This also implicitly drops the redundant clear operation of the ejection
IRQ in the interrupt handler, since nau8821_eject_jack() has been
already responsible for clearing all active interrupts.
Fixes: aab1ad11d69f ("ASoC: nau8821: new driver") Fixes: 2551b6e89936 ("ASoC: nau8821: Add headset button detection") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patch.msgid.link/20251003-nau8821-jdet-fixes-v1-3-f7b0e2543f09@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Instead of adding yet another utility function for dealing with the
interrupt clearing register, generalize nau8821_int_status_clear_all()
by renaming it to nau8821_irq_status_clear(), whilst introducing a
second parameter to allow restricting the operation scope to a single
interrupt instead of the whole range of active IRQs.
While at it, also fix a spelling typo in the comment block.
Note this is mainly a prerequisite for subsequent patches aiming to
address some deficiencies in the implementation of the interrupt
handler. Thus the presence of the Fixes tag below is intentional, to
facilitate backporting.
The microphone detection work scheduled by a prior jack insertion
interrupt may still be in a pending state or under execution when a jack
ejection interrupt has been fired.
This might lead to a racing condition or nau8821_jdet_work() completing
after nau8821_eject_jack(), which will override the currently
disconnected state of the jack and incorrectly report the headphone or
the headset as being connected.
Cancel any pending jdet_work or wait for its execution to finish before
attempting to handle the ejection interrupt.
Proceed similarly before launching the eject handler as a consequence of
detecting an invalid insert interrupt.
The gain ranges specified in Renesas IDT821034 codec documentation
are [-3dB;+13dB] in the transmit path (ADC) and [-13dB;+3dB] in the
receive path (DAC). Allthough the registers allow programming values
outside those ranges, the signal S/N and distorsion are only
guaranteed in the specified ranges.
Fadump allocates memory to pass additional kernel command-line argument
to the fadump kernel. However, this allocation is not needed when fadump
is disabled. So avoid allocating memory for the additional parameter
area in such cases.
There is now a new LT9211 rev. U5, which reports chip ID 0x18 0x01 0xe4 .
The previous LT9211 reported chip ID 0x18 0x01 0xe3 , which is what the
driver checks for right now. Since there is a possibility there will be
yet another revision of the LT9211 in the future, drop the last version
nibble check to allow all future revisions of the chip to work with this
driver.
This fix makes LT9211 rev. U5 work with this driver.
When adding a kprobe such as "p:probe/tcp_sendmsg _text+15392192",
arch_check_kprobe would start iterating all instructions starting from
_text until the probed address. Not only is this very inefficient, but
literal values in there (e.g. left by function patching) are
misinterpreted in a way that causes a desync.
Fix this by doing it like x86: start the iteration at the closest
preceding symbol instead of the given starting point.
Fixes: 87f48c7ccc73 ("riscv: kprobe: Fixup kernel panic when probing an illegal position") Signed-off-by: Fabian Vogt <fvogt@suse.de> Signed-off-by: Marvin Friedrich <marvin.friedrich@suse.com> Acked-by: Guo Ren <guoren@kernel.org> Link: https://lore.kernel.org/r/6191817.lOV4Wx5bFT@fvogt-thinkpad Signed-off-by: Paul Walmsley <pjw@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
For queue-depth I/O policy, this patch fixes unbalanced I/Os across
nvme multipaths.
Issue Description:
The RETRY disposition incorrectly increments ns->ctrl->nr_active
counter and reinitializes iostat start-time. In such cases nr_active
counter never goes back to zero until that path disconnects and
reconnects.
Such a path is not chosen for new I/Os if multiple RETRY cases on a given
a path cause its queue-depth counter to be artificially higher compared
to other paths. This leads to unbalanced I/Os across paths.
The patch skips incrementing nr_active if NVME_MPATH_CNT_ACTIVE is already
set. And it skips restarting io stats if NVME_MPATH_IO_STATS is already set.
base-commit: e989a3da2d371a4b6597ee8dee5c72e407b4db7a Fixes: d4d957b53d91eeb ("nvme-multipath: support io stats on the mpath device") Signed-off-by: Amit Chaudhary <achaudhary@purestorage.com> Reviewed-by: Randy Jennings <randyj@purestorage.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently the Panthor driver needs the GPU to be powered down
between suspend and resume. If this is not done, then the
MCU_CONTROL register will be preserved as AUTO, which again will
cause a premature FW boot on resume. The FW will go directly into
fatal state in this case.
This case needs to be handled as there is no guarantee that the
GPU will be powered down after the suspend callback on all platforms.
The fix is to call panthor_fw_stop() in "pre-reset" path to ensure
the MCU_CONTROL register is cleared (set DISABLE). This matches
well with the already existing call to panthor_fw_start() from the
"post-reset" path.
Signed-off-by: Ketil Johnsen <ketil.johnsen@arm.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Fixes: 2718d91816ee ("drm/panthor: Add the FW logical block") Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20251008105112.4077015-1-ketil.johnsen@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Bringing a linked netdevsim device down and then up causes communication
failure because both interfaces lack carrier. Basically a ifdown/ifup on
the interface make the link broken.
Commit 3762ec05a9fbda ("netdevsim: add NAPI support") added supported
for NAPI, calling netif_carrier_off() in nsim_stop(). This patch
re-enables the carrier symmetrically on nsim_open(), in case the device
is linked and the peer is up.
Signed-off-by: Breno Leitao <leitao@debian.org> Fixes: 3762ec05a9fbda ("netdevsim: add NAPI support") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251014-netdevsim_fix-v2-1-53b40590dae1@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
With async crypto, we rely on tx_work to actually transmit records
once encryption completes. But while send() is running, both the
tx_lock and socket lock are held, so tx_work_handler cannot process
the queue of encrypted records, and simply reschedules itself. During
a large send(), this could last a long time, and use a lot of memory.
Transmit any pending encrypted records before restarting the main
loop of tls_sw_sendmsg_locked.
Async decryption calls tls_strp_msg_hold to create a clone of the
input skb to hold references to the memory it uses. If we fail to
allocate that clone, proceeding with async decryption can lead to
various issues (UAF on the skb, writing into userspace memory after
the recv() call has returned).
In this case, wait for all pending decryption requests.
When userspace wants to send a non-DATA record (via the
TLS_SET_RECORD_TYPE cmsg), we need to send any pending data from a
previous MSG_MORE send() as a separate DATA record. If that DATA record
is encrypted asynchronously, tls_handle_open_record will return
-EINPROGRESS. This is currently treated as an error by
tls_process_cmsg, and it will skip setting record_type to the correct
value, but the caller (tls_sw_sendmsg_locked) handles that return
value correctly and proceeds with sending the new message with an
incorrect record_type (DATA instead of whatever was requested in the
cmsg).
Always set record_type before handling the open record. If
tls_handle_open_record returns an error, record_type will be
ignored. If it succeeds, whether with synchronous crypto (returning 0)
or asynchronous (returning -EINPROGRESS), the caller will proceed
correctly.
If we hit an error during the main loop of tls_sw_sendmsg_locked (eg
failed allocation), we jump to send_end and immediately
return. Previous iterations may have queued async encryption requests
that are still pending. We should wait for those before returning, as
we could otherwise be reading from memory that userspace believes
we're not using anymore, which would be a sort of use-after-free.
This is similar to what tls_sw_recvmsg already does: failures during
the main loop jump to the "wait for async" code, not straight to the
unlock/return.
During tls_sw_sendmsg_locked, we pre-allocate the encrypted message
for the size we're expecting to send during the current iteration, but
we may end up sending less, for example when splicing: if we're
getting the data from small fragments of memory, we may fill up all
the slots in the skmsg with less data than expected.
In this case, we need to trim the encrypted message to only the length
we actually need, to avoid pushing uninitialized bytes down the
underlying TCP socket.
Kernel side:
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:0 state:D stack:0 pid:5021 tgid:5021 ppid:2 flags:0x00200000
Workqueue: ksmbd-io handle_ksmbd_work
Call trace:
__schedule from schedule+0x3c/0x58
schedule from schedule_preempt_disabled+0xc/0x10
schedule_preempt_disabled from rwsem_down_read_slowpath+0x1b0/0x1d8
rwsem_down_read_slowpath from down_read+0x28/0x30
down_read from ksmbd_session_rpc_method+0x18/0x3c
ksmbd_session_rpc_method from ksmbd_rpc_open+0x34/0x68
ksmbd_rpc_open from ksmbd_session_rpc_open+0x194/0x228
ksmbd_session_rpc_open from create_smb2_pipe+0x8c/0x2c8
create_smb2_pipe from smb2_open+0x10c/0x27ac
smb2_open from handle_ksmbd_work+0x238/0x3dc
handle_ksmbd_work from process_scheduled_works+0x160/0x25c
process_scheduled_works from worker_thread+0x16c/0x1e8
worker_thread from kthread+0xa8/0xb8
kthread from ret_from_fork+0x14/0x38
Exception stack(0x8529ffb0 to 0x8529fff8)
The task deadlocks because the lock is already held:
ksmbd_session_rpc_open
down_write(&sess->rpc_lock)
ksmbd_rpc_open
ksmbd_session_rpc_method
down_read(&sess->rpc_lock) <-- deadlock
Adjust ksmbd_session_rpc_method() callers to take the lock when necessary.
Fixes: 305853cce3794 ("ksmbd: Fix race condition in RPC handle list access") Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since blamed commit, unregister_netdevice_many_notify() takes the netdev
mutex if the device needs it.
If the device list is too long, this will lock more device mutexes than
lockdep can handle:
unshare -n \
bash -c 'for i in $(seq 1 100);do ip link add foo$i type dummy;done'
BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48 max: 48!
48 locks held by kworker/u16:1/69:
#0: ..148 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work
#1: ..d40 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work
#2: ..bd0 (pernet_ops_rwsem){++++}-{4:4}, at: cleanup_net
#3: ..aa8 (rtnl_mutex){+.+.}-{4:4}, at: default_device_exit_batch
#4: ..cb0 (&dev_instance_lock_key#3){+.+.}-{4:4}, at: unregister_netdevice_many_notify
[..]
Add a helper to close and then unlock a list of net_devices.
Devices that are not up have to be skipped - netif_close_many always
removes them from the list without any other actions taken, so they'd
remain in locked state.
Close devices whenever we've used up half of the tracking slots or we
processed entire list without hitting the limit.
Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20251013185052.14021-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The jq command is used in vlan_bridge_binding.sh, if it is not supported,
the test will spam the following log.
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# ./vlan_bridge_binding.sh: line 51: jq: command not found
# TEST: Test bridge_binding on->off when lower down [FAIL]
# Got operstate of , expected 0
The rtnetlink.sh has the same problem. It makes sense to check if jq is
installed before running these tests. After this patch, the
vlan_bridge_binding.sh skipped if jq is not supported:
# timeout set to 3600
# selftests: net: vlan_bridge_binding.sh
# TEST: jq not installed [SKIP]
Fixes: dca12e9ab760 ("selftests: net: Add a VLAN bridge binding selftest") Fixes: 6a414fd77f61 ("selftests: rtnetlink: Add an address proto test") Signed-off-by: Wang Liang <wangliang74@huawei.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20251013080039.3035898-1-wangliang74@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Completion napi can free out-of-order tx descriptors if hw QoS is
enabled and packets with different priority are queued to same DMA ring.
Take into account possible out-of-order reports checking if the tx queue
is full using circular buffer head/tail pointer instead of the number of
queued packets.
Fixes: 23020f0493270 ("net: airoha: Introduce ethernet support for EN7581 SoC") Suggested-by: Simon Horman <horms@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251012-airoha-tx-busy-queue-v2-1-a600b08bab2d@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Neal reported that using neper tcp_stream with TCP_TX_DELAY
set to 50ms would often lead to flows stuck in a small cwnd mode,
regardless of the congestion control.
While tcp_stream sets TCP_TX_DELAY too late after the connect(),
it highlighted two kernel bugs.
The following heuristic in tcp_tso_should_defer() seems wrong
for large RTT:
delta = tp->tcp_clock_cache - head->tstamp;
/* If next ACK is likely to come too late (half srtt), do not defer */
if ((s64)(delta - (u64)NSEC_PER_USEC * (tp->srtt_us >> 4)) < 0)
goto send_now;
If next ACK is expected to come in more than 1 ms, we should
not defer because we prefer a smooth ACK clocking.
While blamed commit was a step in the good direction, it was not
generic enough.
Another patch fixing TCP_TX_DELAY for established flows
will be proposed when net-next reopens.
For historical and portability reasons, the netif_rx() is usually
run in the softirq or interrupt context, this commit therefore add
local_bh_disable/enable() protection in the usbnet_resume_rx().
During interface toggle operations (ifdown/ifup), the driver currently
resets the local helper variable 'phy_link' to -1. This causes the link
state machine to incorrectly interpret the state as a link change event,
resulting in spurious "Link is down" messages being logged when the
interface is brought back up.
Preserve the phy_link state across interface toggles to avoid treating
the -1 sentinel value as a legitimate link state transition.
Fixes: 88131a812b16 ("amd-xgbe: Perform phy connect/disconnect at dev open/stop") Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Link: https://patch.msgid.link/20251010065142.1189310-1-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The driver is currently checking for PHYCR2 register presence in
rtl8211f_config_init(), but it does so after accessing PHYCR2 to
disable EEE. This was introduced in commit bfc17c165835 ("net:
phy: realtek: disable PHY-mode EEE"). Move the PHYCR2 presence
test before the EEE disablement and simplify the code.
Fixes: bfc17c165835 ("net: phy: realtek: disable PHY-mode EEE") Signed-off-by: Marek Vasut <marek.vasut@mailbox.org> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20251011110309.12664-1-marek.vasut@mailbox.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since ixgbe_adapter is embedded in devlink, calling devlink_free()
prematurely in the ixgbe_remove() path can lead to UAF. Move devlink_free()
to the end.
When the driver requests Tx timestamp value, one of the first steps is
to clone SKB using skb_get. It increases the reference counter for that
SKB to prevent unexpected freeing by another component.
However, there may be a case where the index is requested, SKB is
assigned and never consumed by PTP flows - for example due to reset during
running PTP apps.
Add a check in release timestamping function to verify if the SKB
assigned to Tx timestamp latch was freed, and release remaining SKBs.
Fixes: 4901e83a94ef ("idpf: add Tx timestamp capabilities negotiation") Signed-off-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Anton Nadezhdin <anton.nadezhdin@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251009-jk-iwl-net-2025-10-01-v3-1-ef32a425b92a@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Similarly to ipv4 tunnel, ipv6 version updates dev->needed_headroom, too.
While ipv4 tunnel headroom adjustment growth was limited in
commit 5ae1e9922bbd ("net: ip_tunnel: prevent perpetual headroom growth"),
ipv6 tunnel yet increases the headroom without any ceiling.
Reflect ipv4 tunnel headroom adjustment limit on ipv6 version.
Credits to Francesco Ruggeri, who was originally debugging this issue
and wrote local Arista-specific patch and a reproducer.
The Broadcom bcm54811 is hardware-strapped to select among RGMII and
GMII/MII/MII-Lite modes. However, the corresponding bit, RGMII Enable
in Miscellaneous Control Register must be also set to select desired
RGMII or MII(-lite)/GMII mode.
After resume from S4 (hibernate), RTL8168H/RTL8111H truncates incoming
packets. Packet captures show messages like "IP truncated-ip - 146 bytes
missing!".
The issue is caused by RxConfig not being properly re-initialized after
resume. Re-initializing the RxConfig register before the chip
re-initialization sequence avoids the truncation and restores correct
packet reception.
This follows the same pattern as commit ef9da46ddef0 ("r8169: fix data
corruption issue on RTL8402").
Fixes: 6e1d0b898818 ("r8169:add support for RTL8168H and RTL8107E") Signed-off-by: Linmao Li <lilinmao@kylinos.cn> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20251009122549.3955845-1-lilinmao@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the internal flash contains missing or corrupted configuration,
basic communication over the bus still functions, but the device
is not capable of normal operation (for example, using mailboxes).
This condition is indicated in the info register by the ready bit.
If this bit is cleared, the probe procedure times out while fetching
the device state.
Handle this case by checking the ready bit value in zl3073x_dev_start()
and skipping DPLL device and pin registration if it is cleared.
Do not report this condition as an error, allowing the devlink device
to be registered and enabling the user to flash the correct configuration.
Prior this patch:
[ 31.112299] zl3073x-i2c 1-0070: Failed to fetch input state: -ETIMEDOUT
[ 31.116332] zl3073x-i2c 1-0070: error -ETIMEDOUT: Failed to start device
[ 31.136881] zl3073x-i2c 1-0070: probe with driver zl3073x-i2c failed with error -110
After this patch:
[ 41.011438] zl3073x-i2c 1-0070: FW not fully ready - missing or corrupted config
Fixes: 75a71ecc24125 ("dpll: zl3073x: Register DPLL devices and pins") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251008141445.841113-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently NETDEV_UNREGISTER event handler is not calling
j1939_cancel_active_session() and j1939_sk_queue_drop_all().
This will result in these calls being skipped when j1939_sk_release() is
called. And I guess that the reason syzbot is still reporting
unregister_netdevice: waiting for vcan0 to become free. Usage count = 2
is caused by lack of these calls.
Calling j1939_cancel_active_session(priv, sk) from j1939_sk_release() can
be covered by calling j1939_cancel_active_session(priv, NULL) from
j1939_netdev_notify().
Calling j1939_sk_queue_drop_all() from j1939_sk_release() can be covered
by calling j1939_sk_netdev_event_netdown() from j1939_netdev_notify().
Therefore, we can reuse j1939_cancel_active_session(priv, NULL) and
j1939_sk_netdev_event_netdown(priv) for NETDEV_UNREGISTER event handler.
This sysctl is not per interface; it's global per netns.
Fixes: 292ecd9f5a94 ("doc: move seg6_flowlabel to seg6-sysctl.rst") Reported-by: Philippe Guibert <philippe.guibert@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Limit tx/rx buffer address to 32-bit address space for board with more
than 4GB DRAM.
Fixes: 804775dfc2885 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)") Fixes: 6757d345dd7db ("net: ethernet: mtk_wed: introduce hw_rro support for MT7988") Tested-by: Daniel Pawlik <pawlik.dan@gmail.com> Tested-by: Matteo Croce <teknoraver@meta.com> Signed-off-by: Rex Lu <rex.lu@mediatek.com> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
A suspend/resume cycle on a down interface results in the interface
coming up in Error Active state. A suspend/resume cycle on an Up
interface will always result in Error Active state, regardless of the
actual CAN state.
During suspend, only set running interfaces to CAN_STATE_SLEEPING.
During resume only touch the CAN state of running interfaces. For
wakeup sources, set the CAN state depending on the Protocol Status
Regitser (PSR), for non wakeup source interfaces m_can_start() will do
the same.