We should count the terminating NUL byte as part of the ctx_len.
Otherwise, UBSAN logs a warning:
UBSAN: array-index-out-of-bounds in security/selinux/xfrm.c:99:14
index 60 is out of range for type 'char [*]'
The allocation itself is correct so there is no actual out of bounds
indexing, just a warning.
Use common wrappers operating directly on the struct sg_table objects to
fix incorrect use of scatterlists sync calls. dma_sync_sg_for_*()
functions have to be called with the number of elements originally passed
to dma_map_sg_*() function, not the one returned in sgtable's nents.
Fixes: 1ffe09590121 ("udmabuf: fix dma-buf cpu access") CC: stable@vger.kernel.org Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250507160913.2084079-3-m.szyprowski@samsung.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Improve the usability of the unit_add sysfs attribute by ensuring that
the associated FCP LUN scan processing is completed synchronously. This
enables configuration tooling to consistently determine the end of the
scan process to allow for serialization of follow-on actions.
While the scan process associated with unit_add typically completes
synchronously, it is deferred to an asynchronous background process if
unit_add is used before initial remote port scanning has completed. This
occurs when unit_add is used immediately after setting the associated FCP
device online.
To ensure synchronous unit_add processing, wait for remote port scanning
to complete before initiating the FCP LUN scan.
Cc: stable@vger.kernel.org Reviewed-by: M Nikhil <nikh1092@linux.ibm.com> Reviewed-by: Nihar Panda <niharp@linux.ibm.com> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Nihar Panda <niharp@linux.ibm.com> Link: https://lore.kernel.org/r/20250603182252.2287285-2-niharp@linux.ibm.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently storvsc_timeout is only used in storvsc_sdev_configure(), and
5s and 10s are used elsewhere. It turns out that rarely the 5s is not
enough on Azure, so let's use storvsc_timeout everywhere.
In case a timeout happens and storvsc_channel_init() returns an error,
close the VMBus channel so that any host-to-guest messages in the
channel's ringbuffer, which might come late, can be safely ignored.
Add a "const" to storvsc_timeout.
Cc: stable@kernel.org Signed-off-by: Dexuan Cui <decui@microsoft.com> Link: https://lore.kernel.org/r/1749243459-10419-1-git-send-email-decui@microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fuzzing hit another invalid pointer dereference due to the lack of
checking whether jffs2_prealloc_raw_node_refs() completed successfully.
Subsequent logic implies that the node refs have been allocated.
Handle that. The code is ready for propagating the error upwards.
Syzkaller detected a kernel bug in jffs2_link_node_ref, caused by fault
injection in jffs2_prealloc_raw_node_refs. jffs2_sum_write_sumnode doesn't
check return value of jffs2_prealloc_raw_node_refs and simply lets any
error propagate into jffs2_sum_write_data, which eventually calls
jffs2_link_node_ref in order to link the summary to an expectedly allocated
node.
cm_chan_msg_send() checks that userspace didn't send too much data but
riocm_ch_send() failed to check that userspace sent sufficient data. The
result is that riocm_ch_send() can write to fields in the rio_ch_chan_hdr
which were outside the bounds of the space which cm_chan_msg_send()
allocated.
Address this by teaching riocm_ch_send() to check that the entire
rio_ch_chan_hdr was copied in from userspace.
commit 7adb96687ce8 ("x86/bugs: Make spectre user default depend on
MITIGATION_SPECTRE_V2") depends on commit 72c70f480a70 ("x86/bugs: Add
a separate config for Spectre V2"), which introduced
MITIGATION_SPECTRE_V2.
commit 72c70f480a70 ("x86/bugs: Add a separate config for Spectre V2")
never landed in stable tree, thus, stable tree doesn't have
MITIGATION_SPECTRE_V2, that said, commit 7adb96687ce8 ("x86/bugs: Make
spectre user default depend on MITIGATION_SPECTRE_V2") has no value if
the dependecy was not applied.
Revert commit 7adb96687ce8 ("x86/bugs: Make spectre user default
depend on MITIGATION_SPECTRE_V2") in stable kernel which landed in in
5.4.294, 5.10.238, 5.15.185, 6.1.141 and 6.6.93 stable versions.
VFIO EEH recovery for PCI passthrough devices fails on PowerNV and pseries
platforms due to missing host-side PE bridge reconfiguration. In the
current implementation, eeh_pe_configure() only performs RTAS or OPAL-based
bridge reconfiguration for native host devices, but skips it entirely for
PEs managed through VFIO in guest passthrough scenarios.
This leads to incomplete EEH recovery when a PCI error affects a
passthrough device assigned to a QEMU/KVM guest. Although VFIO triggers the
EEH recovery flow through VFIO_EEH_PE_ENABLE ioctl, the platform-specific
bridge reconfiguration step is silently bypassed. As a result, the PE's
config space is not fully restored, causing subsequent config space access
failures or EEH freeze-on-access errors inside the guest.
This patch fixes the issue by ensuring that eeh_pe_configure() always
invokes the platform's configure_bridge() callback (e.g.,
pseries_eeh_phb_configure_bridge) even for VFIO-managed PEs. This ensures
that RTAS or OPAL calls to reconfigure the PE bridge are correctly issued
on the host side, restoring the PE's configuration space after an EEH
event.
This fix is essential for reliable EEH recovery in QEMU/KVM guests using
VFIO PCI passthrough on PowerNV and pseries systems.
Tested with:
- QEMU/KVM guest using VFIO passthrough (IBM Power9,(lpar)Power11 host)
- Injected EEH errors with pseries EEH errinjct tool on host, recovery
verified on qemu guest.
- Verified successful config space access and CAP_EXP DevCtl restoration
after recovery
The dell_rbu driver will use memset() to clear the data held by each
packet when it is no longer needed (when the driver is unloaded, the
packet size is changed, etc).
The amount of memory that is cleared (before this patch) is the normal
packet size. However, the last packet in the list may be smaller.
Fix this to only clear the memory actually used by each packet, to prevent
it from writing past the end of data buffer.
Because the packet data buffers are allocated with __get_free_pages() (in
page-sized increments), this bug could only result in a buffer being
overwritten when a packet size larger than one page is used. The only user
of the dell_rbu module should be the Dell BIOS update program, which uses
a packet size of 4096, so no issues should be seen without the patch, it
just blocks the possiblity.
Fixes: 6c54c28e69f2 ("[PATCH] dell_rbu: new Dell BIOS update driver") Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Link: https://lore.kernel.org/r/20250609184659.7210-5-stuart.w.hayes@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pass the correct list head to list_for_each_entry*() when looping through
the packet list.
Without this patch, reading the packet data via sysfs will show the data
incorrectly (because it starts at the wrong packet), and clearing the
packet list will result in a NULL pointer dereference.
Fixes: d19f359fbdc6 ("platform/x86: dell_rbu: don't open code list_for_each_entry*()") Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Link: https://lore.kernel.org/r/20250609184659.7210-3-stuart.w.hayes@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
It may make sense to split the Microsoft Surface hardware platform
drivers out to a separate subdirectory, since some of it may be shared
between ARM and x86 in the future (regarding devices like the Surface
Pro X).
Further, newer Surface devices will require additional platform drivers
for fundamental support (mostly regarding their embedded controller),
which may also warrant this split from a size perspective.
This commit introduces a new platform/surface subdirectory for the
Surface device family, with subsequent commits moving existing Surface
drivers over from platform/x86.
A new MAINTAINERS entry is added for this directory. Patches to files in
this directory will be taken up by the platform-drivers-x86 team (i.e.
Hans de Goede and Mark Gross) after they have been reviewed by
Maximilian Luz.
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20201009141128.683254-2-luzmaximilian@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Stable-dep-of: 61ce04601e0d ("platform/x86: dell_rbu: Fix list usage") Signed-off-by: Sasha Levin <sashal@kernel.org>
It breaks target-module@2b300050 ("ti,sysc-omap2") probe on AM62x in a case
when minimally-configured system tries to network-boot:
[ 6.888776] probe of 2b300050.target-module returned 517 after 258 usecs
[ 17.129637] probe of 2b300050.target-module returned 517 after 708 usecs
[ 17.137397] platform 2b300050.target-module: deferred probe pending: (reason unknown)
[ 26.878471] Waiting up to 100 more seconds for network.
There are minimal configurations possible when the deferred device is not
being probed any more (because everything else has been successfully
probed) and deferral lists are not processed any more.
Stable mmc enumeration can be achieved by filling /aliases node properly
(4700a00755fb commit's rationale).
The current code around TEE_IOCTL_PARAM_SIZE() is a bit wrong on
32-bit kernels: Multiplying a user-provided 32-bit value with the
size of a structure can wrap around on such platforms.
Fix it by using saturating arithmetic for the size calculation.
This has no security consequences because, in all users of
TEE_IOCTL_PARAM_SIZE(), the subsequent kcalloc() implicitly checks
for wrapping.
Don't put the l4ls clk domain to sleep in case of standby.
Since CM3 PM FW[1](ti-v4.1.y) doesn't wake-up/enable the l4ls clk domain
upon wake-up, CM3 PM FW fails to wake-up the MPU.
In case the MC firmware runs in debug mode with extensive prints pushed
to the console, the current timeout of 500ms is not enough.
Increase the timeout value so that we don't have any chance of wrongly
assuming that the firmware is not responding when it's just taking more
time.
We have to wait at least the minimium time for the watchdog window
(TWDMIN) before writings to the wdt register after the
watchdog is activated.
Otherwise the chip will assert TWD_ERROR and power down to reset mode.
The strlcat() with FORTIFY support is triggering a panic because it
thinks the target buffer will overflow although the correct target
buffer size is passed in.
Anyway, instead of memset() with 0 followed by a strlcat(), just use
memcpy() and ensure that the resulting buffer is NULL terminated.
BIOSVersion is only used for the lpfc_printf_log() which expects a
properly terminated string.
software_node_get_reference_args() wants to get @index-th element, so
the property value requires at least '(index + 1) * sizeof(*ref)' bytes
but that can not be guaranteed by current OOB check, and may cause OOB
for malformed property.
Fix by using as OOB check '((index + 1) * sizeof(*ref) > prop->length)'.
FDB entries are allocated in an atomic context as they can be added from
the data path when learning is enabled.
After converting the FDB hash table to rhashtable, the insertion rate
will be much higher (*) which will entail a much higher rate of per-CPU
allocations via dst_cache_init().
When adding a large number of entries (e.g., 256k) in a batch, a small
percentage (< 0.02%) of these per-CPU allocations will fail [1]. This
does not happen with the current code since the insertion rate is low
enough to give the per-CPU allocator a chance to asynchronously create
new chunks of per-CPU memory.
Given that:
a. Only a small percentage of these per-CPU allocations fail.
b. The scenario where this happens might not be the most realistic one.
c. The driver can work correctly without dst caches. The dst_cache_*()
APIs first check that the dst cache was properly initialized.
d. The dst caches are not always used (e.g., 'tos inherit').
It seems reasonable to not treat these allocation failures as fatal.
Therefore, do not bail when dst_cache_init() fails and suppress warnings
by specifying '__GFP_NOWARN'.
[1] percpu: allocation failed, size=40 align=8 atomic=1, atomic alloc failed, no space left
(*) 97% reduction in average latency of vxlan_fdb_update() when adding
256k entries in a batch.
Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250415121143.345227-14-idosch@nvidia.com Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In lpfc_check_sli_ndlp(), the get_job_els_rsp64_did remote_id assignment
does not apply for GEN_REQUEST64 commands as it only has meaning for a
ELS_REQUEST64 command. So, if (iocb->ndlp == ndlp) is false, we could
erroneously return the wrong value. Fix by replacing the fallthrough
statement with a break statement before the remote_id check.
When processing a PREQ the code would always check whether we have a
mesh path locally and reply accordingly. However, when forwarding is
disabled then we should not reply with this information as we will not
forward data packets down that path.
Move the check for dot11MeshForwarding up in the function and skip the
mesh path lookup in that case. In the else block, set forward to false
so that the rest of the function becomes a no-op and the
dot11MeshForwarding check does not need to be duplicated.
This explains an effect observed in the Freifunk community where mesh
forwarding is disabled. In that case a mesh with three STAs and only bad
links in between them, individual STAs would occionally have indirect
mpath entries. This should not have happened.
The statistics are incremented with raw_cpu_inc() assuming it always
happens with bottom half disabled. Without per-CPU locking in
local_bh_disable() on PREEMPT_RT this is no longer true.
Use this_cpu_inc() on PREEMPT_RT for the increment to not worry about
preemption.
tcp_rcv_rtt_update() goal is to maintain an estimation of the RTT
in tp->rcv_rtt_est.rtt_us, used by tcp_rcv_space_adjust()
When TCP TS are enabled, tcp_rcv_rtt_update() is using
EWMA to smooth the samples.
Change this to immediately latch the incoming value if it
is lower than tp->rcv_rtt_est.rtt_us, so that tcp_rcv_space_adjust()
does not overshoot tp->rcvq_space.space and sk->sk_rcvbuf.
This patch synchronizes code that accesses from both user-space
and IRQ contexts. The `get_stats()` function can be called from both
context.
`dev->stats.tx_errors` and `dev->stats.collisions` are also updated
in the `tx_errors()` function. Therefore, these fields must also be
protected by synchronized.
There is no code that accessses `dev->stats.tx_errors` between the
previous and updated lines, so the updating point can be moved.
During init of the bus, the module checks that the bus is idle.
If one of the lines are stuck try to recover them first before failing.
Sometimes SDA and SCL are low if improper reset occurs (e.g., reboot).
Signed-off-by: Tali Perry <tali.perry1@gmail.com> Signed-off-by: Mohammed Elbadry <mohammed.0.elbadry@gmail.com> Reviewed-by: Mukesh Kumar Savaliya <quic_msavaliy@quicinc.com> Link: https://lore.kernel.org/r/20250328193252.1570811-1-mohammed.0.elbadry@gmail.com Signed-off-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Function __sctp_write_space() doesn't set poll key, which leads to
ep_poll_callback() waking up all waiters, not only these waiting
for the socket being writable. Set the key properly using
wake_up_interruptible_poll(), which is preferred over the sync
variant, as writers are not woken up before at least half of the
queue is available. Also, TCP does the same.
This fixes the 5G connectibity issue on LiteOn WN4519R module
see https://github.com/openwrt/mt76/issues/971
And may also fix the 5G issues on the XBox One Wireless Adapter
see https://github.com/openwrt/mt76/issues/200
I have looked at the FCC info related to the MT7632U chip as mentioned in here:
https://github.com/openwrt/mt76/issues/459
These confirm the chipset does not support 'ac' mode and hence VHT should be turned of.
Logic here always sets hdr->version to 2 if it is not a BE3 or Lancer chip,
even if it is BE2. Use 'else if' to prevent multiple assignments, setting
version 0 for BE2, version 1 for BE3 and Lancer, and version 2 for others.
Fixes potential incorrect version setting when BE2_chip and
BE3_chip/lancer_chip checks could both be true.
Replaced pm_runtime_put() with pm_runtime_put_sync_suspend() to ensure
the runtime suspend is invoked immediately when unregistering a slave.
This prevents a race condition where suspend was skipped when
unregistering and registering slave in quick succession.
For example, consider the rapid sequence of
`delete_device -> new_device -> delete_device -> new_device`.
In this sequence, it is observed that the dw_i2c_plat_runtime_suspend()
might not be invoked after `delete_device` operation.
This is because after `delete_device` operation, when the
pm_runtime_put() is about to trigger suspend, the following `new_device`
operation might race and cancel the suspend.
If that happens, during the `new_device` operation,
dw_i2c_plat_runtime_resume() is skipped (since there was no suspend), which
means `i_dev->init()`, i.e. i2c_dw_init_slave(), is skipped.
Since i2c_dw_init_slave() is skipped, i2c_dw_configure_fifo_slave() is
skipped too, which leaves `DW_IC_INTR_MASK` unconfigured. If we inspect
the interrupt mask register using devmem, it will show as zero.
Example shell script to reproduce the issue:
```
#!/bin/sh
The tipc_aead_free() function currently uses kfree() to release the aead
structure. However, this structure contains sensitive information, such
as key's SALT value, which should be securely erased from memory to
prevent potential leakage.
To enhance security, replace kfree() with kfree_sensitive() when freeing
the aead structure. This change ensures that sensitive data is explicitly
cleared before memory deallocation, aligning with the approach used in
tipc_aead_init() and adhering to best practices for handling confidential
information.
If the global boost flag is enabled and policy boost flag is disabled, a
call to `cpufreq_boost_trigger_state(true)` must enable the policy's
boost state.
The current code misses that because of an optimization. Fix it.
TSENS v2.0+ leverage features not available to prior versions such as
updated interrupts init routine, masked interrupts, and watchdog.
Currently, the checks in place evaluate whether the IP version is greater
than v1 which invalidates when updates to v1 or v1 minor versions are
implemented. As such, update the conditional statements to strictly
evaluate whether the version is greater than or equal to v2 (inclusive).
NIOS2 uses a software-managed TLB for virtual address translation. To
flush a cache line, the original mapping is replaced by one to physical
address 0x0 with no permissions (rwx mapped to 0) set. This can lead to
TLB-permission--related traps when such a nominally flushed entry is
encountered as a mapping for an otherwise valid virtual address within a
process (e.g. due to an MMU-PID-namespace rollover that previously
flushed the complete TLB including entries of existing, running
processes).
The default ptep_set_access_flags implementation from mm/pgtable-generic.c
only forces a TLB-update when the page-table entry has changed within the
page table:
/*
* [...] We return whether the PTE actually changed, which in turn
* instructs the caller to do things like update__mmu_cache. [...]
*/
int ptep_set_access_flags(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep,
pte_t entry, int dirty)
{
int changed = !pte_same(*ptep, entry);
if (changed) {
set_pte_at(vma->vm_mm, address, ptep, entry);
flush_tlb_fix_spurious_fault(vma, address);
}
return changed;
}
However, no cross-referencing with the TLB-state occurs, so the
flushing-induced pseudo entries that are responsible for the pagefault
in the first place are never pre-empted from TLB on this code path.
This commit fixes this behaviour by always requesting a TLB-update in
this part of the pagefault handling, fixing spurious page-faults on the
way. The handling is a straightforward port of the logic from the MIPS
architecture via an arch-specific ptep_set_access_flags function ported
from arch/mips/include/asm/pgtable.h.
Signed-off-by: Simon Schuster <schuster.simon@siemens-energy.com> Signed-off-by: Andreas Oetken <andreas.oetken@siemens-energy.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In fimc_is_hw_change_mode(), the function changes camera modes without
waiting for hardware completion, risking corrupted data or system hangs
if subsequent operations proceed before the hardware is ready.
Add fimc_is_hw_wait_intmsr0_intmsd0() after mode configuration, ensuring
hardware state synchronization and stable interrupt handling.
If the HPD is low (happens if there is no EDID or the
EDID is being updated), then return -ENOLINK in
tc358743_get_detected_timings() instead of detecting video.
This avoids userspace thinking that it can start streaming when
the HPD is low.
When submitting MQD to CP, set SDMA_RLCx_IB_CNTL/SWITCH_INSIDE_IB bit so
it'll allow SDMA preemption if there is a massive command buffer of
long-running SDMA commands.
Signed-off-by: Amber Lin <Amber.Lin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We believe that we have found a concurrency bug in the `fs/jfs` module
that results in a null pointer dereference. There is a closely related
issue which has been fixed:
... but, unfortunately, the accepted patch appears to still be
susceptible to a null pointer dereference under some interleavings.
To trigger the bug, we think that `JFS_SBI(ipbmap->i_sb)->bmap` is set
to NULL in `dbFreeBits` and then dereferenced in `jfs_ioc_trim`. This
bug manifests quite rarely under normal circumstances, but is
triggereable from a syz-program.
Reported-and-tested-by: Dylan J. Wolff<wolffd@comp.nus.edu.sg> Reported-and-tested-by: Jiacheng Xu <stitch@zju.edu.cn> Signed-off-by: Dylan J. Wolff<wolffd@comp.nus.edu.sg> Signed-off-by: Jiacheng Xu <stitch@zju.edu.cn> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The EXT4_IOC_GET_ES_CACHE and EXT4_IOC_PRECACHE_EXTENTS currently
invokes ext4_ext_precache() to preload the extent cache without holding
the inode's i_rwsem. This can result in stale extent cache entries when
competing with operations such as ext4_collapse_range() which calls
ext4_ext_remove_space() or ext4_ext_shift_extents().
The problem arises when ext4_ext_remove_space() temporarily releases
i_data_sem due to insufficient journal credits. During this interval, a
concurrent EXT4_IOC_GET_ES_CACHE or EXT4_IOC_PRECACHE_EXTENTS may cache
extent entries that are about to be deleted. As a result, these cached
entries become stale and inconsistent with the actual extents.
Loading the extents cache without holding the inode's i_rwsem or the
mapping's invalidate_lock is not permitted besides during the writeback.
Fix this by holding the i_rwsem during EXT4_IOC_GET_ES_CACHE and
EXT4_IOC_PRECACHE_EXTENTS.
When cache cleanup runs concurrently with cache entry removal, a race
condition can occur that leads to incorrect nextcheck times. This can
delay cache cleanup for the cache_detail by up to 1800 seconds:
1. cache_clean() sets nextcheck to current time plus 1800 seconds
2. While scanning a non-empty bucket, concurrent cache entry removal can
empty that bucket
3. cache_clean() finds no cache entries in the now-empty bucket to update
the nextcheck time
4. This maybe delays the next scan of the cache_detail by up to 1800
seconds even when it should be scanned earlier based on remaining
entries
Fix this by moving the hash_lock acquisition earlier in cache_clean().
This ensures bucket emptiness checks and nextcheck updates happen
atomically, preventing the race between cleanup and entry removal.
Signed-off-by: Long Li <leo.lilong@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Setting up the control handler calls into .s_ctrl ops. While validating
the controls the ops may need to access some of the context state, which
could lead to a crash if not properly initialized.
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Sasha Levin <sashal@kernel.org>
stbl is s8 but it must contain offsets into slot which can go from 0 to
127.
Added a bound check for that error and return -EIO if the check fails.
Also make jfs_readdir return with error if add_missing_indices returns
with an error.
When removing space, we should use EXT4_EX_NOCACHE because we don't
need to cache extents, and we should also use EXT4_EX_NOFAIL to prevent
metadata inconsistencies that may arise from memory allocation failures.
While ext4_ext_remove_space() already uses these two flags in most
places, they are missing in ext4_ext_search_right() and
read_extent_tree_block() calls. Unify the flags to ensure consistent
behavior throughout the extent removal process.
Explicitly compare a buffer type only with valid buffer types,
to avoid matching a buffer type outside of the valid buffer type set.
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com> Reviewed-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Sasha Levin <sashal@kernel.org>
When ACD feature is enabled, it triggers some internal calibrations
which result in a pretty long delay during the first HFI perf vote.
So, increase the HFI response timeout to match the downstream driver.
Signed-off-by: Akhil P Oommen <quic_akhilpo@quicinc.com> Tested-by: Maya Matuszczyk <maccraft123mc@gmail.com> Tested-by: Anthony Ruhier <aruhier@mailbox.org>
Patchwork: https://patchwork.freedesktop.org/patch/649344/ Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
This commit updates the dm_force_atomic_commit function to replace the
usage of PTR_ERR_OR_ZERO with IS_ERR for checking error states after
retrieving the Connector (drm_atomic_get_connector_state), CRTC
(drm_atomic_get_crtc_state), and Plane (drm_atomic_get_plane_state)
states.
The function utilized PTR_ERR_OR_ZERO for error checking. However, this
approach is inappropriate in this context because the respective
functions do not return NULL; they return pointers that encode errors.
This change ensures that error pointers are properly checked using
IS_ERR before attempting to dereference.
Cc: Harry Wentland <harry.wentland@amd.com> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The IRQF_NO_AUTOEN can be used for the drivers that don't want
interrupts to be enabled automatically via devm_request_threaded_irq().
Using this flag can provide be more robust compared to the way of
calling disable_irq() after devm_request_threaded_irq() without the
IRQF_NO_AUTOEN flag.
Suggested-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Damon Ding <damon.ding@rock-chips.com> Link: https://lore.kernel.org/r/20250310104114.2608063-2-damon.ding@rock-chips.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The cache_detail structure uses a "nextcheck" field to control hash table
scanning intervals. When a table scan begins, nextcheck is set to current
time plus 1800 seconds. During scanning, if cache_detail is not empty and
a cache entry's expiry time is earlier than the current nextcheck, the
nextcheck is updated to that expiry time.
This mechanism ensures that:
1) Empty cache_details are scanned every 1800 seconds to avoid unnecessary
scans
2) Non-empty cache_details are scanned based on the earliest expiry time
found
However, when adding a new cache entry to an empty cache_detail, the
nextcheck time was not being updated, remaining at 1800 seconds. This
could delay cache cleanup for up to 1800 seconds, potentially blocking
threads(such as nfsd) that are waiting for cache cleanup.
Fix this by updating the nextcheck time whenever a new cache entry is
added.
Signed-off-by: Long Li <leo.lilong@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The ACPI specification requires that battery rate is always positive,
but the kernel ABI for POWER_SUPPLY_PROP_CURRENT_NOW
(Documentation/ABI/testing/sysfs-class-power) specifies that it should
be negative when a battery is discharging. When reporting CURRENT_NOW,
massage the value to match the documented ABI.
This only changes the sign of `current_now` and not `power_now` because
documentation doesn't describe any particular meaning for `power_now` so
leaving `power_now` unchanged is less likely to confuse userspace
unnecessarily, whereas becoming consistent with the documented ABI is
worth potentially confusing clients that read `current_now`.
pm_runtime_put_autosuspend() schedules a hrtimer to expire
at "dev->power.timer_expires". If the hrtimer's callback,
pm_suspend_timer_fn(), observes that the current time equals
"dev->power.timer_expires", it unexpectedly bails out instead of
proceeding with runtime suspend.
Additionally, as ->timer_expires is not cleared, all the future auto
suspend requests will not schedule hrtimer to perform auto suspend.
rpm_suspend():
if ((rpmflags & RPM_AUTO) &&...) {
if (!(dev->power.timer_expires && ...) { <-- this will fail.
hrtimer_start_range_ns(&dev->power.suspend_timer,...);
}
}
Fix this by as well checking if current time reaches the set expiration.
Co-developed-by: Patrick Daly <quic_pdaly@quicinc.com> Signed-off-by: Patrick Daly <quic_pdaly@quicinc.com> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> Link: https://patch.msgid.link/20250515064125.1211561-1-quic_charante@quicinc.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Multiple applications may access the battery gauge at the same time, so
the gauge may be busy and EBUSY will be returned. The driver will set a
flag to record the EBUSY state, and this flag will be kept until the next
periodic update. When this flag is set, bq27xxx_battery_get_property()
will just return ENODEV until the flag is updated.
Even if the gauge was busy during the last accessing attempt, returning
ENODEV is not ideal, and can cause confusion in the applications layer.
Instead, retry accessing the I2C to update the flag is as expected, for
the gauge typically recovers from busy state within a few milliseconds.
If still failed to access the gauge, the real error code would be returned
instead of ENODEV (as suggested by Pali Rohár).
I analyzed this memory leak in detail. I found that “Acpi-State” cache and
“Acpi-Parse” cache were merged because the size of cache objects was same
slab cache size.
I finally found “Acpi-Parse” cache and “Acpi-parse_ext” cache were leaked
using SLAB_NEVER_MERGE flag in kmem_cache_create() function.
When early abort is occurred due to invalid ACPI information, Linux kernel
terminates ACPI by calling acpi_terminate() function. The function calls
acpi_ut_delete_caches() function to delete local caches (acpi_gbl_namespace_
cache, state_cache, operand_cache, ps_node_cache, ps_node_ext_cache).
But the deletion codes in acpi_ut_delete_caches() function only delete
slab caches using kmem_cache_destroy() function, therefore the cache
objects should be flushed before acpi_ut_delete_caches() function.
"Acpi-Parse" cache and "Acpi-ParseExt" cache are used in an AML parse
function, acpi_ps_parse_loop(). The function should complete all ops
using acpi_ps_complete_final_op() when an error occurs due to invalid
AML codes.
However, the current implementation of acpi_ps_complete_final_op() does not
complete all ops when it meets some errors and this cause cache leak.
This cache leak has a security threat because an old kernel (<= 4.9) shows
memory locations of kernel functions in stack dump. Some malicious users
could use this information to neutralize kernel ASLR.
To fix ACPI cache leak for enhancing security, I made a patch to complete all
ops unconditionally for acpi_ps_complete_final_op() function.
I hope that this patch improves the security of Linux kernel.
The ISENSE/VSENSE blocks are only powered up when the amplifier
transitions from shutdown to active. This means that if those controls
are flipped on while the amplifier is already playing back audio, they
will have no effect.
Fix this by forcing a power cycle around transitions in those controls.
Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: James Calligeros <jcalligeros99@gmail.com> Link: https://patch.msgid.link/20250406-apple-codec-changes-v5-1-50a00ec850a3@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
ap_get_table_length() checks if tables are valid by
calling ap_is_valid_header(). The latter then calls
ACPI_VALIDATE_RSDP_SIG(Table->Signature).
ap_is_valid_header() accepts struct acpi_table_header as an argument, so
the signature size is always fixed to 4 bytes.
The problem is when the string comparison is between ACPI-defined table
signature and ACPI_SIG_RSDP. Common ACPI table header specifies the
Signature field to be 4 bytes long[1], with the exception of the RSDP
structure whose signature is 8 bytes long "RSD PTR " (including the
trailing blank character)[2]. Calling strncmp(sig, rsdp_sig, 8) would
then result in a sequence overread[3] as sig would be smaller (4 bytes)
than the specified bound (8 bytes).
As a workaround, pass the bound conditionally based on the size of the
signature being passed.
Right now, if the clocksource watchdog detects a clocksource skew, it might
perform a per CPU check, for example in the TSC case on x86. In other
words: supposing TSC is detected as unstable by the clocksource watchdog
running at CPU1, as part of marking TSC unstable the kernel will also run a
check of TSC readings on some CPUs to be sure it is synced between them
all.
But that check happens only on some CPUs, not all of them; this choice is
based on the parameter "verify_n_cpus" and in some random cpumask
calculation. So, the watchdog runs such per CPU checks on up to
"verify_n_cpus" random CPUs among all online CPUs, with the risk of
repeating CPUs (that aren't double checked) in the cpumask random
calculation.
But if "verify_n_cpus" > num_online_cpus(), it should skip the random
calculation and just go ahead and check the clocksource sync between
all online CPUs, without the risk of skipping some CPUs due to
duplicity in the random cpumask calculation.
Tests in a 4 CPU laptop with TSC skew detected led to some cases of the per
CPU verification skipping some CPU even with verify_n_cpus=8, due to the
duplicity on random cpumask generation. Skipping the randomization when the
number of online CPUs is smaller than verify_n_cpus, solves that.
Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/all/20250323173857.372390-1-gpiccoli@igalia.com Signed-off-by: Sasha Levin <sashal@kernel.org>
I found an ACPI cache leak in ACPI early termination and boot continuing case.
When early termination occurs due to malicious ACPI table, Linux kernel
terminates ACPI function and continues to boot process. While kernel terminates
ACPI function, kmem_cache_destroy() reports Acpi-Operand cache leak.
I analyzed this memory leak in detail and found acpi_ds_obj_stack_pop_and_
delete() function miscalculated the top of the stack. acpi_ds_obj_stack_push()
function uses walk_state->operand_index for start position of the top, but
acpi_ds_obj_stack_pop_and_delete() function considers index 0 for it.
Therefore, this causes acpi operand memory leak.
This cache leak causes a security threat because an old kernel (<= 4.9) shows
memory locations of kernel functions in stack dump. Some malicious users
could use this information to neutralize kernel ASLR.
Fix incorrect value mask for register write. Register values are 8-bit,
not 9. If this function was called with a value > 0xFF and an even addr,
it would cause writing to the next register.
Fixes: f2a22e1e172f ("iio: adc: ad7606: Add support for software mode for ad7616") Signed-off-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Angelo Dureghello <adureghello@baylibre.com> Link: https://patch.msgid.link/20250428-iio-adc-ad7606_spi-fix-write-value-mask-v1-1-a2d5e85a809f@baylibre.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>From the documentation:
"offset to be added to <type>[Y]_raw prior toscaling by <type>[Y]_scale"
Offset should be applied before multiplying scale, so divide offset by
scale to make this correct.
Fixes: bc3eb0207fb5 ("iio: imu: inv_icm42600: add temperature sensor support") Signed-off-by: Sean Nyekjaer <sean@geanix.com> Acked-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol@tdk.com> Link: https://patch.msgid.link/20250502-imu-v1-1-129b8391a4e3@geanix.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit a4e772898f8b ("PCI: Add missing bridge lock to pci_bus_lock()")
made the lock function to call depend on dev->subordinate but left
pci_slot_unlock() unmodified creating locking asymmetry compared with
pci_slot_lock().
Because of the asymmetric lock handling, the same bridge device is unlocked
twice. First pci_bus_unlock() unlocks bus->self and then pci_slot_unlock()
will unconditionally unlock the same bridge device.
Move pci_dev_unlock() inside an else branch to match the logic in
pci_slot_lock().
Loongson PCIe Root Ports don't advertise an ACS capability, but they do not
allow peer-to-peer transactions between Root Ports. Add an ACS quirk so
each Root Port can be in a separate IOMMU group.
Interrupt and monitor pages should be in Hyper-V page size (4k bytes).
This can be different from the system page size.
This size is read and used by the user-mode program to determine the
mapped data region. An example of such user-mode program is the VMBus
driver in DPDK.
Cc: stable@vger.kernel.org Fixes: 95096f2fbd10 ("uio-hv-generic: new userspace i/o driver for VMBus") Signed-off-by: Long Li <longli@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1746492997-4599-3-git-send-email-longli@linuxonhyperv.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1746492997-4599-3-git-send-email-longli@linuxonhyperv.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function max14577_reg_get_current_limit() calls the function
max14577_read_reg(), but does not check its return value. A proper
implementation can be found in max14577_get_online().
Add a error check for the max14577_read_reg() and return error code
if the function fails.
Fixes: b0902bbeb768 ("regulator: max14577: Add regulator driver for Maxim 14577") Cc: stable@vger.kernel.org # v3.14 Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Link: https://patch.msgid.link/20250526025627.407-1-vulab@iscas.ac.cn Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
GCC 15 changed the default C standard dialect from gnu17 to gnu23,
which should not have impacted the kernel because it explicitly requests
the gnu11 standard in the main Makefile. However, mips/vdso code uses
its own CFLAGS without a '-std=' value, which break with this dialect
change because of the kernel's own definitions of bool, false, and true
conflicting with the C23 reserved keywords.
include/linux/stddef.h:11:9: error: cannot use keyword 'false' as enumeration constant
11 | false = 0,
| ^~~~~
include/linux/stddef.h:11:9: note: 'false' is a keyword with '-std=c23' onwards
include/linux/types.h:35:33: error: 'bool' cannot be defined via 'typedef'
35 | typedef _Bool bool;
| ^~~~
include/linux/types.h:35:33: note: 'bool' is a keyword with '-std=c23' onwards
Add -std as specified in KBUILD_CFLAGS to the decompressor and purgatory
CFLAGS to eliminate these errors and make the C standard version of these
areas match the rest of the kernel.
In mii_nway_restart() the code attempts to call
mii->mdio_read which is ch9200_mdio_read(). ch9200_mdio_read()
utilises a local buffer called "buff", which is initialised
with control_read(). However "buff" is conditionally
initialised inside control_read():
if (err == size) {
memcpy(data, buf, size);
}
If the condition of "err == size" is not met, then
"buff" remains uninitialised. Once this happens the
uninitialised "buff" is accessed and returned during
ch9200_mdio_read():
return (buff[0] | buff[1] << 8);
The problem stems from the fact that ch9200_mdio_read()
ignores the return value of control_read(), leading to
uinit-access of "buff".
To fix this we should check the return value of
control_read() and return early on error.
The above issue may happen as follows:
(1) Add kprobe tracepoint;
(2) insmod test.ko;
(3) Module triggers ftrace disabled;
(4) rmmod test.ko;
(5) cat /proc/kallsyms; --> Will trigger UAF as test.ko already removed;
ftrace_mod_get_kallsym()
...
strscpy(module_name, mod_map->mod->name, MODULE_NAME_LEN);
...
The problem is when a module triggers an issue with ftrace and
sets ftrace_disable. The ftrace_disable is set when an anomaly is
discovered and to prevent any more damage, ftrace stops all text
modification. The issue that happened was that the ftrace_disable stops
more than just the text modification.
When a module is loaded, its init functions can also be traced. Because
kallsyms deletes the init functions after a module has loaded, ftrace
saves them when the module is loaded and function tracing is enabled. This
allows the output of the function trace to show the init function names
instead of just their raw memory addresses.
When a module is removed, ftrace_release_mod() is called, and if
ftrace_disable is set, it just returns without doing anything more. The
problem here is that it leaves the mod_list still around and if kallsyms
is called, it will call into this code and access the module memory that
has already been freed as it will return:
There's a tiny race condition in dm-mirror. The functions queue_bio and
write_callback grab a spinlock, add a bio to the list, drop the spinlock
and wake up the mirrord thread that processes bios in the list.
It may be possible that the mirrord thread processes the bio just after
spin_unlock_irqrestore is called, before wakeup_mirrord. This spurious
wake-up is normally harmless, however if the device mapper device is
unloaded just after the bio was processed, it may be possible that
wakeup_mirrord(ms) uses invalid "ms" pointer.
Fix this bug by moving wakeup_mirrord inside the spinlock.
In sunxi_nfc_hw_ecc_read_chunk(), the sunxi_nfc_randomizer_enable() is
called without the config of randomizer. A proper implementation can be
found in sunxi_nfc_hw_ecc_read_chunks_dma().
Add sunxi_nfc_randomizer_config() before the start of randomization.
The function sunxi_nfc_hw_ecc_write_chunk() calls the
sunxi_nfc_hw_ecc_write_chunk(), but does not call the configuration
function sunxi_nfc_randomizer_config(). Consequently, the randomization
might not conduct correctly, which will affect the lifespan of NAND flash.
A proper implementation can be found in sunxi_nfc_hw_ecc_write_page_dma().
Add the sunxi_nfc_randomizer_config() to config randomizer.
In dirty_ratio_handler(), vm_dirty_bytes must be set to zero before
calling writeback_set_ratelimit(), as global_dirty_limits() always
prioritizes the value of vm_dirty_bytes.
It's domain_dirty_limits() that's relevant here, not node_dirty_ok:
This causes ratelimit_pages to still use the value calculated based on
vm_dirty_bytes, which is wrong now.
The impact visible to userspace is difficult to capture directly because
there is no procfs/sysfs interface exported to user space. However, it
will have a real impact on the balance of dirty pages.
For example:
1. On default, we have vm_dirty_ratio=40, vm_dirty_bytes=0
2. echo 8192 > dirty_bytes, then vm_dirty_bytes=8192,
vm_dirty_ratio=0, and ratelimit_pages is calculated based on
vm_dirty_bytes now.
3. echo 20 > dirty_ratio, then since vm_dirty_bytes is not reset to
zero when writeback_set_ratelimit() -> global_dirty_limits() ->
domain_dirty_limits() is called, reallimit_pages is still calculated
based on vm_dirty_bytes instead of vm_dirty_ratio. This does not
conform to the actual intent of the user.
Link: https://lkml.kernel.org/r/20250415090232.7544-1-alexjlzheng@tencent.com Fixes: 9d823e8f6b1b ("writeback: per task dirty rate limit") Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: MengEn Sun <mengensun@tencent.com> Cc: Andrea Righi <andrea@betterlinux.com> Cc: Fenggaung Wu <fengguang.wu@intel.com> Cc: Jinliang Zheng <alexjlzheng@tencent.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
idr_for_each() is protected by rwsem, but this is not enough. If it is
not protected by RCU read-critical region, when idr_for_each() calls
radix_tree_node_free() through call_rcu() to free the radix_tree_node
structure, the node will be freed immediately, and when reading the next
node in radix_tree_for_each_slot(), the already freed memory may be read.
Therefore, we need to add code to make sure that idr_for_each() is
protected within the RCU read-critical region when we call it in
shm_destroy_orphaned().
Link: https://lkml.kernel.org/r/20250424143322.18830-1-aha310510@gmail.com Fixes: b34a6b1da371 ("ipc: introduce shm_rmid_forced sysctl") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reported-by: syzbot+a2b84e569d06ca3a949c@syzkaller.appspotmail.com Cc: Jeongjun Park <aha310510@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SPICC is missing fclk_div2, which means fclk_div5 and fclk_div7 indexes
are wrong on this clock. This causes the spicc module to output sclk at
2.5x the expected rate when clock index 3 is picked.
Adding the missing fclk_div2 resolves this.
[jbrunet: amended commit description] Fixes: a18c8e0b7697 ("clk: meson: g12a: add support for the SPICC SCLK Source clocks") Cc: stable@vger.kernel.org # 6.1 Signed-off-by: Da Xue <da@libre.computer> Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Link: https://lore.kernel.org/r/20250512142617.2175291-1-da@libre.computer Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The decompressor is built with the default C dialect, which is now gnu23
on gcc-15, and this clashes with the kernel's bool type definition:
In file included from include/uapi/linux/posix_types.h:5,
from arch/parisc/boot/compressed/misc.c:7:
include/linux/stddef.h:11:9: error: cannot use keyword 'false' as enumeration constant
11 | false = 0,
Add the -std=gnu11 argument here, as we do for all other architectures.
Second to last potentially related work creation:
kasan_save_stack+0x20/0x40 mm/kasan/common.c:45
__kasan_record_aux_stack+0x94/0xa0 mm/kasan/generic.c:492
__call_rcu_common.constprop.0+0xc3/0xa10 kernel/rcu/tree.c:2713
netlink_release+0x620/0xc20 net/netlink/af_netlink.c:802
__sock_release+0xb5/0x270 net/socket.c:663
sock_close+0x1e/0x30 net/socket.c:1425
__fput+0x408/0xab0 fs/file_table.c:384
task_work_run+0x154/0x240 kernel/task_work.c:239
exit_task_work include/linux/task_work.h:45 [inline]
do_exit+0x8e5/0x1320 kernel/exit.c:874
do_group_exit+0xcd/0x280 kernel/exit.c:1023
get_signal+0x1675/0x1850 kernel/signal.c:2905
arch_do_signal_or_restart+0x80/0x3b0 arch/x86/kernel/signal.c:310
exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x1b3/0x1e0 kernel/entry/common.c:218
do_syscall_64+0x66/0x110 arch/x86/entry/common.c:87
entry_SYSCALL_64_after_hwframe+0x78/0xe2
The buggy address belongs to the object at ffff88800f5be000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 2656 bytes to the right of
allocated 1280-byte region [ffff88800f5be000, ffff88800f5be500)
...
Memory state around the buggy address: ffff88800f5bee00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88800f5bee80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88800f5bef00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff88800f5bef80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88800f5bf000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
By analyzing the vmcore, we found that vc->vc_origin was somehow placed
one line prior to vc->vc_screenbuf when vc was in KD_TEXT mode, and
further writings to /dev/vcs caused out-of-bounds reads (and writes
right after) in vcs_write_buf_noattr().
Our further experiments show that in most cases, vc->vc_origin equals to
vga_vram_base when the console is in KD_TEXT mode, and it's around
vc->vc_screenbuf for the KD_GRAPHICS mode. But via triggerring a
TIOCL_SETVESABLANK ioctl beforehand, we can make vc->vc_origin be around
vc->vc_screenbuf while the console is in KD_TEXT mode, and then by
writing the special 'ESC M' control sequence to the tty certain times
(depends on the value of `vc->state.y - vc->vc_top`), we can eventually
move vc->vc_origin prior to vc->vc_screenbuf. Here's the PoC, tested on
QEMU:
```
int main() {
const int RI_NUM = 10; // should be greater than `vc->state.y - vc->vc_top`
int tty_fd, vcs_fd;
const char *tty_path = "/dev/tty0";
const char *vcs_path = "/dev/vcs";
const char escape_seq[] = "\x1bM"; // ESC + M
const char trigger_seq[] = "Let's trigger an OOB write.";
struct vt_sizes vt_size = { 70, 2 };
int blank = TIOCL_BLANKSCREEN;
If fb_add_videomode() in fb_set_var() fails to allocate memory for
fb_videomode, later it may lead to a null-ptr dereference in
fb_videomode_to_var(), as the fb_info is registered while not having the
mode in modelist that is expected to be there, i.e. the one that is
described in fb_info->var.
The reason is that fb_info->var is being modified in fb_set_var(), and
then fb_videomode_to_var() is called. If it fails to add the mode to
fb_info->modelist, fb_set_var() returns error, but does not restore the
old value of fb_info->var. Restore fb_info->var on failure the same way
it is done earlier in the function.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.