Commit 4ce6e2db00de ("virtio-blk: Ensure no requests in virtqueues before
deleting vqs.") replaces queue quiesce with queue freeze in virtio-blk's
PM callbacks. And the motivation is to drain inflight IOs before suspending.
block layer's queue freeze looks very handy, but it is also easy to cause
deadlock, such as, any attempt to call into bio_queue_enter() may run into
deadlock if the queue is frozen in current context. There are all kinds
of ->suspend() called in suspend context, so keeping queue frozen in the
whole suspend context isn't one good idea. And Marek reported lockdep
warning[1] caused by virtio-blk's freeze queue in virtblk_freeze().
Given the motivation is to drain in-flight IOs, it can be done by calling
freeze & unfreeze, meantime restore to previous behavior by keeping queue
quiesced during suspend.
Cc: Yi Sun <yi.sun@unisoc.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: virtualization@lists.linux.dev Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20241112125821.1475793-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
While receiving an MST up request message from one thread in
drm_dp_mst_handle_up_req(), the MST topology could be removed from
another thread via drm_dp_mst_topology_mgr_set_mst(false), freeing
mst_primary and setting drm_dp_mst_topology_mgr::mst_primary to NULL.
This could lead to a NULL deref/use-after-free of mst_primary in
drm_dp_mst_handle_up_req().
Avoid the above by holding a reference for mst_primary in
drm_dp_mst_handle_up_req() while it's used.
v2: Fix kfreeing the request if getting an mst_primary reference fails.
HiSilicon HIP09A platforms using the same SMMU PMCG with HIP09
and thus suffers the same erratum. List them in the PMCG platform
information list without introducing a new SMMU PMCG Model.
This partially reverts commit 812fe6420a6e ("scsi: storvsc: Handle
additional SRB status values").
HyperV does not support MAINTENANCE_IN resulting in FC passthrough
returning the SRB_STATUS_DATA_OVERRUN value. Now that
SRB_STATUS_DATA_OVERRUN is treated as an error, multipath ALUA paths go
into a faulty state as multipath ALUA submits RTPG commands via
MAINTENANCE_IN.
Before retrying initialization, check and abort if the fault code
indicates insufficient power. Also mark the controller as unrecoverable
instead of issuing reset in the watch dog timer if the fault code
indicates insufficient power.
Signed-off-by: Prayas Patel <prayas.patel@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20241110194405.10108-5-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The driver, through the SAS transport, exposes a sysfs interface to
enable/disable PHYs in a controller/expander setup. When multiple PHYs
are disabled and enabled in rapid succession, the persistent and current
config pages related to SAS IO unit/SAS Expander pages could get
corrupted.
Use separate memory for each config request.
Signed-off-by: Prayas Patel <prayas.patel@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20241110194405.10108-3-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The driver serializes ioctls through a mutex lock but access to the
ioctl data buffer is not guarded by the mutex. This results in multiple
user threads being able to write to the driver's ioctl buffer
simultaneously.
Protect the ioctl buffer with the ioctl mutex.
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com> Link: https://lore.kernel.org/r/20241110194405.10108-2-ranjan.kumar@broadcom.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The Panther Lake SPI controllers are compatible with the Cannon Lake
controllers. Add support for following SPI controller device IDs:
- H-series: 0xe323
- P-series: 0xe423
- U-series: 0xe423
Andrii spotted that process_dynptr_func's rejection of incorrect
argument register type will print an error string where argument numbers
are not zero-indexed, unlike elsewhere in the verifier. Fix this by
subtracting 1 from regno. The same scenario exists for iterator
messages. Fix selftest error strings that match on the exact argument
number while we're at it to ensure clean bisection.
The branch instructions beq, bne, blt, bge, bltu, bgeu and jirl belong
to the format reg2i16, but the sequence of oprand is different for the
instruction jirl. So adjust the parameter order of emit_jirl() to make
it more readable correspond with the Instruction Set Architecture manual.
Since screen_info.lfb_base is a __u32 type, an above-4G address need an
ext_lfb_base to present its higher 32bits. In init_screen_info() we can
use __screen_info_lfb_base() to handle this case for reserving screen
info memory.
GCC 15 introduces -Werror=unterminated-string-initialization by default,
this results in the following build error
fs/smb/server/smb_common.c:21:35: error: initializer-string for array of 'char' is too long [-Werror=unterminated-string-ini
tialization]
21 | static const char basechars[43] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_-!@#$%";
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
To this we are replacing char basechars[43] with a character pointer
and then using strlen to get the length.
Signed-off-by: Brahmajit Das <brahmajit.xyz@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The SH DAC audio driver uses the kmalloc'ed buffer as the main PCM
buffer, and the data is transferred via hrtimer callbacks manually
from there to the hardware. Meanwhile, some of its code are written
as if the buffer is on iomem and use the special helpers for the iomem
(e.g. copy_from_iter_toio() or memset_io()). Those are rather useless
and the standard helpers should be used.
Similarly, the PCM mmap callback is set to a special one with
snd_pcm_lib_mmap_iomem, but this is also nonsense, because SH
architecture doesn't support this function, hence it leads just to
NULL -- the fallback to the standard helper.
This patch replaces those special setups with the standard ones.
When Z60MR100 startup, speaker will output a pop. To fix this issue,
we mute codec by init verbs in bios when system startup, and set GPIO
to low to unmute codec in codec driver when it loaded .
[ white space fixes and compile warning fix by tiwai ]
Since the legacy rawmidi has no proper way to know the inactive group,
indicate it in the rawmidi substream names with "[Inactive]" suffix
when the corresponding UMP group is inactive.
When a UMP Group is inactive, we shouldn't allow users to access it
via the legacy MIDI access. Add the group active flag check and
return -ENODEV if it's inactive.
During rename, we are updating link counts of various inodes either when
rename deletes target or when moving directory across directories.
Verify involved link counts are sane so that we don't trip warnings in
VFS.
Reported-by: syzbot+3ff7365dc04a6bcafa66@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the parent directory link count is too low (likely directory inode
corruption), just skip updating its link count as if it goes to 0 too
early it can cause unexpected issues.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix the hardware revision numbering for Qlogic ISP1020/1040 boards. HWMASK
suggests that the revision number only needs four bits, this is consistent
with how NetBSD does things in their ISP driver. Verified on a IPS1040B
which is seen as rev 5 not as BIT_4.
Signed-off-by: Magnus Lindholm <linmag7@gmail.com> Link: https://lore.kernel.org/r/20241113225636.2276-1-linmag7@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Now that we can register a SoC specific regmap with syscon using
of_syscon_register_regmap() api we can switch back to using
syscon_regmap_lookup_by_phandle() in the client drivers.
Signed-off-by: Peter Griffin <peter.griffin@linaro.org> Reviewed-by: Sam Protsenko <semen.protsenko@linaro.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20241029191131.2329414-1-peter.griffin@linaro.org Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
On RZ/G3S the watchdog can be part of a software-controlled PM domain. In
this case, the watchdog device need to be powered on in
struct watchdog_ops::restart API. This can be done though
pm_runtime_resume_and_get() API if the watchdog PM domain and watchdog
device are marked as IRQ safe. We mark the watchdog PM domain as IRQ safe
with GENPD_FLAG_IRQ_SAFE when the watchdog PM domain is registered and the
watchdog device though pm_runtime_irq_safe().
Before commit e4cf89596c1f ("watchdog: rzg2l_wdt: Fix 'BUG: Invalid wait
context'") pm_runtime_get_sync() was used in watchdog restart handler
(which is similar to pm_runtime_resume_and_get() except the later one
handles the runtime resume errors).
Commit e4cf89596c1f ("watchdog: rzg2l_wdt: Fix 'BUG: Invalid wait
context'") dropped the pm_runtime_get_sync() and replaced it with
clk_prepare_enable() to avoid invalid wait context due to genpd_lock()
in genpd_runtime_resume() being called from atomic context. But
clk_prepare_enable() doesn't fit for this either (as reported by
Ulf Hansson) as clk_prepare() can also sleep (it just not throw invalid
wait context warning as it is not written for this).
Because the watchdog device is marked now as IRQ safe (though this patch)
the irq_safe_dev_in_sleep_domain() call from genpd_runtime_resume() returns
1 for devices not registering an IRQ safe PM domain for watchdog (as the
watchdog device is IRQ safe, PM domain is not and watchdog PM domain is
always-on), this being the case for RZ/G3S with old device trees and
the rest of the SoCs that use this driver, we can now drop also the
clk_prepare_enable() calls in restart handler and rely on
pm_runtime_resume_and_get().
Thus, drop clk_prepare_enable() and use pm_runtime_resume_and_get() in
watchdog restart handler.
For the watchdog timer to work properly on the QCML04 board we need to
set PWRGD enable in the Environment Controller Configuration Registers
Special Configuration Register 1 when it is not already set, this may
be the case when the watchdog is not enabled from within the BIOS.
Make sure the trace_kprobe's module notifer callback function is called
after jump_label's callback is called. Since the trace_kprobe's callback
eventually checks jump_label address during registering new kprobe on
the loading module, jump_label must be updated before this registration
happens.
For the most part of the C++ history, it couldn't have type
declarations inside anonymous unions for different reasons. At the
same time, __struct_group() relies on the latters, so when the @TAG
argument is not empty, C++ code doesn't want to build (even under
`extern "C"`):
../linux/include/uapi/linux/pkt_cls.h:25:24: error:
'struct tc_u32_sel::<unnamed union>::tc_u32_sel_hdr,' invalid;
an anonymous union may only have public non-static data members
[-fpermissive]
The safest way to fix this without trying to switch standards (which
is impossible in UAPI anyway) etc., is to disable tag declaration
for that language. This won't break anything since for now it's not
buildable at all.
Use a separate definition for __struct_group() when __cplusplus is
defined to mitigate the error, including the version from tools/.
Fixes: 50d7bd38c3aa ("stddef: Introduce struct_group() helper macro") Reported-by: Christopher Ferris <cferris@google.com> Closes: https://lore.kernel.org/linux-hardening/Z1HZpe3WE5As8UAz@google.com Suggested-by: Kees Cook <kees@kernel.org> # __struct_group_tag() Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20241219135734.2130002-1-aleksander.lobakin@intel.com Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
1. When writing the OTG_CONFIG bit it uses POC_CHG_CONFIG_SHIFT which
should be POC_OTG_CONFIG_SHIFT.
2. When turning the regulator off it never turns charging back on. Note
this must be done through bq24190_charger_set_charge_type(), to ensure
that the charge_type property value of none/trickle/fast is honored.
Resolve both issues to fix BQ24296 Vbus regulator support not working.
Fixes: b150a703b56f ("power: supply: bq24190_charger: Add support for BQ24296") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20241116203648.169100-2-hdegoede@redhat.com Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The mapping VMA address is saved in VAS window struct when the
paste address is mapped. This VMA address is used during migration
to unmap the paste address if the window is active. The paste
address mapping will be removed when the window is closed or with
the munmap(). But the VMA address in the VAS window is not updated
with munmap() which is causing invalid access during migration.
The KASAN report shows:
[16386.254991] BUG: KASAN: slab-use-after-free in reconfig_close_windows+0x1a0/0x4e8
[16386.255043] Read of size 8 at addr c00000014a819670 by task drmgr/696928
[16386.256136] Allocated by task 696554 on cpu 31 at 16377.277618s:
[16386.256149] kasan_save_stack+0x34/0x68
[16386.256163] kasan_save_track+0x34/0x80
[16386.256175] kasan_save_alloc_info+0x58/0x74
[16386.256196] __kasan_slab_alloc+0xb8/0xdc
[16386.256209] kmem_cache_alloc_noprof+0x200/0x3d0
[16386.256225] vm_area_alloc+0x44/0x150
[16386.256245] mmap_region+0x214/0x10c4
[16386.256265] do_mmap+0x5fc/0x750
[16386.256277] vm_mmap_pgoff+0x14c/0x24c
[16386.256292] ksys_mmap_pgoff+0x20c/0x348
[16386.256303] sys_mmap+0xd0/0x160
...
[16386.256350] Freed by task 0 on cpu 31 at 16386.204848s:
[16386.256363] kasan_save_stack+0x34/0x68
[16386.256374] kasan_save_track+0x34/0x80
[16386.256384] kasan_save_free_info+0x64/0x10c
[16386.256396] __kasan_slab_free+0x120/0x204
[16386.256415] kmem_cache_free+0x128/0x450
[16386.256428] vm_area_free_rcu_cb+0xa8/0xd8
[16386.256441] rcu_do_batch+0x2c8/0xcf0
[16386.256458] rcu_core+0x378/0x3c4
[16386.256473] handle_softirqs+0x20c/0x60c
[16386.256495] do_softirq_own_stack+0x6c/0x88
[16386.256509] do_softirq_own_stack+0x58/0x88
[16386.256521] __irq_exit_rcu+0x1a4/0x20c
[16386.256533] irq_exit+0x20/0x38
[16386.256544] interrupt_async_exit_prepare.constprop.0+0x18/0x2c
...
[16386.256717] Last potentially related work creation:
[16386.256729] kasan_save_stack+0x34/0x68
[16386.256741] __kasan_record_aux_stack+0xcc/0x12c
[16386.256753] __call_rcu_common.constprop.0+0x94/0xd04
[16386.256766] vm_area_free+0x28/0x3c
[16386.256778] remove_vma+0xf4/0x114
[16386.256797] do_vmi_align_munmap.constprop.0+0x684/0x870
[16386.256811] __vm_munmap+0xe0/0x1f8
[16386.256821] sys_munmap+0x54/0x6c
[16386.256830] system_call_exception+0x1a0/0x4a0
[16386.256841] system_call_vectored_common+0x15c/0x2ec
[16386.256868] The buggy address belongs to the object at c00000014a819670
which belongs to the cache vm_area_struct of size 168
[16386.256887] The buggy address is located 0 bytes inside of
freed 168-byte region [c00000014a819670, c00000014a819718)
This patch adds close() callback in vas_vm_ops vm_operations_struct
which will be executed during munmap() before freeing VMA. The VMA
address in the VAS window is set to NULL after holding the window
mmap_mutex.
Update the DMI match for a Lenovo laptop to the new DMI identifier.
This laptop ships with a different DMI identifier to what was expected,
and now has two identifiers.
Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Fixes: 83c062ae81e8 ("ASoC: Intel: sof_sdw: Add quirks for some new Lenovo laptops") Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20241216140821.153670-2-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The "user" pointer was converted from being allocated with kzalloc() to
being allocated by devm_kzalloc(). Calling kfree(user) will lead to a
double free.
The linkDMA should not be released on stop trigger since a stream re-start
might happen without closing of the stream. This leaves a short time for
other streams to 'steal' the linkDMA since it has been released.
This issue is not easy to reproduce under normal conditions as usually
after stop the stream is closed, or the same stream is restarted, but if
another stream got in between the stop and start, like this:
aplay -Dhw:0,3 -c2 -r48000 -fS32_LE /dev/zero -d 120
CTRL+z
aplay -Dhw:0,0 -c2 -r48000 -fS32_LE /dev/zero -d 120
then the link DMA channels will be mixed up, resulting firmware error or
crash.
Fixes: ab5593793e90 ("ASoC: SOF: Intel: hda: Always clean up link DMA during stop") Cc: stable@vger.kernel.org Closes: https://github.com/thesofproject/sof/issues/9695 Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20241217091019.31798-1-peter.ujfalusi@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The at_xdmac_memset_create_desc may return NULL, which will lead to a
null pointer dereference. For example, the len input is error, or the
atchan->free_descs_list is empty and memory is exhausted. Therefore, add
check to avoid this.
The ADMAC attached to the AOP has complex power sequencing, and is
power gated when the probe callback runs. Move the register reads
to other functions, where we can guarantee that the hardware is
switched on.
Current implementation of fsl_edma3_attach_pd() does not provide a
cleanup path, resulting in a memory leak. For example,
dev_pm_domain_detach() is not called after dev_pm_domain_attach_by_id(),
and the device link created with the DL_FLAG_STATELESS is not released
explicitly.
Therefore, provide a cleanup function fsl_edma3_detach_pd() and call it
upon failure. Also add a devm_add_action_or_reset() call with this
function after a successful fsl_edma3_attach_pd().
The get_dma_ops and set_dma_ops APIs were never for driver to use. Remove
these calls from QDMA driver. Instead, pass the DMA device pointer from the
qdma_platdata structure.
Currently, the driver does not return the correct DMA status when a DMA
pause is issued by the client drivers. This causes GPCDMA users to
assume that DMA is still running, while in reality, the DMA is paused.
Return DMA_PAUSED for tx_status() if the channel is paused in the middle
of a transfer.
The recently submitted fix-commit revealed a problem in the iDMA 32-bit
platform code. Even though the controller supported only a single master
the dw_dma_acpi_filter() method hard-coded two master interfaces with IDs
0 and 1. As a result the sanity check implemented in the commit b336268dde75 ("dmaengine: dw: Add peripheral bus width verification")
got incorrect interface data width and thus prevented the client drivers
from configuring the DMA-channel with the EINVAL error returned. E.g.,
the next error was printed for the PXA2xx SPI controller driver trying
to configure the requested channels:
> [ 164.525604] pxa2xx_spi_pci 0000:00:07.1: DMA slave config failed
> [ 164.536105] pxa2xx_spi_pci 0000:00:07.1: failed to get DMA TX descriptor
> [ 164.543213] spidev spi-SPT0001:00: SPI transfer failed: -16
The problem would have been spotted much earlier if the iDMA 32-bit
controller supported more than one master interfaces. But since it
supports just a single master and the iDMA 32-bit specific code just
ignores the master IDs in the CTLLO preparation method, the issue has
been gone unnoticed so far.
Fix the problem by specifying the default master ID for both memory
and peripheral devices in the driver data. Thus the issue noticed for
the iDMA 32-bit controllers will be eliminated and the ACPI-probed
DW DMA controllers will be configured with the correct master ID by
default.
With CONFIG_DMA_API_DEBUG enabled, the following warning is observed:
DMA-API: snd_hda_intel 0000:03:00.1: device driver failed to check map error[device address=0x00000000ffff0000] [size=20480 bytes] [mapped as single]
WARNING: CPU: 28 PID: 2255 at kernel/dma/debug.c:1036 check_unmap+0x1408/0x2430
CPU: 28 UID: 42 PID: 2255 Comm: wireplumber Tainted: G W L 6.12.0-10-133577cad6bf48e5a7848c4338124081393bfe8a+ #759
debug_dma_unmap_page+0xe9/0xf0
snd_dma_wc_free+0x85/0x130 [snd_pcm]
snd_pcm_lib_free_pages+0x1e3/0x440 [snd_pcm]
snd_pcm_common_ioctl+0x1c9a/0x2960 [snd_pcm]
snd_pcm_ioctl+0x6a/0xc0 [snd_pcm]
...
Check for returned DMA addresses using specialized dma_mapping_error()
helper which is generally recommended for this purpose by
Documentation/core-api/dma-api.rst.
Fixes: c880a5146642 ("ALSA: memalloc: Use proper DMA mapping API for x86 WC buffer allocations") Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Closes: https://lore.kernel.org/r/CABXGCsNB3RsMGvCucOy3byTEOxoc-Ys+zB_HQ=Opb_GhX1ioDA@mail.gmail.com/ Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Link: https://patch.msgid.link/20241219203345.195898-1-pchelkin@ispras.ru Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, the USB port via combophy on the RK3528/RK3588 SoC is broken.
usb usb8-port1: Cannot enable. Maybe the USB cable is bad?
This is due to the combphy of RK3528/RK3588 SoC has multiple resets, but
only "phy resets" need assert and deassert, "apb resets" don't need.
So change the driver to only match the phy resets, which is also what
the vendor kernel does.
In some cases, rk_hdptx_phy_runtime_resume() may be invoked before
platform_set_drvdata() is executed in ->probe(), leading to a NULL
pointer dereference when using the return of dev_get_drvdata().
Ensure platform_set_drvdata() is called before devm_pm_runtime_enable().
When bringing up the PHY, it might be in a bad state if left powered.
One case is we lose the PLL lock if the PLL is gated while the PHY
is powered. Toggle the PHY power so we can start from a known state.
Fixes: 4e5b9c9a73b3 ("phy: usb: Add support for new Synopsys USB controller on the 7216") Signed-off-by: Justin Chen <justin.chen@broadcom.com> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20241024213540.1059412-1-justin.chen@broadcom.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For devm_phy_destroy(), its comment says it needs to invoke phy_destroy()
to destroy the phy, but it will not actually invoke the function since
devres_destroy() does not call devm_phy_consume(), and the missing
phy_destroy() call will cause that the phy fails to be destroyed.
Fortunately, the faulty API has not been used by current kernel tree.
Fix by using devres_release() instead of devres_destroy() within the API.
For devm_of_phy_provider_unregister(), its comment says it needs to invoke
of_phy_provider_unregister() to unregister the phy provider, but it will
not actually invoke the function since devres_destroy() does not call
devm_phy_provider_release(), and the missing of_phy_provider_unregister()
call will cause:
- The phy provider fails to be unregistered.
- Leak both memory and the OF node refcount.
Fortunately, the faulty API has not been used by current kernel tree.
Fix by using devres_release() instead of devres_destroy() within the API.
For devm_phy_put(), its comment says it needs to invoke phy_put() to
release the phy, but it will not actually invoke the function since
devres_destroy() does not call devm_phy_release(), and the missing
phy_put() call will cause:
- The phy fails to be released.
- devm_phy_put() can not fully undo what API devm_phy_get() does.
- Leak refcount of both the module and device for below typical usage:
devm_phy_get(); // or its variant
...
err = do_something();
if (err)
goto err_out;
...
err_out:
devm_phy_put(); // leak refcount here
The file(s) affected by this issue are shown below since they have such
typical usage.
drivers/pci/controller/cadence/pcie-cadence.c
drivers/net/ethernet/ti/am65-cpsw-nuss.c
Fix by using devres_release() instead of devres_destroy() within the API.
Fixes: ff764963479a ("drivers: phy: add generic PHY framework") Cc: stable@vger.kernel.org Cc: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: Krzysztof Wilczyński <kw@linux.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20241213-phy_core_fix-v6-1-40ae28f5015a@quicinc.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For macro for_each_child_of_node(parent, child), refcount of @child has
been increased before entering its loop body, so normally needs to call
of_node_put(@child) before returning from the loop body to avoid refcount
leakage.
of_phy_provider_lookup() has such usage but does not call of_node_put()
before returning, so cause leakage of the OF node refcount.
Fix by simply calling of_node_put() before returning from the loop body.
The APIs affected by this issue are shown below since they indirectly
invoke problematic of_phy_provider_lookup().
phy_get()
of_phy_get()
devm_phy_get()
devm_of_phy_get()
devm_of_phy_get_by_index()
Fixes: 2a4c37016ca9 ("phy: core: Fix of_phy_provider_lookup to return PHY provider for sub node") Cc: stable@vger.kernel.org Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20241213-phy_core_fix-v6-5-40ae28f5015a@quicinc.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
_of_phy_get() will directly return when suffers of_device_is_compatible()
error, but it forgets to decrease refcount of OF node @args.np before error
return, the refcount was increased by previous of_parse_phandle_with_args()
so causes the OF node's refcount leakage.
Fix by decreasing the refcount via of_node_put() before the error return.
Fixes: b7563e2796f8 ("phy: work around 'phys' references to usb-nop-xceiv devices") Cc: stable@vger.kernel.org Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20241213-phy_core_fix-v6-4-40ae28f5015a@quicinc.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In RX Lane configuration sequence of SC8280XP, the register
V5_RX_UCDR_FO_GAIN is incorrectly spelled as RX_UCDR_SO_GAIN and
hence the programming sequence is wrong. Fix the register sequence
accordingly to avoid any compliance failures. This has been tested
on SA8775P by checking device mode enumeration in SuperSpeed.
The NAND chip-selects are registered for the Arasan driver during
initialization but are not de-registered when the driver is unloaded. As a
result, if the driver is loaded again, the chip-selects remain registered
and busy, making them unavailable for use.
When two chip-selects are configured in the device tree, and the second is
a non-native GPIO, both the GPIO-based chip-select and the first native
chip-select may be asserted simultaneously. This double assertion causes
incorrect read and write operations.
The issue occurs because when nfc->ncs <= 2, nfc->spare_cs is always
initialized to 0 due to static initialization. Consequently, when the
second chip-select (GPIO-based) is selected in anfc_assert_cs(), it is
detected by anfc_is_gpio_cs(), and nfc->native_cs is assigned the value 0.
This results in both the GPIO-based chip-select being asserted and the
NAND controller register receiving 0, erroneously selecting the native
chip-select.
This patch resolves the issue, as confirmed by oscilloscope testing with
configurations involving two or more chip-selects in the device tree.
There may be a potential integer overflow issue in inftl_partscan().
parts[0].size is defined as "uint64_t" while mtd->erasesize and
ip->firstUnit are defined as 32-bit unsigned integer. The result of
the calculation will be limited to 32 bits without correct casting.
Before this commit, svc_export_put or expkey_put will call path_put with
sync mode. After this commit, path_put will be called with async mode.
And this can lead the unexpected results show as follow.
The touch will finally call nfsd_cross_mnt, add refcount to mount, and
then add cache_head. Before this commit, exportfs -r will call
cache_flush to cleanup all cache_head, and path_put in
svc_export_put/expkey_put will be finished with sync mode. So, the
latter umount will always success. However, after this commit, path_put
will be called with async mode, the latter umount may failed, and if
we add some delay, umount will success too. Personally I think this bug
and should be fixed. We first revert before bugfix patch, and then fix
the original bug with a different way.
Fixes: f8c989a0c89a ("nfsd: release svc_expkey/svc_export with rcu_work") Signed-off-by: Yang Erkun <yangerkun@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
skb_network_offset() and skb_transport_offset() can be negative when
they are called after we pull the transport header, for example, when
we use eBPF sockmap at the point of ->sk_data_ready().
__bpf_skb_min_len() uses an unsigned int to get these offsets, this
leads to a very large number which then causes bpf_skb_change_tail()
failed unexpectedly.
Fix this by using a signed int to get these offsets and ensure the
minimum is at least zero.
Fixes: 5293efe62df8 ("bpf: add bpf_skb_change_tail helper") Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241213034057.246437-2-xiyou.wangcong@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
When we do sk_psock_verdict_apply->sk_psock_skb_ingress, an sk_msg will
be created out of the skb, and the rmem accounting of the sk_msg will be
handled by the skb.
For skmsgs in __SK_REDIRECT case of tcp_bpf_send_verdict, when redirecting
to the ingress of a socket, although we sk_rmem_schedule and add sk_msg to
the ingress_msg of sk_redir, we do not update sk_rmem_alloc. As a result,
except for the global memory limit, the rmem of sk_redir is nearly
unlimited. Thus, add sk_rmem_alloc related logic to limit the recv buffer.
Since the function sk_msg_recvmsg and __sk_psock_purge_ingress_msg are
used in these two paths. We use "msg->skb" to test whether the sk_msg is
skb backed up. If it's not, we shall do the memory accounting explicitly.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241210012039.1669389-3-zijianzhang@bytedance.com Signed-off-by: Sasha Levin <sashal@kernel.org>
When bpf_tcp_ingress() is called, the skmsg is being redirected to the
ingress of the destination socket. Therefore, we should charge its
receive socket buffer, instead of sending socket buffer.
Because sk_rmem_schedule() tests pfmemalloc of skb, we need to
introduce a wrapper and call it for skmsg.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241210012039.1669389-2-zijianzhang@bytedance.com Signed-off-by: Sasha Levin <sashal@kernel.org>
With recent netfs apis changes, the bytes written
value was not getting updated in /proc/fs/cifs/Stats.
Fix this by updating tcon->bytes in write operations.
Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In get_uprobe_offset(), the call to procmap_query() use the constant
PROCMAP_QUERY_VMA_EXECUTABLE, even if PROCMAP_QUERY is not defined.
Define PROCMAP_QUERY_VMA_EXECUTABLE when PROCMAP_QUERY isn't.
Fixes: 4e9e07603ecd ("selftests/bpf: make use of PROCMAP_QUERY ioctl if available") Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20241218175724.578884-1-jmarchan@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
If dup_mmap() encounters an issue, currently uprobe is able to access the
relevant mm via the reverse mapping (in build_map_info()), and if we are
very unlucky with a race window, observe invalid XA_ZERO_ENTRY state which
we establish as part of the fork error path.
This occurs because uprobe_write_opcode() invokes anon_vma_prepare() which
in turn invokes find_mergeable_anon_vma() that uses a VMA iterator,
invoking vma_iter_load() which uses the advanced maple tree API and thus
is able to observe XA_ZERO_ENTRY entries added to dup_mmap() in commit d24062914837 ("fork: use __mt_dup() to duplicate maple tree in
dup_mmap()").
This change was made on the assumption that only process tear-down code
would actually observe (and make use of) these values. However this very
unlikely but still possible edge case with uprobes exists and
unfortunately does make these observable.
The uprobe operation prevents races against the dup_mmap() operation via
the dup_mmap_sem semaphore, which is acquired via uprobe_start_dup_mmap()
and dropped via uprobe_end_dup_mmap(), and held across
register_for_each_vma() prior to invoking build_map_info() which does the
reverse mapping lookup.
Currently these are acquired and dropped within dup_mmap(), which exposes
the race window prior to error handling in the invoking dup_mm() which
tears down the mm.
We can avoid all this by just moving the invocation of
uprobe_start_dup_mmap() and uprobe_end_dup_mmap() up a level to dup_mm()
and only release this lock once the dup_mmap() operation succeeds or clean
up is done.
This means that the uprobe code can never observe an incompletely
constructed mm and resolves the issue in this case.
Link: https://lkml.kernel.org/r/20241210172412.52995-1-lorenzo.stoakes@oracle.com Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by: syzbot+2d788f4f7cb660dac4b7@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Set the cache-line-size parameter of the L2 cache for each core to the
correct value of 64 bytes.
Previously, the L2 cache line size was incorrectly set to 128 bytes
for the Broadcom BCM2712. This causes validation tests for the
Performance Application Programming Interface (PAPI) tool to fail as
they depend on sysfs accurately reporting cache line sizes.
The correct value of 64 bytes is stated in the official documentation of
the ARM Cortex A-72, which is linked in the comments of
arm64/boot/dts/broadcom/bcm2712.dtsi as the source for cache-line-size.
Fixes: faa3381267d0 ("arm64: dts: broadcom: Add minimal support for Raspberry Pi 5") Signed-off-by: Willow Cunningham <willow.e.cunningham@maine.edu> Link: https://lore.kernel.org/r/20241007212954.214724-1-willow.e.cunningham@maine.edu Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Now, the epoll only use wake_up() interface to wake up task.
However, sometimes, there are epoll users which want to use
the synchronous wakeup flag to hint the scheduler, such as
Android binder driver.
So add a wake_up_sync() define, and use the wake_up_sync()
when the sync is true in ep_poll_callback().
Co-developed-by: Jing Xia <jing.xia@unisoc.com> Signed-off-by: Jing Xia <jing.xia@unisoc.com> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Link: https://lore.kernel.org/r/20240426080548.8203-1-xuewen.yan@unisoc.com Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: Brian Geffon <bgeffon@google.com> Reported-by: Benoit Lize <lizeb@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Other page flags in the 2nd page, like PG_hwpoison and PG_anon_exclusive
can get modified concurrently. Changes to other page flags might be lost
if they are happening at the same time as non-atomic partially_mapped
operations. Hence, make partially_mapped operations atomic.
Link: https://lkml.kernel.org/r/20241212183351.1345389-1-usamaarif642@gmail.com Fixes: 8422acdc97ed ("mm: introduce a pageflag for partially mapped folios") Reported-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/all/e53b04ad-1827-43a2-a1ab-864c7efecf6e@redhat.com/ Signed-off-by: Usama Arif <usamaarif642@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Barry Song <baohua@kernel.org> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Rik van Riel <riel@surriel.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Yu Zhao <yuzhao@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
/proc/meminfo ShmemHugePages has been showing overlarge amounts (more than
Shmem) after swapping out THPs: we forgot to update NR_SHMEM_THPS.
Add shmem_update_stats(), to avoid repetition, and risk of making that
mistake again: the call from shmem_delete_from_page_cache() is the bugfix;
the call from shmem_replace_folio() is reassuring, but not really a bugfix
(replace corrects misplaced swapin readahead, but huge swapin readahead
would be a mistake).
Link: https://lkml.kernel.org/r/5ba477c8-a569-70b5-923e-09ab221af45b@google.com Fixes: 809bc86517cc ("mm: shmem: support large folio swap out") Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In current kernel, hugetlb_wp() calls copy_user_large_folio() with the
fault address. Where the fault address may be not aligned with the huge
page size. Then, copy_user_large_folio() may call
copy_user_gigantic_page() with the address, while
copy_user_gigantic_page() requires the address to be huge page size
aligned. So, this may cause memory corruption or information leak,
addtional, use more obvious naming 'addr_hint' instead of 'addr' for
copy_user_gigantic_page().
Link: https://lkml.kernel.org/r/20241028145656.932941-2-wangkefeng.wang@huawei.com Fixes: 530dd9926dc1 ("mm: memory: improve copy_user_large_folio()") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In current kernel, hugetlb_no_page() calls folio_zero_user() with the
fault address. Where the fault address may be not aligned with the huge
page size. Then, folio_zero_user() may call clear_gigantic_page() with
the address, while clear_gigantic_page() requires the address to be huge
page size aligned. So, this may cause memory corruption or information
leak, addtional, use more obvious naming 'addr_hint' instead of 'addr' for
clear_gigantic_page().
Link: https://lkml.kernel.org/r/20241028145656.932941-1-wangkefeng.wang@huawei.com Fixes: 78fefd04c123 ("mm: memory: convert clear_huge_page() to folio_zero_user()") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The bvecs array which is allocated in iter_get_bvecs_alloc() is leaked
and pages remain pinned if ceph_alloc_sparse_ext_map() fails.
There is no need to delay the allocation of sparse_ext map until after
the bvecs array is set up, so fix this by moving sparse_ext allocation
a bit earlier. Also, make a similar adjustment in __ceph_sync_read()
for consistency (a leak of the same kind in __ceph_sync_read() has been
addressed differently).
Cc: stable@vger.kernel.org Fixes: 03bc06c7b0bd ("ceph: add new mount option to enable sparse reads") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In two `break` statements, the call to ceph_release_page_vector() was
missing, leaking the allocation from ceph_alloc_page_vector().
Instead of adding the missing ceph_release_page_vector() calls, the
Ceph maintainers preferred to transfer page ownership to the
`ceph_osd_request` by passing `own_pages=true` to
osd_req_op_extent_osd_data_pages(). This requires postponing the
ceph_osdc_put_request() call until after the block that accesses the
`pages`.
Cc: stable@vger.kernel.org Fixes: 03bc06c7b0bd ("ceph: add new mount option to enable sparse reads") Fixes: f0fe1e54cfcf ("ceph: plumb in decryption during reads") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch refines the read logic in __ceph_sync_read() to ensure more
predictable and efficient behavior in various edge cases.
- Return early if the requested read length is zero or if the file size
(`i_size`) is zero.
- Initialize the index variable (`idx`) where needed and reorder some
code to ensure it is always set before use.
- Improve error handling by checking for negative return values earlier.
- Remove redundant encrypted file checks after failures. Only attempt
filesystem-level decryption if the read succeeded.
- Simplify leftover calculations to correctly handle cases where the
read extends beyond the end of the file or stops short. This can be
hit by continuously reading a file while, on another client, we keep
truncating and writing new data into it.
- This resolves multiple issues caused by integer and consequent buffer
overflow (`pages` array being accessed beyond `num_pages`):
- https://tracker.ceph.com/issues/67524
- https://tracker.ceph.com/issues/68980
- https://tracker.ceph.com/issues/68981
Cc: stable@vger.kernel.org Fixes: 1065da21e5df ("ceph: stop copying to iter at EOF on sync reads") Reported-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Signed-off-by: Alex Markuze <amarkuze@redhat.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It becomes a path component, so it shouldn't exceed NAME_MAX
characters. This was hardened in commit c152737be22b ("ceph: Use
strscpy() instead of strcpy() in __get_snap_name()"), but no actual
check was put in place.
If the full path to be built by ceph_mdsc_build_path() happens to be
longer than PATH_MAX, then this function will enter an endless (retry)
loop, effectively blocking the whole task. Most of the machine
becomes unusable, making this a very simple and effective DoS
vulnerability.
I cannot imagine why this retry was ever implemented, but it seems
rather useless and harmful to me. Let's remove it and fail with
ENAMETOOLONG instead.
__of_get_dma_parent() returns OF device node @args.np, but the node's
refcount is increased twice, by both of_parse_phandle_with_args() and
of_node_get(), so causes refcount leakage for the node.
Fix by directly returning the node got by of_parse_phandle_with_args().
Fixes: f83a6e5dea6c ("of: address: Add support for the parent DMA bus") Cc: stable@vger.kernel.org Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20241206-of_core_fix-v1-4-dc28ed56bec3@quicinc.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current code uses some 'goto put;' to cancel the parsing operation
and can lead to a return code value of 0 even on error cases.
Indeed, some goto calls are done from a loop without setting the ret
value explicitly before the goto call and so the ret value can be set to
0 due to operation done in previous loop iteration. For instance match
can be set to 0 in the previous loop iteration (leading to a new
iteration) but ret can also be set to 0 it the of_property_read_u32()
call succeed. In that case if no match are found or if an error is
detected the new iteration, the return value can be wrongly 0.
Avoid those cases setting the ret value explicitly before the goto
calls.
Fixes: bd6f2fd5a1d5 ("of: Support parsing phandle argument lists through a nexus node") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina <herve.codina@bootlin.com> Link: https://lore.kernel.org/r/20241202165819.158681-1-herve.codina@bootlin.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A missing or empty dma-ranges in a DT node implies a 1:1 mapping for dma
translations. In this specific case, the current behaviour is to zero out
the entire specifier so that the translation could be carried on as an
offset from zero. This includes address specifier that has flags (e.g.
PCI ranges).
Once the flags portion has been zeroed, the translation chain is broken
since the mapping functions will check the upcoming address specifier
against mismatching flags, always failing the 1:1 mapping and its entire
purpose of always succeeding.
Set to zero only the address portion while passing the flags through.
Fixes: dbbdee94734b ("of/address: Merge all of the bus translation code") Cc: stable@vger.kernel.org Signed-off-by: Andrea della Porta <andrea.porta@suse.com> Tested-by: Herve Codina <herve.codina@bootlin.com> Link: https://lore.kernel.org/r/e51ae57874e58a9b349c35e2e877425ebc075d7a.1732441813.git.andrea.porta@suse.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f00be96f125 ("of: property: Add device link support for
interrupt-parent, dmas and -gpio(s)") started adding device links for
the interrupt-parent property. commit 4104ca776ba3 ("of: property: Add
fw_devlink support for interrupts") and commit f265f06af194 ("of:
property: Fix fw_devlink handling of interrupts/interrupts-extended")
later added full support for parsing the interrupts and
interrupts-extended properties, which includes looking up the node of
the parent domain. This made the handler for the interrupt-parent
property redundant.
In fact, creating device links based solely on interrupt-parent is
problematic, because it can create spurious cycles. A node may have
this property without itself being an interrupt controller or consumer.
For example, this property is often present in the root node or a /soc
bus node to set the default interrupt parent for child nodes. However,
it is incorrect for the bus to depend on the interrupt controller, as
some of the bus's children may not be interrupt consumers at all or may
have a different interrupt parent.
Resolving these spurious dependency cycles can cause an incorrect probe
order for interrupt controller drivers. This was observed on a RISC-V
system with both an APLIC and IMSIC under /soc, where interrupt-parent
in /soc points to the APLIC, and the APLIC msi-parent points to the
IMSIC. fw_devlink found three dependency cycles and attempted to probe
the APLIC before the IMSIC. After applying this patch, there were no
dependency cycles and the probe order was correct.
Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Fixes: 4104ca776ba3 ("of: property: Add fw_devlink support for interrupts") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20241120233124.3649382-1-samuel.holland@sifive.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When F_SEAL_FUTURE_WRITE was introduced, it was overlooked that udmabuf
must reject memfds with this flag, just like ones with F_SEAL_WRITE.
Fix it by adding F_SEAL_FUTURE_WRITE to SEALS_DENIED.
Fixes: ab3948f58ff8 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") Cc: stable@vger.kernel.org Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241204-udmabuf-fixes-v2-2-23887289de1c@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current check_memfd_seals() is racy: Since we first do
check_memfd_seals() and then udmabuf_pin_folios() without holding any
relevant lock across both, F_SEAL_WRITE can be set in between.
This is problematic because we can end up holding pins to pages in a
write-sealed memfd.
Fix it using the inode lock, that's probably the easiest way.
In the future, we might want to consider moving this logic into memfd,
especially if anyone else wants to use memfd_pin_folios().
Because the inode bitmap is corrupted, an inode with an inode number that
should exist as a ".nilfs" file was reassigned by nilfs_mkdir for "file0",
causing an inode duplication during execution. And this causes an
underflow of i_nlink in rmdir operations.
The inode is used twice by the same task to unmount and remove directories
".nilfs" and "file0", it trigger warning in nilfs_rmdir.
Avoid to this issue, check i_nlink in nilfs_iget(), if it is 0, it means
that this inode has been deleted, and iput is executed to reclaim it.
When block_invalidatepage was converted to block_invalidate_folio, the
fallback to block_invalidatepage in folio_invalidate() if the
address_space_operations method invalidatepage (currently
invalidate_folio) was not set, was removed.
Unfortunately, some pseudo-inodes in nilfs2 use empty_aops set by
inode_init_always_gfp() as is, or explicitly set it to
address_space_operations. Therefore, with this change,
block_invalidatepage() is no longer called from folio_invalidate(), and as
a result, the buffer_head structures attached to these pages/folios are no
longer freed via try_to_free_buffers().
Thus, these buffer heads are now leaked by truncate_inode_pages(), which
cleans up the page cache from inode evict(), etc.
Three types of caches use empty_aops: gc inode caches and the DAT shadow
inode used by GC, and b-tree node caches. Of these, b-tree node caches
explicitly call invalidate_mapping_pages() during cleanup, which involves
calling try_to_free_buffers(), so the leak was not visible during normal
operation but worsened when GC was performed.
Fix this issue by using address_space_operations with invalidate_folio set
to block_invalidate_folio instead of empty_aops, which will ensure the
same behavior as before.
Commit 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()")
introduced an issue, the ocfs2_sync_local_to_main() ignores the last
contiguous free bits, which causes an OCFS2 volume to lose the last free
clusters of LA window during the release routine.
Please note, because commit dfe6c5692fb5 ("ocfs2: fix the la space leak
when unmounting an ocfs2 volume") was reverted, this commit is a
replacement fix for commit dfe6c5692fb5.
Link: https://lkml.kernel.org/r/20241205104835.18223-3-heming.zhao@suse.com Fixes: 30dd3478c3cd ("ocfs2: correctly use ocfs2_find_next_zero_bit()") Signed-off-by: Heming Zhao <heming.zhao@suse.com> Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On a malformed interrupt-map property which is shorter than expected by
1 cell, we may read bogus data past the end of the property instead of
returning an error in of_irq_parse_imap_parent().
Decrement the remaining length when skipping over the interrupt parent
phandle cell.
Fixes: 935df1bd40d4 ("of/irq: Factor out parsing of interrupt-map parent phandle+args from of_irq_parse_raw()") Cc: stable@vger.kernel.org Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20241209-of_irq_fix-v1-1-782f1419c8a1@quicinc.com
[rh: reword commit msg] Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Drop KVM's arbitrary behavior of making DE_CFG.LFENCE_SERIALIZE read-only
for the guest, as rejecting writes can lead to guest crashes, e.g. Windows
in particular doesn't gracefully handle unexpected #GPs on the WRMSR, and
nothing in the AMD manuals suggests that LFENCE_SERIALIZE is read-only _if
it exists_.
KVM only allows LFENCE_SERIALIZE to be set, by the guest or host, if the
underlying CPU has X86_FEATURE_LFENCE_RDTSC, i.e. if LFENCE is guaranteed
to be serializing. So if the guest sets LFENCE_SERIALIZE, KVM will provide
the desired/correct behavior without any additional action (the guest's
value is never stuffed into hardware). And having LFENCE be serializing
even when it's not _required_ to be is a-ok from a functional perspective.
Fixes: 74a0e79df68a ("KVM: SVM: Disallow guest from changing userspace's MSR_AMD64_DE_CFG value") Fixes: d1d93fa90f1a ("KVM: SVM: Add MSR-based feature support for serializing LFENCE") Reported-by: Simon Pilkington <simonp.git@mailbox.org> Closes: https://lore.kernel.org/all/52914da7-a97b-45ad-86a0-affdf8266c61@mailbox.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/20241211172952.1477605-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the server is recalling a layout, we should ignore the count of
outstanding layoutget calls, since the server is expected to return
either NFS4ERR_RECALLCONFLICT or NFS4ERR_RETURNCONFLICT for as long as
the recall is outstanding.
Currently, we may end up livelocking, causing the layout to eventually
be forcibly revoked.
Fixes: bf0291dd2267 ("pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>