==================================================================
BUG: KASAN: slab-use-after-free in rtsx_usb_ms_poll_card+0x159/0x200 [rtsx_usb_ms]
Read of size 8 at addr ffff888136335380 by task kworker/6:0/140241
Last potentially related work creation:
kasan_save_stack+0x20/0x40
kasan_record_aux_stack+0x85/0x90
insert_work+0x29/0x100
__queue_work+0x34a/0x540
call_timer_fn+0x2a/0x160
expire_timers+0x5f/0x1f0
__run_timer_base.part.0+0x1b6/0x1e0
run_timer_softirq+0x8b/0xe0
handle_softirqs+0xf9/0x360
__irq_exit_rcu+0x114/0x130
sysvec_apic_timer_interrupt+0x72/0x90
asm_sysvec_apic_timer_interrupt+0x16/0x20
Second to last potentially related work creation:
kasan_save_stack+0x20/0x40
kasan_record_aux_stack+0x85/0x90
insert_work+0x29/0x100
__queue_work+0x34a/0x540
call_timer_fn+0x2a/0x160
expire_timers+0x5f/0x1f0
__run_timer_base.part.0+0x1b6/0x1e0
run_timer_softirq+0x8b/0xe0
handle_softirqs+0xf9/0x360
__irq_exit_rcu+0x114/0x130
sysvec_apic_timer_interrupt+0x72/0x90
asm_sysvec_apic_timer_interrupt+0x16/0x20
The buggy address belongs to the object at ffff888136335000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 896 bytes inside of
freed 2048-byte region [ffff888136335000, ffff888136335800)
commit 8a7d12d674ac ("net: usb: usbnet: fix name regression") assumed
that local addresses always came from the kernel, but some devices hand
out local mac addresses so we ended up with point-to-point devices with
a mac set by the driver, renaming to eth%d when they used to be named
usb%d.
Userspace should not rely on device name, but for the sake of stability
restore the local mac address check portion of the naming exception:
point to point devices which either have no mac set by the driver or
have a local mac handed out by the driver will keep the usb%d name.
(some USB LTE modems are known to hand out a stable mac from the locally
administered range; that mac appears to be random (different for
mulitple devices) and can be reset with device-specific commands, so
while such devices would benefit from getting a OUI reserved, we have
to deal with these and might as well preserve the existing behavior
to avoid breaking fragile openwrt configurations and such on upgrade.)
According to the LPUART reference manual, TXRTSE and TXRTSPOL of MODIR
register only can be changed when the transmitter is disabled.
So disable the transmitter before changing RS485 related registers and
re-enable it after the change is done.
Fixes: 67b01837861c ("tty: serial: lpuart: Add RS485 support for 32-bit uart flavour") Cc: stable <stable@kernel.org> Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250312022503.1342990-1-sherry.sun@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hardware initialize of the timer counter channel does not occur on probe
thus leaving the Count in an undefined state until the first
function_write() callback is executed. Fix this by performing the proper
hardware initialization during probe.
In case the stm32_lptim_set_enable_state() fails to update CMP and ARR,
a timeout error is raised, by regmap_read_poll_timeout. It may happen,
when the lptimer runs on a slow clock, and the clock is gated only
few times during the polling.
Badly, when this happen, STM32_LPTIM_ENABLE in CR register has been set.
So the 'enable' state in sysfs wrongly lies on the counter being
correctly enabled, due to CR is read as one in stm32_lptim_is_enabled().
To fix both issues:
- enable the clock before writing CMP, ARR and polling ISR bits. It will
avoid the possible timeout error.
- clear the ENABLE bit in CR and disable the clock in the error path.
Fixes: d8958824cf07 ("iio: counter: Add support for STM32 LPTimer") Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com> Link: https://lore.kernel.org/r/20250224170657.3368236-1-fabrice.gasnier@foss.st.com Signed-off-by: William Breathitt Gray <wbg@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
nf_sk_lookup_slow_v4 does the conntrack lookup for IPv4 packets to
restore the original 5-tuple in case of SNAT, to be able to find the
right socket (if any). Then socket_match() can correctly check whether
the socket was transparent.
However, the IPv6 counterpart (nf_sk_lookup_slow_v6) lacks this
conntrack lookup, making xt_socket fail to match on the socket when the
packet was SNATed. Add the same logic to nf_sk_lookup_slow_v6.
IPv6 SNAT is used in Kubernetes clusters for pod-to-world packets, as
pods' addresses are in the fd00::/8 ULA subnet and need to be replaced
with the node's external address. Cilium leverages Envoy to enforce L7
policies, and Envoy uses transparent sockets. Cilium inserts an iptables
prerouting rule that matches on `-m socket --transparent` and redirects
the packets to localhost, but it fails to match SNATed IPv6 packets due
to that missing conntrack lookup.
Closes: https://github.com/cilium/cilium/issues/37932 Fixes: eb31628e37a0 ("netfilter: nf_tables: Add support for IPv6 NAT") Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
data->asserted will be NULL on JH7110 SoC since commit 82327b127d41
("reset: starfive: Add StarFive JH7110 reset driver") was added. Add
the judgment condition to avoid errors when calling reset_control_status
on JH7110 SoC.
Fixes: 82327b127d41 ("reset: starfive: Add StarFive JH7110 reset driver") Signed-off-by: Changhuang Liang <changhuang.liang@starfivetech.com> Acked-by: Hal Feng <hal.feng@starfivetech.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Link: https://lore.kernel.org/r/20240925112442.1732416-1-changhuang.liang@starfivetech.com Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Jianqi Ren <jianqi.ren.cn@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Watermarks are initialized during the postcore initcall. Until then, all
watermarks are set to zero. This causes cond_accept_memory() to
incorrectly skip memory acceptance because a watermark of 0 is always met.
This can lead to a premature OOM on boot.
To ensure progress, accept one MAX_ORDER page if the watermark is zero.
Link: https://lkml.kernel.org/r/20250310082855.2587122-1-kirill.shutemov@linux.intel.com Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Reported-by: Farrah Chen <farrah.chen@intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: <stable@vger.kernel.org> [6.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
Observe after suspend/resme, we can't light up mst monitors under specific
mst hub. The reason is that driver still writes DPCD DP_MSTM_CTRL after LT.
It's forbidden even we write the same value for that dpcd register.
[How]
We already resume the mst branch device dpcd settings during
resume_mst_branch_status(). Leverage drm_dp_mst_topology_queue_probe() to
only probe the topology, not calling drm_dp_mst_topology_mgr_resume() which
will set DP_MSTM_CTRL as well.
Reviewed-by: Jerry Zuo <jerry.zuo@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[cascardo: adjust context in local declarations] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[WHAT & HOW]
A denominator cannot be 0, and is checked before used.
This fixes 2 DIVIDE_BY_ZERO issues reported by Coverity.
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Jerry Zuo <jerry.zuo@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Cliff Liu <donghua.liu@windriver.com> Signed-off-by: He Zhe <Zhe.He@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 169f9102f9198b ("ARM: 9350/1: fault: Implement
copy_from_kernel_nofault_allowed()") added the function to check address
before use. However, for devices without MMU, addr > TASK_SIZE will
always fail. This patch move this function after the #ifdef CONFIG_MMU
statement.
Under PAN emulation when dumping backtraces from things like the
LKDTM EXEC_USERSPACE test[1], a double fault (which would hang a CPU)
would happen because of dump_instr() attempting to read a userspace
address. Make sure copy_from_kernel_nofault() does not attempt this
any more.
Closes: https://lava.sirena.org.uk/scheduler/job/497571 Link: https://lore.kernel.org/all/202401181125.D48DCB4C@keescook/ Reported-by: Mark Brown <broonie@kernel.org> Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Cc: Wang Kefeng <wangkefeng.wang@huawei.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When MPOA_cache_impos_rcvd() receives the msg, it can trigger
Null Pointer Dereference Vulnerability if both entry and
holding_time are NULL. Because there is only for the situation
where entry is NULL and holding_time exists, it can be passed
when both entry and holding_time are NULL. If these are NULL,
the entry will be passd to eg_cache_put() as parameter and
it is referenced by entry->use code in it.
Remove PLT_QUIRK_DOUBLE_VOLUME_KEYS quirk and made it generic.
The quirk logic did not keep track of the actual previous key
so any key event occurring in less than or equal to 5ms was ignored.
Remove PLT_QUIRK_FOLLOWED_OPPOSITE_VOLUME_KEYS quirk.
It had the same logic issue as the double key quirk and was actually
masking the as designed behavior of most of the headsets.
It's occurrence should be minimized with the ALSA control naming
quirk that is part of the patch set.
Many Poly/Plantronics headset families name the feature, input,
and/or output units in a such a way to produce control names
that are not recognized by user space. As such, the volume and
mute events do not get routed to the headset's audio controls.
As an example from a product family:
The microphone mute control is named
Headset Microphone Capture Switch
and the headset volume control is named
Headset Earphone Playback Volume
The quirk fixes these to become
Headset Capture Switch
Headset Playback Volume
The nft_counter uses two s64 counters for statistics. Those two are
protected by a seqcount to ensure that the 64bit variable is always
properly seen during updates even on 32bit architectures where the store
is performed by two writes. A side effect is that the two counter (bytes
and packet) are written and read together in the same window.
This can be replaced with u64_stats_t. write_seqcount_begin()/ end() is
replaced with u64_stats_update_begin()/ end() and behaves the same way
as with seqcount_t on 32bit architectures. Additionally there is a
preempt_disable on PREEMPT_RT to ensure that a reader does not preempt a
writer.
On 64bit architectures the macros are removed and the reads happen
without any retries. This also means that the reader can observe one
counter (bytes) from before the update and the other counter (packets)
but that is okay since there is no requirement to have both counter from
the same update window.
Convert the statistic to u64_stats_t. There is one optimisation:
nft_counter_do_init() and nft_counter_clone() allocate a new per-CPU
counter and assign a value to it. During this assignment preemption is
disabled which is not needed because the counter is not yet exposed to
the system so there can not be another writer or reader. Therefore
disabling preemption is omitted and raw_cpu_ptr() is used to obtain a
pointer to a counter for the assignment.
Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The resume code path assumes that the TX queue for the offloading TID
has been configured. At resume time it then tries to sync the write
pointer as it may have been updated by the firmware.
In the unusual event that no packets have been send on TID 0, the queue
will not have been allocated and this causes a crash. Fix this by
ensuring the queue exist at suspend time.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240218194912.6632e6dc7b35.Ie6e6a7488c9c7d4529f13d48f752b5439d8ac3c4@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jianqi Ren <jianqi.ren.cn@windriver.com> Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 6b3e87cc0ca5 ("iwlwifi: Add support for LARI_CONFIG_CHANGE_CMD
cmd v9")
added a few bits to iwl_lari_config_change_cmd::oem_unii4_allow_bitmap
if the FW has LARI version >= 9.
But we also need to send those bits for version 8 if the FW is capable
of this feature (indicated with capability bits)
Add the FW capability bit, and set the additional bits in the cmd when
the version is 8 and the FW capability bit is set.
The MTU setting at the time an XDP multi-buffer is attached
determines whether the aggregation ring will be used and the
rx_skb_func handler. This is done in bnxt_set_rx_skb_mode().
If the MTU is later changed, the aggregation ring setting may need
to be changed and it may become out-of-sync with the settings
initially done in bnxt_set_rx_skb_mode(). This may result in
random memory corruption and crashes as the HW may DMA data larger
than the allocated buffer size, such as:
To address the issue, we now call bnxt_set_rx_skb_mode() within
bnxt_change_mtu() to properly set the AGG rings configuration and
update rx_skb_func based on the new MTU value.
Additionally, BNXT_FLAG_NO_AGG_RINGS is cleared at the beginning of
bnxt_set_rx_skb_mode() to make sure it gets set or cleared based on
the current MTU.
We previously would call btrfs_check_leaf() if we had the check
integrity code enabled, which meant that we could only run the extended
leaf checks if we had WRITTEN set on the header flags.
This leaves a gap in our checking, because we could end up with
corruption on disk where WRITTEN isn't set on the leaf, and then the
extended leaf checks don't get run which we rely on to validate all of
the item pointers to make sure we don't access memory outside of the
extent buffer.
However, since 732fab95abe2 ("btrfs: check-integrity: remove
CONFIG_BTRFS_FS_CHECK_INTEGRITY option") we no longer call
btrfs_check_leaf() from btrfs_mark_buffer_dirty(), which means we only
ever call it on blocks that are being written out, and thus have WRITTEN
set, or that are being read in, which should have WRITTEN set.
Add checks to make sure we have WRITTEN set appropriately, and then make
sure __btrfs_check_leaf() always does the item checking. This will
protect us from file systems that have been corrupted and no longer have
WRITTEN set on some of the blocks.
This was hit on a crafted image tweaking the WRITTEN bit and reported by
KASAN as out-of-bound access in the eb accessors. The example is a dir
item at the end of an eb.
Hazem reported a 30% drop in UnixBench spawn test with commit eff6c8ce8d4d ("sched/core: Reduce cost of sched_move_task when config
autogroup") on a m6g.xlarge AWS EC2 instance with 4 vCPUs and 16 GiB RAM
(aarch64) (single level MC sched domain):
There is an early bail from sched_move_task() if p->sched_task_group is
equal to p's 'cpu cgroup' (sched_get_task_group()). E.g. both are
pointing to taskgroup '/user.slice/user-1000.slice/session-1.scope'
(Ubuntu '22.04.5 LTS').
So in:
do_exit()
sched_autogroup_exit_task()
sched_move_task()
if sched_get_task_group(p) == p->sched_task_group
return
/* p is enqueued */
dequeue_task() \
sched_change_group() |
task_change_group_fair() |
detach_task_cfs_rq() | (1)
set_task_rq() |
attach_task_cfs_rq() |
enqueue_task() /
(1) isn't called for p anymore.
Turns out that the regression is related to sgs->group_util in
group_is_overloaded() and group_has_capacity(). If (1) isn't called for
all the 'spawn' tasks then sgs->group_util is ~900 and
sgs->group_capacity = 1024 (single CPU sched domain) and this leads to
group_is_overloaded() returning true (2) and group_has_capacity() false
(3) much more often compared to the case when (1) is called.
I.e. there are much more cases of 'group_is_overloaded' and
'group_fully_busy' in WF_FORK wakeup sched_balance_find_dst_cpu() which
then returns much more often a CPU != smp_processor_id() (5).
This isn't good for these extremely short running tasks (FORK + EXIT)
and also involves calling sched_balance_find_dst_group_cpu() unnecessary
(single CPU sched domain).
Instead if (1) is called for 'p->flags & PF_EXITING' then the path
(4),(6) is taken much more often.
The u2phy1_host should always have the same status as usb_host1_ehci
and usb_host1_ohci, otherwise the EHCI and OHCI drivers may be
initialized for a disabled usb port.
Per the NanoPi R4S schematic, the phy-supply for u2phy1_host is set to
the vdd_5v regulator.
In non-protected KVM modes, while the guest FPSIMD/SVE/SME state is live on the
CPU, the host's active SVE VL may differ from the guest's maximum SVE VL:
* For VHE hosts, when a VM uses NV, ZCR_EL2 contains a value constrained
by the guest hypervisor, which may be less than or equal to that
guest's maximum VL.
Note: in this case the value of ZCR_EL1 is immaterial due to E2H.
* For nVHE/hVHE hosts, ZCR_EL1 contains a value written by the guest,
which may be less than or greater than the guest's maximum VL.
Note: in this case hyp code traps host SVE usage and lazily restores
ZCR_EL2 to the host's maximum VL, which may be greater than the
guest's maximum VL.
This can be the case between exiting a guest and kvm_arch_vcpu_put_fp().
If a softirq is taken during this period and the softirq handler tries
to use kernel-mode NEON, then the kernel will fail to save the guest's
FPSIMD/SVE state, and will pend a SIGKILL for the current thread.
This happens because kvm_arch_vcpu_ctxsync_fp() binds the guest's live
FPSIMD/SVE state with the guest's maximum SVE VL, and
fpsimd_save_user_state() verifies that the live SVE VL is as expected
before attempting to save the register state:
Fix this and make this a bit easier to reason about by always eagerly
switching ZCR_EL{1,2} at hyp during guest<->host transitions. With this
happening, there's no need to trap host SVE usage, and the nVHE/nVHE
__deactivate_cptr_traps() logic can be simplified to enable host access
to all present FPSIMD/SVE/SME features.
In protected nVHE/hVHE modes, the host's state is always saved/restored
by hyp, and the guest's state is saved prior to exit to the host, so
from the host's PoV the guest never has live FPSIMD/SVE/SME state, and
the host's ZCR_EL1 is never clobbered by hyp.
Fixes: 8c8010d69c132273 ("KVM: arm64: Save/restore SVE state for nVHE") Fixes: 2e3cf82063a00ea0 ("KVM: arm64: nv: Ensure correct VL is loaded before saving SVE state") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-9-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
[ v6.6 lacks pKVM saving of host SVE state, pull in discovery of maximum
host VL separately -- broonie ] Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The shared hyp switch header has a number of static functions which
might not be used by all files that include the header, and when unused
they will provoke compiler warnings, e.g.
| In file included from arch/arm64/kvm/hyp/nvhe/hyp-main.c:8:
| ./arch/arm64/kvm/hyp/include/hyp/switch.h:703:13: warning: 'kvm_hyp_handle_dabt_low' defined but not used [-Wunused-function]
| 703 | static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code)
| | ^~~~~~~~~~~~~~~~~~~~~~~
| ./arch/arm64/kvm/hyp/include/hyp/switch.h:682:13: warning: 'kvm_hyp_handle_cp15_32' defined but not used [-Wunused-function]
| 682 | static bool kvm_hyp_handle_cp15_32(struct kvm_vcpu *vcpu, u64 *exit_code)
| | ^~~~~~~~~~~~~~~~~~~~~~
| ./arch/arm64/kvm/hyp/include/hyp/switch.h:662:13: warning: 'kvm_hyp_handle_sysreg' defined but not used [-Wunused-function]
| 662 | static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code)
| | ^~~~~~~~~~~~~~~~~~~~~
| ./arch/arm64/kvm/hyp/include/hyp/switch.h:458:13: warning: 'kvm_hyp_handle_fpsimd' defined but not used [-Wunused-function]
| 458 | static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
| | ^~~~~~~~~~~~~~~~~~~~~
| ./arch/arm64/kvm/hyp/include/hyp/switch.h:329:13: warning: 'kvm_hyp_handle_mops' defined but not used [-Wunused-function]
| 329 | static bool kvm_hyp_handle_mops(struct kvm_vcpu *vcpu, u64 *exit_code)
| | ^~~~~~~~~~~~~~~~~~~
Mark these functions as 'inline' to suppress this warning. This
shouldn't result in any functional change.
At the same time, avoid the use of __alias() in the header and alias
kvm_hyp_handle_iabt_low() and kvm_hyp_handle_watchpt_low() to
kvm_hyp_handle_memory_fault() using CPP, matching the style in the rest
of the kernel. For consistency, kvm_hyp_handle_memory_fault() is also
marked as 'inline'.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-8-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The hyp exit handling logic is largely shared between VHE and nVHE/hVHE,
with common logic in arch/arm64/kvm/hyp/include/hyp/switch.h. The code
in the header depends on function definitions provided by
arch/arm64/kvm/hyp/vhe/switch.c and arch/arm64/kvm/hyp/nvhe/switch.c
when they include the header.
This is an unusual header dependency, and prevents the use of
arch/arm64/kvm/hyp/include/hyp/switch.h in other files as this would
result in compiler warnings regarding missing definitions, e.g.
| In file included from arch/arm64/kvm/hyp/nvhe/hyp-main.c:8:
| ./arch/arm64/kvm/hyp/include/hyp/switch.h:733:31: warning: 'kvm_get_exit_handler_array' used but never defined
| 733 | static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu);
| | ^~~~~~~~~~~~~~~~~~~~~~~~~~
| ./arch/arm64/kvm/hyp/include/hyp/switch.h:735:13: warning: 'early_exit_filter' used but never defined
| 735 | static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code);
| | ^~~~~~~~~~~~~~~~~
Refactor the logic such that the header doesn't depend on anything from
the C files. There should be no functional change as a result of this
patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-7-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When KVM is in VHE mode, the host kernel tries to save and restore the
configuration of CPACR_EL1.SMEN (i.e. CPTR_EL2.SMEN when HCR_EL2.E2H=1)
across kvm_arch_vcpu_load_fp() and kvm_arch_vcpu_put_fp(), since the
configuration may be clobbered by hyp when running a vCPU. This logic
has historically been broken, and is currently redundant.
This logic was originally introduced in commit:
861262ab86270206 ("KVM: arm64: Handle SME host state when running guests")
At the time, the VHE hyp code would reset CPTR_EL2.SMEN to 0b00 when
returning to the host, trapping host access to SME state. Unfortunately,
this was unsafe as the host could take a softirq before calling
kvm_arch_vcpu_put_fp(), and if a softirq handler were to use kernel mode
NEON the resulting attempt to save the live FPSIMD/SVE/SME state would
result in a fatal trap.
That issue was limited to VHE mode. For nVHE/hVHE modes, KVM always
saved/restored the host kernel's CPACR_EL1 value, and configured
CPTR_EL2.TSM to 0b0, ensuring that host usage of SME would not be
trapped.
The issue above was incidentally fixed by commit:
375110ab51dec5dc ("KVM: arm64: Fix resetting SME trap values on reset for (h)VHE")
That commit changed the VHE hyp code to configure CPTR_EL2.SMEN to 0b01
when returning to the host, permitting host kernel usage of SME,
avoiding the issue described above. At the time, this was not identified
as a fix for commit 861262ab86270206.
Now that the host eagerly saves and unbinds its own FPSIMD/SVE/SME
state, there's no need to save/restore the state of the EL0 SME trap.
The kernel can safely save/restore state without trapping, as described
above, and will restore userspace state (including trap controls) before
returning to userspace.
Remove the redundant logic.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-5-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
[Update for rework of flags storage -- broonie] Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When KVM is in VHE mode, the host kernel tries to save and restore the
configuration of CPACR_EL1.ZEN (i.e. CPTR_EL2.ZEN when HCR_EL2.E2H=1)
across kvm_arch_vcpu_load_fp() and kvm_arch_vcpu_put_fp(), since the
configuration may be clobbered by hyp when running a vCPU. This logic is
currently redundant.
The VHE hyp code unconditionally configures CPTR_EL2.ZEN to 0b01 when
returning to the host, permitting host kernel usage of SVE.
Now that the host eagerly saves and unbinds its own FPSIMD/SVE/SME
state, there's no need to save/restore the state of the EL0 SVE trap.
The kernel can safely save/restore state without trapping, as described
above, and will restore userspace state (including trap controls) before
returning to userspace.
Remove the redundant logic.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-4-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
[Rework for refactoring of where the flags are stored -- broonie] Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that the host eagerly saves its own FPSIMD/SVE/SME state,
non-protected KVM never needs to save the host FPSIMD/SVE/SME state,
and the code to do this is never used. Protected KVM still needs to
save/restore the host FPSIMD/SVE state to avoid leaking guest state to
the host (and to avoid revealing to the host whether the guest used
FPSIMD/SVE/SME), and that code needs to be retained.
Remove the unused code and data structures.
To avoid the need for a stub copy of kvm_hyp_save_fpsimd_host() in the
VHE hyp code, the nVHE/hVHE version is moved into the shared switch
header, where it is only invoked when KVM is in protected mode.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-3-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are several problems with the way hyp code lazily saves the host's
FPSIMD/SVE state, including:
* Host SVE being discarded unexpectedly due to inconsistent
configuration of TIF_SVE and CPACR_ELx.ZEN. This has been seen to
result in QEMU crashes where SVE is used by memmove(), as reported by
Eric Auger:
https://issues.redhat.com/browse/RHEL-68997
* Host SVE state is discarded *after* modification by ptrace, which was an
unintentional ptrace ABI change introduced with lazy discarding of SVE state.
* The host FPMR value can be discarded when running a non-protected VM,
where FPMR support is not exposed to a VM, and that VM uses
FPSIMD/SVE. In these cases the hyp code does not save the host's FPMR
before unbinding the host's FPSIMD/SVE/SME state, leaving a stale
value in memory.
Avoid these by eagerly saving and "flushing" the host's FPSIMD/SVE/SME
state when loading a vCPU such that KVM does not need to save any of the
host's FPSIMD/SVE/SME state. For clarity, fpsimd_kvm_prepare() is
removed and the necessary call to fpsimd_save_and_flush_cpu_state() is
placed in kvm_arch_vcpu_load_fp(). As 'fpsimd_state' and 'fpmr_ptr'
should not be used, they are set to NULL; all uses of these will be
removed in subsequent patches.
Historical problems go back at least as far as v5.17, e.g. erroneous
assumptions about TIF_SVE being clear in commit:
8383741ab2e773a9 ("KVM: arm64: Get rid of host SVE tracking/saving")
... and so this eager save+flush probably needs to be backported to ALL
stable trees.
Fixes: 93ae6b01bafee8fa ("KVM: arm64: Discard any SVE state when entering KVM guests") Fixes: 8c845e2731041f0f ("arm64/sve: Leave SVE enabled on syscall if we don't context switch") Fixes: ef3be86021c3bdf3 ("KVM: arm64: Add save/restore support for FPMR") Reported-by: Eric Auger <eauger@redhat.com> Reported-by: Wilco Dijkstra <wilco.dijkstra@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Tested-by: Eric Auger <eric.auger@redhat.com> Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Fuad Tabba <tabba@google.com> Cc: Jeremy Linton <jeremy.linton@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250210195226.1215254-2-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
[ Mark: Handle vcpu/host flag conflict ] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Similar to VHE, calculate the value of cptr_el2 from scratch on
activate traps. This removes the need to store cptr_el2 in every
vcpu structure. Moreover, some traps, such as whether the guest
owns the fp registers, need to be set on every vcpu run.
Reported-by: James Clark <james.clark@linaro.org> Fixes: 5294afdbf45a ("KVM: arm64: Exclude FP ownership from kvm_vcpu_arch") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20241216105057.579031-13-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because of the size restriction in the TCP options space, the MPTCP
ADD_ADDR option is exclusive and cannot be sent with other MPTCP ones.
For this reason, in the linked mptcp_out_options structure, group of
fields linked to different options are part of the same union.
There is a case where the mptcp_pm_add_addr_signal() function can modify
opts->addr, but not ended up sending an ADD_ADDR. Later on, back in
mptcp_established_options, other options will be sent, but with
unexpected data written in other fields due to the union, e.g. in
opts->ext_copy. This could lead to a data stream corruption in the next
packet.
Using an intermediate variable, prevents from corrupting previously
established DSS option. The assignment of the ADD_ADDR option
parameters is now done once we are sure this ADD_ADDR option can be set
in the packet, e.g. after having dropped other suboptions.
Fixes: 1bff1e43a30e ("mptcp: optimize out option generation") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Arthur Mongodin <amongodin@randorisec.fr> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
[ Matt: the commit message has been updated: long lines splits and some
clarifications. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-1-122dbb249db3@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
parse_dcal() validate num_aces to allocate posix_ace_state_array.
if (num_aces > ULONG_MAX / sizeof(struct smb_ace *))
It is an incorrect validation that we can create an array of size ULONG_MAX.
smb_acl has ->size field to calculate actual number of aces in request buffer
size. Use this to check invalid num_aces.
Reported-by: Igor Leite Ladessa <igor-ladessa@hotmail.com> Tested-by: Igor Leite Ladessa <igor-ladessa@hotmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[WHY]
DMUB locking is important to make sure that registers aren't accessed
while in PSR. Previously it was enabled but caused a deadlock in
situations with multiple eDP panels.
[HOW]
Detect if multiple eDP panels are in use to decide whether to use
lock. Refactor the function so that the first check is for PSR-SU
and then replay is in use to prevent having to look up number
of eDP panels for those configurations.
Fixes: f245b400a223 ("Revert "drm/amd/display: Use HW lock mgr for PSR1"") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3965 Reviewed-by: ChiaHsuan Chung <chiahsuan.chung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ed569e1279a3045d6b974226c814e071fa0193a6) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
Without acquiring DMCUB hw lock, a race condition is caused with
Panel Replay feature, which will trigger a hang. Indicate that a
lock is necessary to prevent this when replay feature is enabled.
[How]
To allow dmub hw lock on Replay.
Reviewed-by: Robin Chen <robin.chen@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Martin Tsai <martin.tsai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On the off chance that command stream passed from userspace via
ioctl() call to radeon_vce_cs_parse() is weirdly crafted and
first command to execute is to encode (case 0x03000001), the function
in question will attempt to call radeon_vce_cs_reloc() with size
argument that has not been properly initialized. Specifically, 'size'
will point to 'tmp' variable before the latter had a chance to be
assigned any value.
Play it safe and init 'tmp' with 0, thus ensuring that
radeon_vce_cs_reloc() will catch an early error in cases like these.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 2fc5703abda2 ("drm/radeon: check VCE relocation buffer range v3") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2d52de55f9ee7aaee0e09ac443f77855989c6b68) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When some client process A call pdr_add_lookup() to add the look up for
the service and does schedule locator work, later a process B got a new
server packet indicating locator is up and call pdr_locator_new_server()
which eventually sets pdr->locator_init_complete to true which process A
sees and takes list lock and queries domain list but it will timeout due
to deadlock as the response will queued to the same qmi->wq and it is
ordered workqueue and process B is not able to complete new server
request work due to deadlock on list lock.
Fix it by removing the unnecessary list iteration as the list iteration
is already being done inside locator work, so avoid it here and just
call schedule_work() here.
pdr_get_domain_list()
pr_err("PDR: %s get domain list
txn wait failed: %d\n",
req->service_name,
ret);
Timeout error log due to deadlock:
"
PDR: tms/servreg get domain list txn wait failed: -110
PDR: service lookup for msm/adsp/sensor_pd:tms/servreg failed: -110
"
Thanks to Bjorn and Johan for letting me know that this commit also fixes
an audio regression when using the in-kernel pd-mapper as that makes it
easier to hit this race. [1]
An OGMv1 and OGMv2 packet receive processing were not only limited by the
number of bytes in the received packet but also by the nodes maximum
aggregation packet size limit. But this limit is relevant for TX and not
for RX. It must not be enforced by batadv_(i)v_ogm_aggr_packet to avoid
loss of information in case of a different limit for sender and receiver.
This has a minor side effect for B.A.T.M.A.N. IV because the
batadv_iv_ogm_aggr_packet is also used for the preprocessing for the TX.
But since the aggregation code itself will not allow more than
BATADV_MAX_AGGREGATION_BYTES bytes, this check was never triggering (in
this context) prior of removing it.
Since the i and pool->chunk_size variables are of type 'u32',
their product can wrap around and then be cast to 'u64'.
This can lead to two different XDP buffers pointing to the same
memory area.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: 94033cd8e73b ("xsk: Optimize for aligned case") Cc: stable@vger.kernel.org Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov@infotecs.ru> Link: https://patch.msgid.link/20250313085007.3116044-1-Ilia.Gavrilov@infotecs.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ben reports spurious EFI zboot failures on a system where physical RAM
starts at 0x0. When doing random memory allocation from the EFI stub on
such a platform, a random seed of 0x0 (which means no entropy source is
available) will result in the allocation to be placed at address 0x0 if
sufficient space is available.
When this allocation is subsequently passed on to the decompression
code, the 0x0 address is mistaken for NULL and the code complains and
gives up.
So avoid address 0x0 when doing random allocation, and set the minimum
address to the minimum alignment.
When the addresses of the shmobile_smp_mpidr, shmobile_smp_fn, and
shmobile_smp_arg variables are not multiples of 4 bytes, secondary CPU
bring-up fails:
smp: Bringing up secondary CPUs ...
CPU1: failed to come online
CPU2: failed to come online
CPU3: failed to come online
smp: Brought up 1 node, 1 CPU
Fix this by adding the missing alignment directive.
The current solution for powering off the Apalis iMX6 is not functioning
as intended. To resolve this, it is necessary to power off the
vgen2_reg, which will also set the POWER_ENABLE_MOCI signal to a low
state. This ensures the carrier board is properly informed to initiate
its power-off sequence.
The new solution uses the regulator-poweroff driver, which will power
off the regulator during a system shutdown.
Cc: <stable@vger.kernel.org> Fixes: 4eb56e26f92e ("ARM: dts: imx6q-apalis: Command pmic to standby for poweroff") Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix race between rmmod and /proc/XXX's inode instantiation.
The bug is that pde->proc_ops don't belong to /proc, it belongs to a
module, therefore dereferencing it after /proc entry has been registered
is a bug unless use_pde/unuse_pde() pair has been used.
use_pde/unuse_pde can be avoided (2 atomic ops!) because pde->proc_ops
never changes so information necessary for inode instantiation can be
saved _before_ proc_register() in PDE itself and used later, avoiding
pde->proc_ops->... dereference.
[adobriyan@gmail.com: don't do 2 atomic ops on the common path] Link: https://lkml.kernel.org/r/3d25ded0-1739-447e-812b-e34da7990dcf@p183 Fixes: 778f3dd5a13c ("Fix procfs compat_ioctl regression") Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David S. Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A shmem folio can be either in page cache or in swap cache, but not at the
same time. Namely, once it is in swap cache, folio->mapping should be
NULL, and the folio is no longer in a shmem mapping.
In __folio_migrate_mapping(), to determine the number of xarray entries to
update, folio_test_swapbacked() is used, but that conflates shmem in page
cache case and shmem in swap cache case. It leads to xarray multi-index
entry corruption, since it turns a sibling entry to a normal entry during
xas_store() (see [1] for a userspace reproduction). Fix it by only using
folio_test_swapcache() to determine whether xarray is storing swap cache
entries or not to choose the right number of xarray entries to update.
Note:
In __split_huge_page(), folio_test_anon() && folio_test_swapcache() is
used to get swap_cache address space, but that ignores the shmem folio in
swap cache case. It could lead to NULL pointer dereferencing when a
in-swap-cache shmem folio is split at __xa_store(), since
!folio_test_anon() is true and folio->mapping is NULL. But fortunately,
its caller split_huge_page_to_list_to_order() bails out early with EBUSY
when folio->mapping is NULL. So no need to take care of it here.
Link: https://lkml.kernel.org/r/20250305200403.2822855-1-ziy@nvidia.com Fixes: fc346d0a70a1 ("mm: migrate high-order folios in swap cache correctly") Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: Liu Shixin <liushixin2@huawei.com> Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/ Suggested-by: Hugh Dickins <hughd@google.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Charan Teja Kalla <quic_charante@quicinc.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
original report:
https://lore.kernel.org/all/CAKhLTr1UL3ePTpYjXOx2AJfNk8Ku2EdcEfu+CH1sf3Asr=B-Dw@mail.gmail.com/T/
When doing buffered writes with FGP_NOWAIT, under memory pressure, the
system returned ENOMEM despite there being plenty of available memory, to
be reclaimed from page cache. The user space used io_uring interface,
which in turn submits I/O with FGP_NOWAIT (the fast path).
This is likely a regression caused by 66dabbb65d67 ("mm: return an ERR_PTR
from __filemap_get_folio"), which moved error handling from
io_map_get_folio() to __filemap_get_folio(), but broke FGP_NOWAIT
handling, so ENOMEM is being escaped to user space. Had it correctly
returned -EAGAIN with NOWAIT, either io_uring or user space itself would
be able to retry the request.
It's not enough to patch io_uring since the iomap interface is the one
responsible for it, and pwritev2(RWF_NOWAIT) and AIO interfaces must
return the proper error too.
The patch was tested with scylladb test suite (its original reproducer),
and the tests all pass now when memory is pressured.
Link: https://lkml.kernel.org/r/20250224143700.23035-1-raphaelsc@scylladb.com Fixes: 66dabbb65d67 ("mm: return an ERR_PTR from __filemap_get_folio") Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The error path when atmci_configure_dma() set dma fails in atmci driver
does not correctly disable the clock.
Add the missing clk_disable_unprepare() to the error path for pair with
clk_prepare_enable().
Fixes: 467e081d23e6 ("mmc: atmel-mci: use probe deferring if dma controller is not ready yet") Signed-off-by: Gu Bowen <gubowen5@huawei.com> Acked-by: Aubin Constans <aubin.constans@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250225022856.3452240-1-gubowen5@huawei.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
UART0 pinmux by default configures GPIO0_B5 in its UART RTS function for
UART0. However, by default on Haikou, it is used as GPIO as UART RTS for
UART5.
Therefore, let's update UART0 pinmux to not configure the pin in that
mode, a later commit will make UART5 request the GPIO pinmux.
The simple-audio-card's microphone widget currently connects to the
headphone jack. Routing the microphone input to the microphone jack
allows for independent operation of the microphone and headphones.
This resolves the following boot-time kernel log message, which
indicated a conflict when the microphone and headphone functions were
not separated:
debugfs: File 'Headphone Jack' in directory 'dapm' already present!
Fixes: 6a57f224f734 ("arm64: dts: freescale: add initial support for verdin imx8m mini") Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Reviewed-by: Francesco Dolcini <francesco.dolcini@toradex.com> Cc: <stable@vger.kernel.org> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The simple-audio-card's microphone widget currently connects to the
headphone jack. Routing the microphone input to the microphone jack
allows for independent operation of the microphone and headphones.
This resolves the following boot-time kernel log message, which
indicated a conflict when the microphone and headphone functions were
not separated:
debugfs: File 'Headphone Jack' in directory 'dapm' already present!
These are u64 variables that come from the user via
qaic_attach_slice_bo_ioctl(). Use check_add_overflow() to ensure that
the math doesn't have an integer wrapping bug.
Cc: stable@vger.kernel.org Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://patchwork.freedesktop.org/patch/msgid/176388fa-40fe-4cb4-9aeb-2c91c22130bd@stanley.mountain Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
By placing some extra BUG_ON() statements I could verify that this is
raised because probing of the 'dummy' regulator driver is not completed
('dummy_regulator_rdev' is still NULL).
In the JTAG debugger I can see that dummy_regulator_probe() and
anatop_regulator_probe() can be run by different kernel threads
(kworker/u4:*). I haven't further investigated whether this can be
changed or if there are other possibilities to force synchronization
between these two probe routines. On the other hand I don't expect much
boot time penalty by probing the 'dummy' regulator synchronously.
Cc: stable@vger.kernel.org Fixes: 259b93b21a9f ("regulator: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in 4.14") Signed-off-by: Christian Eggers <ceggers@arri.de> Link: https://patch.msgid.link/20250311091803.31026-1-ceggers@arri.de Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The V3D driver still relies on `drm_sched_increase_karma()` and
`drm_sched_resubmit_jobs()` for resubmissions when a timeout occurs.
The function `drm_sched_increase_karma()` marks the job as guilty, while
`drm_sched_resubmit_jobs()` sets an error (-ECANCELED) in the DMA fence of
that guilty job.
Because of this, we must check whether the job’s DMA fence has been
flagged with an error before executing the job. Otherwise, the same guilty
job may be resubmitted indefinitely, causing repeated GPU resets.
This patch adds a check for an error on the job's fence to prevent running
a guilty job that was previously flagged when the GPU timed out.
Note that the CPU and CACHE_CLEAN queues do not require this check, as
their jobs are executed synchronously once the DRM scheduler starts them.
Cc: stable@vger.kernel.org Fixes: d223f98f0209 ("drm/v3d: Add support for compute shader dispatch.") Fixes: 1584f16ca96e ("drm/v3d: Add support for submitting jobs to the TFU.") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250313-v3d-gpu-reset-fixes-v4-1-c1e780d8e096@igalia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After a suspend/resume cycle on a down interface, it will come up as
ERROR-ACTIVE.
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state STOPPED (berr-counter tx 0 rx 0) restart-ms 1000
$ sudo systemctl suspend
$ ip -details -s -s a s dev flexcan0
3: flexcan0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN group default qlen 10
link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 1000
And only set CAN state to CAN_STATE_ERROR_ACTIVE when resume process
has no issue, otherwise keep in CAN_STATE_SLEEPING as suspend did.
Commit 7fdaf8966aae ("can: ucan: use strscpy() to instead of strncpy()")
unintentionally introduced a one byte out of bound read on strscpy()'s
source argument (which is kind of ironic knowing that strscpy() is meant
to be a more secure alternative :)).
Let's consider below buffers:
dest[len + 1]; /* will be NUL terminated */
src[len]; /* may not be NUL terminated */
When doing:
strncpy(dest, src, len);
dest[len] = '\0';
strncpy() will read up to len bytes from src.
On the other hand:
strscpy(dest, src, len + 1);
will read up to len + 1 bytes from src, that is to say, an out of bound
read of one byte will occur on src if it is not NUL terminated. Note
that the src[len] byte is never copied, but strscpy() still needs to
read it to check whether a truncation occurred or not.
This exact pattern happened in ucan.
The root cause is that the source is not NUL terminated. Instead of
doing a copy in a local buffer, directly NUL terminate it as soon as
usb_control_msg() returns. With this, the local firmware_str[] variable
can be removed.
On top of this do a couple refactors:
- ucan_ctl_payload->raw is only used for the firmware string, so
rename it to ucan_ctl_payload->fw_str and change its type from u8 to
char.
- ucan_device_request_in() is only used to retrieve the firmware
string, so rename it to ucan_get_fw_str() and refactor it to make it
directly handle all the string termination logic.
Reported-by: syzbot+d7d8c418e8317899e88c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-can/67b323a4.050a0220.173698.002b.GAE@google.com/ Fixes: 7fdaf8966aae ("can: ucan: use strscpy() to instead of strncpy()") Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://patch.msgid.link/20250218143515.627682-2-mailhol.vincent@wanadoo.fr Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are a total of 96 AFL pages and each page has 16 entries with
registers CFDGAFLIDr, CFDGAFLMr, CFDGAFLP0r, CFDGAFLP1r holding
the rule entries (r = 0..15).
Currently, RCANFD_GAFL* macros use a start variable to find AFL entries,
which is incorrect as the testing on RZ/G3E shows ch1 and ch4
gets a start value of 0 and the register contents are overwritten.
Fix this issue by using rule_entry corresponding to the channel
to find the page entries in the AFL list.
Fixes: dd3bd23eb438 ("can: rcar_canfd: Add Renesas R-Car CAN FD driver") Cc: stable@vger.kernel.org Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20250307170330.173425-3-biju.das.jz@bp.renesas.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On the GTA04A5 writing a reset command to the gyroscope causes IRQ
storms because NACK IRQs are enabled and therefore triggered but not
acked.
Sending a reset command to the gyroscope by
i2cset 1 0x69 0x14 0xb6
with an additional debug print in the ISR (not the thread) itself
causes
[ 363.353515] i2c i2c-1: ioctl, cmd=0x720, arg=0xbe801b00
[ 363.359039] omap_i2c 48072000.i2c: addr: 0x0069, len: 2, flags: 0x0, stop: 1
[ 363.366180] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x1110)
[ 363.371673] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
[ 363.376892] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
[ 363.382263] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
[ 363.387664] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
repeating till infinity
[...]
(0x2 = NACK, 0x100 = Bus free, which is not enabled)
Apparently no other IRQ bit gets set, so this stalls.
Do not ignore enabled interrupts and make sure they are acked.
If the NACK IRQ is not needed, it should simply not enabled, but
according to the above log, caring about it is necessary unless
the Bus free IRQ is enabled and handled. The assumption that is
will always come with a ARDY IRQ, which was the idea behind
ignoring it, proves wrong.
It is true for simple reads from an unused address.
To still avoid the i2cdetect trouble which is the reason for
commit c770657bd261 ("i2c: omap: Fix standard mode false ACK readings"),
avoid doing much about NACK in omap_i2c_xfer_data() which is used
by both IRQ mode and polling mode, so also the false detection fix
is extended to polling usage and IRQ storms are avoided.
By changing this, the hardirq handler is not needed anymore to filter
stuff.
The mentioned gyro reset now just causes a -ETIMEDOUT instead of
hanging the system.
This patch broke net/forwarding/ip6gre_custom_multipath_hash.sh in some
circumstances (https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/).
Let's revert it while the problem is being investigated.
Previous commit 8b5c171bb3dc ("neigh: new unresolved queue limits")
introduces new netlink attribute NDTPA_QUEUE_LENBYTES to represent
approximative value for deprecated QUEUE_LEN. However, it forgot to add
the associated nla_policy in nl_ntbl_parm_policy array. Fix it with one
simple NLA_U32 type policy.
Fixes: 8b5c171bb3dc ("neigh: new unresolved queue limits") Signed-off-by: Lin Ma <linma@zju.edu.cn> Link: https://patch.msgid.link/20250315165113.37600-1-linma@zju.edu.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch acts as a parachute, catch all solution, by detecting
recursion loops in lwtunnel users and taking care of them (e.g., a loop
between routes, a loop within the same route, etc). In general, such
loops are the consequence of pathological configurations. Each lwtunnel
user is still free to catch such loops early and do whatever they want
with them. It will be the case in a separate patch for, e.g., seg6 and
seg6_local, in order to provide drop reasons and update statistics.
Another example of a lwtunnel user taking care of loops is ioam6, which
has valid use cases that include loops (e.g., inline mode), and which is
addressed by the next patch in this series. Overall, this patch acts as
a last resort to catch loops and drop packets, since we don't want to
leak something unintentionally because of a pathological configuration
in lwtunnels.
The solution in this patch reuses dev_xmit_recursion(),
dev_xmit_recursion_inc(), and dev_xmit_recursion_dec(), which seems fine
considering the context.
Closes: https://lore.kernel.org/netdev/2bc9e2079e864a9290561894d2a602d6@akamai.com/ Closes: https://lore.kernel.org/netdev/Z7NKYMY7fJT5cYWu@shredder/ Fixes: ffce41962ef6 ("lwtunnel: support dst output redirect function") Fixes: 2536862311d2 ("lwt: Add support to redirect dst.input") Fixes: 14972cbd34ff ("net: lwtunnel: Handle fragmentation") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Link: https://patch.msgid.link/20250314120048.12569-2-justin.iurman@uliege.be Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While creating a new IPv6, we could get a weird -ENOMEM when
RTA_NH_ID is set and either of the conditions below is true:
1) CONFIG_IPV6_SUBTREES is enabled and rtm_src_len is specified
2) nexthop_get() fails
e.g.)
# strace ip -6 route add fe80::dead:beef:dead:beef nhid 1 from ::
recvmsg(3, {msg_iov=[{iov_base=[...[
{error=-ENOMEM, msg=[... [...]]},
[{nla_len=49, nla_type=NLMSGERR_ATTR_MSG}, "Nexthops can not be used with so"...]
]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148
Let's set err explicitly after ip_fib_metrics_init() in
ip6_route_info_create().
Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250312013854.61125-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
fib_check_nh_v6_gw() expects that fib6_nh_init() cleans up everything
when it fails.
Commit 7dd73168e273 ("ipv6: Always allocate pcpu memory in a fib6_nh")
moved fib_nh_common_init() before alloc_percpu_gfp() within fib6_nh_init()
but forgot to add cleanup for fib6_nh->nh_common.nhc_pcpu_rth_output in
case it fails to allocate fib6_nh->rt6i_pcpu, resulting in memleak.
Let's call fib_nh_common_release() and clear nhc_pcpu_rth_output in the
error path.
Note that we can remove the fib6_nh_release() call in nh_create_ipv6()
later in net-next.git.
Fixes: 7dd73168e273 ("ipv6: Always allocate pcpu memory in a fib6_nh") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250312010333.56001-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When slicing a BO, we need to iterate through the BO's sgt to find the
right pieces to construct the slice. Some of the data types chosen for
this process are incorrectly too small, and can overflow. This can
result in the incorrect slice construction, which can lead to data
corruption in workload execution.
The device can only handle 32-bit sized transfers, and the scatterlist
struct only supports 32-bit buffer sizes, so our upper limit for an
individual transfer is an unsigned int. Using an int is incorrect due to
the reservation of the sign bit. Upgrade the length of a scatterlist
entry and the offsets into a scatterlist entry to unsigned int for a
correct representation.
While each transfer may be limited to 32-bits, the overall BO may exceed
that size. For counting the total length of the BO, we need a type that
can represent the largest allocation possible on the system. That is the
definition of size_t, so use it.
Due to a typo during defining HCI errors it is not possible to connect
LE-capable device with BR/EDR only adapter. The connection is terminated
by the LE adapter because the invalid LL params error code is treated
as unsupported remote feature.
Fixes: 79c0868ad65a ("Bluetooth: hci_event: Use HCI error defines instead of magic values") Signed-off-by: Arkadiusz Bokowy <arkadiusz.bokowy@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When ib_copy_to_udata() fails in hns_roce_create_qp_common(),
hns_roce_qp_remove() should be called in the error path to
clean up resources in hns_roce_qp_store().
Fixes: 0f00571f9433 ("RDMA/hns: Use new SQ doorbell register for HIP09") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250311084857.3803665-6-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently the condition of unmapping sdb in error path is not exactly
the same as the condition of mapping in alloc_user_qp_db(). This may
cause a problem of unmapping an unmapped db in some case, such as
when the QP is XRC TGT. Unified the two conditions.
Driver runs a for-loop when allocating bt pages and mapping them with
buffer pages. When a large buffer (e.g. MR over 100GB) is being allocated,
it may require a considerable loop count. This will lead to soft lockup:
Add a cond_resched() to fix soft lockup during these loops. In order not
to affect the allocation performance of normal-size buffer, set the loop
count of a 100GB MR as the threshold to call cond_resched().
Fixes: 38389eaa4db1 ("RDMA/hns: Add mtr support for mixed multihop addressing") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250311084857.3803665-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Driver is always clearing the mask that sets the VLAN ID/Service Level
in the adapter. Recent change for supporting multiple traffic class
exposed this issue.
Allow setting SL and VLAN_ID while QP is moved from INIT to RTR state.
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Fixes: c64b16a37b6d ("RDMA/bnxt_re: Support different traffic class") Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1741670196-2919-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
During upstream process of Raspberry Pi 4 back in 2019 the ARMv7 stubs
didn't configured the ARM architectural timer. This firmware issue has
been fixed in 2020, which gave users enough time to update their system.
So drop this property to allow the use of the vDSO version of
clock_gettime.
Link: https://github.com/raspberrypi/tools/pull/113 Fixes: 7dbe8c62ceeb ("ARM: dts: Add minimal Raspberry Pi 4 support") Signed-off-by: Phil Elwell <phil@raspberrypi.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250222094113.48198-1-wahrenst@gmx.net Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When GENERIC_IRQ_CHIP is disabled, OMAP1 kernels fail to link:
arm-linux-gnueabi-ld: arch/arm/mach-omap1/irq.o: in function `omap1_init_irq':
irq.c:(.init.text+0x1e8): undefined reference to `irq_alloc_generic_chip'
arm-linux-gnueabi-ld: irq.c:(.init.text+0x228): undefined reference to `irq_setup_generic_chip'
arm-linux-gnueabi-ld: irq.c:(.init.text+0x2a8): undefined reference to `irq_gc_set_wake'
arm-linux-gnueabi-ld: irq.c:(.init.text+0x2b0): undefined reference to `irq_gc_mask_set_bit'
arm-linux-gnueabi-ld: irq.c:(.init.text+0x2b4): undefined reference to `irq_gc_mask_clr_bit'
This has apparently been the case for many years, but I never caught it
in randconfig builds until now, as there are dozens of other drivers
that also 'select GENERIC_IRQ_CHIP' and statistically there is almost
always one of them enabled.
In function create_ib_ah() the following line attempts
to left shift the return value of mlx5r_ib_rate() by 4
and store it in the stat_rate_sl member of av:
However the code overlooks the fact that mlx5r_ib_rate()
may return -EINVAL if the rate passed to it is less than
IB_RATE_2_5_GBPS or greater than IB_RATE_800_GBPS.
Because of this, the code may invoke undefined behaviour when
shifting a signed negative value when doing "-EINVAL << 4".
To fix this check for errors before assigning stat_rate_sl and
propagate any error value to the callers.
Property "supports-sd" isn't documented anywhere and is unnecessary for
mainline driver to function. It seems a property used by downstream
kernel was brought into mainline.
This should be reported by dtbs_check, but mmc-controller-common.yaml
defaults additionalProperties to true thus allows it. Remove the
property to clean the devicetree up and avoid possible confusion.
The ARM PL011 UART instances in BCM2711 are r1p5 spec, which means they
have 32-entry FIFOs. The correct periphid value for this is 0x00341011.
Thanks to N Buchwitz for pointing this out.
Signed-off-by: Phil Elwell <phil@raspberrypi.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20250223125614.3592-2-wahrenst@gmx.net Fixes: 7dbe8c62ceeb ("ARM: dts: Add minimal Raspberry Pi 4 support") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Unregister the cpufreq device and soc device when resource unwinding,
otherwise there will be warning when do removing test:
sysfs: cannot create duplicate filename '/devices/platform/imx-cpufreq-dt'
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-rc1-next-20241204
Hardware name: NXP i.MX8MPlus EVK board (DT)
Fixes: 9cc832d37799 ("soc: imx8m: Probe the SoC driver as platform driver") Cc: Marco Felsch <m.felsch@pengutronix.de> Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Marco Felsch <m.felsch@pengutronix.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The static global soc_uid is only ever used as kasprintf() parameter in
imx8m_soc_probe(). Pass pointer to local u64 variable to .soc_revision()
callback instead and let the .soc_revision() callback fill in the content.
Remove the unnecessary static global variable.
Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Stable-dep-of: cf7139aac463 ("soc: imx8m: Unregister cpufreq and soc dev in cleanup path") Signed-off-by: Sasha Levin <sashal@kernel.org>
The cited commit fixed a software GSO bug with VXLAN + IPSec in tunnel
mode. Unfortunately, it is slightly broader than necessary, as it also
severely affects performance for Geneve + IPSec transport mode over a
device capable of both HW GSO and IPSec crypto offload. In this case,
xfrm_output unnecessarily triggers software GSO instead of letting the
HW do it. In simple iperf3 tests over Geneve + IPSec transport mode over
a back-2-back pair of NICs with MTU 1500, the performance was observed
to be up to 6x worse when doing software GSO compared to leaving it to
the hardware.
This commit makes xfrm_output only trigger software GSO in crypto
offload cases for already encapsulated packets in tunnel mode, as not
doing so would then cause the inner tunnel skb->inner_networking_header
to be overwritten and break software GSO for that packet later if the
device turns out to not be capable of HW GSO.
Taking a closer look at the conditions for the original bug, to better
understand the reasons for this change:
- vxlan_build_skb -> iptunnel_handle_offloads sets inner_protocol and
inner network header.
- then, udp_tunnel_xmit_skb -> ip_tunnel_xmit adds outer transport and
network headers.
- later in the xmit path, xfrm_output -> xfrm_outer_mode_output ->
xfrm4_prepare_output -> xfrm4_tunnel_encap_add overwrites the inner
network header with the one set in ip_tunnel_xmit before adding the
second outer header.
- __dev_queue_xmit -> validate_xmit_skb checks whether GSO segmentation
needs to happen based on dev features. In the original bug, the hw
couldn't segment the packets, so skb_gso_segment was invoked.
- deep in the .gso_segment callback machinery, __skb_udp_tunnel_segment
tries to use the wrong inner network header, expecting the one set in
iptunnel_handle_offloads but getting the one set by xfrm instead.
- a bit later, ipv6_gso_segment accesses the wrong memory based on that
wrong inner network header.
With the new change, the original bug (or similar ones) cannot happen
again, as xfrm will now trigger software GSO before applying a tunnel.
This concern doesn't exist in packet offload mode, when the HW adds
encapsulation headers. For the non-offloaded packets (crypto in SW),
software GSO is still done unconditionally in the else branch.
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Fixes: a204aef9fd77 ("xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>