This fixes a regression introduced a few weeks ago in stable kernels
6.12.14 and 6.13.3. The internal microphone on ASUS Vivobook N705UD /
X705UD laptops is broken: the microphone appears in userspace (e.g.
Gnome settings) but no sound is detected.
I bisected it to commit 3b4309546b48 ("ALSA: hda: Fix headset detection
failure due to unstable sort").
I figured out the cause:
1. The initial pins enabled for the ALC256 driver are:
cfg->inputs == {
{ pin=0x19, type=AUTO_PIN_MIC,
is_headset_mic=1, is_headphone_mic=0, has_boost_on_pin=1 },
{ pin=0x1a, type=AUTO_PIN_MIC,
is_headset_mic=0, is_headphone_mic=0, has_boost_on_pin=1 } }
2. Since 2017 and commits c1732ede5e8 ("ALSA: hda/realtek - Fix headset
and mic on several ASUS laptops with ALC256") and 28e8af8a163 ("ALSA:
hda/realtek: Fix mic and headset jack sense on ASUS X705UD"), the
quirk ALC256_FIXUP_ASUS_MIC is also applied to ASUS X705UD / N705UD
laptops.
This added another internal microphone on pin 0x13:
cfg->inputs == {
{ pin=0x13, type=AUTO_PIN_MIC,
is_headset_mic=0, is_headphone_mic=0, has_boost_on_pin=1 },
{ pin=0x19, type=AUTO_PIN_MIC,
is_headset_mic=1, is_headphone_mic=0, has_boost_on_pin=1 },
{ pin=0x1a, type=AUTO_PIN_MIC,
is_headset_mic=0, is_headphone_mic=0, has_boost_on_pin=1 } }
I don't know what this pin 0x13 corresponds to. To the best of my
knowledge, these laptops have only one internal microphone.
3. Before 2025 and commit 3b4309546b48 ("ALSA: hda: Fix headset
detection failure due to unstable sort"), the sort function would let
the microphone of pin 0x1a (the working one) *before* the microphone
of pin 0x13 (the phantom one).
4. After this commit 3b4309546b48, the fixed sort function puts the
working microphone (pin 0x1a) *after* the phantom one (pin 0x13). As
a result, no sound is detected anymore.
It looks like the quirk ALC256_FIXUP_ASUS_MIC is not needed anymore for
ASUS Vivobook X705UD / N705UD laptops. Without it, everything works
fine:
- the internal microphone is detected and records actual sound,
- plugging in a jack headset is detected and can record actual sound
with it,
- unplugging the jack headset makes the system go back to internal
microphone and can record actual sound.
Check whether denominator expression x * (x - 1) * 1000 mod {2^32, 2^64}
produce zero and skip stddev computation in that case.
For now don't care about rec->counter * rec->counter overflow because
rec->time * rec->time overflow will likely happen earlier.
Cc: stable@vger.kernel.org Cc: Wen Yang <wenyang@linux.alibaba.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250206090156.1561783-1-kniv@yandex-team.ru Fixes: e31f7939c1c27 ("ftrace: Avoid potential division by zero in function profiler") Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ret = create_actions(); (return -EINVAL)
if (ret)
goto out_unreg;
[..]
ret = hist_trigger_enable(data, ...) {
list_add_tail_rcu(&data->list, &file->triggers); <<<---- SKIPPED!!! (this is important!)
[..]
out_unreg:
event_hist_unregister(.., data) {
cmd_ops->unreg(.., data, ..) [hist_unregister_trigger()] {
list_for_each_entry(iter, &file->triggers, list) {
if (!hist_trigger_match(data, iter, named_data, false)) <- never matches
continue;
[..]
test = iter;
}
if (test && test->ops->free) <<<-- test is NULL
test->ops->free(test) [event_hist_trigger_free()] {
[..]
if (data->name)
del_named_trigger(data) {
list_del(&data->named_list); <<<<-- NEVER gets removed!
}
}
}
}
[..]
kfree(data); <<<-- frees item but it is still on list
The next time a hist with name is registered, it causes an u-a-f bug and
the kernel can crash.
Move the code around such that if event_trigger_register() succeeds, the
next thing called is hist_trigger_enable() which adds it to the list.
A bunch of actions is called if get_named_trigger_data() returns false.
But that doesn't need to be called after event_trigger_register(), so it
can be moved up, allowing event_trigger_register() to be called just
before hist_trigger_enable() keeping them together and allowing the
file->triggers to be properly populated.
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250227163944.1c37f85f@gandalf.local.home Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers") Reported-by: Tomas Glozar <tglozar@redhat.com> Tested-by: Tomas Glozar <tglozar@redhat.com> Reviewed-by: Tom Zanussi <zanussi@kernel.org> Closes: https://lore.kernel.org/all/CAP4=nvTsxjckSBTz=Oe_UYh8keD9_sZC4i++4h72mJLic4_W4A@mail.gmail.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When an invalid function ID of an SBI extension is used we should
return not-supported, not invalid-param. Also, when we see that at
least one hartid constructed from the base and mask parameters is
invalid, then we should return invalid-param. Finally, rather than
relying on overflowing a left shift to result in zero and then using
that zero in a condition which [correctly] skips sending an IPI (but
loops unnecessarily), explicitly check for overflow and exit the loop
immediately.
Fixes: 5f862df5585c ("RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20250217084506.18763-10-ajones@ventanamicro.com Signed-off-by: Anup Patel <anup@brainfault.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The spec says suspend_type is 32 bits wide and "In case the data is
defined as 32bit wide, higher privilege software must ensure that it
only uses 32 bit data." Mask off upper bits of suspend_type before
using it.
"Not stopped" means started or suspended so we need to check for
a single state in order to have a chance to check for each state.
Also, we need to use target_vcpu when checking for the suspend
state.
A previous patch ensured that USB Type C connector support is enabled,
but it is still possible to build the phy driver without enabling
CONFIG_USB (host support) or CONFIG_USB_GADGET (device support), and
in that case the common helper functions are unavailable:
aarch64-linux-ld: drivers/phy/rockchip/phy-rockchip-usbdp.o: in function `rk_udphy_probe':
phy-rockchip-usbdp.c:(.text+0xe74): undefined reference to `usb_get_maximum_speed'
Select CONFIG_USB_COMMON directly here, like we do in some other phy
drivers, to make sure this is available even when actual USB support
is disabled or in a loadable module that cannot be reached from a
built-in phy driver.
I still have some Soekris net4826 in a Community Wireless Network I
volunteer with. These devices use an AMD SC1100 SoC. I am running
OpenWrt on them, which uses a patched kernel, that naturally has
evolved over time. I haven't updated the ones in the field in a
number of years (circa 2017), but have one in a test bed, where I have
intermittently tried out test builds.
A few years ago, I noticed some trouble, particularly when "warm
booting", that is, doing a reboot without removing power, and noticed
the device was hanging after the kernel message:
[ 0.081615] Working around Cyrix MediaGX virtual DMA bugs.
If I removed power and then restarted, it would boot fine, continuing
through the message above, thusly:
[ 0.081615] Working around Cyrix MediaGX virtual DMA bugs.
[ 0.090076] Enable Memory-Write-back mode on Cyrix/NSC processor.
[ 0.100000] Enable Memory access reorder on Cyrix/NSC processor.
[ 0.100070] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.110058] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[ 0.120037] CPU: NSC Geode(TM) Integrated Processor by National Semi (family: 0x5, model: 0x9, stepping: 0x1)
[...]
In order to continue using modern tools, like ssh, to interact with
the software on these old devices, I need modern builds of the OpenWrt
firmware on the devices. I confirmed that the warm boot hang was still
an issue in modern OpenWrt builds (currently using a patched linux
v6.6.65).
Last night, I decided it was time to get to the bottom of the warm
boot hang, and began bisecting. From preserved builds, I narrowed down
the bisection window from late February to late May 2019. During this
period, the OpenWrt builds were using 4.14.x. I was able to build
using period-correct Ubuntu 18.04.6. After a number of bisection
iterations, I identified a kernel bump from 4.14.112 to 4.14.113 as
the commit that introduced the warm boot hang.
So, I tried reverting just that kernel change on top of the breaking
OpenWrt commit, and my warm boot hang went away.
Presumably, the warm boot hang is due to some register not getting
cleared in the same way that a loss of power does. That is
approximately as much as I understand about the problem.
More poking/prodding and coaching from Jonas Gorski, it looks
like this test patch fixes the problem on my board: Tested against
v6.6.67 and v4.14.113.
Fixes: 18fb053f9b82 ("x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors") Debugged-by: Jonas Gorski <jonas.gorski@gmail.com> Signed-off-by: Russell Senior <russell@personaltelco.net> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/CAHP3WfOgs3Ms4Z+L9i0-iBOE21sdMk5erAiJurPjnrL9LSsgRA@mail.gmail.com Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
A C jump table (such as the one used by the BPF interpreter) is a const
global array of absolute code addresses, and this means that the actual
values in the table may not be known until the kernel is booted (e.g.,
when using KASLR or when the kernel VA space is sized dynamically).
When using PIE codegen, the compiler will default to placing such const
global objects in .data.rel.ro (which is annotated as writable), rather
than .rodata (which is annotated as read-only). As C jump tables are
explicitly emitted into .rodata, this used to result in warnings for
LoongArch builds (which uses PIE codegen for the entire kernel) like
Warning: setting incorrect section attributes for .rodata..c_jump_table
due to the fact that the explicitly specified .rodata section inherited
the read-write annotation that the compiler uses for such objects when
using PIE codegen.
This warning was suppressed by explicitly adding the read-only
annotation to the __attribute__((section(""))) string, by commit
c5b1184decc8 ("compiler.h: specify correct attribute for .rodata..c_jump_table")
Unfortunately, this hack does not work on Clang's integrated assembler,
which happily interprets the appended section type and permission
specifiers as part of the section name, which therefore no longer
matches the hard-coded pattern '.rodata..c_jump_table' that objtool
expects, causing it to emit a warning
kernel/bpf/core.o: warning: objtool: ___bpf_prog_run+0x20: sibling call from callable instruction with modified stack frame
Work around this, by emitting C jump tables into .data.rel.ro instead,
which is treated as .rodata by the linker script for all builds, not
just PIE based ones.
Fixes: c5b1184decc8 ("compiler.h: specify correct attribute for .rodata..c_jump_table") Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn> # on LoongArch Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250221135704.431269-6-ardb+git@google.com Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
params->total_weight is not initialized during bind and not updated when
the bound cdev changes. The cooling device weight will not be used due
to the uninitialized total_weight, until an update via sysfs is
triggered.
The bound cdevs are updated during thermal zone registration, where each
cooling device will be bound to the thermal zone one by one, but
power_allocator_bind() can be called without an additional cdev update
when manually changing the policy of a thermal zone via sysfs.
Add a new function to handle weight update logic, including updating
total_weight, and call it when bind, weight changes, and cdev updates to
ensure total_weight is always correct.
Fixes: a3cd6db4cc2e ("thermal: gov_power_allocator: Support new update callback of weights") Signed-off-by: Yu-Che Cheng <giver@chromium.org> Link: https://patch.msgid.link/20250222-fix-power-allocator-weight-v2-1-a94de86b685a@chromium.org
[ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In almost all places where a thermal zone's list of thermal instances
is walked, there is a check to match a specific trip point and it is
walked in vain whenever there are no cooling devices associated with
the given trip.
To address this, store the lists of thermal instances in trip point
descriptors instead of storing them in thermal zones and adjust all
code using those lists accordingly.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/5522726.Sb9uPGUboI@rjwysocki.net Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Stable-dep-of: 0cde378a10c1 ("thermal: gov_power_allocator: Update total_weight on bind and cdev updates") Signed-off-by: Sasha Levin <sashal@kernel.org>
Since thermal_of_should_bind() terminates the loop after processing
the first child found in cooling-maps, it will never match more than
one cdev to a given trip point which is incorrect, as there may be
cooling-maps associating one trip point with multiple cooling devices.
Address this by letting the loop continue until either all
children have been processed or a matching one has been found.
To avoid adding conditionals or goto statements, put the loop in
question into a separate function and make that function return
right away after finding a matching cooling-maps entry.
The cause is that zero pfn is set to the PTE without increasing the RSS
count in mfill_atomic_pte_zeropage() and the refcount of zero folio does
not increase accordingly. Then, the operation on the same pfn is performed
in uprobe_write_opcode()->__replace_page() to unconditional decrease the
RSS count and old_folio's refcount.
Therefore, two bugs are introduced:
1. The RSS count is incorrect, when process exit, the check_mm() report
error "Bad rss-count".
2. The reserved folio (zero folio) is freed when folio->refcount is zero,
then free_pages_prepare->free_page_is_bad() report error
"Bad page state".
There is more, the following warning could also theoretically be triggered:
Syskaller triggers a warning due to prev_epc->pmu != next_epc->pmu in
perf_event_swap_task_ctx_data(). vmcore shows that two lists have the same
perf_event_pmu_context, but not in the same order.
The problem is that the order of pmu_ctx_list for the parent is impacted by
the time when an event/PMU is added. While the order for a child is
impacted by the event order in the pinned_groups and flexible_groups. So
the order of pmu_ctx_list in the parent and child may be different.
To fix this problem, insert the perf_event_pmu_context to its proper place
after iteration of the pmu_ctx_list.
The follow testcase can trigger above warning:
# perf record -e cycles --call-graph lbr -- taskset -c 3 ./a.out &
# perf stat -e cpu-clock,cs -p xxx // xxx is the pid of a.out
pid = fork();
if (pid == -1) {
printf("fork error\n");
return;
}
if (pid == 0) {
while (1) {
count++;
}
} else {
while (1) {
count++;
}
}
}
The testcase first opens an LBR event, so it will allocate task_ctx_data,
and then open tracepoint and software events, so the parent context will
have 3 different perf_event_pmu_contexts. On inheritance, child ctx will
insert the perf_event_pmu_context in another order and the warning will
trigger.
divvy_up_power() should use weighted_req_power instead of req_power to
calculate granted_power. Otherwise, granted_power may be unexpected as
the denominator total_req_power is a weighted sum.
This is a mistake made during the previous refactor.
Replace req_power with weighted_req_power in divvy_up_power()
calculation.
Fixes: 912e97c67cc3 ("thermal: gov_power_allocator: Move memory allocation out of throttle()") Signed-off-by: Yu-Che Cheng <giver@chromium.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20250219-fix-power-allocator-calc-v1-1-48b860291919@chromium.org
[ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
IEP driver supports both perout and pps signal generation
but perout feature is faulty with half-cooked support
due to some missing configuration. Remove perout
support from the driver and reject perout requests with
"not supported" error code.
Change the test to check if 'combined-count' is a key in the dictionary
first and if not assume that this means the driver has separate RX and
TX queues.
With this change, the test now passes successfully on tg3 and mlx5
(which does have a 'combined-count').
Fixes: 1cf270424218 ("net: selftest: add test for netdev netlink queue-get API") Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250226181957.212189-1-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In certain cases, napi_get_frags() returns an skb that points to an old
received fragment, This skb may have its skb->ip_summed, csum, and other
fields set from previous fragment handling.
Some network drivers set skb->ip_summed to either CHECKSUM_COMPLETE or
CHECKSUM_UNNECESSARY when getting skb from napi_get_frags(), while
others only set skb->ip_summed when RX checksum offload is enabled on
the device, and do not set any value for skb->ip_summed when hardware
checksum offload is disabled, assuming that the skb->ip_summed
initiated to zero by napi_reuse_skb, ionic driver for example will
ignore/unset any value for the ip_summed filed if HW checksum offload is
disabled, and if we have a situation where the user disables the
checksum offload during a traffic that could lead to the following
errors shown in the kernel logs:
<IRQ>
dump_stack_lvl+0x34/0x48
__skb_gro_checksum_complete+0x7e/0x90
tcp6_gro_receive+0xc6/0x190
ipv6_gro_receive+0x1ec/0x430
dev_gro_receive+0x188/0x360
? ionic_rx_clean+0x25a/0x460 [ionic]
napi_gro_frags+0x13c/0x300
? __pfx_ionic_rx_service+0x10/0x10 [ionic]
ionic_rx_service+0x67/0x80 [ionic]
ionic_cq_service+0x58/0x90 [ionic]
ionic_txrx_napi+0x64/0x1b0 [ionic]
__napi_poll+0x27/0x170
net_rx_action+0x29c/0x370
handle_softirqs+0xce/0x270
__irq_exit_rcu+0xa3/0xc0
common_interrupt+0x80/0xa0
</IRQ>
This inconsistency sometimes leads to checksum validation issues in the
upper layers of the network stack.
To resolve this, this patch clears the skb->ip_summed value for each
reused skb in by napi_reuse_skb(), ensuring that the caller is responsible
for setting the correct checksum status. This eliminates potential
checksum validation issues caused by improper handling of
skb->ip_summed.
Fixes: 76620aafd66f ("gro: New frags interface to avoid copying shinfo") Signed-off-by: Mohammad Heib <mheib@redhat.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250225112852.2507709-1-mheib@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Recently a bug was discovered where the server had entered TCP_ESTABLISHED
state, but the upper layers were not notified.
The same 5-tuple packet may be processed by different CPUSs, so two
CPUs may receive different ack packets at the same time when the
state is TCP_NEW_SYN_RECV.
In that case, req->ts_recent in tcp_check_req may be changed concurrently,
which will probably cause the newsk's ts_recent to be incorrectly large.
So that tcp_validate_incoming will fail. At this point, newsk will not be
able to enter the TCP_ESTABLISHED.
The cpu2's skb or a newly received skb will call tcp_v4_do_rcv to get
the newsk into the TCP_ESTABLISHED state, but at this point it is no
longer possible to notify the upper layer application. A notification
mechanism could be added here, but the fix is more complex, so the
current fix is used.
In tcp_check_req, req->ts_recent is used to assign a value to
tcp_sk(child)->rx_opt.ts_recent, so removing the change in req->ts_recent
and changing tcp_sk(child)->rx_opt.ts_recent directly after owning the
req fixes this bug.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
As part of switchdev environment setup, uplink VSI is configured as
default for both Tx and Rx. Default Rx VSI is also used by promiscuous
mode. If promisc mode is enabled and an attempt to enter switchdev mode
is made, the setup will fail because Rx VSI is already configured as
default (rule exists).
Reproducer:
devlink dev eswitch set $PF1_PCI mode switchdev
ip l s $PF1 up
ip l s $PF1 promisc on
echo 1 > /sys/class/net/$PF1/device/sriov_numvfs
In switchdev setup, use ice_set_dflt_vsi() instead of plain
ice_cfg_dflt_vsi(), which avoids repeating setting default VSI for Rx if
it's already configured.
Fixes: 50d62022f455 ("ice: default Tx rule instead of to queue") Reported-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Closes: https://lore.kernel.org/intel-wired-lan/PH0PR11MB50138B635F2E5CEB7075325D961F2@PH0PR11MB5013.namprd11.prod.outlook.com Reviewed-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250224190647.3601930-3-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If ice_ena_vfs() fails after calling ice_create_vf_entries(), it frees
all VFs without removing them from snapshot PF-VF mailbox list, leading
to list corruption.
Reproducer:
devlink dev eswitch set $PF1_PCI mode switchdev
ip l s $PF1 up
ip l s $PF1 promisc on
sleep 1
echo 1 > /sys/class/net/$PF1/device/sriov_numvfs
sleep 1
echo 1 > /sys/class/net/$PF1/device/sriov_numvfs
Trace (minimized):
list_add corruption. next->prev should be prev (ffff8882e241c6f0), but was 0000000000000000. (next=ffff888455da1330).
kernel BUG at lib/list_debug.c:29!
RIP: 0010:__list_add_valid_or_report+0xa6/0x100
ice_mbx_init_vf_info+0xa7/0x180 [ice]
ice_initialize_vf_entry+0x1fa/0x250 [ice]
ice_sriov_configure+0x8d7/0x1520 [ice]
? __percpu_ref_switch_mode+0x1b1/0x5d0
? __pfx_ice_sriov_configure+0x10/0x10 [ice]
Sometimes a KASAN report can be seen instead with a similar stack trace:
BUG: KASAN: use-after-free in __list_add_valid_or_report+0xf1/0x100
VFs are added to this list in ice_mbx_init_vf_info(), but only removed
in ice_free_vfs(). Move the removing to ice_free_vf_entries(), which is
also being called in other places where VFs are being removed (including
ice_free_vfs() itself).
Fixes: 8cd8a6b17d27 ("ice: move VF overflow message count into struct ice_mbx_vf_info") Reported-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Closes: https://lore.kernel.org/intel-wired-lan/PH0PR11MB50138B635F2E5CEB7075325D961F2@PH0PR11MB5013.namprd11.prod.outlook.com Reviewed-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250224190647.3601930-2-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
E830 adds hardware support to prevent the VF from overflowing the PF
mailbox with VIRTCHNL messages. E830 will use the hardware feature
(ICE_F_MBX_LIMIT) instead of the software solution ice_is_malicious_vf().
To prevent a VF from overflowing the PF, the PF sets the number of
messages per VF that can be in the PF's mailbox queue
(ICE_MBX_OVERFLOW_WATERMARK). When the PF processes a message from a VF,
the PF decrements the per VF message count using the E830_MBX_VF_DEC_TRIG
register.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: 79990cf5e7ad ("ice: Fix deinitializing VF in error path") Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, we report -ETOOSMALL (err) only on the first iteration
(!sent). When we get put_cmsg error after a bunch of successful
put_cmsg calls, we don't signal the error at all. This might be
confusing on the userspace side which will see truncated CMSGs
but no MSG_CTRUNC signal.
Consider the following case:
- sizeof(struct cmsghdr) = 16
- sizeof(struct dmabuf_cmsg) = 24
- total cmsg size (CMSG_LEN) = 40 (16+24)
When calling recvmsg with msg_controllen=60, the userspace
will receive two(!) dmabuf_cmsg(s), the first one will
be a valid one and the second one will be silently truncated. There is no
easy way to discover the truncation besides doing something like
"cm->cmsg_len != CMSG_LEN(sizeof(dmabuf_cmsg))".
Introduce new put_devmem_cmsg wrapper that reports an error instead
of doing the truncation. Mina suggests that it's the intended way
this API should work.
Note that we might now report MSG_CTRUNC when the users (incorrectly)
call us with msg_control == NULL.
Fixes: 8f0b3cc9a4c1 ("tcp: RX path for devmem TCP") Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250224174401.3582695-1-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
am65-cpsw uses page_pool_dev_alloc_pages(), thus needs PAGE_POOL
selected to avoid linker errors. This is missing since the driver
started to use page_pool helpers in 8acacc40f733 ("net: ethernet:
ti: am65-cpsw: Add minimal XDP support")
ASUS VivoBook 15 with SSID 1043:1460 took an incorrect quirk via the
pin pattern matching for ASUS (ALC256_FIXUP_ASUS_MIC), resulting in
the two built-in mic pins (0x13 and 0x1b). This had worked without
problems casually in the past because the right pin (0x1b) was picked
up as the primary device. But since we fixed the pin enumeration for
other bugs, the bogus one (0x13) is picked up as the primary device,
hence the bug surfaced now.
For addressing the regression, this patch explicitly specifies the
quirk entry with ALC256_FIXUP_ASUS_MIC_NO_PRESENCE, which sets up only
the headset mic pin.
When SPI is used for control, the driver must hold the SPI bus lock
while issuing the sequence of writes to perform a soft reset.
>From the time the driver writes the SYSTEM_RESET command until the
driver does a write to terminate the reset, there must not be any
activity on the SPI bus lines. If there is any SPI activity during the
soft-reset, another soft-reset will be triggered. The state of the SPI
chip select is irrelevant.
A repeated soft-reset does not in itself cause any problems, and it is
not an infinite loop. The problem is a race between these resets and
the driver polling for boot completion. There is a time window between
soft resets where the driver could read HALO_STATE as 2 (fully booted)
while the chip is actually soft-resetting. Although this window is
small, it is long enough that it is possible to hit it in normal
operation.
To prevent this race and ensure the chip really is fully booted, the
driver calls spi_bus_lock() to prevent other activity while resetting.
It then issues the SYSTEM_RESET mailbox command. After allowing
sufficient time for reset to take effect, the driver issues a PING
mailbox command, which will force completion of the full soft-reset
sequence. The SPI bus lock can then be released. The mailbox is
checked for any boot or wakeup response from the firmware, before the
value in HALO_STATE will be trusted.
This does not affect SoundWire or I2C control.
Fixes: 8a731fd37f8b ("ASoC: cs35l56: Move utility functions to shared file") Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20250225131843.113752-3-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Change calls to async regmap write functions to use the normal
blocking writes so that the cs35l56 driver can use spi_bus_lock() to
gain exclusive access to the SPI bus.
As this is part of a fix, it makes only the minimal change to swap the
functions to the blocking equivalents. There's no need to risk
reworking the buffer allocation logic that is now partially redundant.
The async writes are a 12-year-old workaround for inefficiency of
synchronous writes in the SPI subsystem. The SPI subsystem has since
been changed to avoid the overheads, so this workaround should not be
necessary.
The cs35l56 driver needs to use spi_bus_lock() prevent bus activity
while it is soft-resetting the cs35l56. But spi_bus_lock() is
incompatible with spi_async() calls, which will fail with -EBUSY.
Fixes: 8a731fd37f8b ("ASoC: cs35l56: Move utility functions to shared file") Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Link: https://patch.msgid.link/20250225131843.113752-2-rf@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
OA exponent value of 0 is a valid value for periodic reports. Allow user
to pass 0 for the OA sampling interval since it gets converted to 2 gt
clock ticks.
v2: Update the check in xe_oa_stream_init as well (Ashutosh)
v3: Fix mi-rpc failure by setting default exponent to -1 (CI)
v4: Add the Fixes tag
Whereas all properties can be specified during OA stream open, when the OA
stream is reconfigured only the config_id and syncs can be specified.
v2: Use separate function table for reconfig case (Jonathan)
Change bool function args to enum (Matt B)
v3: s/xe_oa_set_property_funcs/xe_oa_set_property_funcs_open/ (Jonathan)
In addition to stream open, add xe_sync support to the OA config ioctl,
where it is even more useful. This allows e.g. Mesa to replay a workload
repeatedly on the GPU, each time with a different OA configuration, while
precisely controlling (at batch buffer granularity) the workload segment
for which a particular OA configuration is active, without introducing
stalls in the userspace pipeline.
v2: Emit OA config even when config id is same as previous, to ensure
consistent sync behavior (Jose)
No code changes, only code movement so that functions used during stream
open can be reused for the stream reconfiguration
ioctl (DRM_XE_OBSERVATION_IOCTL_CONFIG).
Introduce 'struct xe_oa_fence' which includes the dma_fence used to signal
output fences in the xe_sync array. The fences are signaled
asynchronously. When there are no output fences to signal, the OA
configuration wait is synchronously re-introduced into the ioctl.
v2: Don't wait in the work, use callback + delayed work (Matt B)
Use a single, not a per-fence spinlock (Matt Brost)
v3: Move ofence alloc before job submission (Matt)
Assert, don't fail, from dma_fence_add_callback (Matt)
Additional dma_fence_get for dma_fence_wait (Matt)
Change dma_fence_wait to non-interruptible (Matt)
v4: Introduce last_fence to prevent uaf if stream is closed with
pending OA config jobs
v5: Remove oa_fence_lock, move spinlock back into xe_oa_fence to
prevent uaf
Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-5-ashutosh.dixit@intel.com
Stable-dep-of: 5bd566703e16 ("drm/xe/oa: Allow oa_exponent value of 0") Signed-off-by: Sasha Levin <sashal@kernel.org>
We found an issue when using bpf_redirect with ipvs NAT mode after
commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
the same name space"). Particularly, we use bpf_redirect to return
the skb directly back to the netif it comes from, i.e., xnet is
false in skb_scrub_packet(), and then ipvs_property is preserved
and SNAT is skipped in the rx path.
ipvs_property has been already cleared when netns is changed in
commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when
SKB net namespace changed"). This patch just clears it in spite of
netns.
Fixes: 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed") Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Julian Anastasov <ja@ssi.bg> Link: https://patch.msgid.link/20250222033518.126087-1-lulie@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Use different stream names to avoid such warnings.
DAI names in AUDMIX are also updated accordingly.
Fixes: 15c958390460 ("ASoC: fsl_sai: Add separate DAI for transmitter and receiver") Signed-off-by: Chancel Liu <chancel.liu@nxp.com> Acked-by: Shengjiu Wang <shengjiu.wang@gmail.com> Link: https://patch.msgid.link/20250217010437.258621-1-chancel.liu@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The ES8328 codec driver, which is also used for the ES8388 chip that
appears to have an identical register map, claims that the output can
either take the route from DAC->Mixer->Output or through DAC->Output
directly. To the best of what I could find, this is not true, and
creates problems.
Without DACCONTROL17 bit index 7 set for the left channel, as well as
DACCONTROL20 bit index 7 set for the right channel, I cannot get any
analog audio out on Left Out 2 and Right Out 2 respectively, despite the
DAPM routes claiming that this should be possible. Furthermore, the same
is the case for Left Out 1 and Right Out 1, showing that those two don't
have a direct route from DAC to output bypassing the mixer either.
Those control bits toggle whether the DACs are fed (stale bread?) into
their respective mixers. If one "unmutes" the mixer controls in
alsamixer, then sure, the audio output works, but if it doesn't work
without the mixer being fed the DAC input then evidently it's not a
direct output from the DAC.
ES8328/ES8388 are seemingly not alone in this. ES8323, which uses a
separate driver for what appears to be a very similar register map,
simply flips those two bits on in its probe function, and then pretends
there is no power management whatsoever for the individual controls.
Fair enough.
My theory as to why nobody has noticed this up to this point is that
everyone just assumes it's their fault when they had to unmute an
additional control in ALSA.
Fix this in the es8328 driver by removing the erroneous direct route,
then get rid of the playback switch controls and have those bits tied to
the mixer's widget instead, which until now had no register to play
with.
When the kernel is compiled without LED framework support the
rtl8366rb fails to build like this:
rtl8366rb.o: in function `rtl8366rb_setup_led':
rtl8366rb.c:953:(.text.unlikely.rtl8366rb_setup_led+0xe8):
undefined reference to `led_init_default_state_get'
rtl8366rb.c:980:(.text.unlikely.rtl8366rb_setup_led+0x240):
undefined reference to `devm_led_classdev_register_ext'
As this is constantly coming up in different randconfig builds,
bite the bullet and create a separate file for the offending
code, split out a header with all stuff needed both in the
core driver and the leds code.
Add a new bool Kconfig option for the LED compile target, such
that it depends on LEDS_CLASS=y || LEDS_CLASS=RTL8366RB
which make LED support always available when LEDS_CLASS is
compiled into the kernel and enforce that if the LEDS_CLASS
is a module, then the RTL8366RB driver needs to be a module
as well so that modprobe can resolve the dependencies.
Fixes: 32d617005475 ("net: dsa: realtek: add LED drivers for rtl8366rb") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502070525.xMUImayb-lkp@intel.com/ Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Stats calculations involve a RMW to add the stat update to the existing
value. This is currently not protected by any synchronization mechanism,
so data races are possible. Add a spinlock to protect the update. The
reader side could be protected using u64_stats, but we would still need
a spinlock for the update side anyway. And we always do an update
immediately before reading the stats anyway.
Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver") Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20250220162950.95941-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pass a dscp_t variable to ip_route_input(), instead of a plain u8, to
prevent accidental setting of ECN bits in ->flowi4_tos.
Callers of ip_route_input() to consider are:
* input_action_end_dx4_finish() and input_action_end_dt4() in
net/ipv6/seg6_local.c. These functions set the tos parameter to 0,
which is already a valid dscp_t value, so they don't need to be
adjusted for the new prototype.
* icmp_route_lookup(), which already has a dscp_t variable to pass as
parameter. We just need to remove the inet_dscp_to_dsfield()
conversion.
* br_nf_pre_routing_finish(), ip_options_rcv_srr() and ip4ip6_err(),
which get the DSCP directly from IPv4 headers. Define a helper to
read the .tos field of struct iphdr as dscp_t, so that these
function don't have to do the conversion manually.
While there, declare *iph as const in br_nf_pre_routing_finish(),
declare its local variables in reverse-christmas-tree order and move
the "err = ip_route_input()" assignment out of the conditional to avoid
checkpatch warning.
Pass a dscp_t variable to icmp_route_lookup(), instead of a plain u8,
to prevent accidental setting of ECN bits in ->flowi4_tos. Rename that
variable ("tos" -> "dscp") to make the intent clear.
While there, reorganise the function parameters to fill up horizontal
space.
Commit 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs
to enable softirq tuning") added a possibility to set
net_hotdata.netdev_budget_usecs, but added no lower bound checking.
Commit a4837980fd9f ("net: revert default NAPI poll timeout to 2 jiffies")
made the *initial* value HZ-dependent, so the initial value is at least
2 jiffies even for lower HZ values (2 ms for 1000 Hz, 8ms for 250 Hz, 20
ms for 100 Hz).
But a user still can set improper values by a sysctl. Set .extra1
(the lower bound) for net_hotdata.netdev_budget_usecs to the same value
as in the latter commit. That is to 2 jiffies.
Fixes: a4837980fd9f ("net: revert default NAPI poll timeout to 2 jiffies") Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning") Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Cc: Dmitry Yakunin <zeil@yandex-team.ru> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Link: https://patch.msgid.link/20250220110752.137639-1-jirislaby@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
After commit 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev")
IPv4 neighbors can be constructed on the blackhole net device, but they
are constructed with an output function (neigh_direct_output()) that
simply calls dev_queue_xmit(). The latter will transmit packets via
'skb->dev' which might not be the blackhole net device if dst_dev_put()
switched 'dst->dev' to the blackhole net device while another CPU was
using the dst entry in ip_output(), but after it already initialized
'skb->dev' from 'dst->dev'.
Fix by making sure that neighbors are constructed on top of the
blackhole net device with an output function that simply consumes the
packets, in a similar fashion to dst_discard_out() and
blackhole_netdev_xmit().
Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries") Fixes: 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev") Reported-by: Florian Meister <fmei@sfs.com> Closes: https://lore.kernel.org/netdev/20250210084931.23a5c2e4@hermes.local/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250220072559.782296-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Give an afs_server object a ref on the afs_cell object it points to so that
the cell doesn't get deleted before the server record.
Whilst this is circular (cell -> vol -> server_list -> server -> cell), the
ref only pins the memory, not the lifetime as that's controlled by the
activity counter. When the volume's activity counter reaches 0, it
detaches from the cell and discards its server list; when a cell's activity
counter reaches 0, it discards its root volume. At that point, the
circularity is cut.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250218192250.296870-6-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When allocating and building an afs_server_list struct object from a VLDB
record, we look up each server address to get the server record for it -
but a server may have more than one entry in the record and we discard the
duplicate pointers. Currently, however, when we discard, we only put a
server record, not unuse it - but the lookup got as an active-user count.
The active-user count on an afs_server_list object determines its lifetime
whereas the refcount keeps the memory backing it around. Failing to reduce
the active-user counter prevents the record from being cleaned up and can
lead to multiple copied being seen - and pointing to deleted afs_cell
objects and other such things.
Fix this by switching the incorrect 'put' to an 'unuse' instead.
Without this, occasionally, a dead server record can be seen in
/proc/net/afs/servers and list corruption may be observed:
The rxperf RPCs seem to have a magic cookie at the end of the request that
was failing to be taken account of by the unmarshalling of the request.
Fix the rxperf code to expect this.
Fixes: 75bfdbf2fca3 ("rxrpc: Implement an in-kernel rxperf server for testing purposes") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Simon Horman <horms@kernel.org>
cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250218192250.296870-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
L2CAP_ECRED_CONN_RSP needs to respond DCID in the same order received as
SCID but the order is reversed due to use of list_add which actually
prepend channels to the list so the response is reversed:
Also make sure the response don't include channels that are not on
BT_CONNECT2 since the chan->ident can be set to the same value as in the
following trace:
< ACL Data TX: Handle 16 flags 0x00 dlen 12
LE L2CAP: LE Flow Control Credit (0x16) ident 6 len 4
Source CID: 64
Credits: 1
...
> ACL Data RX: Handle 16 flags 0x02 dlen 18
LE L2CAP: Enhanced Credit Connection Request (0x17) ident 6 len 10
PSM: 39 (0x0027)
MTU: 517
MPS: 251
Credits: 255
Source CID: 70
< ACL Data TX: Handle 16 flags 0x00 dlen 20
LE L2CAP: Enhanced Credit Connection Response (0x18) ident 6 len 12
MTU: 517
MPS: 247
Credits: 3
Result: Connection successful (0x0000)
Destination CID: 64
Destination CID: 68
Closes: https://github.com/bluez/bluez/issues/1094 Fixes: 9aa9d9473f15 ("Bluetooth: L2CAP: Fix responding with wrong PDU type") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We fixed the UAF issue in USB MIDI code by canceling the pending work
at closing each MIDI output device in the commit below. However, this
assumed that it's the only one that is tied with the endpoint, and it
resulted in unexpected data truncations when multiple devices are
assigned to a single endpoint and opened simultaneously.
For addressing the unexpected MIDI message drops, simply replace
cancel_work_sync() with flush_work(). The drain callback should have
been already invoked before the close callback, hence the port->active
flag must be already cleared. So this just assures that the pending
work is finished before freeing the resources.
Fixes: 0125de38122f ("ALSA: usb-audio: Cancel pending work at closing a MIDI substream") Reported-and-tested-by: John Keeping <jkeeping@inmusicbrands.com> Closes: https://lore.kernel.org/20250217111647.3368132-1-jkeeping@inmusicbrands.com Link: https://patch.msgid.link/20250218114024.23125-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a warning about unused variables when building with W=1 and no procfs:
net/sunrpc/cache.c:1660:30: error: 'cache_flush_proc_ops' defined but not used [-Werror=unused-const-variable=]
1660 | static const struct proc_ops cache_flush_proc_ops = {
| ^~~~~~~~~~~~~~~~~~~~
net/sunrpc/cache.c:1622:30: error: 'content_proc_ops' defined but not used [-Werror=unused-const-variable=]
1622 | static const struct proc_ops content_proc_ops = {
| ^~~~~~~~~~~~~~~~
net/sunrpc/cache.c:1598:30: error: 'cache_channel_proc_ops' defined but not used [-Werror=unused-const-variable=]
1598 | static const struct proc_ops cache_channel_proc_ops = {
| ^~~~~~~~~~~~~~~~~~~~~~
These are used inside of an #ifdef, so replacing that with an
IS_ENABLED() check lets the compiler see how they are used while
still dropping them during dead code elimination.
Fixes: dbf847ecb631 ("knfsd: allow cache_register to return error on failure") Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When there is a failure during bind QP, the cleanup flow destroys the
counter regardless if it is the one that created it or not, which is
problematic since if it isn't the one that created it, that counter could
still be in use.
Fix that by destroying the counter only if it was created during this call.
Fixes: 45842fc627c7 ("IB/mlx5: Support statistic q counter configuration") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://patch.msgid.link/25dfefddb0ebefa668c32e06a94d84e3216257cf.1740033937.git.leon@kernel.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit bb9850704c04 ("scsi: ufs: core: Honor runtime/system PM levels if
set by host controller drivers") introduced the check for setting default
PM levels only if the levels are uninitialized by the host controller
drivers. But it missed the fact that the levels could be initialized to 0
(UFS_PM_LVL_0) on purpose by the controller drivers. Even though none of
the drivers are doing so now, the logic should be fixed irrespectively.
So set the default levels unconditionally before calling ufshcd_hba_init()
API which initializes the controller drivers. It ensures that the
controller drivers could override the default levels if required.
Fixes: bb9850704c04 ("scsi: ufs: core: Honor runtime/system PM levels if set by host controller drivers") Reported-by: Bao D. Nguyen <quic_nguyenb@quicinc.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20250219105047.49932-1-manivannan.sadhasivam@linaro.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
After commit 1bad6c4a57ef ("scsi: zero per-cmd private driver data for each
MQ I/O"), the xen-scsifront/virtio_scsi/snic drivers all removed code that
explicitly zeroed driver-private command data.
In combination with commit 464a00c9e0ad ("scsi: core: Kill DRIVER_SENSE"),
after virtio_scsi performs a capacity expansion, the first request will
return a unit attention to indicate that the capacity has changed. And then
the original command is retried. As driver-private command data was not
cleared, the request would return UA again and eventually time out and fail.
Zero driver-private command data when a request is retried.
Fixes: f7de50da1479 ("scsi: xen-scsifront: Remove code that zeroes driver-private command data") Fixes: c2bb87318baa ("scsi: virtio_scsi: Remove code that zeroes driver-private command data") Fixes: c3006a926468 ("scsi: snic: Remove code that zeroes driver-private command data") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250217021628.2929248-1-yebin@huaweicloud.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the TLS handshake attempt returns -ETIMEDOUT, we currently translate
that error into -EACCES. This becomes problematic for cases where the RPC
layer is attempting to re-connect in paths that don't resonably handle
-EACCES, for example: writeback. The RPC layer can handle -ETIMEDOUT quite
well, however - so if the handshake returns this error let's just pass it
along.
Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the file is sillyrenamed, and slated for delete on close, it is
possible for a server reboot to triggeer an open reclaim, with can again
race with the application call to close(). When that happens, the call
to put_nfs_open_context() can trigger a synchronous delegreturn call
which deadlocks because it is not marked as privileged.
Instead, ensure that the call to nfs4_inode_return_delegation_on_close()
catches the delegreturn, and schedules it asynchronously.
Reported-by: Li Lingfeng <lilingfeng3@huawei.com> Fixes: adb4b42d19ae ("Return the delegation when deleting sillyrenamed files") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If rpc_signal_task() is called while a task is in an rpc_call_done()
callback function, and the latter calls rpc_restart_call(), the task can
end up looping due to the RPC_TASK_SIGNALLED flag being set without the
tk_rpc_status being set.
Removing the redundant mechanism for signalling the task fixes the
looping behaviour.
Reported-by: Li Lingfeng <lilingfeng3@huawei.com> Fixes: 39494194f93b ("SUNRPC: Fix races with rpc_killall_tasks()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While it is uncommon for delegations to be held while O_DIRECT writes
are in progress, it is possible. The xfstests generic/647 and
generic/729 both end up triggering that state, and end up failing due to
the fact that the file size is not adjusted.
Reported-by: Chuck Lever <chuck.lever@oracle.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219738 Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Stable-dep-of: 88025c67fe3c ("NFS: Adjust delegated timestamps for O_DIRECT reads and writes") Signed-off-by: Sasha Levin <sashal@kernel.org>
ufshcd_is_ufs_dev_busy(), ufshcd_print_host_state() and
ufshcd_eh_timed_out() are used in both modes (legacy mode and MCQ mode).
hba->outstanding_reqs only represents the outstanding requests in legacy
mode. Hence, change hba->outstanding_reqs into scsi_host_busy(hba->host) in
these functions.
Fixes: eacb139b77ff ("scsi: ufs: core: mcq: Enable multi-circular queue") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250214224352.3025151-1-bvanassche@acm.org Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Use sk_is_tcp() to check if socket is TCP in bind(2) and connect(2)
hooks.
SMC, MPTCP, SCTP protocols are currently restricted by TCP access
rights. The purpose of TCP access rights is to provide control over
ports that can be used by userland to establish a TCP connection.
Therefore, it is incorrect to deny bind(2) and connect(2) requests for a
socket of another protocol.
However, SMC, MPTCP and RDS implementations use TCP internal sockets to
establish communication or even to exchange packets over a TCP
connection [1]. Landlock rules that configure bind(2) and connect(2)
usage for TCP sockets should not cover requests for sockets of such
protocols. These protocols have different set of security issues and
security properties, therefore, it is necessary to provide the userland
with the ability to distinguish between them (eg. [2]).
Control over TCP connection used by other protocols can be achieved with
upcoming support of socket creation control [3].
In order to optimize the size of driver private structure,
the memory for dev_attr is allocated dynamically during the
chip context initialization. In order to make certain runtime
decisions, store dev_attr in the qplib_res structure.
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1736446693-6692-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Stable-dep-of: 8238c7bd8420 ("RDMA/bnxt_re: Fix the statistics for Gen P7 VF") Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a possibility that ulp_irq_stop and ulp_irq_start
callbacks will be called when the device is in detached state.
This can cause a crash due to NULL pointer dereference as
the rdev is already freed.
Fixes: cc5b9b48d447 ("RDMA/bnxt_re: Recover the device when FW error is detected") Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/1738657285-23968-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
L2 driver allocates the vectors for RoCE and pass it through the
en_dev structure to RoCE. During probe, cache the MSIx related
info to a local structure.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/1731577748-1804-5-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
Stable-dep-of: f0df225d12fc ("RDMA/bnxt_re: Add sanity checks on rdev validity") Signed-off-by: Sasha Levin <sashal@kernel.org>
Move NQ related data structures from rdev to a new structure
named "struct bnxt_re_nq_record" by keeping a pointer to in
the rdev structure. Allocate the memory for it dynamically.
This change is needed for subsequent patches in the series.
Also, removed the nq_task variable from rdev structure as it
is redundant and no longer used.
This change would help to reduce the size of the driver private
structure as well.
L2 driver allocates and populates the MSI-x vector details for RoCE
in the en_dev structure. RoCE driver requires minimum 2 MSIx vectors.
Hence during probe, driver has to check and bail out if there are not
enough MSI-x vectors reserved for it before proceeding further
initialization.
If a QP is modified to error state and a flush CQE process is triggered,
the subsequent QP destruction mbox can still be successfully posted but
will be blocked in HW until the flush CQE process finishes. This causes
further mbox posting timeouts in driver. The blocking time is related
to QP depth. Considering an extreme case where SQ depth and RQ depth
are both 32K, the blocking time can reach about 135ms.
This patch adds a retry mechanism for mbox posting. For each try, FW
waits 15ms for HW to complete the previous mbox, otherwise return a
timeout error code to driver. Counting other time consumption in FW,
set 8 tries for mbox posting and a 5ms time gap before each retry to
increase to a sufficient timeout limit.
Fixes: 0425e3e6e0c7 ("RDMA/hns: Support flush cqe for hip08 in kernel space") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20250208105930.522796-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Allocate a PAGE aligned doorbell index to ensure each process gets a
separate PAGE sized doorbell area space remapped to it in mana_ib_mmap
Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/1738751405-15041-1-git-send-email-kotaranov@linux.microsoft.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Memory regions (MR) of type DM (device memory) do not have an associated
umem.
In the __mlx5_ib_dereg_mr() -> mlx5_free_priv_descs() flow, the code
incorrectly takes the wrong branch, attempting to call
dma_unmap_single() on a DMA address that is not mapped.
This results in a WARN [1], as shown below.
The issue is resolved by properly accounting for the DM type and
ensuring the correct branch is selected in mlx5_free_priv_descs().
This patch addresses a potential race condition for a DMABUF MR that can
result in a CQE with an error on the UMR QP.
During the __mlx5_ib_dereg_mr() flow, the following sequence of calls
occurs:
mlx5_revoke_mr()
mlx5r_umr_revoke_mr()
mlx5r_umr_post_send_wait()
At this point, the lkey is freed from the hardware's perspective.
However, concurrently, mlx5_ib_dmabuf_invalidate_cb() might be triggered
by another task attempting to invalidate the MR having that freed lkey.
Since the lkey has already been freed, this can lead to a CQE error,
causing the UMR QP to enter an error state.
To resolve this race condition, the dma_resv_lock() which was hold as
part of the mlx5_ib_dmabuf_invalidate_cb() is now also acquired as part
of the mlx5_revoke_mr() scope.
Upon a successful revoke, we set umem_dmabuf->private which points to
that MR to NULL, preventing any further invalidation attempts on its
lkey.
When a DCT QP is created on an active lag, it's dctc.port is assigned
in a round-robin way, which is from 1 to dev->lag_port. In this case
when querying this QP, we may get qp_attr.port_num > 2.
Fix this by setting qp->port when modifying a DCT QP, and read port_num
from qp->port instead of dctc.port when querying it.
This patch addresses an issue in the recovery flow of the UMR QP,
ensuring tasks do not get stuck, as highlighted by the call trace [1].
During recovery, before transitioning the QP to the RESET state, the
software must wait for all outstanding WRs to complete.
Failing to do so can cause the firmware to skip sending some flushed
CQEs with errors and simply discard them upon the RESET, as per the IB
specification.
This race condition can result in lost CQEs and tasks becoming stuck.
To resolve this, the patch sends a final WR which serves only as a
barrier before moving the QP state to RESET.
Once a CQE is received for that final WR, it guarantees that no
outstanding WRs remain, making it safe to transition the QP to RESET and
subsequently back to RTS, restoring proper functionality.
Note:
For the barrier WR, we simply reuse the failed and ready WR.
Since the QP is in an error state, it will only receive
IB_WC_WR_FLUSH_ERR. However, as it serves only as a barrier we don't
care about its status.
When mesa started using compute queues more often
we started seeing additional hangs with compute queues.
Disabling gfxoff seems to mitigate that. Manually
control gfxoff and gfx pg with command submissions to avoid
any issues related to gfxoff. KFD already does the same
thing for these chips.
v2: limit to compute
v3: limit to APUs
v4: limit to Raven/PCO
v5: only update the compute ring_funcs
v6: Disable GFX PG
v7: adjust order
In general the delay should be added by the PHY instead of the MAC,
and this improves network stability on some boards which seem to
need different delay.
The mm kselftests are currently built with no optimisation (-O0). It's
unclear why, and besides being obviously suboptimal, this also prevents
the pkeys tests from working as intended. Let's build all the tests with
-O2.
[kevin.brodsky@arm.com: silence unused-result warnings] Link: https://lkml.kernel.org/r/20250107170110.2819685-1-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20241209095019.1732120-6-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Keith Lucas <keith.lucas@oracle.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Yifei: This commit also fix the failure of pkey_sighandler_tests_64,
which is also in linux-6.12.y, thus backport this commit. ] Signed-off-by: Yifei Liu <yifei.l.liu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While fixing migration disabled task handling, 32966821574c ("sched_ext: Fix
migration disabled handling in targeted dispatches") assumed that a
migration disabled task's ->cpus_ptr would only have the pinned CPU. While
this is eventually true for migration disabled tasks that are switched out,
->cpus_ptr update is performed by migrate_disable_switch() which is called
right before context_switch() in __scheduler(). However, the task is
enqueued earlier during pick_next_task() via put_prev_task_scx(), so there
is a race window where another CPU can see the task on a DSQ.
If the CPU tries to dispatch the migration disabled task while in that
window, task_allowed_on_cpu() will succeed and task_can_run_on_remote_rq()
will subsequently trigger SCHED_WARN(is_migration_disabled()).
Fix it by converting the SCHED_WARN() back to a regular failure path. Also,
perform the migration disabled test before task_allowed_on_cpu() test so
that BPF schedulers which fail to handle migration disabled tasks can be
noticed easily.
While at it, adjust scx_ops_error() message for !task_allowed_on_cpu() case
for brevity and consistency.
Fix a deadlock in pse_pi_get_current_limit and pse_pi_set_current_limit
caused by consecutive mutex_lock calls. One in the function itself and
another in pse_pi_get_voltage.
Resolve the issue by using the unlocked version of pse_pi_get_voltage
instead.
Fixes: e0a5e2bba38a ("net: pse-pd: Use power limit at driver side instead of current limit") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250212151751.1515008-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the function tracing_set_tracer() switched over to using the guard()
infrastructure, it did not need to save the 'ret' variable and would just
return the value when an error arised, instead of setting ret and jumping
to an out label.
When CONFIG_TRACER_SNAPSHOT is enabled, it had code that expected the
"ret" variable to be initialized to zero and had set 'ret' while holding
an arch_spin_lock() (not used by guard), and then upon releasing the lock
it would check 'ret' and exit if set. But because ret was only set when an
error occurred while holding the locks, 'ret' would be used uninitialized
if there was no error. The code in the CONFIG_TRACER_SNAPSHOT block should
be self contain. Make sure 'ret' is also set when no error occurred.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250106111143.2f90ff65@gandalf.local.home Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202412271654.nJVBuwmF-lkp@intel.com/ Fixes: d33b10c0c73ad ("tracing: Switch trace.c code over to use guard()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Check if a function is already in the manager ops of a subops. A manager
ops contains multiple subops, and if two or more subops are tracing the
same function, the manager ops only needs a single entry in its hash.
Function graph uses a subops and manager ops mechanism to attach to
ftrace. The manager ops connects to ftrace and the functions it connects
to is defined by a list of subops that it manages.
The function hash that defines what the above ops attaches to limits the
functions to attach if the hash has any content. If the hash is empty, it
means to trace all functions.
The creation of the manager ops hash is done by iterating over all the
subops hashes. If any of the subops hashes is empty, it means that the
manager ops hash must trace all functions as well.
The issue is in the creation of the manager ops. When a second subops is
attached, a new hash is created by starting it as NULL and adding the
subops one at a time. But the NULL ops is mistaken as an empty hash, and
once an empty hash is found, it stops the loop of subops and just enables
all functions.
Fix this by initializing the new hash to NULL and if the hash is NULL do
not treat it as an empty hash but instead allocate by copying the content
of the first sub ops. Then on subsequent iterations, the new hash will not
be NULL, but the content of the previous subops. If that first subops
attached to all functions, then new hash may assume that the manager ops
also needs to attach to all functions.
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/20250220202055.060300046@goodmis.org Fixes: 5fccc7552ccbc ("ftrace: Add subops logic to allow one ops to manage many") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The function tracer should record the preemption level at the point when
the function is invoked. If the tracing subsystem decrement the
preemption counter it needs to correct this before feeding the data into
the trace buffer. This was broken in the commit cited below while
shifting the preempt-disabled section.
Use tracing_gen_ctx_dec() which properly subtracts one from the
preemption counter on a preemptible kernel.
Cc: stable@vger.kernel.org Cc: Wander Lairson Costa <wander@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/20250220140749.pfw8qoNZ@linutronix.de Fixes: ce5e48036c9e7 ("ftrace: disable preemption when recursion locked") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Wander Lairson Costa <wander@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The previous implementation incorrectly configured the cmn_interrupt_2_enable
register for interrupt handling. Using cmn_interrupt_2_enable to configure
Tag, Data RAM ECC interrupts would lead to issues like double handling of the
interrupts (EL1 and EL3) as cmn_interrupt_2_enable is meant to be configured
for interrupts which needs to be handled by EL3.
EL1 LLCC EDAC driver needs to use cmn_interrupt_0_enable register to configure
Tag, Data RAM ECC interrupts instead of cmn_interrupt_2_enable.