In sun8i_ce_cipher_unprepare(), dma_unmap_sg() is incorrectly called with
the number of entries returned by dma_map_sg(), rather than using the
original number of entries passed when mapping the scatterlist.
To fix this, stash the original number of entries passed to dma_map_sg()
in the request context.
Commit bc4d25fdfadf ("clk: renesas: rzv2h: Add support for dynamic
switching divider clocks") missed setting the `CLK_SET_RATE_PARENT`
flag when registering ddiv clocks.
Without this flag, rate changes to the divider clock do not propagate
to its parent, potentially resulting in incorrect clock configurations.
Fix this by setting `CLK_SET_RATE_PARENT` in the clock init data.
When FORTIFY_SOURCE reports about a run-time buffer overread, the wrong
buffer size was being shown in the error message. (The bounds checking
was correct.)
When gmin_get_config_var() calls efi.get_variable() and the EFI variable
is larger than the expected buffer size, two behaviors combine to create
a stack buffer overflow:
1. gmin_get_config_var() does not return the proper error code when
efi.get_variable() fails. It returns the stale 'ret' value from
earlier operations instead of indicating the EFI failure.
2. When efi.get_variable() returns EFI_BUFFER_TOO_SMALL, it updates
*out_len to the required buffer size but writes no data to the output
buffer. However, due to bug #1, gmin_get_var_int() believes the call
succeeded.
The caller gmin_get_var_int() then performs:
- Allocates val[CFG_VAR_NAME_MAX + 1] (65 bytes) on stack
- Calls gmin_get_config_var(dev, is_gmin, var, val, &len) with len=64
- If EFI variable is >64 bytes, efi.get_variable() sets len=required_size
- Due to bug #1, thinks call succeeded with len=required_size
- Executes val[len] = 0, writing past end of 65-byte stack buffer
This creates a stack buffer overflow when EFI variables are larger than
64 bytes. Since EFI variables can be controlled by firmware or system
configuration, this could potentially be exploited for code execution.
Fix the bug by returning proper error codes from gmin_get_config_var()
based on EFI status instead of stale 'ret' value.
The gmin_get_var_int() function is called during device initialization
for camera sensor configuration on Intel Bay Trail and Cherry Trail
platforms using the atomisp camera stack.
In the ARM64 BPF JIT when prog->aux->exception_boundary is set for a BPF
program, find_used_callee_regs() is not called because for a program
acting as exception boundary, all callee saved registers are saved.
find_used_callee_regs() sets `ctx->fp_used = true;` when it sees FP
being used in any of the instructions.
For programs acting as exception boundary, ctx->fp_used remains false
even if frame pointer is used by the program and therefore, FP is not
set-up for such programs in the prologue. This can cause the kernel to
crash due to a pagefault.
Fix it by setting ctx->fp_used = true for exception boundary programs as
fp is always saved in such programs.
rt->fib6_nsiblings can be read locklessly, add corresponding
READ_ONCE() and WRITE_ONCE() annotations.
Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250725140725.3626540-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
fib6_info_uses_dev() seems to rely on RCU without an explicit
protection.
Like the prior fix in rt6_nlmsg_size(),
we need to make sure fib6_del_route() or fib6_add_rt2node()
have not removed the anchor from the list, or we risk an infinite loop.
Fixes: d9ccb18f83ea ("ipv6: Fix soft lockups in fib6_select_path under high next hop churn") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250725140725.3626540-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
This is because fib6_del_route() and fib6_add_rt2node()
uses list_del_rcu(), which can confuse rcu readers,
because they might no longer see the head of the list.
Restart the loop if f6i->fib6_nsiblings is zero.
Fixes: d9ccb18f83ea ("ipv6: Fix soft lockups in fib6_select_path under high next hop churn") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250725140725.3626540-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The esp4_offload module, loaded during IPsec offload tests, should
be reset to its default settings after testing.
Otherwise, leaving it enabled could unintentionally affect subsequence
test cases by keeping offload active.
A negative overflow can happen when the budget number of descs are
consumed. as long as the budget is decreased to zero, it will again go
into while (budget-- > 0) statement and get decreased by one, so the
overflow issue can happen. It will lead to returning true whereas the
expected value should be false.
In this case where all the budget is used up, it means zc function
should return false to let the poll run again because normally we
might have more data to process. Without this patch, zc function would
return true instead.
Fixes: 132c32ee5bc0 ("net: stmmac: Add TX via XDP zero-copy socket") Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Link: https://patch.msgid.link/20250723142327.85187-2-kerneljasonxing@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When KSZ8863 support was first added to KSZ driver the RX drop MIB
counter was somehow defined as 0x105. The TX drop MIB counter
starts at 0x100 for port 1, 0x101 for port 2, and 0x102 for port 3, so
the RX drop MIB counter should start at 0x103 for port 1, 0x104 for
port 2, and 0x105 for port 3.
There are 5 ports for KSZ8895, so its RX drop MIB counter starts at
0x105.
Fixes: 4b20a07e103f ("net: dsa: microchip: ksz8795: add support for ksz88xx chips") Signed-off-by: Tristram Ha <tristram.ha@microchip.com> Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20250723030403.56878-1-Tristram.Ha@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Hardware returns a unique identifier for a decrypted packet's xfrm
state, this state is looked up in an xarray. However, the state might
have been freed by the time of this lookup.
Currently, if the state is not found, only a counter is incremented.
The secpath (sp) extension on the skb is not removed, resulting in
sp->len becoming 0.
Subsequently, functions like __xfrm_policy_check() attempt to access
fields such as xfrm_input_state(skb)->xso.type (which dereferences
sp->xvec[sp->len - 1]) without first validating sp->len. This leads to
a crash when dereferencing an invalid state pointer.
This patch prevents the crash by explicitly removing the secpath
extension from the skb if the xfrm state is not found after hardware
decryption. This ensures downstream functions do not operate on a
zero-length secpath.
When updating the PBMC register, we read its current value,
modify desired fields, then write it back.
The port_buffer_size field within PBMC is Read-Only (RO).
If this RO field contains a non-zero value when read,
attempting to write it back will cause the entire PBMC
register update to fail.
This commit ensures port_buffer_size is explicitly cleared
to zero after reading the PBMC register but before writing
back the modified value.
This allows updates to other fields in the PBMC register to succeed.
Assign netdev.dev_port based on the device channel index, to indicate the
port number of the network device.
While this driver already uses netdev.dev_id for that purpose, dev_port is
more appropriate. However, retain dev_id to avoid potential regressions.
Fixes: 3e66d0138c05 ("can: populate netdev::dev_id for udev discrimination") Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-4-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
The latest firmware versions of USB CAN FD interfaces export the EP numbers
to be used to dialog with the device via the "type" field of a response to
a vendor request structure, particularly when its value is greater than or
equal to 2.
Correct the driver's test of this field.
Fixes: 4f232482467a ("can: peak_usb: include support for a new MCU") Signed-off-by: Stephane Grosjean <stephane.grosjean@hms-networks.com> Link: https://patch.msgid.link/20250724081550.11694-1-stephane.grosjean@free.fr Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
[mkl: rephrase commit message] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, the userspace RV tool skips trace events triggered by the RV
tool itself, this can be changed by passing the parameter -s, which sets
the variable config_my_pid to 0 (instead of the tool's PID).
This has the side effect of skipping events generated by idle (PID 0).
Set config_my_pid to -1 (an invalid pid) to avoid skipping idle.
Cc: Nam Cao <namcao@linutronix.de> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250723161240.194860-2-gmonaco@redhat.com Fixes: 6d60f89691fc ("tools/rv: Add in-kernel monitor interface") Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The Event_Type field in an LE Extended Advertising Report uses bits 5
and 6 for data status (e.g. truncation or fragmentation), not the PDU
type itself.
The ext_evt_type_to_legacy() function fails to mask these status bits
before evaluation. This causes valid advertisements with status bits set
(e.g. a truncated non-connectable advertisement, which ends up showing
as PDU type 0x40) to be misclassified as unknown and subsequently
dropped. This is okay for most checks which use bitwise AND on the
relevant event type bits, but it doesn't work for non-connectable types,
which are checked with '== LE_EXT_ADV_NON_CONN_IND' (that is, zero).
In terms of behaviour, first the device sends a truncated report:
> HCI Event: LE Meta Event (0x3e) plen 26
LE Extended Advertising Report (0x0d)
Entry 0
Event type: 0x0040
Data status: Incomplete, data truncated, no more to come
Address type: Random (0x01)
Address: 1D:12:46:FA:F8:6E (Non-Resolvable)
SID: 0x03
RSSI: -98 dBm (0x9e)
Data length: 0x00
Then, a few seconds later, it sends the subsequent complete report:
> HCI Event: LE Meta Event (0x3e) plen 122
LE Extended Advertising Report (0x0d)
Entry 0
Event type: 0x0000
Data status: Complete
Address type: Random (0x01)
Address: 1D:12:46:FA:F8:6E (Non-Resolvable)
SID: 0x03
RSSI: -97 dBm (0x9f)
Data length: 0x60
Service Data: Google (0xfef3)
Data[92]: ...
These devices often send multiple truncated reports per second.
This patch introduces a PDU type mask to ensure only the relevant bits
are evaluated, allowing for the correct translation of all valid
extended advertising packets.
Fixes: b2cc9761f144 ("Bluetooth: Handle extended ADV PDU types") Signed-off-by: Chris Down <chris@chrisdown.name> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes: 4ffca5a96678 (mm: support only one page_type per page) Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20250611155916.2579160-11-willy@infradead.org Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
kernel/kcsan/kcsan_test.c:591:41: error: variable 'dummy' is
uninitialized when passed as a const pointer argument here
[-Werror,-Wuninitialized-const-pointer]
591 | KCSAN_EXPECT_READ_BARRIER(atomic_read(&dummy), false);
| ^~~~~
1 error generated.
Although this particular test does not care about the value stored in
the dummy atomic variable, let's silence the warning.
Link: https://lkml.kernel.org/r/CA+G9fYu8JY=k-r0hnBRSkQQrFJ1Bz+ShdXNwC1TNeMt0eXaxeA@mail.gmail.com Fixes: 8bc32b348178 ("kcsan: test: Add test cases for memory barrier instrumentation") Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reviewed-by: Alexander Potapenko <glider@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the ring buffer was first introduced, reading the non-consuming
"trace" file required disabling the writing of the ring buffer. To make
sure the writing was fully disabled before iterating the buffer with a
non-consuming read, it would set the disable flag of the buffer and then
call an RCU synchronization to make sure all the buffers were
synchronized.
The function ring_buffer_read_start() originally would initialize the
iterator and call an RCU synchronization, but this was for each individual
per CPU buffer where this would get called many times on a machine with
many CPUs before the trace file could be read. The commit 72c9ddfd4c5bf
("ring-buffer: Make non-consuming read less expensive with lots of cpus.")
separated ring_buffer_read_start into ring_buffer_read_prepare(),
ring_buffer_read_sync() and then ring_buffer_read_start() to allow each of
the per CPU buffers to be prepared, call the read_buffer_read_sync() once,
and then the ring_buffer_read_start() for each of the CPUs which made
things much faster.
The commit 1039221cc278 ("ring-buffer: Do not disable recording when there
is an iterator") removed the requirement of disabling the recording of the
ring buffer in order to iterate it, but it did not remove the
synchronization that was happening that was required to wait for all the
buffers to have no more writers. It's now OK for the buffers to have
writers and no synchronization is needed.
Remove the synchronization and put back the interface for the ring buffer
iterator back before commit 72c9ddfd4c5bf was applied.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250630180440.3eabb514@batman.local.home Reported-by: David Howells <dhowells@redhat.com> Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator") Tested-by: David Howells <dhowells@redhat.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The processing of the struct cfg80211_sar_specs::sub_specs flexible
array requires its counter, num_sub_specs, to be assigned before the
loop in nl80211_set_sar_specs(). Leave the final assignment after the
loop in place in case fewer ended up in the array.
Fixes: aa4ec06c455d ("wifi: cfg80211: use __counted_by where appropriate") Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/20250721183125.work.183-kees@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While I caught the need for setting cnt early in nl80211_parse_rnr_elems()
in the original annotation of struct cfg80211_rnr_elems with __counted_by,
I missed a similar pattern in ieee80211_copy_rnr_beacon(). Fix this by
moving the cnt assignment to before the loop.
Fixes: 7b6d7087031b ("wifi: cfg80211: Annotate struct cfg80211_rnr_elems with __counted_by") Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/20250721182521.work.540-kees@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
After commit bd99a3013bdc ("brcmfmac: move configuration of probe request
IEs"), the probe request MGMT IE addition operation brcmf_vif_set_mgmt_ie()
got moved from the brcmf_p2p_scan_prep() to the brcmf_cfg80211_scan().
Because of this, as part of the scan request handler for the P2P Discovery,
vif struct used for adding the Probe Request P2P IE in firmware got changed
from the P2PAPI_BSSCFG_DEVICE vif to P2PAPI_BSSCFG_PRIMARY vif incorrectly.
So the firmware stopped adding P2P IE to the outgoing P2P Discovery probe
requests frames and the other P2P peers were unable to discover this device
causing a regression on the P2P feature.
To fix this, while setting the P2P IE in firmware, properly use the vif of
the P2P discovery wdev on which the driver received the P2P scan request.
This is done by not changing the vif pointer, until brcmf_vif_set_mgmt_ie()
is completed.
Fixes: bd99a3013bdc ("brcmfmac: move configuration of probe request IEs") Signed-off-by: Gokul Sivakumar <gokulkumar.sivakumar@infineon.com> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com> Link: https://patch.msgid.link/20250626050706.7271-1-gokulkumar.sivakumar@infineon.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently there is no endian conversion in ath12k_wmi_tlv_services_parser()
so the service bit parsing will be incorrect on a big endian platform and
to fix this by using appropriate endian conversion.
Fixes: 342527f35338 ("wifi: ath12k: Add support to parse new WMI event for 6 GHz regulatory") Signed-off-by: Tamizh Chelvam Raja <tamizh.raja@oss.qualcomm.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20250717173539.2523396-2-tamizh.raja@oss.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
With 802.11 encapsulation offloading, ieee80211_tx_h_select_key() is
called on 802.3 frames. In that case do not try to use skb data as
valid 802.11 headers.
Ignore TXQs with the flag IEEE80211_TXQ_STOP when scheduling a queue.
The flag is only set after all fragments have been dequeued and won't
allow dequeueing other frames as long as the flag is set.
For drivers using ieee80211_txq_schedule_start() this prevents an
loop trying to push the queued frames while IEEE80211_TXQ_STOP is set:
After setting IEEE80211_TXQ_STOP the driver will call
ieee80211_return_txq(). Which calls __ieee80211_schedule_txq(), detects
that there sill are frames in the queue and immediately restarts the
stopped TXQ. Which can't dequeue any frame and thus starts over the loop.
Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de> Fixes: ba8c3d6f16a1 ("mac80211: add an intermediate software queue implementation") Link: https://patch.msgid.link/20250717162547.94582-2-Alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If probe fails before ieee80211_register_hw() is successfully done,
ieee80211_unregister_hw() will be called anyway. This may lead to various
bugs as the implementation of ieee80211_unregister_hw() assumes that
ieee80211_register_hw() has been called.
Divide error handling section into relevant subsections, so that
ieee80211_unregister_hw() is called only when it is appropriate. Correct
the order of the calls: ieee80211_unregister_hw() should go before
plfxlc_mac_release(). Also move ieee80211_free_hw() to plfxlc_mac_release()
as it supposed to be the opposite to plfxlc_mac_alloc_hw() that calls
ieee80211_alloc_hw().
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 68d57a07bfe5 ("wireless: add plfxlc driver for pureLiFi X, XL, XC devices") Signed-off-by: Murad Masimov <m.masimov@mt-integration.ru> Link: https://patch.msgid.link/20250321185226.71-3-m.masimov@mt-integration.ru Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
syzbot triggered a WARN in ieee80211_tdls_oper() by sending
NL80211_TDLS_ENABLE_LINK immediately after NL80211_CMD_CONNECT,
before association completed and without prior TDLS setup.
This left internal state like sdata->u.mgd.tdls_peer uninitialized,
leading to a WARN_ON() in code paths that assumed it was valid.
Reject the operation early if not in station mode or not associated.
Reported-by: syzbot+f73f203f8c9b19037380@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f73f203f8c9b19037380 Fixes: 81dd2b882241 ("mac80211: move TDLS data to mgd private part") Tested-by: syzbot+f73f203f8c9b19037380@syzkaller.appspotmail.com Signed-off-by: Moon Hee Lee <moonhee.lee.ca@gmail.com> Link: https://patch.msgid.link/20250715230904.661092-2-moonhee.lee.ca@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We observed a regression in our customer’s environment after enabling
CONFIG_LAZY_RCU. In the Android Update Engine scenario, where ioctl() is
used heavily, we found that callbacks queued via call_rcu_hurry (such as
percpu_ref_switch_to_atomic_rcu) can sometimes be delayed by up to 5
seconds before execution. This occurs because the new grace period does
not start immediately after the previous one completes.
The root cause is that the wake_nocb_gp_defer() function now checks
"rdp->nocb_defer_wakeup" instead of "rdp_gp->nocb_defer_wakeup". On CPUs
that are not rcuog, "rdp->nocb_defer_wakeup" may always be
RCU_NOCB_WAKE_NOT. This can cause "rdp_gp->nocb_defer_wakeup" to be
downgraded and the "rdp_gp->nocb_timer" to be postponed by up to 10
seconds, delaying the execution of hurry RCU callbacks.
The trace log of one scenario we encountered is as follow:
// previous GP ends at this point
rcu_preempt [000] d..1. 137.240210: rcu_grace_period: rcu_preempt 8369 end
rcu_preempt [000] ..... 137.240212: rcu_grace_period: rcu_preempt 8372 reqwait
// call_rcu_hurry enqueues "percpu_ref_switch_to_atomic_rcu", the callback waited on by UpdateEngine
update_engine [002] d..1. 137.301593: __call_rcu_common: wyy: unlikely p_ref = 00000000********. lazy = 0
// FirstQ on cpu 2 rdp_gp->nocb_timer is set to fire after 1 jiffy (4ms)
// and the rdp_gp->nocb_defer_wakeup is set to RCU_NOCB_WAKE
update_engine [002] d..2. 137.301595: rcu_nocb_wake: rcu_preempt 2 FirstQ on cpu2 with rdp_gp (cpu0).
// FirstBQ event on cpu2 during the 1 jiffy, make the timer postpond 10 seconds later.
// also, the rdp_gp->nocb_defer_wakeup is overwrite to RCU_NOCB_WAKE_LAZY
update_engine [002] d..1. 137.301601: rcu_nocb_wake: rcu_preempt 2 WakeEmptyIsDeferred
...
...
...
// before the 10 seconds timeout, cpu0 received another call_rcu_hurry
// reset the timer to jiffies+1 and set the waketype = RCU_NOCB_WAKE.
kworker/u32:0 [000] d..2. 142.557564: rcu_nocb_wake: rcu_preempt 0 FirstQ
kworker/u32:0 [000] d..1. 142.557576: rcu_nocb_wake: rcu_preempt 0 WakeEmptyIsDeferred
kworker/u32:0 [000] d..1. 142.558296: rcu_nocb_wake: rcu_preempt 0 WakeNot
kworker/u32:0 [000] d..1. 142.558562: rcu_nocb_wake: rcu_preempt 0 WakeNot
// idle(do_nocb_deferred_wakeup) wake rcuog due to waketype == RCU_NOCB_WAKE
<idle> [000] d..1. 142.558786: rcu_nocb_wake: rcu_preempt 0 DoWake
<idle> [000] dN.1. 142.558839: rcu_nocb_wake: rcu_preempt 0 DeferredWake
rcuog/0 [000] ..... 142.558871: rcu_nocb_wake: rcu_preempt 0 EndSleep
rcuog/0 [000] ..... 142.558877: rcu_nocb_wake: rcu_preempt 0 Check
// finally rcuog request a new GP at this point (5 seconds after the FirstQ event)
rcuog/0 [000] d..2. 142.558886: rcu_grace_period: rcu_preempt 8372 newreq
rcu_preempt [001] d..1. 142.559458: rcu_grace_period: rcu_preempt 8373 start
...
rcu_preempt [000] d..1. 142.564258: rcu_grace_period: rcu_preempt 8373 end
rcuop/2 [000] D..1. 142.566337: rcu_batch_start: rcu_preempt CBs=219 bl=10
// the hurry CB is invoked at this point
rcuop/2 [000] b.... 142.566352: blk_queue_usage_counter_release: wyy: wakeup. p_ref = 00000000********.
This patch changes the condition to check "rdp_gp->nocb_defer_wakeup" in
the lazy path. This prevents an already scheduled "rdp_gp->nocb_timer"
from being postponed and avoids overwriting "rdp_gp->nocb_defer_wakeup"
when it is not RCU_NOCB_WAKE_NOT.
Fixes: 3cb278e73be5 ("rcu: Make call_rcu() lazy to save power") Co-developed-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com> Signed-off-by: Cheng-jui Wang <cheng-jui.wang@mediatek.com> Co-developed-by: Lorry.Luo@mediatek.com Signed-off-by: Lorry.Luo@mediatek.com Tested-by: weiyangyang@vivo.com Signed-off-by: weiyangyang@vivo.com Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The AMD IOMMU documentation seems pretty clear that the V2 table follows
the normal CPU expectation of sign extension. This is shown in
Figure 25: AMD64 Long Mode 4-Kbyte Page Address Translation
Where bits Sign-Extend [63:57] == [56]. This is typical for x86 which
would have three regions in the page table: lower, non-canonical, upper.
The manual describes that the V1 table does not sign extend in section
2.2.4 Sharing AMD64 Processor and IOMMU Page Tables GPA-to-SPA
Further, Vasant has checked this and indicates the HW has an addtional
behavior that the manual does not yet describe. The AMDv2 table does not
have the sign extended behavior when attached to PASID 0, which may
explain why this has gone unnoticed.
The iommu domain geometry does not directly support sign extended page
tables. The driver should report only one of the lower/upper spaces. Solve
this by removing the top VA bit from the geometry to use only the lower
space.
This will also make the iommu_domain work consistently on all PASID 0 and
PASID != 1.
Adjust dma_max_address() to remove the top VA bit. It now returns:
5 Level:
Before 0x1ffffffffffffff
After 0x0ffffffffffffff
4 Level:
Before 0xffffffffffff
After 0x7fffffffffff
Fixes: 097af47d3cfb ("drm/amdgpu/gfx10: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes: 4c953e53cc34 ("drm/amdgpu/gfx_9.4.3: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes: fdbd69486b46 ("drm/amdgpu/gfx9: wait for reset done before remap") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
ath11k_mac_disable_peer_fixed_rate() is passed as the iterator to
ieee80211_iterate_stations_atomic(). Note in this case the iterator is
required to be atomic, however ath11k_mac_disable_peer_fixed_rate() does
not follow it as it might sleep. Consequently below warning is seen:
BUG: sleeping function called from invalid context at wmi.c:304
Call Trace:
<TASK>
dump_stack_lvl
__might_resched.cold
ath11k_wmi_cmd_send
ath11k_wmi_set_peer_param
ath11k_mac_disable_peer_fixed_rate
ieee80211_iterate_stations_atomic
ath11k_mac_op_set_bitrate_mask.cold
Change to ieee80211_iterate_stations_mtx() to fix this issue.
The DMA map functions can fail and should be tested for errors.
If the mapping fails, unmap and return an error.
Fixes: 788838ebe8a4 ("mwl8k: use pci_unmap_addr{,set}() to keep track of unmap addresses on rx") Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Link: https://patch.msgid.link/20250709111339.25360-2-fourier.thomas@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When working in station mode, TDLS peers are assigned macid 0, even
though 0 was already assigned to the AP. This causes the connection
with the AP to stop working after the TDLS connection is torn down.
Assign the next available macid to TDLS peers, same as client stations
in AP mode.
Fixes: 902cb7b11f9a ("wifi: rtw88: assign mac_id for vif/sta and update to TX desc") Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/58648c09-8553-4bcc-a977-9dc9afd63780@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 1e5b3b3fe9e0 ("rtl8xxxu: Adjust RX skb size to include space for
phystats") increased the skb size when aggregation is enabled but decreased
it for the aggregation disabled case.
As a result, if a frame near the maximum size is received,
rtl8xxxu_rx_complete() is called with status -EOVERFLOW and then the
driver starts to malfunction and no further communication is possible.
Restore the skb size in the aggregation disabled case.
Fixes: 1e5b3b3fe9e0 ("rtl8xxxu: Adjust RX skb size to include space for phystats") Signed-off-by: Martin Kaistra <martin.kaistra@linutronix.de> Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/20250709121522.1992366-1-martin.kaistra@linutronix.de Signed-off-by: Sasha Levin <sashal@kernel.org>
tcp_measure_rcv_mss() is used to update icsk->icsk_ack.rcv_mss
(tcpi_rcv_mss in tcp_info) and tp->scaling_ratio.
Calling it from tcp_data_queue_ofo() makes sure these
fields are updated, and permits a better tuning
of sk->sk_rcvbuf, in the case a new flow receives many ooo
packets.
Fixes: dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250711114006.480026-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When compiling the kernel with LLVM, the following warning was issued:
drivers/xen/gntdev.c:991: warning: stack frame size (1160) exceeds
limit (1024) in function 'gntdev_ioctl'
The main reason is struct gntdev_copy_batch which is located on the
stack and has a size of nearly 1kb.
For performance reasons it shouldn't by just dynamically allocated
instead, so allocate a new instance when needed and instead of freeing
it put it into a list of free structs anchored in struct gntdev_priv.
[dma_buf_fd() fixes; no preferences regarding the tree it goes through -
up to xen folks]
As soon as we'd inserted a file reference into descriptor table, another
thread could close it. That's fine for the case when all we are doing is
returning that descriptor to userland (it's a race, but it's a userland
race and there's nothing the kernel can do about it). However, if we
follow fd_install() with any kind of access to objects that would be
destroyed on close (be it the struct file itself or anything destroyed
by its ->release()), we have a UAF.
dma_buf_fd() is a combination of reserving a descriptor and fd_install().
gntdev dmabuf_exp_from_pages() calls it and then proceeds to access the
objects destroyed on close - starting with gntdev_dmabuf itself.
Fix that by doing reserving descriptor before anything else and do
fd_install() only when everything had been set up.
When changing the page size on an mkey, the driver needs to set the
appropriate bits in the mkey mask to indicate which fields are being
modified.
The 6th bit of a page size in mlx5 driver is considered an extension,
and this bit has a dedicated capability and mask bits.
Previously, the driver was not setting this mask in the mkey mask when
performing page size changes, regardless of its hardware support,
potentially leading to an incorrect page size updates.
This fixes the issue by setting the relevant bit in the mkey mask when
performing page size changes on an mkey and the 6th bit of this field is
supported by the hardware.
Commit 21c167aa0ba9 ("net/sched: act_ctinfo: use percpu stats")
missed that stats_dscp_set, stats_dscp_error and stats_cpmark_set
might be written (and read) locklessly.
Use atomic64_t for these three fields, I doubt act_ctinfo is used
heavily on big SMP hosts anyway.
Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pedro Tammela <pctammela@mojatatu.com> Link: https://patch.msgid.link/20250709090204.797558-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
netem_enqueue's duplication prevention logic breaks when a netem
resides in a qdisc tree with other netems - this can lead to a
soft lockup and OOM loop in netem_dequeue, as seen in [1].
Ensure that a duplicating netem cannot exist in a tree with other
netems.
Previous approaches suggested in discussions in chronological order:
1) Track duplication status or ttl in the sk_buff struct. Considered
too specific a use case to extend such a struct, though this would
be a resilient fix and address other previous and potential future
DOS bugs like the one described in loopy fun [2].
2) Restrict netem_enqueue recursion depth like in act_mirred with a
per cpu variable. However, netem_dequeue can call enqueue on its
child, and the depth restriction could be bypassed if the child is a
netem.
3) Use the same approach as in 2, but add metadata in netem_skb_cb
to handle the netem_dequeue case and track a packet's involvement
in duplication. This is an overly complex approach, and Jamal
notes that the skb cb can be overwritten to circumvent this
safeguard.
4) Prevent the addition of a netem to a qdisc tree if its ancestral
path contains a netem. However, filters and actions can cause a
packet to change paths when re-enqueued to the root from netem
duplication, leading us to the current solution: prevent a
duplicating netem from inhabiting the same tree as other netems.
Per the PCIe spec, behavior of the PASID capability is undefined if the
value of the PASID Enable bit changes while the Enable bit of the
function's ATS control register is Set. Unfortunately,
pdev_enable_caps() does exactly that by ordering enabling ATS for the
device before enabling PASID.
Remove the declaration of 'err' inside the 'if (timetravel)' block,
as it would otherwise be unavailable outside that block, potentially
leading to uml_rtc_start() returning an uninitialized value.
It's needed to check the return value of lockdep_commit_lock_is_held(),
otherwise there's no point in this assertion as it doesn't print any
debug information on itself.
Found by Linux Verification Center (linuxtesting.org) with Svace static
analysis tool.
Fixes: b04df3da1b5c ("netfilter: nf_tables: do not defer rule destruction via call_rcu") Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
This practically reverts commit 28339b21a365 ("netfilter: nf_tables: do
not send complete notification of deletions"): The feature was never
effective, due to prior modification of 'event' variable the conditional
early return never happened.
User space also relies upon the current behaviour, so better reintroduce
the shortened deletion notifications once it is fixed.
Fixes: 28339b21a365 ("netfilter: nf_tables: do not send complete notification of deletions") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Dietmar reported that commit 3840cbe24cf0 ("sched: psi: fix bogus
pressure spikes from aggregation race") caused a regression for him on
a high context switch rate benchmark (schbench) due to the now
repeating cpu_clock() calls.
In particular the problem is that get_recent_times() will extrapolate
the current state to 'now'. But if an update uses a timestamp from
before the start of the update, it is possible to get two reads
with inconsistent results. It is effectively back-dating an update.
(note that this all hard-relies on the clock being synchronized across
CPUs -- if this is not the case, all bets are off).
Combine this problem with the fact that there are per-group-per-cpu
seqcounts, the commit in question pushed the clock read into the group
iteration, causing tree-depth cpu_clock() calls. On architectures
where cpu_clock() has appreciable overhead, this hurts.
Instead move to a per-cpu seqcount, which allows us to have a single
clock read for all group updates, increasing internal consistency and
lowering update overhead. This comes at the cost of a longer update
side (proportional to the tree depth) which can cause the read side to
retry more often.
The nreaders and loops variables are exposed as module parameters, which,
in certain combinations, can lead to multiplication overflow.
Besides, loops parameter is defined as long, while through the code is
used as int, which can cause truncation on 64-bit kernels and possible
zeroes where they shouldn't appear.
Since code uses result of multiplication as int anyway, it only makes sense
to replace loops with int. Multiplication overflow check is also added
due to possible multiplication between two very big numbers.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 653ed64b01dc ("refperf: Add a test to measure performance of read-side synchronization") Signed-off-by: Artem Sadovnikov <a.sadovnikov@ispras.ru> Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When MACH_IS_MVME147, the boot console calls mvme147_scc_write() to
generate console output. That will continue to work even after
debug_cons_nputs() becomes unavailable so there's no need to
unregister the boot console.
Take the opportunity to remove a repeated MACH_IS_* test. Use the
actual .write method (instead of a wrapper) and test that pointer
instead. This means adding an unused parameter to debug_cons_nputs() for
consistency with the struct console API.
early_printk.c is only built when CONFIG_EARLY_PRINTK=y. As of late,
head.S is only built when CONFIG_MMU_MOTOROLA=y. So let the former symbol
depend on the latter, to obviate some ifdef conditionals.
Cc: Daniel Palmer <daniel@0x0f.com> Fixes: 077b33b9e283 ("m68k: mvme147: Reinstate early console") Signed-off-by: Finn Thain <fthain@linux-m68k.org> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/d1d4328e5aa9a87bd8352529ce62b767731c0530.1743467205.git.fthain@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add a warning to ensure RCU lock is held around tree lookup, and then
fix one of the invocations in bpf_stack_walker. The program has an
active stack frame and won't disappear. Use the opportunity to remove
unneeded invocation of is_bpf_text_address.
The check that the new vector length we set was the expected one was typoed
to an assignment statement which for some reason the compilers didn't spot,
most likely due to the macros involved.
Fixes: a1d7111257cd ("selftests: arm64: More comprehensively test the SVE ptrace interface") Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250609-kselftest-arm64-ssve-fixups-v2-1-998fcfa6f240@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the new coming segment covers more than one skbs in the ofo queue,
and which seq is equal to rcv_nxt, then the sequence range
that is duplicated will be sent as DUP SACK, the detail as below,
in step6, the {501,2001} range is clearly including too much
DUP SACK range, in violation of RFC 2883 rules.
In a number of cases we see kernel panics on resume due
to ath11k kernel page fault, which happens under the
following circumstances:
1) First ath11k_hal_dump_srng_stats() call
Last interrupt received for each group:
ath11k_pci 0000:01:00.0: group_id 0 22511ms before
ath11k_pci 0000:01:00.0: group_id 1 14440788ms before
[..]
ath11k_pci 0000:01:00.0: failed to receive control response completion, polling..
ath11k_pci 0000:01:00.0: Service connect timeout
ath11k_pci 0000:01:00.0: failed to connect to HTT: -110
ath11k_pci 0000:01:00.0: failed to start core: -110
ath11k_pci 0000:01:00.0: firmware crashed: MHI_CB_EE_RDDM
ath11k_pci 0000:01:00.0: already resetting count 2
ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110
ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110
ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery
[..]
2) At this point reconfiguration fails (we have 2 resets) and
ath11k_core_reconfigure_on_crash() calls ath11k_hal_srng_deinit()
which destroys srng lists. However, it does not reset per-list
->initialized flag.
3) Second ath11k_hal_dump_srng_stats() call sees stale ->initialized
flag and attempts to dump srng stats:
Last interrupt received for each group:
ath11k_pci 0000:01:00.0: group_id 0 66785ms before
ath11k_pci 0000:01:00.0: group_id 1 14485062ms before
ath11k_pci 0000:01:00.0: group_id 2 14485062ms before
ath11k_pci 0000:01:00.0: group_id 3 14485062ms before
ath11k_pci 0000:01:00.0: group_id 4 14780845ms before
ath11k_pci 0000:01:00.0: group_id 5 14780845ms before
ath11k_pci 0000:01:00.0: group_id 6 14485062ms before
ath11k_pci 0000:01:00.0: group_id 7 66814ms before
ath11k_pci 0000:01:00.0: group_id 8 68997ms before
ath11k_pci 0000:01:00.0: group_id 9 67588ms before
ath11k_pci 0000:01:00.0: group_id 10 69511ms before
BUG: unable to handle page fault for address: ffffa007404eb010
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 100000067 P4D 100000067 PUD 10022d067 PMD 100b01067 PTE 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k]
Call Trace:
<TASK>
? __die_body+0xae/0xb0
? page_fault_oops+0x381/0x3e0
? exc_page_fault+0x69/0xa0
? asm_exc_page_fault+0x22/0x30
? ath11k_hal_dump_srng_stats+0x2b4/0x3b0 [ath11k (HASH:6cea 4)]
ath11k_qmi_driver_event_work+0xbd/0x1050 [ath11k (HASH:6cea 4)]
worker_thread+0x389/0x930
kthread+0x149/0x170
Clear per-list ->initialized flag in ath11k_hal_srng_deinit().
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Baochen Qiang <quic_bqiang@quicinc.com> Fixes: 5118935b1bc2 ("ath11k: dump SRNG stats during FW assert") Link: https://patch.msgid.link/20250612084551.702803-1-senozhatsky@chromium.org Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In rtl8187_stop() move the call of usb_kill_anchored_urbs() before clearing
b_tx_status.queue. This change prevents callbacks from using already freed
skb due to anchor was not killed before freeing such skb.
With a quite rare chance, RX report might be problematic to make SW think
a packet is received on 6 GHz band even if the chip does not support 6 GHz
band actually. Since SW won't initialize stuffs for unsupported bands, NULL
dereference will happen then in the sequence, rtw89_vif_rx_stats_iter() ->
rtw89_core_cancel_6ghz_probe_tx(). So, add a check to avoid it.
I tried to fix the stack usage in this function a couple of years ago,
but there is still a problem with the latest gcc versions in some
configurations:
net/caif/cfctrl.c:553:1: error: the frame size of 1296 bytes is larger than 1280 bytes [-Werror=frame-larger-than=]
Reduce this once again, with a separate cfctrl_link_setup() function that
holds the bulk of all the local variables. It also turns out that the
param[] array that takes up a large portion of the stack is write-only
and can be left out here.
IO hotplug add event is handled in the user space with drmgr tool.
After the device is enabled, the user space uses /sys/kernel/dlpar
interface with “dt add index <drc_index>” to update the device tree.
The kernel interface (dlpar_hp_dt_add()) finds the parent node for
the specified ‘drc_index’ from ibm,drc-info property. The recent FW
provides this property from 2017 onwards. But KVM guest code in
some releases is still using the older SLOF firmware which has
ibm,drc-indexes property instead of ibm,drc-info.
If the ibm,drc-info is not available, this patch adds changes to
search ‘drc_index’ from the indexes array in ibm,drc-indexes
property to support old FW.
Fixes: 02b98ff44a57 ("powerpc/pseries/dlpar: Add device tree nodes for DLPAR IO add") Reported-by: Kowshik Jois <kowsjois@linux.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Tested-by: Amit Machhiwal <amachhiw@linux.ibm.com> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250531235002.239213-1-haren@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
In function dump_xx_nlmsg(), when realloc() fails to allocate memory,
the original pointer to the buffer is overwritten with NULL. This causes
a memory leak because the previously allocated buffer becomes unreachable
without being freed.
Running 3D applications with SVGA_FORCE_HOST_BACKED=1 or using an
ancient version of mesa was broken because the buffer was pinned in
VMW_BO_DOMAIN_SYS and could not be moved to VMW_BO_DOMAIN_MOB during
validation.
The compat_shader buffer should not pinned.
Fixes: 668b206601c5 ("drm/vmwgfx: Stop using raw ttm_buffer_object's") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Reviewed-by: Maaz Mombasawala <maaz.mombasawala@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://lore.kernel.org/r/20250429203427.1742331-1-ian.forbes@broadcom.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The netfilter hook is invoked with skb->dev for input netdevice, and
vif_dev for output netdevice. However at the point of invocation, skb->dev
is already set to vif_dev, and MR-forwarded packets are reported with
in=out:
When xsend() returns -1 (error), the check 'n < sizeof(buf)' incorrectly
treats it as success due to unsigned promotion. Explicitly check for -1
first.
Fixes: a4b7193d8efd ("selftests/bpf: Add sockmap test for redirecting partial skb data") Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Link: https://lore.kernel.org/r/20250612084208.27722-1-wangfushuai@baidu.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When sending plaintext data, we initially calculated the corresponding
ciphertext length. However, if we later reduced the plaintext data length
via socket policy, we failed to recalculate the ciphertext length.
This results in transmitting buffers containing uninitialized data during
ciphertext transmission.
This causes uninitialized bytes to be appended after a complete
"Application Data" packet, leading to errors on the receiving end when
parsing TLS record.
Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/20250609020910.397930-2-jiayuan.chen@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
We observed an issue from the latest selftest: sockmap_redir where
sk_psock(psock->sk) != psock in the backlog. The root cause is the special
behavior in sockmap_redir - it frequently performs map_update() and
map_delete() on the same socket. During map_update(), we create a new
psock and during map_delete(), we eventually free the psock via rcu_work
in sk_psock_drop(). However, pending workqueues might still exist and not
be processed yet. If users immediately perform another map_update(), a new
psock will be allocated for the same sk, resulting in two psocks pointing
to the same sk.
When the pending workqueue is later triggered, it uses the old psock to
access sk for I/O operations, which is incorrect.
Timing Diagram:
cpu0 cpu1
map_update(sk):
sk->psock = psock1
psock1->sk = sk
map_delete(sk):
rcu_work_free(psock1)
map_update(sk):
sk->psock = psock2
psock2->sk = sk
workqueue:
wakeup with psock1, but the sk of psock1
doesn't belong to psock1
rcu_handler:
clean psock1
free(psock1)
Previously, we used reference counting to address the concurrency issue
between backlog and sock_map_close(). This logic remains necessary as it
prevents the sk from being freed while processing the backlog. But this
patch prevents pending backlogs from using a psock after it has been
stopped.
Note: We cannot call cancel_delayed_work_sync() in map_delete() since this
might be invoked in BPF context by BPF helper, and the function may sleep.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20250609025908.79331-1-jiayuan.chen@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
drm_panthor_gpu_info::shader_present is currently automatically offset
by 4 byte to meet Arm's 32-bit/64-bit field alignment rules, but those
constraints don't stand on 32-bit x86 and cause a mismatch when running
an x86 binary in a user emulated environment like FEX. It's also
generally agreed that uAPIs should explicitly pad their struct fields,
which we originally intended to do, but a mistake slipped through during
the submission process, leading drm_panthor_gpu_info::shader_present to
be misaligned.
This uAPI change doesn't break any of the existing users of panthor
which are either arm32 or arm64 where the 64-bit alignment of
u64 fields is already enforced a the compiler level.
Changes in v2:
- Rename the garbage field into pad0 and adjust the comment accordingly
- Add Liviu's A-b
The subsystem event test enables all "sched" events and makes sure there's
at least 3 different events in the output. It used to cat the entire trace
file to | wc -l, but on slow machines, that could last a very long time.
To solve that, it was changed to just read the first 100 lines of the
trace file. This can cause false failures as some events repeat so often,
that the 100 lines that are examined could possibly be of only one event.
Instead, create an awk script that looks for 3 different events and will
exit out after it finds them. This will find the 3 events the test looks
for (eventually if it works), and still exit out after the test is
satisfied and not cause slower machines to run forever.
The battery manufacturer string was incorrectly null terminated using
bat_model instead of bat_manu. This could result in an unintended
write to the wrong field and potentially incorrect behavior.
fixe the issue by correctly null terminating the bat_manu string.
Fixes: 32890b983086 ("Staging: initial version of the nvec driver") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/20250719080755.3954373-1-alok.a.tiwari@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
gbphy_dev_match_id() should be taking a const pointer, as the pointer
passed to it from the container_of() call was const to start with (it
was accidentally cast away with the call.) Fix this all up by correctly
marking the pointer types.
Cc: Alex Elder <elder@kernel.org> Cc: greybus-dev@lists.linaro.org Fixes: d69d80484598 ("driver core: have match() callback in struct bus_type take a const *") Reviewed-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/2025070115-reoccupy-showy-e2ad@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In cpufreq_policy_put_kobj(), policy->rwsem is used. But in
cpufreq_policy_alloc(), if freq_qos_add_notifier() returns an error, error
path via err_kobj_remove or err_min_qos_notifier will be reached and
cpufreq_policy_put_kobj() will be called before policy->rwsem is
initialized. Thus, the calling of init_rwsem() should be moved to where
before these two error paths can be reached.
Fixes: 67d874c3b2c6 ("cpufreq: Register notifiers with the PM QoS framework") Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20250709104145.2348017-3-zhenglifeng1@huawei.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The cpufreq-based invariance is enabled in cpufreq_register_driver(),
but never disabled after registration fails. Move the invariance
initialization to where all other initializations have been successfully
done to solve this problem.
Fixes: 874f63531064 ("cpufreq: report whether cpufreq supports Frequency Invariance (FI)") Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20250709104145.2348017-2-zhenglifeng1@huawei.com
[ rjw: New subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the passive mode, intel_cpufreq_update_pstate() sets HWP_MIN_PERF in
accordance with the target frequency to ensure delivering adequate
performance, but it sets HWP_DESIRED_PERF to 0, so the processor has no
indication that the desired performance level is actually equal to the
floor one. This may cause it to choose a performance point way above
the desired level.
Moreover, this is inconsistent with intel_cpufreq_adjust_perf() which
actually sets HWP_DESIRED_PERF in accordance with the target performance
value.
Address this by adjusting intel_cpufreq_update_pstate() to pass
target_pstate as both the minimum and the desired performance levels
to intel_cpufreq_hwp_update().
Fixes: a365ab6b9dfb ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Shashank Balaji <shashank.mahadasyam@sony.com> Link: https://patch.msgid.link/6173276.lOV4Wx5bFT@rjwysocki.net Signed-off-by: Sasha Levin <sashal@kernel.org>