'stream_enc_regs' array is an array of dcn10_stream_enc_registers
structures. The array is initialized with four elements, corresponding
to the four calls to stream_enc_regs() in the array initializer. This
means that valid indices for this array are 0, 1, 2, and 3.
The error message 'stream_enc_regs' 4 <= 5 below, is indicating that
there is an attempt to access this array with an index of 5, which is
out of bounds. This could lead to undefined behavior
Here, eng_id is used as an index to access the stream_enc_regs array. If
eng_id is 5, this would result in an out-of-bounds access on the
stream_enc_regs array.
Thus fixing Buffer overflow error in dcn301_stream_encoder_create
reported by Smatch:
drivers/gpu/drm/amd/amdgpu/../display/dc/resource/dcn301/dcn301_resource.c:1011 dcn301_stream_encoder_create() error: buffer overflow 'stream_enc_regs' 4 <= 5
Fixes: 3a83e4e64bb1 ("drm/amd/display: Add dcn3.01 support to DC (v2)") Cc: Roman Li <Roman.Li@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
'panel_cntl' structure used to control the display panel could be null,
dereferencing it could lead to a null pointer access.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn21/dcn21_hwseq.c:269 dcn21_set_backlight_level() error: we previously assumed 'panel_cntl' could be null (see line 250)
Fixes: 474ac4a875ca ("drm/amd/display: Implement some asic specific abm call backs.") Cc: Yongqiang Sun <yongqiang.sun@amd.com> Cc: Anthony Koo <Anthony.Koo@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Anthony Koo <Anthony.Koo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
xt_check_{match,target} expects u16, but NFTA_RULE_COMPAT_PROTO is u32.
NLA_POLICY_MAX(NLA_BE32, 65535) cannot be used because .max in
nla_policy is s16, see 3e48be05f3c7 ("netlink: add attribute range
validation to policy").
Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
cmsg_ipv6 test requests tcpdump to capture 4 packets,
and sends until tcpdump quits. Only the first packet
is "real", however, and the rest are basic UDP packets.
So if tcpdump doesn't start in time it will miss
the real packet and only capture the UDP ones.
This makes the test fail on slow machine (no KVM or with
debug enabled) 100% of the time, while it passes in fast
environments.
Repeat the "real" / expected packet.
Fixes: 9657ad09e1fa ("selftests: net: test IPV6_TCLASS") Fixes: 05ae83d5a4a2 ("selftests: net: test IPV6_HOPLIMIT") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
This forms a self-cyclic reference that GC should finally untangle
but does not due to lack of MSG_OOB handling, resulting in memory
leak.
Recently, commit 11498715f266 ("af_unix: Remove io_uring code for
GC.") removed io_uring's dead code in GC and revealed the problem.
The code was executed at the final stage of GC and unconditionally
moved all GC candidates from gc_candidates to gc_inflight_list.
That papered over the reported problem by always making the following
WARN_ON_ONCE(!list_empty(&gc_candidates)) false.
The problem has been there since commit 2aab4b969002 ("af_unix: fix
struct pid leaks in OOB support") added full scm support for MSG_OOB
while fixing another bug.
To fix this problem, we must call kfree_skb() for unix_sk(sk)->oob_skb
if the socket still exists in gc_candidates after purging collected skb.
Then, we need to set NULL to oob_skb before calling kfree_skb() because
it calls last fput() and triggers unix_release_sock(), where we call
duplicate kfree_skb(u->oob_skb) if not NULL.
Note that the leaked socket remained being linked to a global list, so
kmemleak also could not detect it. We need to check /proc/net/protocol
to notice the unfreed socket.
The cause of this issue is that when tipc_nl_bearer_add() is called with
the TIPC_NLA_BEARER_UDP_OPTS attribute, tipc_udp_nl_bearer_add() is called
even if the bearer is not UDP.
tipc_udp_is_known_peer() called by tipc_udp_nl_bearer_add() assumes that
the media_ptr field of the tipc_bearer has an udp_bearer type object, so
the function goes crazy for non-UDP bearers.
This patch fixes the issue by checking the bearer type before calling
tipc_udp_nl_bearer_add() in tipc_nl_bearer_add().
In very slow environments, most big TCP cases including
segmentation and reassembly of big TCP packets have a good
chance to fail: by default the TCP client uses write size
well below 64K. If the host is low enough autocorking is
unable to build real big TCP packets.
Address the issue using much larger write operations.
Note that is hard to observe the issue without an extremely
slow and/or overloaded environment; reduce the TCP transfer
time to allow for much easier/faster reproducibility.
Fixes: 6bb382bcf742 ("selftests: add a selftest for big tcp") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix the counting of new acks and nacks when parsing a packet - something
that is used in congestion control.
As the code stands, it merely notes if there are any nacks whereas what we
really should do is compare the previous SACK table to the new one,
assuming we get two successive ACK packets with nacks in them. However, we
really don't want to do that if we can avoid it as the tables might not
correspond directly as one may be shifted from the other - something that
will only get harder to deal with once extended ACK tables come into full
use (with a capacity of up to 8192).
Instead, count the number of nacks shifted out of the old SACK, the number
of nacks retained in the portion still active and the number of new acks
and nacks in the new table then calculate what we need.
Note this ends up a bit of an estimate as the Rx protocol allows acks to be
withdrawn by the receiver and packets requested to be retransmitted.
Fixes: d57a3a151660 ("rxrpc: Save last ACK's SACK table rather than marking txbufs") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Stop rxrpc from sending a DUP ACK in response to a PING RESPONSE ACK on a
dead call. We may have initiated the ping but the call may have beaten the
response to completion.
Fixes: 18bfeba50dfd ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix the construction of delayed ACKs to not set the reference serial number
as they can't be used as an RTT reference.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the Rx protocol, every packet generated is marked with a per-connection
monotonically increasing serial number. This number can be referenced in
an ACK packet generated in response to an incoming packet - thereby
allowing the sender to use this for RTT determination, amongst other
things.
However, if the reference field in the ACK is zero, it doesn't refer to any
incoming packet (it could be a ping to find out if a packet got lost, for
example) - so we shouldn't generate zero serial numbers.
Fix the generation of serial numbers to retry if it comes up with a zero.
Furthermore, since the serial numbers are only ever allocated within the
I/O thread this connection is bound to, there's no need for atomics so
remove that too.
Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
This code prints the wrong variable in the warning message. It should
print "i" instead of "info->offset". On the first iteration "info" is
uninitialized leading to a crash and on subsequent iterations it prints
the previous offset instead of the current one.
Fixes: e0f74ed4634d ("i915/gvt: Separate the MMIO tracking table from GVT-g") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/11957c20-b178-4027-9b0a-e32e9591dd7c@moroto.mountain Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
inet_recv_error() is called without holding the socket lock.
IPv6 socket could mutate to IPv4 with IPV6_ADDRFORM
socket option and trigger a KCSAN warning.
Fixes: f4713a3dfad0 ("net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Before commit 7108b80a542b ("hwmon/coretemp: Handle large core ID
value"), there is a fixed mapping between
1. cpu_core_id
2. the index in pdata->core_data[] array
3. the sysfs attr name, aka "tempX_"
The later two always equal cpu_core_id + 2.
After the commit, pdata->core_data[] index is got from ida so that it
can handle sparse core ids and support more cores within a package.
However, the commit erroneously maps the sysfs attr name to
pdata->core_data[] index instead of cpu_core_id + 2.
As a result, the code is not aligned with the comments, and brings user
visible changes in hwmon sysfs on systems with sparse core id.
For example, before commit 7108b80a542b ("hwmon/coretemp: Handle large
core ID value"),
/sys/class/hwmon/hwmon2/temp2_label:Core 0
/sys/class/hwmon/hwmon2/temp3_label:Core 1
/sys/class/hwmon/hwmon2/temp4_label:Core 2
/sys/class/hwmon/hwmon2/temp5_label:Core 3
/sys/class/hwmon/hwmon2/temp6_label:Core 4
/sys/class/hwmon/hwmon3/temp10_label:Core 8
/sys/class/hwmon/hwmon3/temp11_label:Core 9
after commit,
/sys/class/hwmon/hwmon2/temp2_label:Core 0
/sys/class/hwmon/hwmon2/temp3_label:Core 1
/sys/class/hwmon/hwmon2/temp4_label:Core 2
/sys/class/hwmon/hwmon2/temp5_label:Core 3
/sys/class/hwmon/hwmon2/temp6_label:Core 4
/sys/class/hwmon/hwmon2/temp7_label:Core 8
/sys/class/hwmon/hwmon2/temp8_label:Core 9
Restore the previous behavior and rework the code, comments and variable
names to avoid future confusions.
When qmem_alloc and pfvf->hw_ops->sq_aq_init fails, sq->sg should be
freed to prevent memleak.
Fixes: c9c12d339d93 ("octeontx2-pf: Add support for PTP clock") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the ICMPv6 error is built from a non-linear skb we get the following
splat,
BUG: KASAN: slab-out-of-bounds in do_csum+0x220/0x240
Read of size 4 at addr ffff88811d402c80 by task netperf/820
CPU: 0 PID: 820 Comm: netperf Not tainted 6.8.0-rc1+ #543
...
kasan_report+0xd8/0x110
do_csum+0x220/0x240
csum_partial+0xc/0x20
skb_tunnel_check_pmtu+0xeb9/0x3280
vxlan_xmit_one+0x14c2/0x4080
vxlan_xmit+0xf61/0x5c00
dev_hard_start_xmit+0xfb/0x510
__dev_queue_xmit+0x7cd/0x32a0
br_dev_queue_push_xmit+0x39d/0x6a0
Use skb_checksum instead of csum_partial who cannot deal with non-linear
SKBs.
Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
For XDP_TX action xdp_buff is converted to xdp_frame. The conversion is
done by xdp_convert_buff_to_frame(). The memory type of the resulting
xdp_frame depends on the memory type of the xdp_buff. For page pool
based xdp_buff it produces xdp_frame with memory type
MEM_TYPE_PAGE_POOL. For zero copy XSK pool based xdp_buff it produces
xdp_frame with memory type MEM_TYPE_PAGE_ORDER0.
tsnep_xdp_xmit_back() is not prepared for that and uses always the page
pool buffer type TSNEP_TX_TYPE_XDP_TX. This leads to invalid mappings
and the transmission of undefined data.
Improve tsnep_xdp_xmit_back() to use the generic buffer type
TSNEP_TX_TYPE_XDP_NDO for zero copy XDP_TX.
Fixes: 3fc2333933fd ("tsnep: Add XDP socket zero-copy RX support") Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Using hard-coded constant timeout to wait for some expected
event is deemed to fail sooner or later, especially in slow
env.
Our CI has spotted another of such race:
# TEST: ipv6: cleanup of cached exceptions - nexthop objects [FAIL]
# can't delete veth device in a timely manner, PMTU dst likely leaked
Replace the crude sleep with a loop looking for the expected condition
at low interval for a much longer range.
The pmtu.sh test uses a few TCP listener in a problematic way:
It hard-codes a constant timeout to wait for the listener starting-up
in background. That introduces unneeded latency and on very slow and
busy host it can fail.
Additionally the test starts again the same listener in the same
namespace on the same port, just after the previous connection
completed. Fast host can attempt starting the new server before the
old one really closed the socket.
Address the issues using the wait_local_port_listen helper and
explicitly waiting for the background listener process exit.
The patch set [1] added a general lib.sh in net selftests, and converted
several test scripts to source the lib.sh.
unicast_extensions.sh (converted in [1]) and pmtu.sh (converted in [2])
have a /bin/sh shebang which may point to various shells in different
distributions, but "source" is only available in some of them. For
example, "source" is a built-it function in bash, but it cannot be
used in dash.
Refer to other scripts that were converted together, simply change the
shebang to bash to fix the following issues when the default /bin/sh
points to other shells.
not ok 51 selftests: net: unicast_extensions.sh # exit=1
v1 -> v2:
- Fix pmtu.sh which has the same issue as unicast_extensions.sh,
suggested by Hangbin
- Change the style of the "source" line to be consistent with other
tests, suggested by Hangbin
pmtu test use /bin/sh, so we need to source ./lib.sh instead of lib.sh
Here is the test result after conversion.
# ./pmtu.sh
TEST: ipv4: PMTU exceptions [ OK ]
TEST: ipv4: PMTU exceptions - nexthop objects [ OK ]
TEST: ipv6: PMTU exceptions [ OK ]
TEST: ipv6: PMTU exceptions - nexthop objects [ OK ]
...
TEST: ipv4: list and flush cached exceptions - nexthop objects [ OK ]
TEST: ipv6: list and flush cached exceptions [ OK ]
TEST: ipv6: list and flush cached exceptions - nexthop objects [ OK ]
TEST: ipv4: PMTU exception w/route replace [ OK ]
TEST: ipv4: PMTU exception w/route replace - nexthop objects [ OK ]
TEST: ipv6: PMTU exception w/route replace [ OK ]
TEST: ipv6: PMTU exception w/route replace - nexthop objects [ OK ]
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: e71e016ad0f6 ("selftests: net: fix tcp listener handling in pmtu.sh") Signed-off-by: Sasha Levin <sashal@kernel.org>
# ./unicast_extensions.sh
/usr/bin/which: no nettest in (/root/.local/bin:/root/bin:/usr/share/Modules/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin)
###########################################################################
Unicast address extensions tests (behavior of reserved IPv4 addresses)
###########################################################################
TEST: assign and ping within 240/4 (1 of 2) (is allowed) [ OK ]
TEST: assign and ping within 240/4 (2 of 2) (is allowed) [ OK ]
TEST: assign and ping within 0/8 (1 of 2) (is allowed) [ OK ]
...
TEST: assign and ping class D address (is forbidden) [ OK ]
TEST: routing using class D (is forbidden) [ OK ]
TEST: routing using 127/8 (is forbidden) [ OK ]
Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: e71e016ad0f6 ("selftests: net: fix tcp listener handling in pmtu.sh") Signed-off-by: Sasha Levin <sashal@kernel.org>
The udpgro_fwd.sh self-tests are somewhat unstable. There are
a few timing constraints the we struggle to meet on very slow
environments.
Instead of skipping the whole tests in such envs, increase the
test resilience WRT very slow hosts: increase the inter-packets
timeouts, avoid resetting the counters every second and finally
disable reduce the background traffic noise.
Tested with:
for I in $(seq 1 100); do
./tools/testing/selftests/kselftest_install/run_kselftest.sh \
-t net:udpgro_fwd.sh || exit -1
done
Function aq_ring_hwts_rx_alloc() maps extra AQ_CFG_RXDS_DEF bytes
for PTP HWTS ring but then generic aq_ring_free() does not take this
into account.
Create and use a specific function to free HWTS ring to fix this
issue.
After commit e3eac9f32ec0 ("wifi: cfg80211: Annotate struct
cfg80211_scan_request with __counted_by"), the compiler may enforce
dynamic array indexing of req->channels to stay below n_channels. As a
result, n_channels needs to be increased _before_ accessing the newly
added array index. Increment it first, then use "i" for the prior index.
Solves this warning in the coming GCC that has __counted_by support:
../drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c: In function 'brcmf_internal_escan_add_info':
../drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:3783:46: warning: operation on 'req->
n_channels' may be undefined [-Wsequence-point]
3783 | req->channels[req->n_channels++] = chan;
| ~~~~~~~~~~~~~~~^~
Fixes: e3eac9f32ec0 ("wifi: cfg80211: Annotate struct cfg80211_scan_request with __counted_by") Cc: Arend van Spriel <aspriel@gmail.com> Cc: Franky Lin <franky.lin@broadcom.com> Cc: Hante Meuleman <hante.meuleman@broadcom.com> Cc: Kalle Valo <kvalo@kernel.org> Cc: Chi-hsien Lin <chi-hsien.lin@infineon.com> Cc: Ian Lin <ian.lin@infineon.com> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Wright Feng <wright.feng@cypress.com> Cc: Hector Martin <marcan@marcan.st> Cc: linux-wireless@vger.kernel.org Cc: brcm80211-dev-list.pdl@broadcom.com Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/20240126223150.work.548-kees@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently the driver exits eSR by calling
iwl_mvm_esr_mode_inactive() before updating the FW
(by deactivating one of the links), and therefore before
sending the EML frame notifying that we are no longer in eSR.
This is wrong for several reasons:
1. The driver sends SMPS activation frames when we are still in eSR
and SMPS should be disabled when in eSR
2. The driver restores RLC configuration as it was before eSR
entering, and RLC command shouldn't be sent in eSR
Fix this by calling iwl_mvm_esr_mode_inactive() after FW update
Commit 56e58d6c8a56 ("net: stmmac: Implement Safety Features in
XGMAC core") checks and reports safety errors, but leaves the
Data Path Parity Errors for each channel in DMA unhandled at all, lead to
a storm of interrupt.
Fix it by checking and clearing the DMA_DPP_Interrupt_Status register.
Fixes: 56e58d6c8a56 ("net: stmmac: Implement Safety Features in XGMAC core") Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The EFI stub's kernel placement logic randomizes the physical placement
of the kernel by taking all available memory into account, and picking a
region at random, based on a random seed.
When KASLR is disabled, this seed is set to 0x0, and this results in the
lowest available region of memory to be selected for loading the kernel,
even if this is below LOAD_PHYSICAL_ADDR. Some of this memory is
typically reserved for the GFP_DMA region, to accommodate masters that
can only access the first 16 MiB of system memory.
Even if such devices are rare these days, we may still end up with a
warning in the kernel log, as reported by Tom:
The recently introduced EFI memory attributes protocol should be used
if it exists to ensure that the memory allocation created for the kernel
permits execution. This is needed for compatibility with tightened
requirements related to Windows logo certification for x86 PCs.
Currently, we simply strip the execute protect (XP) attribute from the
entire range, but this might be rejected under some firmware security
policies, and so in a subsequent patch, this will be changed to only
strip XP from the executable region that runs early, and make it
read-only (RO) as well.
In order to catch any issues early, ensure that the memory attribute
protocol works as intended, and give up if it produces spurious errors.
Note that the DXE services based fallback was always based on best
effort, so don't propagate any errors returned by that API.
The commit 8b45a26f2ba9 ("drm/msm/dpu: reserve cdm blocks for writeback
in case of YUV output") introduced a smatch warning about another
conditional block in dpu_encoder_helper_phys_cleanup() which had assumed
hw_pp will always be valid which may not necessarily be true.
Lets fix the other conditional block by making sure hw_pp is valid
before dereferencing it.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: ae4d721ce100 ("drm/msm/dpu: add an API to reset the encoder related hw blocks") Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/574878/ Link: https://lore.kernel.org/r/20240117194109.21609-1-quic_abhinavk@quicinc.com Signed-off-by: Abhinav Kumar <quic_abhinavk@quicinc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
MSA MISC0 bit 1 to 7 contains Colorimetry Indicator Field.
dp_link_get_colorimetry_config() returns wrong colorimetry value
in the DP_TEST_DYNAMIC_RANGE_CEA case in the current implementation.
Hence fix this problem by having dp_link_get_colorimetry_config()
return defined CEA RGB colorimetry value in the case of
DP_TEST_DYNAMIC_RANGE_CEA.
Changes in V2:
-- drop retrieving colorimetry from colorspace
-- drop dr = link->dp_link.test_video.test_dyn_range assignment
Changes in V3:
-- move defined MISCr0a Colorimetry vale to dp_reg.h
-- rewording commit title
-- rewording commit text to more precise describe this patch
Since the value of DP_TEST_BIT_DEPTH_8 is already left shifted, in the
BPC unknown case, the additional shift causes spill over to the other
bits of the [DP_CONFIGURATION_CTRL] register.
Fix this by changing the return value of dp_link_get_test_bits_depth()
in the BPC unknown case to (DP_TEST_BIT_DEPTH_8 >> DP_TEST_BIT_DEPTH_SHIFT).
Introduce a local boolean variable if FS_XFLAG_REALTIME to make the
checks for it more obvious, and de-densify a few of the conditionals
using it to make them more readable while at it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231025141020.192413-4-hch@lst.de Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since the introduction of xfs_dqblk in V5, xfs really ought to find the
dqblk pointer from the dquot buffer, then compute the xfs_disk_dquot
pointer from the dqblk pointer. Fix the open-coded xfs_buf_offset calls
and do the type checking in the correct order.
Note that this has made no practical difference since the start of the
xfs_disk_dquot is coincident with the start of the xfs_dqblk.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Discovered when trying to track down a weird recovery corruption
issue that wasn't detected at recovery time.
The specific corruption was a zero extent count field when big
extent counts are in use, and it turns out the dinode verifier
doesn't detect that specific corruption case, either. So fix it too.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 57c0f4a8ea3a attempted to fix the select in the kconfig entry
XFS_ONLINE_SCRUB_STATS by selecting XFS_DEBUG, but the original
intention was to select DEBUG_FS, since the feature relies on debugfs to
export the related scrub statistics.
Fixes: 57c0f4a8ea3a ("xfs: fix select in config XFS_ONLINE_SCRUB_STATS") Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
XFS: Internal error i != 1 at line 3526 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_btree_insert+0x1ec/0x280
...
Call Trace:
xfs_corruption_error+0x94/0xa0
xfs_btree_insert+0x221/0x280
xfs_alloc_fixup_trees+0x104/0x3e0
xfs_alloc_ag_vextent_size+0x667/0x820
xfs_alloc_fix_freelist+0x5d9/0x750
xfs_free_extent_fix_freelist+0x65/0xa0
__xfs_free_extent+0x57/0x180
...
This is the XFS_IS_CORRUPT() check in xfs_btree_insert() when
xfs_btree_insrec() fails.
After converting this into a panic and dissecting the core dump, I found
that xfs_btree_insrec() is failing because it's trying to split a leaf
node in the cntbt when the AG free list is empty. In particular, it's
failing to get a block from the AGFL _while trying to refill the AGFL_.
If a single operation splits every level of the bnobt and the cntbt (and
the rmapbt if it is enabled) at once, the free list will be empty. Then,
when the next operation tries to refill the free list, it allocates
space. If the allocation does not use a full extent, it will need to
insert records for the remaining space in the bnobt and cntbt. And if
those new records go in full leaves, the leaves (and potentially more
nodes up to the old root) need to be split.
Fix it by accounting for the additional splits that may be required to
refill the free list in the calculation for the minimum free list size.
P.S. As far as I can tell, this bug has existed for a long time -- maybe
back to xfs-history commit afdf80ae7405 ("Add XFS_AG_MAXLEVELS macros
...") in April 1994! It requires a very unlucky sequence of events, and
in fact we didn't hit it until a particular sparse mmap workload updated
from 5.12 to 5.19. But this bug existed in 5.12, so it must've been
exposed by some other change in allocation or writeback patterns. It's
also much less likely to be hit with the rmapbt enabled, since that
increases the minimum free list size and is unlikely to split at the
same time as the bnobt and cntbt.
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We flush the data device cache before we issue external log IO. If
the flush fails, we shut down the log immediately and return. However,
the iclog->ic_sema is left in a decremented state so let's add an up().
Prior to this patch, xfs/438 would fail consistently when running with
an external log device:
sync
-> xfs_log_force
-> xlog_write_iclog
-> down(&iclog->ic_sema)
-> blkdev_issue_flush (fail causes us to intiate shutdown)
-> xlog_force_shutdown
-> return
unmount
-> xfs_log_umount
-> xlog_wait_iclog_completion
-> down(&iclog->ic_sema) --------> HANG
There is a second early return / shutdown. Make sure the up() happens
for it as well. Also make sure we cleanup the iclog state,
xlog_state_done_syncing, before dropping the iclog lock.
Fixes: b5d721eaae47 ("xfs: external logs need to flush data device") Fixes: 842a42d126b4 ("xfs: shutdown on failure to add page to log bio") Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices") Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
xfs_reflink_end_cow_extent looks up the COW extent and the data fork
extent at offset_fsb, and then proceeds to remap the common subset
between the two.
It does however not limit the remapped extent to the passed in
[*offset_fsbm end_fsb] range and thus potentially remaps more blocks than
the one handled by the current I/O completion. This means that with
sufficiently large data and COW extents we could be remapping COW fork
mappings that have not been written to, leading to a stale data exposure
on a powerfail event.
We use to have a xfs_trim_range to make the remap fit the I/O completion
range, but that got (apparently accidentally) removed in commit df2fd88f8ac7 ("xfs: rewrite xfs_reflink_end_cow to use intents").
Note that I've only found this by code inspection, and a test case would
probably require very specific delay and error injection.
Fixes: df2fd88f8ac7 ("xfs: rewrite xfs_reflink_end_cow to use intents") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When recovering intents, we capture newly created intent items as part of
committing recovered intent items. If intent recovery fails at a later
point, we forget to remove those newly created intent items from the AIL
and hang:
When newly created intent items fail to commit via transaction, intent
recovery hasn't created done items for these newly created intent items,
so the capture structure is the sole owner of the captured intent items.
We must release them explicitly or else they leak:
Fix the problem above by abort intent items that don't have a done item
when recovery intents fail.
Fixes: e6fff81e4870 ("xfs: proper replay of deferred ops queued during log recovery") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Factor out xfs_defer_pending_abort() from xfs_defer_trans_abort(), which
not use transaction parameter, so it can be used after the transaction
life cycle.
Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
One of our VM cluster management products needs to snapshot KVM image
files so that they can be restored in case of failure. Snapshotting is
done by redirecting VM disk writes to a sidecar file and using reflink
on the disk image, specifically the FICLONE ioctl as used by
"cp --reflink". Reflink locks the source and destination files while it
operates, which means that reads from the main vm disk image are blocked,
causing the vm to stall. When an image file is heavily fragmented, the
copy process could take several minutes. Some of the vm image files have
50-100 million extent records, and duplicating that much metadata locks
the file for 30 minutes or more. Having activities suspended for such
a long time in a cluster node could result in node eviction.
Clone operations and read IO do not change any data in the source file,
so they should be able to run concurrently. Demote the exclusive locks
taken by FICLONE to shared locks to allow reads while cloning. While a
clone is in progress, writes will take the IOLOCK_EXCL, so they block
until the clone completes.
Link: https://lore.kernel.org/linux-xfs/8911B94D-DD29-4D6E-B5BC-32EAF1866245@oracle.com/ Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If xfs_bmapi_write finds a delalloc extent at the requested range, it
tries to convert the entire delalloc extent to a real allocation.
But if the allocator cannot find a single free extent large enough to
cover the start block of the requested range, xfs_bmapi_write will
return 0 but leave *nimaps set to 0.
In that case we simply need to keep looping with the same startoffset_fsb
so that one of the following allocations will eventually reach the
requested range.
Note that this could affect any caller of xfs_bmapi_write that covers
an existing delayed allocation. As far as I can tell we do not have
any other such caller, though - the regular writeback path uses
xfs_bmapi_convert_delalloc to convert delayed allocations to real ones,
and direct I/O invalidates the page cache first.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In commit 2a6ca4baed62, we tried to fix an overflow problem in the
realtime allocator that was caused by an overly large maxlen value
causing xfs_rtcheck_range to run off the end of the realtime bitmap.
Unfortunately, there is a subtle bug here -- maxlen (and minlen) both
have to be aligned with @prod, but @prod can be larger than 1 if the
user has set an extent size hint on the file, and that extent size hint
is larger than the realtime extent size.
If the rt free space extents are not aligned to this file's extszhint
because other files without extent size hints allocated space (or the
number of rt extents is similarly not aligned), then it's possible that
maxlen after clamping to sb_rextents will no longer be aligned to prod.
The allocation will succeed just fine, but we still trip the assertion.
Fix the problem by reducing maxlen by any misalignment with prod. While
we're at it, split the assertions into two so that we can tell which
value had the bad alignment.
Fixes: 2a6ca4baed62 ("xfs: make sure the rt allocator doesn't run off the end") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The unit conversions in this function do not make sense. First we
convert a block count to bytes, then divide that bytes value by
rextsize, which is in blocks, to get an rt extent count. You can't
divide bytes by blocks to get a (possibly multiblock) extent value.
Fortunately nobody uses delalloc on the rt volume so this hasn't
mattered.
Fixes: fa5c836ca8eb5 ("xfs: refactor xfs_bunmapi_cow") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When realtime support is not compiled into the kernel, these functions
should return negative errnos, not positive errnos. While we're at it,
fix a broken macro declaration.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Quotas aren't (yet) supported with realtime, so we shouldn't allow
userspace to set up a realtime section when quotas are enabled, even if
they attached one via mount options. IOWS, you shouldn't be able to do:
Currently, xfs_bmap_del_extent_real contains a bunch of code to convert
the physical extent of a data fork mapping for a realtime file into rt
extents and pass that to the rt extent freeing function. Since the
details of this aren't needed when CONFIG_XFS_REALTIME=n, move it to
xfs_rtbitmap.c to reduce code size when realtime isn't enabled.
This will (one day) enable realtime EFIs to reuse the same
unit-converting call with less code duplication.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The latest version of the fs geometry structure is v5. Bump this
constant so that xfs_db and mkfs calls to libxfs_fs_geometry will fill
out all the fields.
IOWs, this commit is a no-op for the kernel, but will be useful for
userspace reporting in later changes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com> Acked-by: Chandan Babu R <chandanbabu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
This is the next upgrade to the Rust toolchain, from 1.72.1 to 1.73.0
(i.e. the latest) [1].
See the upgrade policy [2] and the comments on the first upgrade in
commit 3ed03f4da06e ("rust: upgrade to Rust 1.68.2").
# Unstable features
No unstable features (that we use) were stabilized.
Therefore, the only unstable feature allowed to be used outside
the `kernel` crate is still `new_uninit`, though other code to be
upstreamed may increase the list.
Please see [3] for details.
# Required changes
For the upgrade, the following changes are required:
- Allow `internal_features` for `feature(compiler_builtins)` since
now Rust warns about using internal compiler and standard library
features (similar to how it also warns about incomplete ones) [4].
- A cleanup for a documentation link thanks to a new `rustdoc` lint.
See previous commits for details.
- A need to make an intra-doc link to a macro explicit, due to a
change in behavior in `rustdoc`. See previous commits for details.
# `alloc` upgrade and reviewing
The vast majority of changes are due to our `alloc` fork being upgraded
at once.
There are two kinds of changes to be aware of: the ones coming from
upstream, which we should follow as closely as possible, and the updates
needed in our added fallible APIs to keep them matching the newer
infallible APIs coming from upstream.
Instead of taking a look at the diff of this patch, an alternative
approach is reviewing a diff of the changes between upstream `alloc` and
the kernel's. This allows to easily inspect the kernel additions only,
especially to check if the fallible methods we already have still match
the infallible ones in the new version coming from upstream.
Another approach is reviewing the changes introduced in the additions in
the kernel fork between the two versions. This is useful to spot
potentially unintended changes to our additions.
To apply these approaches, one may follow steps similar to the following
to generate a pair of patches that show the differences between upstream
Rust and the kernel (for the subset of `alloc` we use) before and after
applying this patch:
# Get the difference with respect to the old version.
git -C rust checkout $(linux/scripts/min-tool-version.sh rustc)
git -C linux ls-tree -r --name-only HEAD -- rust/alloc |
cut -d/ -f3- |
grep -Fv README.md |
xargs -IPATH cp rust/library/alloc/src/PATH linux/rust/alloc/PATH
git -C linux diff --patch-with-stat --summary -R > old.patch
git -C linux restore rust/alloc
# Apply this patch.
git -C linux am rust-upgrade.patch
# Get the difference with respect to the new version.
git -C rust checkout $(linux/scripts/min-tool-version.sh rustc)
git -C linux ls-tree -r --name-only HEAD -- rust/alloc |
cut -d/ -f3- |
grep -Fv README.md |
xargs -IPATH cp rust/library/alloc/src/PATH linux/rust/alloc/PATH
git -C linux diff --patch-with-stat --summary -R > new.patch
git -C linux restore rust/alloc
Now one may check the `new.patch` to take a look at the additions (first
approach) or at the difference between those two patches (second
approach). For the latter, a side-by-side tool is recommended.
The future `rustdoc` in the Rust 1.73.0 upgrade requires an explicit
link for `pr_info!`:
error: unresolved link to `pr_info`
--> rust/kernel/print.rs:395:63
|
395 | /// Use only when continuing a previous `pr_*!` macro (e.g. [`pr_info!`]).
| ^^^^^^^^ no item named `pr_info` in scope
|
= note: `macro_rules` named `pr_info` exists in this crate, but it is not in scope at this link's location
= note: `-D rustdoc::broken-intra-doc-links` implied by `-D warnings`
Thus do so to avoid a broken link while upgrading.
Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> Reviewed-by: Finn Behrens <me@kloenk.dev> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Link: https://lore.kernel.org/r/20231005210556.466856-3-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Starting with Rust 1.73.0, `rustdoc` detects redundant explicit
links with its new lint `redundant_explicit_links` [1]:
error: redundant explicit link target
--> rust/kernel/task.rs:85:21
|
85 | /// [`current`](crate::current) macro because it is safe.
| --------- ^^^^^^^^^^^^^^ explicit target is redundant
| |
| because label contains path that resolves to same destination
|
= note: when a link's destination is not specified,
the label is used to resolve intra-doc links
= note: `-D rustdoc::redundant-explicit-links` implied by `-D warnings`
help: remove explicit link target
|
85 | /// [`current`] macro because it is safe.
In order to avoid the warning in the compiler upgrade commit,
make it an intra-doc link as the tool suggests.
Link: https://github.com/rust-lang/rust/pull/113167 Reviewed-by: Finn Behrens <me@kloenk.dev> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Reviewed-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com> Link: https://lore.kernel.org/r/20231005210556.466856-2-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is the third upgrade to the Rust toolchain, from 1.71.1 to 1.72.1
(i.e. the latest) [1].
See the upgrade policy [2] and the comments on the first upgrade in
commit 3ed03f4da06e ("rust: upgrade to Rust 1.68.2").
# Unstable features
No unstable features (that we use) were stabilized.
Therefore, the only unstable feature allowed to be used outside
the `kernel` crate is still `new_uninit`, though other code to be
upstreamed may increase the list.
Please see [3] for details.
# Other improvements
Previously, the compiler could incorrectly generate a `.eh_frame`
section under `-Cpanic=abort`. We were hitting this bug when debug
assertions were enabled (`CONFIG_RUST_DEBUG_ASSERTIONS=y`) [4]:
LD .tmp_vmlinux.kallsyms1
ld.lld: error: <internal>:(.eh_frame) is being placed in '.eh_frame'
Gary fixed the issue in Rust 1.72.0 [5].
# Required changes
For the upgrade, the following changes are required:
- A call to `Box::from_raw` in `rust/kernel/sync/arc.rs` now requires
an explicit `drop()` call. See previous patch for details.
# `alloc` upgrade and reviewing
The vast majority of changes are due to our `alloc` fork being upgraded
at once.
There are two kinds of changes to be aware of: the ones coming from
upstream, which we should follow as closely as possible, and the updates
needed in our added fallible APIs to keep them matching the newer
infallible APIs coming from upstream.
Instead of taking a look at the diff of this patch, an alternative
approach is reviewing a diff of the changes between upstream `alloc` and
the kernel's. This allows to easily inspect the kernel additions only,
especially to check if the fallible methods we already have still match
the infallible ones in the new version coming from upstream.
Another approach is reviewing the changes introduced in the additions in
the kernel fork between the two versions. This is useful to spot
potentially unintended changes to our additions.
To apply these approaches, one may follow steps similar to the following
to generate a pair of patches that show the differences between upstream
Rust and the kernel (for the subset of `alloc` we use) before and after
applying this patch:
# Get the difference with respect to the old version.
git -C rust checkout $(linux/scripts/min-tool-version.sh rustc)
git -C linux ls-tree -r --name-only HEAD -- rust/alloc |
cut -d/ -f3- |
grep -Fv README.md |
xargs -IPATH cp rust/library/alloc/src/PATH linux/rust/alloc/PATH
git -C linux diff --patch-with-stat --summary -R > old.patch
git -C linux restore rust/alloc
# Apply this patch.
git -C linux am rust-upgrade.patch
# Get the difference with respect to the new version.
git -C rust checkout $(linux/scripts/min-tool-version.sh rustc)
git -C linux ls-tree -r --name-only HEAD -- rust/alloc |
cut -d/ -f3- |
grep -Fv README.md |
xargs -IPATH cp rust/library/alloc/src/PATH linux/rust/alloc/PATH
git -C linux diff --patch-with-stat --summary -R > new.patch
git -C linux restore rust/alloc
Now one may check the `new.patch` to take a look at the additions (first
approach) or at the difference between those two patches (second
approach). For the latter, a side-by-side tool is recommended.
`Box::from_raw()` is `#[must_use]`, which means the result cannot
go unused.
In Rust 1.71.0, this was not detected because the block expression
swallows the diagnostic [1]:
unsafe { Box::from_raw(self.ptr.as_ptr()) };
It would have been detected, however, if the line had been instead:
unsafe { Box::from_raw(self.ptr.as_ptr()); }
i.e. the semicolon being inside the `unsafe` block, rather than
outside.
In Rust 1.72.0, the compiler started warning about this [2], so
without this patch we will get:
error: unused return value of `alloc::boxed::Box::<T>::from_raw` that must be used
--> rust/kernel/sync/arc.rs:302:22
|
302 | unsafe { Box::from_raw(self.ptr.as_ptr()) };
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: call `drop(Box::from_raw(ptr))` if you intend to drop the `Box`
= note: `-D unused-must-use` implied by `-D warnings`
help: use `let _ = ...` to ignore the resulting value
|
302 | unsafe { let _ = Box::from_raw(self.ptr.as_ptr()); };
| +++++++ +
Thus add an add an explicit `drop()` as the `#[must_use]`'s
annotation suggests (instead of the more general help line).
After the interface selection policy change to do a weighted
round robin, each iface maintains a weight_fulfilled. When the
weight_fulfilled reaches the total weight for the iface, we know
that the weights can be reset and ifaces can be allocated from
scratch again.
During channel allocation failures on a particular channel,
weight_fulfilled is not incremented. If a few interfaces are
inactive, we could end up in a situation where the active
interfaces are all allocated for the total_weight, and inactive
ones are all that remain. This can cause a situation where
no more channels can be allocated further.
This change fixes it by increasing weight_fulfilled, even when
channel allocation failure happens. This could mean that if
there are temporary failures in channel allocation, the iface
weights may not strictly be adhered to. But that's still okay.
Fixes: a6d8fb54a515 ("cifs: distribute channels across interfaces based on speed") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the server reports query network interface info call
as unsupported following a tree connect, it means that
multichannel is unsupported, even if the server capabilities
report otherwise.
When this happens, cifs_chan_skip_or_disable is called to
disable multichannel on the client. However, we only need
to call this when multichannel is currently setup.
Fixes: f591062bdbf4 ("cifs: handle servers that still advertise multichannel after disabling") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the external phy working together with phy-omap-usb2 does not implement
send_srp(), we may still attempt to call it. This can happen on an idle
Ethernet gadget triggering a wakeup for example:
configfs-gadget.g1 gadget.0: ECM Suspend
configfs-gadget.g1 gadget.0: Port suspended. Triggering wakeup
...
Unable to handle kernel NULL pointer dereference at virtual address 00000000 when execute
...
PC is at 0x0
LR is at musb_gadget_wakeup+0x1d4/0x254 [musb_hdrc]
...
musb_gadget_wakeup [musb_hdrc] from usb_gadget_wakeup+0x1c/0x3c [udc_core]
usb_gadget_wakeup [udc_core] from eth_start_xmit+0x3b0/0x3d4 [u_ether]
eth_start_xmit [u_ether] from dev_hard_start_xmit+0x94/0x24c
dev_hard_start_xmit from sch_direct_xmit+0x104/0x2e4
sch_direct_xmit from __dev_queue_xmit+0x334/0xd88
__dev_queue_xmit from arp_solicit+0xf0/0x268
arp_solicit from neigh_probe+0x54/0x7c
neigh_probe from __neigh_event_send+0x22c/0x47c
__neigh_event_send from neigh_resolve_output+0x14c/0x1c0
neigh_resolve_output from ip_finish_output2+0x1c8/0x628
ip_finish_output2 from ip_send_skb+0x40/0xd8
ip_send_skb from udp_send_skb+0x124/0x340
udp_send_skb from udp_sendmsg+0x780/0x984
udp_sendmsg from __sys_sendto+0xd8/0x158
__sys_sendto from ret_fast_syscall+0x0/0x58
Let's fix the issue by checking for send_srp() and set_vbus() before
calling them. For USB peripheral only cases these both could be NULL.
Fixes: 657b306a7bdf ("usb: phy: add a new driver for omap usb2 phy") Signed-off-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20240128120556.8848-1-tony@atomide.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The 'Session topology' test currently fails with this message when
evlist__new_default() opens more than one event:
32: Session topology :
--- start ---
templ file: /tmp/perf-test-vv5YzZ
Using CPUID 0x00000000410fd070
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xb00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 4
Opening: unknown-hardware:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xa00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 5
non matching sample_type
FAILED tests/topology.c:73 can't get session
---- end ----
Session topology: FAILED!
This is because when re-opening the file and parsing the header, Perf
expects that any file that has more than one event has the sample ID
flag set. Perf record already sets the flag in a similar way when there
is more than one event, so add the same logic to evlist__new_default().
evlist__new_default() is only currently used in tests, so I don't
expect this change to have any other side effects. The other tests that
use it don't save and re-open the file so don't hit this issue.
The session topology test has been failing on Arm big.LITTLE platforms
since commit 251aa040244a3b17 ("perf parse-events: Wildcard most
"numeric" events") when evlist__new_default() started opening multiple
events for 'cycles'.
Fixes: 251aa040244a3b17 ("perf parse-events: Wildcard most "numeric" events") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@arm.com>
[ This was failing as well on a Rocket Lake Refresh/14700k Intel hybrid system - Arnaldo ] Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Jihong <yangjihong1@huawei.com> Closes: https://lore.kernel.org/lkml/CAP-5=fWVQ-7ijjK3-w1q+k2WYVNHbAcejb-xY0ptbjRw476VKA@mail.gmail.com/ Link: https://lore.kernel.org/r/20240124094358.489372-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Propagate the TR response status to the device using BCDMA
split-channels. For example CSI-RX driver should be able to check if a
frame was not transferred completely (short packet) and needs to be
discarded.
Fixes: 25dcb5dd7b7c ("dmaengine: ti: New driver for K3 UDMA") Signed-off-by: Jai Luthra <j-luthra@ti.com> Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Link: https://lore.kernel.org/r/20240103-tr_resp_err-v1-1-2fdf6d48ab92@ti.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In case of long format of qDMA command descriptor, there are one frame
descriptor, three entries in the frame list and two data entries. So the
size of dma_pool_create for these three fields should be the same with
the total size of entries respectively, or the contents may be overwritten
by the next allocated descriptor.
This mostly reverts commit 6bd97bf273bd ("ext4: remove redundant
mb_regenerate_buddy()") and reintroduces mb_regenerate_buddy(). Based on
code in mb_free_blocks(), fast commit replay can end up marking as free
blocks that are already marked as such. This causes corruption of the
buddy bitmap so we need to regenerate it in that case.
Reported-by: Jan Kara <jack@suse.cz> Fixes: 6bd97bf273bd ("ext4: remove redundant mb_regenerate_buddy()") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
The PCIe reset handlers can run at the same time as the
health thread. This can cause the health thread to
stomp on the PCIe reset. Fix this by preventing the
health thread from running while a PCIe reset is happening.
As part of this use timer_shutdown_sync() during reset and
remove to make sure the timer doesn't ever get rearmed.
Fixes: fac4ebd79fed ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The PA gain can be set in steps of 1.5 dB from -3 dB to 18 dB, that is,
in 15 levels.
Fix the dB values for the PA volume control as experiments using wsa8835
show that the first 16 levels all map to the same lowest gain while the
last three map to the highest gain.
These values specifically need to be correct for the sound server to
provide proper volume control.
Note that level 0 (-3 dB) does not mute the PA so the mute flag should
also not be set.
Fixes: cdb09e623143 ("ASoC: codecs: wsa883x: add control, dapm widgets and map") Cc: stable@vger.kernel.org # 6.0 Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Link: https://msgid.link/r/20240119112420.7446-2-johan+linaro@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The LPASS WSA macro codec driver is updating the digital gain settings
behind the back of user space on DAPM events if companding has been
enabled.
As compander control is exported to user space, this can result in the
digital gain setting being incremented (or decremented) every time the
sound server is started and the codec suspended depending on what the
UCM configuration looks like.
Soon enough playback will become distorted (or too quiet).
This is specifically a problem on the Lenovo ThinkPad X13s as this
bypasses the limit for the digital gain setting that has been set by the
machine driver.
Fix this by simply dropping the compander gain offset hack. If someone
cares about modelling the impact of the compander setting this can
possibly be done by exporting it as a volume control later.
Note that the volume registers still need to be written after enabling
clocks in order for any prior updates to take effect.
Fixes: 2c4066e5d428 ("ASoC: codecs: lpass-wsa-macro: add dapm widgets and route") Cc: stable@vger.kernel.org # 5.11 Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Link: https://msgid.link/r/20240119112420.7446-4-johan+linaro@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The lowest headphones volume setting does not mute so the leave the TLV
mute flag unset.
This is specifically needed to let the sound server use the lowest gain
setting.
Fixes: c03226ba15fe ("ASoC: codecs: wcd938x: fix dB range for HPHL and HPHR") Cc: <stable@vger.kernel.org> # 6.5 Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Link: https://msgid.link/r/20240122091130.27463-1-johan+linaro@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The UCM configuration for the Lenovo ThinkPad X13s has up until now
been setting the speaker PA volume to the minimum -3 dB when enabling
the speakers, but this does not prevent the user from increasing the
volume further.
Limit the digital gain and PA volumes to a combined -3 dB in the machine
driver to reduce the risk of speaker damage until we have active speaker
protection in place (or higher safe levels have been established).
Note that the PA volume limit cannot be set lower than 0 dB or
PulseAudio gets confused when the first 16 levels all map to -3 dB.
Also note that this will probably need to be generalised using
machine-specific limits, but a common limit should do for now.
Cc: <stable@vger.kernel.org> # 6.5 Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Link: https://msgid.link/r/20240122181819.4038-3-johan+linaro@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The test tries to prove a btf_id not available after the map is closed.
But btf_id is freed only after workqueue and a rcu grace period, compared
to previous case just after a rcu grade period.
Depending on system workload, workqueue could take quite some time
to execute function bpf_map_free_deferred() which may cause the test failure.
Instead of adding arbitrary delays, let us remove the logic to
check btf_id availability after map is closed.
Machines which have more than 8 nodes fail to boot SMP after commit a2ccf46333d7b2cf96 ("LoongArch/smp: Call rcutree_report_cpu_starting()
earlier"). Because such machines use tlb-based per-cpu base address
rather than dmw-based per-cpu base address, resulting per-cpu variables
can only be accessed after tlb_init(). But rcutree_report_cpu_starting()
is now called before tlb_init() and accesses per-cpu variables indeed.
Since the original patch want to avoid the lockdep warning caused by
page allocation in tlb_init(), we can move rcutree_report_cpu_starting()
to tlb_init() where after tlb exception configuration but before page
allocation.
Alexander Potapenko writes in [1]: "For every memory access in the code
instrumented by KMSAN we call kmsan_get_metadata() to obtain the metadata
for the memory being accessed. For virtual memory the metadata pointers
are stored in the corresponding `struct page`, therefore we need to call
virt_to_page() to get them.
According to the comment in arch/x86/include/asm/page.h,
virt_to_page(kaddr) returns a valid pointer iff virt_addr_valid(kaddr) is
true, so KMSAN needs to call virt_addr_valid() as well.
To avoid recursion, kmsan_get_metadata() must not call instrumented code,
therefore ./arch/x86/include/asm/kmsan.h forks parts of
arch/x86/mm/physaddr.c to check whether a virtual address is valid or not.
But the introduction of rcu_read_lock() to pfn_valid() added instrumented
RCU API calls to virt_to_page_or_null(), which is called by
kmsan_get_metadata(), so there is an infinite recursion now. I do not
think it is correct to stop that recursion by doing
kmsan_enter_runtime()/kmsan_exit_runtime() in kmsan_get_metadata(): that
would prevent instrumented functions called from within the runtime from
tracking the shadow values, which might introduce false positives."
Fix the issue by switching pfn_valid() to the _sched() variant of
rcu_read_lock/unlock(), which does not require calling into RCU. Given
the critical section in pfn_valid() is very small, this is a reasonable
trade-off (with preemptible RCU).
KMSAN further needs to be careful to suppress calls into the scheduler,
which would be another source of recursion. This can be done by wrapping
the call to pfn_valid() into preempt_disable/enable_no_resched(). The
downside is that this sacrifices breaking scheduling guarantees; however,
a kernel compiled with KMSAN has already given up any performance
guarantees due to being heavily instrumented.
Note, KMSAN code already disables tracing via Makefile, and since mmzone.h
is included, it is not necessary to use the notrace variant, which is
generally preferred in all other cases.
Link: https://lkml.kernel.org/r/20240115184430.2710652-1-glider@google.com Link: https://lkml.kernel.org/r/20240118110022.2538350-1-elver@google.com Fixes: 5ec8e8ea8b77 ("mm/sparsemem: fix race in accessing memory_section->usage") Signed-off-by: Marco Elver <elver@google.com> Reported-by: Alexander Potapenko <glider@google.com> Reported-by: syzbot+93a9e8a3dea8d6085e12@syzkaller.appspotmail.com Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Cc: Charan Teja Kalla <quic_charante@quicinc.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The init_irq_stacks() has been changed to use the correct node:
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?id=75b5e0bf90bf
The init_irq_scs() has the same issue with init_irq_stacks():
cpu_to_node() is not initialized yet, it does not work.
This patch uses early_cpu_to_node() to set the init_irq_scs()
with the correct node.
The purpose of the test_LAG_cleanup() function is to check that some
hardware addresses are removed from underlying devices after they have been
unenslaved. The test function simply checks that those addresses are not
present at the end. However, if the addresses were never added to begin
with due to some error in device setup, the test function currently passes.
This is a false positive since in that situation the test did not actually
exercise the intended functionality.
Add a check that the expected addresses are indeed present after device
setup. This makes the test function more robust.
I noticed this problem when running the team/dev_addr_lists.sh test on a
system without support for dummy and ipv6:
tools/testing/selftests/drivers/net/team# ./dev_addr_lists.sh
Error: Unknown device type.
Error: Unknown device type.
This program is not intended to be run as root.
RTNETLINK answers: Operation not supported
TEST: team cleanup mode lacp [ OK ]
Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/20240131140848.360618-3-bpoirier@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Similar to commit dd2d40acdbb2 ("selftests: bonding: Add more missing
config options"), add more networking-specific config options which are
needed for team device tests.
For testing, I used the minimal config generated by virtme-ng and I added
the options in the config file. Afterwards, the team device test passed.
Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Link: https://lore.kernel.org/r/20240131140848.360618-2-bpoirier@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
XDP queues are created/destroyed when a XDP program
is attached/detached. In current driver xdp_queues are not
getting destroyed on program exit due to incorrect xdp_queue
and tot_tx_queue count values.
This patch fixes the issue by setting tot_tx_queue and xdp_queue
count to correct values. It also fixes xdp.data_hard_start address.
Fixes: 06059a1a9a4a ("octeontx2-pf: Add XDP support to netdev PF") Signed-off-by: Geetha sowjanya <gakula@marvell.com> Link: https://lore.kernel.org/r/20240130120610.16673-1-gakula@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>