ChaCha support did not adjust the bidirectional test.
We need to set up KTLS in reverse direction correctly,
otherwise these two cases will fail:
tls.12_chacha.bidir
tls.13_chacha.bidir
Fixes: 4f336e88a870 ("selftests/tls: add CHACHA20-POLY1305 to tls selftests") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
A bunch of tests uses uninitialized stack memory as random
data to send. This is harmless but generates compiler warnings.
Explicitly init the buffers with random data.
Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The spin_trylock() was assumed to contain the implicit
barrier needed to ensure the correct ordering between
STATE_MISSED setting/clearing and STATE_MISSED checking
in commit a90c57f2cedd ("net: sched: fix packet stuck
problem for lockless qdisc").
But it turns out that spin_trylock() only has load-acquire
semantic, for strongly-ordered system(like x86), the compiler
barrier implicitly contained in spin_trylock() seems enough
to ensure the correct ordering. But for weakly-orderly system
(like arm64), the store-release semantic is needed to ensure
the correct ordering as clear_bit() and test_bit() is store
operation, see queued_spin_lock().
So add the explicit barrier to ensure the correct ordering
for the above case.
Fixes: a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Non-ND strict packets with a source LLA go through the packet taps
again, while non-ND strict packets with other source addresses do not,
and we can see a clone of those packets on the vrf interface (we should
not). This is due to a series of changes:
Commit 6f12fa775530[1] made non-ND strict packets not being pushed again
in the packet taps. This changed with commit 205704c618af[2] for those
packets having a source LLA, as they need a lookup with the orig_iif.
The issue now is those packets do not skip the 'vrf_ip6_rcv' function to
the end (as the ones without a source LLA) and go through the check to
call packet taps again. This check was changed by commit 6f12fa775530[1]
and do not exclude non-strict packets anymore. Packets matching
'need_strict && !is_ndisc && is_ll_src' are now being sent through the
packet taps again. This can be seen by dumping packets on the vrf
interface.
Fix this by having the same code path for all non-ND strict packets and
selectively lookup with the orig_iif for those with a source LLA. This
has the effect to revert to the pre-205704c618af[2] condition, which
should also be easier to maintain.
[1] 6f12fa775530 ("vrf: mark skb for multicast or link-local as enslaved to VRF")
[2] 205704c618af ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict")
Fixes: 205704c618af ("vrf: packets with lladdr src needs dst at input with orig_iif when needs strict") Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
As documented at drivers/base/platform.c for platform_get_irq:
* Gets an IRQ for a platform device and prints an error message if finding the
* IRQ fails. Device drivers should check the return value for errors so as to
* not pass a negative integer value to the request_irq() APIs.
So, the driver should check that platform_get_irq() return value
is _negative_, not that it's equal to zero, because -ENXIO (return
value from request_irq() if irq was not found) will
pass this check and it leads to passing negative irq to request_irq()
Fixes: 0dd077093636 ("NET: Add ezchip ethernet driver") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
priv is netdev private data, but it is used
after free_netdev(). It can cause use-after-free when accessing priv
pointer. So, fix it by moving free_netdev() after netif_napi_del()
call.
Fixes: 0dd077093636 ("NET: Add ezchip ethernet driver") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
greth is netdev private data, but it is used
after free_netdev(). It can cause use-after-free when accessing greth
pointer. So, fix it by moving free_netdev() after of_iounmap()
call.
Fixes: d4c41139df6e ("net: Add Aeroflex Gaisler 10/100/1G Ethernet MAC driver") Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
FCS error packets are filtered by default and won't be reported to
driver, so that RX fcs error and PER in testmode always show zero.
Fix this issue by reading fcs error count from hw counter.
We did't fix this issue by disabling fcs error rx filter since it may
let HW suffer some SER errors.
OMAC idx have to be same with BSS idx according to firmware usage.
Fixes: e0f9fdda81bd ("mt76: mt7921: add ieee80211_ops") Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Deren Wu <deren.wu@mediatek.com> Signed-off-by: YN Chen <yn.chen@mediatek.com> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently in the switch statement case where type is
NL80211_IFTYPE_STATION there is a check to see if type
is not NL80211_IFTYPE_STATION. This check is always false
and is redundant dead code that can be removed.
Addresses-Coverity: ("Logically dead code") Fixes: e0f9fdda81bd ("mt76: mt7921: add ieee80211_ops") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
It is possible the RCPI from the certain antenna is an invalid value,
especially packets are receiving while the system is frequently entering
deep sleep mode, so consider calculating RSSI with the reasonable upper
bound to avoid report the wrong value to the mac80211 layer.
Fixes: 163f4d22c118 ("mt76: mt7921: add MAC support") Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix theoretical NULL pointer dereference in mt7615_tx_prepare_skb and
mt7663_usb_sdio_tx_prepare_skb routines. This issue has been identified
by code analysis.
Fixes: 6aa4ed7927f11 ("mt76: mt7615: implement DMA support for MT7622") Fixes: 4bb586bc33b98 ("mt76: mt7663u: sync probe sampling with rate configuration") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
Even if this is not a real issue since mt76_tx is never run with wcid set
to NULL, fix a theoretical NULL pointer dereference in mt76_tx routine
Fixes: db9f11d3433f7 ("mt76: store wcid tx rate info in one u32 reduce locking") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 0571a753cb07 cancelled delayed work too late, keeping small
race. Cancel work sooner to close it completely.
Signed-off-by: Pavel Machek (CIP) <pavel@denx.de> Fixes: 0571a753cb07 ("net: pxa168_eth: Fix a potential data race in pxa168_eth_remove") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
If bpf_map_update_elem() failed, main() should return a negative error.
Fixes: 832622e6bd18 ("xdp: sample program for new bpf_redirect helper") Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210616042534.315097-1-wanghai38@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently rtrs when create_qp use a coarse numbers (bigger in general),
which leads to hardware create more resources which only waste memory with
no benefits.
For max_send_wr, we don't really need alway max_qp_wr size when creating
qp, reduce it to cq_size.
For max_recv_wr, cq_size is enough.
With the patch when sess_queue_depth=128, per session (2 paths) memory
consumption reduced from 188 MB to 65MB
When always_invalidate is enabled, we need send more wr, so treat it
special.
Fixes: 9cb837480424e ("RDMA/rtrs: server: main functionality") Link: https://lore.kernel.org/r/20210614090337.29557-2-jinpu.wang@ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The vmlinux ".BTF_ids" ELF section is declared in btf_ids.h to hold a list
of zero-filled BTF IDs, which is then patched at link-time with correct
values by resolv_btfids. The section is flagged as "allocable" to preclude
compression, but notably the section contents (BTF IDs) are untyped.
When patching the BTF IDs, resolve_btfids writes in host-native endianness
and relies on libelf for any required translation on reading and updating
vmlinux. However, since the type of the .BTF_ids section content defaults
to ELF_T_BYTE (i.e. unsigned char), no translation occurs. This results in
incorrect patched values when cross-compiling to non-native endianness,
and can manifest as kernel Oops and test failures which are difficult to
troubleshoot [1].
Explicitly set the type of patched data to ELF_T_WORD, the architecture-
neutral ELF type corresponding to the u32 BTF IDs. This enables libelf to
transparently perform any needed endian conversions.
Fix broken Tx ring validation for AF_XDP. The commit under the Fixes
tag, fixed an off-by-one error in the validation but introduced
another error. Descriptors are now let through even if they straddle a
chunk boundary which they are not allowed to do in aligned mode. Worse
is that they are let through even if they straddle the end of the umem
itself, tricking the kernel to read data outside the allowed umem
region which might or might not be mapped at all.
Fix this by reintroducing the old code, but subtract the length by one
to fix the off-by-one error that the original patch was
addressing. The test chunk != chunk_end makes sure packets do not
straddle chunk boundraries. Note that packets of zero length are
allowed in the interface, therefore the test if the length is
non-zero.
Fixes: ac31565c2193 ("xsk: Fix for xp_aligned_validate_desc() when len == chunk_size") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20210618075805.14412-1-magnus.karlsson@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix a missing validation of a Tx descriptor when executing in skb mode
and the umem is in unaligned mode. A descriptor could point to a
buffer straddling the end of the umem, thus effectively tricking the
kernel to read outside the allowed umem region. This could lead to a
kernel crash if that part of memory is not mapped.
In zero-copy mode, the descriptor validation code rejects such
descriptors by checking a bit in the DMA address that tells us if the
next page is physically contiguous or not. For the last page in the
umem, this bit is not set, therefore any descriptor pointing to a
packet straddling this last page boundary will be rejected. However,
the skb path does not use this bit since it copies out data and can do
so to two different pages. (It also does not have the array of DMA
address, so it cannot even store this bit.) The code just returned
that the packet is always physically contiguous. But this is
unfortunately also returned for the last page in the umem, which means
that packets that cross the end of the umem are being allowed, which
they should not be.
Fix this by introducing a check for this in the SKB path only, not
penalizing the zero-copy path.
Fixes: 2b43470add8c ("xsk: Introduce AF_XDP buffer allocation API") Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20210617092255.3487-1-magnus.karlsson@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently the rdma_rxe driver attempts to protect atomic responder
resources by taking a reference to the qp which is only freed when the
resource is recycled for a new read or atomic operation. This means that
in normal circumstances there is almost always an extra qp reference once
an atomic operation has been executed which prevents cleaning up the qp
and associated pd and cqs when the qp is destroyed.
This patch removes the call to rxe_add_ref() in send_atomic_ack() and the
call to rxe_drop_ref() in free_rd_atomic_resource(). If the qp is
destroyed while a peer is retrying an atomic op it will cause the
operation to fail which is acceptable.
ipv6_find_hdr() does not validate that this is an IPv6 packet. Add a
sanity check for calling ipv6_find_hdr() to make sure an IPv6 packet
is passed for parsing.
The mlx5_ib_bind_slave_port() doesn't remove multiport device from the
unaffiliated list, but mlx5_ib_unbind_slave_port() did it. This unbalanced
flow caused to the situation where mlx5_ib_unaffiliated_port_list was
changed during iteration.
Fixes: 63c416887437 ("netlabel: Add network address selectors to the NetLabel/LSM domain mapping") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Firmware has added assert if beacon template is received after
vdev_down. Firmware expects beacon template after vdev_start
and before vdev_up. This change is needed to support MBSSID EMA
cases in firmware.
Hence, Change the sequence in ath11k as expected from firmware.
This new change is not causing any issues with older
firmware.
When the code execute this if statement, the value of ret is 0.
However, we can see from the ath10k_warn() log that the value of
ret should be -EINVAL.
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Fixes: ccec9038c721 ("ath10k: enable raw encap mode and software crypto engine") Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/1621939577-62218-1-git-send-email-yang.lee@linux.alibaba.com Signed-off-by: Sasha Levin <sashal@kernel.org>
A previous commit 4f68ef64cd7f ("cw1200: Fix concurrency
use-after-free bugs in cw1200_hw_scan()") tried to fix a seemingly
use-after-free bug between cw1200_bss_info_changed() and
cw1200_hw_scan(), where the former frees a sk_buff pointed
to by frame.skb, and the latter accesses the sk_buff
pointed to by frame.skb. However, this issue should be a
false alarm because:
(1) "frame.skb" is not a shared variable between the above
two functions, because "frame" is a local function variable,
each of the two functions has its own local "frame" - they
just happen to have the same variable name.
(2) the sk_buff(s) pointed to by these two "frame.skb" are
also two different object instances, they are individually
allocated by different dev_alloc_skb() within the two above
functions. To free one object instance will not invalidate
the access of another different one.
Based on these facts, the previous commit should be unnecessary.
Moreover, it also introduced a missing unlock which was
addressed in a subsequent commit 51c8d24101c7 ("cw1200: fix missing
unlock on error in cw1200_hw_scan()"). Now that the
original use-after-free is unreal, these two commits should
be reverted. This patch performs the reversion.
Fixes: 4f68ef64cd7f ("cw1200: Fix concurrency use-after-free bugs in cw1200_hw_scan()") Fixes: 51c8d24101c7 ("cw1200: fix missing unlock on error in cw1200_hw_scan()") Signed-off-by: Hang Zhang <zh.nvgt@gmail.com> Acked-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210521223238.25020-1-zh.nvgt@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
brcmf_sdiod_remove has been called inside brcmf_sdiod_probe when fails,
so there's no need to call another one. Otherwise, sdiodev->freezer
would be double freed.
Fixes: 7836102a750a ("brcmfmac: reset SDIO bus on a firmware crash") Signed-off-by: Tong Tiangen <tongtiangen@huawei.com> Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210601100128.69561-1-tongtiangen@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The rx_lastpkt_rssi field provided by the firmware is suitable for
NL80211_STA_INFO_{SIGNAL,CHAIN_SIGNAL}, while the rssi field is an
average. Fix up the assignments and set the correct STA_INFO bits. This
lets userspace know that the average RSSI is part of the station info.
Fixes: cae355dc90db ("brcmfmac: Add RSSI information to get_station.") Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210506132010.3964484-2-alsi@bang-olufsen.dk Signed-off-by: Sasha Levin <sashal@kernel.org>
The sinfo->chains field is a bitmask for filled values in chain_signal
and chain_signal_avg, not a count. Treat it as such so that the driver
can properly report per-chain RSSI information.
Before (MIMO mode):
$ iw dev wlan0 station dump
...
signal: -51 [-51] dBm
After (MIMO mode):
$ iw dev wlan0 station dump
...
signal: -53 [-53, -54] dBm
Fixes: cae355dc90db ("brcmfmac: Add RSSI information to get_station.") Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210506132010.3964484-1-alsi@bang-olufsen.dk Signed-off-by: Sasha Levin <sashal@kernel.org>
Right now wcn->hal_buf is allocated in wcn36xx_start(). This is a problem
since we should have setup all of the buffers we required by the time
ieee80211_register_hw() is called.
struct ieee80211_ops callbacks may run prior to mac_start() and therefore
wcn->hal_buf must be initialized.
This is easily remediated by moving the allocation to probe() taking the
opportunity to tidy up freeing memory by using devm_kmalloc().
Fixes: 8e84c2582169 ("wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware") Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210605173347.2266003-1-bryan.odonoghue@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
Remove the PLL clock gates as the allowing to gate the sys1_pll_266m breaks
the uSDHC module which is sporadically unable to enumerate devices after
this change. Also it makes AMP clock management harder with no obvious
benefit to Linux, so just revert the change.
Link: https://lore.kernel.org/r/20210528180135.1640876-1-l.stach@pengutronix.de Fixes: b04383b6a558 ("clk: imx8mq: Define gates for pll1/2 fixed dividers") Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Abel Vesa <abel.vesa@nxp.com> Signed-off-by: Abel Vesa <abel.vesa@nxp.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In hwsim_subscribe_all_others, the error handling code performs
incorrectly if the second hwsim_alloc_edge fails. When this issue occurs,
it goes to sub_fail, without cleaning the edges allocated before.
Fixes: f25da51fdc38 ("ieee802154: hwsim: add replacement for fakelb") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Acked-by: Alexander Aring <aahringo@redhat.com> Link: https://lore.kernel.org/r/20210611015812.1626999-1-mudongliangabcd@gmail.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
kernel test robot reports over 200 build errors and warnings
that are due to this Kconfig problem when CARL9170=m,
MAC80211=y, and LEDS_CLASS=m.
WARNING: unmet direct dependencies detected for MAC80211_LEDS
Depends on [n]: NET [=y] && WIRELESS [=y] && MAC80211 [=y] && (LEDS_CLASS [=m]=y || LEDS_CLASS [=m]=MAC80211 [=y])
Selected by [m]:
- CARL9170_LEDS [=y] && NETDEVICES [=y] && WLAN [=y] && WLAN_VENDOR_ATH [=y] && CARL9170 [=m]
CARL9170_LEDS selects MAC80211_LEDS even though its kconfig
dependencies are not met. This happens because 'select' does not follow
any Kconfig dependency chains.
Fix this by making CARL9170_LEDS depend on MAC80211_LEDS, where
the latter supplies any needed dependencies on LEDS_CLASS.
Fixes: 1d7e1e6b1b8ed ("carl9170: Makefile, Kconfig files and MAINTAINERS") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Christian Lamparter <chunkeey@googlemail.com> Cc: linux-wireless@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Suggested-by: Christian Lamparter <chunkeey@googlemail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210530031134.23274-1-rdunlap@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org>
When chip_id is not supported, the resources will be freed
on path err_unsupported, these resources will also be freed
when calling ath10k_pci_remove(), it will cause double free,
so return -ENODEV when it doesn't support the device with wrong
chip_id.
Fixes: c0c378f9907c ("ath10k: remove target soc ps code") Fixes: 7505f7c3ec1d ("ath10k: create a chip revision whitelist") Fixes: f8914a14623a ("ath10k: restore QCA9880-AR1A (v1) detection") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210522105822.1091848-3-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The object surf is not fully initialized and the uninitialized
field surf.data is being copied by the call to qxl_bo_create
via the call to qxl_gem_object_create. Set surf.data to zero
to ensure garbage data from the stack is not being copied.
On 5P49V6965, when an output is enabled we enable the corresponding
FOD. When this happens for the first time, and specifically when writing
register VC5_OUT_DIV_CONTROL in vc5_clk_out_prepare(), all other outputs
are stopped for a short time and then restarted.
According to Renesas support this is intended: "The reason for that is VC6E
has synced up all output function".
This behaviour can be disabled at least on VersaClock 6E devices, of which
only the 5P49V6965 is currently implemented by this driver. This requires
writing bit 7 (bypass_sync{1..4}) in register 0x20..0x50. Those registers
are named "Unused Factory Reserved Register", and the bits are documented
as "Skip VDDO<N> verification", which does not clearly explain the relation
to FOD sync. However according to Renesas support as well as my testing
setting this bit does prevent disabling of all clock outputs when enabling
a FOD.
See "VersaClock ® 6E Family Register Descriptions and Programming Guide"
(August 30, 2018), Table 116 "Power Up VDD check", page 58:
https://www.renesas.com/us/en/document/mau/versaclock-6e-family-register-descriptions-and-programming-guide
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net> Reviewed-by: Adam Ford <aford173@gmail.com> Link: https://lore.kernel.org/r/20210527211647.1520720-1-luca@lucaceresoli.net Fixes: 2bda748e6ad8 ("clk: vc5: Add support for IDT VersaClock 5P49V6965") Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the of_get_named_gpio_flags call fails in vc4_hdmi_bind, we jump to
the err_unprepare_hsm label. That label will then call
pm_runtime_disable and put_device on the DDC device.
We just retrieved the DDC device, so the latter is definitely justified.
However at that point we still haven't called pm_runtime_enable, so the
call to pm_runtime_disable is not supposed to be there.
To avoid the following failure when trying to load the rdma_rxe module
while IPv6 is disabled, add a check for EAFNOSUPPORT and ignore the
failure, also delete the needless debug print from rxe_setup_udp_tunnel().
$ modprobe rdma_rxe
modprobe: ERROR: could not insert 'rdma_rxe': Operation not permitted
Fixes: dfdd6158ca2c ("IB/rxe: Fix kernel panic in udp_setup_tunnel") Link: https://lore.kernel.org/r/20210603090112.36341-1-kamalheib1@gmail.com Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Avoid randconfig build failures by requiring VEXPRESS_CONFIG:
aarch64-linux-gnu-ld: drivers/gpu/drm/pl111/pl111_versatile.o: in function `pl111_vexpress_clcd_init':
pl111_versatile.c:(.text+0x220): undefined reference to `devm_regmap_init_vexpress_config'
The mlx4 and mlx5 implemented differently the WQ input checks. Instead of
duplicating mlx4 logic in the mlx5, let's prepare the input in the central
place.
The mlx5 implementation didn't check for validity of state input. It is
not real bug because our FW checked that, but still worth to fix.
Currently vlan modification action checks existence of vlan priority by
comparing it to 0. Therefore it is impossible to modify existing vlan
tag to have priority 0.
For example, the following tc command will change the vlan id but will
not affect vlan priority:
tc filter add dev eth1 ingress matchall action vlan modify id 300 \
priority 0 pipe mirred egress redirect dev eth2
The incoming packet on eth1:
ethertype 802.1Q (0x8100), vlan 200, p 4, ethertype IPv4
will be changed to:
ethertype 802.1Q (0x8100), vlan 300, p 4, ethertype IPv4
although the user has intended to have p == 0.
The fix is to add tcfv_push_prio_exists flag to struct tcf_vlan_params
and rely on it when deciding to set the priority.
Fixes: 45a497f2d149a4a8061c (net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action) Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
In commit 68dc022d04eb ("xfrm: BEET mode doesn't support fragments
for inner packets"), it tried to fix the issue that in TX side the
packet is fragmented before the ESP encapping while in the RX side
the fragments always get reassembled before decapping with ESP.
This is not true for IPv6. IPv6 is different, and it's using exthdr
to save fragment info, as well as the ESP info. Exthdrs are added
in TX and processed in RX both in order. So in the above case, the
ESP decapping will be done earlier than the fragment reassembling
in TX side.
Here just remove the fragment check for the IPv6 inner packets to
recover the fragments support for BEET mode.
Fixes: 68dc022d04eb ("xfrm: BEET mode doesn't support fragments for inner packets") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The 600MHz is a too high clock rate for some SoC versions for the video
decoder hardware and this may cause stability issues. Use 300MHz for the
video decoder by default, which is supported by all hardware versions.
sess->stats and sess->stats->pcpu_stats objects are freed
when sysfs entry is removed. If something wrong happens and
session is closed before sysfs entry is created,
sess->stats and sess->stats->pcpu_stats objects are not freed.
This patch adds freeing of them at three places:
1. When client uses wrong address and session creation fails.
2. When client fails to create a sysfs entry.
3. When client adds wrong address via sysfs add_path.
Fixes: 215378b838df0 ("RDMA/rtrs: client: sysfs interface functions") Link: https://lore.kernel.org/r/20210528113018.52290-21-jinpu.wang@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The queue_depth is a module parameter for rtrs_server. It is used on the
client side to determing the queue_depth of the request queue for the RNBD
virtual block device.
During a reconnection event for an already mapped device, in case the
rtrs_server module queue_depth has changed, fail the reconnect attempt.
Also stop further auto reconnection attempts. A manual reconnect via
sysfs has to be triggerred.
Fixes: 6a98d71daea18 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20210528113018.52290-20-jinpu.wang@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The problem is we increase device refcount by get_device in process_info_req
for each path, but only does put_deice for last path, which lead to
memory leak.
To fix it, it also calls put_device when dev_ref is not 0.
Fixes: e2853c49477d1 ("RDMA/rtrs-srv-sysfs: fix missing put_device") Link: https://lore.kernel.org/r/20210528113018.52290-19-jinpu.wang@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When closing a session, currently the rtrs_srv_stats object in the
closing session is freed by kobject release. But if it failed
to create a session by various reasons, it must free the rtrs_srv_stats
object directly because kobject is not created yet.
This problem is found by kmemleak as below:
1. One client machine maps /dev/nullb0 with session name 'bla':
root@test1:~# echo "sessname=bla path=ip:192.168.122.190 \
device_path=/dev/nullb0" > /sys/devices/virtual/rnbd-client/ctl/map_device
2. Another machine failed to create a session with the same name 'bla':
root@test2:~# echo "sessname=bla path=ip:192.168.122.190 \
device_path=/dev/nullb1" > /sys/devices/virtual/rnbd-client/ctl/map_device
-bash: echo: write error: Connection reset by peer
Fixes: 39c2d639ca183 ("RDMA/rtrs-srv: Set .release function for rtrs srv device during device init") Link: https://lore.kernel.org/r/20210528113018.52290-18-jinpu.wang@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When re-connecting, it resets hb_missed_max to 0.
Before the first re-connecting, client will trigger re-connection
when it gets hb-ack more than 5 times. But after the first
re-connecting, clients will do re-connection whenever it does
not get hb-ack because hb_missed_max is 0.
There is no need to reset hb_missed_max when re-connecting.
hb_missed_max should be kept until closing the session.
Fixes: c0894b3ea69d3 ("RDMA/rtrs: core: lib functions shared between client and server modules") Link: https://lore.kernel.org/r/20210528113018.52290-16-jinpu.wang@ionos.com Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When get_next_path_min_inflight is called to select the next path, it
iterates over the list of available rtrs_clt_sess (paths). It then reads
the number of inflight IOs for that path to select one which has the least
inflight IO.
But it may so happen that rtrs_clt_sess (path) is no longer in the
connected state because closing or error recovery paths can change the status
of the rtrs_clt_Sess.
For example, the client sent the heart-beat and did not get the
response, it would change the session status and stop IO processing.
The added checking of this patch can prevent accessing the broken path
and generating duplicated error messages.
It is ok if the status is changed after checking the status because
the error recovery path does not free memory and only tries to
reconnection. And also it is ok if the session is closed after checking
the status because closing the session changes the session status and
flush all IO beforing free memory. If the session is being accessed for
IO processing, the closing session will wait.
Fixes: 6a98d71daea18 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20210528113018.52290-13-jinpu.wang@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
For outgoing subflow join, when recv SYNACK, in subflow_finish_connect(),
the mptcp_finish_join() may return false in some cases, and send a RESET
to remote, and no local hmac is required.
So generate subflow hmac after mptcp_finish_join().
Fixes: ec3edaa7ca6c ("mptcp: Add handling of outgoing MP_JOIN requests") Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
After commit 2c5ebd001d4f ("mptcp: refactor token container"),
pr_debug() is called before mptcp_crypto_key_gen_sha() in
mptcp_token_new_connect(), so the output local_key, token and
idsn are 0, like:
The variable bit_per_pix is a u8 and is promoted in the multiplication
to an int type and then sign extended to a u64. If the result of the
int multiplication is greater than 0x7fffffff then the upper 32 bits will
be set to 1 as a result of the sign extension. Avoid this by casting
tu_size_reg to u64 to avoid sign extension and also a potential overflow.
Fixes: 1a0f7ed3abe2 ("drm/rockchip: cdn-dp: add cdn DP support for rk3399") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://patchwork.freedesktop.org/patch/msgid/20200915162049.36434-1-colin.king@canonical.com Signed-off-by: Sasha Levin <sashal@kernel.org>
When we first enable the DSI encoder, we currently program some per-chip
configuration that we look up in rk3399_chip_data based on the device
tree compatible we match. This data configures various parameters of the
MIPI lanes, including on RK3399 whether DSI1 is slaved to DSI0 in a
dual-mode configuration. It also selects which LCDC (i.e. VOP) to scan
out from.
This causes a problem in RK3399 dual-mode configurations, though: panel
prepare() callbacks run before the encoder gets enabled and expect to be
able to write commands to the DSI bus, but the bus isn't fully
functional until the lane and master/slave configuration have been
programmed. As a result, dual-mode panels (and possibly others too) fail
to turn on when the rockchipdrm driver is initially loaded.
Because the LCDC mux is the only thing we don't know until enable time
(and is the only thing that can ever change), we can actually move most
of the initialization to bind() and get it out of the way early. That's
what this change does. (Rockchip's 4.4 BSP kernel does it in mode_set(),
which also avoids the issue, but bind() seems like the more correct
place to me.)
Tested on a Google Scarlet board (Acer Chromebook Tab 10), which has a
Kingdisplay KD097D04 dual-mode panel. Prior to this change, the panel's
backlight would turn on but no image would appear when initially loading
rockchipdrm. If I kept rockchipdrm loaded and reloaded the panel driver,
it would come on. With this change, the panel successfully turns on
during initial rockchipdrm load as expected.
At boot, we can't rely on the vc4_get_crtc_encoder since we don't have a
state yet and thus will not be able to figure out which connector is
attached to our CRTC.
However, we have a muxing bit in the CRTC register we can use to get the
encoder currently connected to the pixelvalve. We can thus read that
register, lookup the associated register through the vc4_pv_data
structure, and then pass it to vc4_crtc_disable so that we can perform
the proper operations.
The vc4_get_crtc_encoder function currently only works when the
connector->state->crtc pointer is set, which is only true when the
connector is currently enabled.
However, we use it as part of the disable path as well, and our lookup
will fail in that case, resulting in it returning a null pointer we
can't act on.
We can access the connector that used to be connected to that crtc
though using the old connector state in the disable path.
Since we want to support both the enable and disable path, we can
support it by passing the state accessor variant as a function pointer,
together with the atomic state.
The vc4_crtc_config_pv will need to access the drm_atomic_state
structure and its only parent function, vc4_crtc_atomic_enable already
has access to it. Let's pass it as a parameter.
The variables will be free on path err_phy_connect, it should
return error code, or it will cause double free when calling
ftgmac100_remove().
Fixes: bd466c3fb5a4 ("net/faraday: Support NCSI mode") Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
[Why]
Conditions that end up modifying the global dc state must be locked.
However, during mst allocate payload sequence, lock is already taken.
With StarTech 1.2 DP hub, we get an HPD RX interrupt for a reason other
than to indicate down reply availability right after sending payload
allocation. The handler again takes dc lock before calling the
dc's HPD RX handler. Due to this contention, the DRM thread which waits
for MST down reply never gets a chance to finish its waiting
successfully and ends up timing out. Once the lock is released, the hpd
rx handler fires and goes ahead to read from the MST HUB, but now its
too late and the HUB doesnt lightup all displays since DRM lacks error
handling when payload allocation fails.
[How]
Take lock only if there is a change in link status or if automated test
pattern bit is set. The latter fixes the null pointer dereference when
running certain DP Link Layer Compliance test.
Fixes: c8ea79a8a276 ("drm/amd/display: NULL pointer error during compliance test") Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[Why]
In gpu reset dc_lock acquired in dm_suspend().
Asynchronously handle_hpd_rx_irq can also be called
through amdgpu_dm_irq_suspend->flush_work, which also
tries to acquire dc_lock. That causes a deadlock.
[How]
Check if amdgpu executing reset before acquiring dc_lock.
Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Reviewed-by: Qingqing Zhuo <Qingqing.Zhuo@amd.com> Acked-by: Wayne Lin <Wayne.Lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While some SoC samples are able to lock with a PLL factor of 55, others
samples can't. ATM, a minimum of 60 appears to work on all the samples
I have tried.
Even with 60, it sometimes takes a long time for the PLL to eventually
lock. The documentation says that the minimum rate of these PLLs DCO
should be 3GHz, a factor of 125. Let's use that to be on the safe side.
With factor range changed, the PLL seems to lock quickly (enough) so far.
It is still unclear if the range was the only reason for the delay.
In cases where the dirty linear memory range spans multiple sample sheets
in a surface, the dirty surface region is incorrectly computed.
To do this correctly and in an optimized fashion we would have to compute
the dirty region of each sample sheet and compute the union of those
regions.
But assuming that cpu writing to a multisample surface is rather a corner
case than a common case, just set the dirty region to the full surface.
This fixes OpenGL piglit errors with SVGA_FORCE_COHERENT=1
and the piglit test:
fbo-depthstencil blit default_fb -samples=2 -auto
Fixes: 9ca7d19ff8ba ("drm/vmwgfx: Add surface dirty-tracking callbacks") Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Zack Rusin <zackr@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210505035740.286923-4-zackr@vmware.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The SVGA3dCmdDXGenMips command uses a shader-resource view to access
the underlying surface. Normally accesses using that view-type are not
dirtying the underlying surface, but that particular command is an
exception.
Mark the surface gpu-dirty after a SVGA3dCmdDXGenMips command has been
submitted.
This fixes the piglit getteximage-formats test run with
SVGA_FORCE_COHERENT=1
Fixes: a9f58c456e9d ("drm/vmwgfx: Be more restrictive when dirtying resources") Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Zack Rusin <zackr@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210505035740.286923-3-zackr@vmware.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Even in the case of heavy load, direct WQE can still be posted. The
hardware will decide whether to drop the DWQE or not. Thus, the limit
needs to be removed.
Fixes: 01584a5edcc4 ("RDMA/hns: Add support of direct wqe") Link: https://lore.kernel.org/r/1619593950-29414-1-git-send-email-liweihang@huawei.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
R-Car Gen3 Hardware Manual Errata for Rev. 0.52 of Nov 30, 2016, added
the configuration bit for bias pull-down control for the PRESET# pin on
R-Car M3-W. Add driver support for controlling pull-down on this pin.
In each iteration fwnode_for_each_available_child_node() bumps a reference
counting of a loop variable followed by dropping in on a next iteration,
Since in error case the loop is broken, we have to drop a reference count
by ourselves. Do it for port_fwnode in error case during ->probe().
Fixes: 248122212f68 ("net: mvpp2: use device_*/fwnode_* APIs instead of of_*") Cc: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The conversion to drm managed resources introduced two bugs: the plane is now
always initialized with the linear-only list, while the list with the Vivante
GPU modifiers should have been used when the PRG/PRE engines are present. This
masked another issue, as ipu_plane_format_mod_supported() is now called before
the private plane data is set up, so if a non-linear modifier is supplied in
the plane modifier list, we run into a NULL pointer dereference checking for
the PRG presence. To fix this just remove the check from this function, as we
know that it will only be called with a non-linear modifier, if the plane init
code has already determined that the PRG/PRE is present.
Fixes: 699e7e543f1a ("drm/imx: ipuv3-plane: use drm managed resources") Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Link: https://lore.kernel.org/r/20210510145927.988661-1-l.stach@pengutronix.de Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Only planes that are displayed via the Display Processor (DP) path
support color space conversion. Limit formats on planes that are
shown via the direct Display Controller (DC) path to RGB.
The commit 7cbb93d89838 ("drm/ast: Use managed pci functions")
converted a few PCI accessors to the managed API and dropped the
manual pci_iounmap() calls, but it seems to have forgotten converting
pci_iomap() to the managed one. It resulted in the leftover resources
after the driver unbind. Let's fix them.
In dm_dp_mst_detect(), We should check whether or not @connector
has been unregistered from userspace. If the connector is unregistered,
we should return disconnected status.
Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)") Signed-off-by: Yingjie Wang <wangyingjie55@126.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The drm_bridge_chain_pre_enable() is not the proper opposite of
drm_bridge_chain_post_disable(). It continues along the chain to
_before_ the starting bridge. Let's fix that.
The DRM_SIL_SII8620 kconfig has a weak `imply` dependency
on EXTCON, which causes issues when sii8620 is built
as a builtin and EXTCON is built as a module.
The symptoms are 'undefined reference' errors caused
by the symbols in EXTCON not being available
to the sii8620 driver.
Fixes: 688838442147 ("drm/bridge/sii8620: use micro-USB cable detection logic to detect MHL") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Robert Foss <robert.foss@linaro.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patchwork.freedesktop.org/patch/msgid/20210419090124.153560-1-robert.foss@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org>
Jianwen reported that IPv6 Interoperability tests are failing in an
IPsec case where one of the links between the IPsec peers has an MTU
of 1280. The peer generates a packet larger than this MTU, the router
replies with a "Packet too big" message indicating an MTU of 1280.
When the peer tries to send another large packet, xfrm_state_mtu
returns 1280 - ipsec_overhead, which causes ip6_setup_cork to fail
with EINVAL.
We can fix this by forcing xfrm_state_mtu to return IPV6_MIN_MTU when
IPv6 is used. After going through IPsec, the packet will then be
fragmented to obey the actual network's PMTU, just before leaving the
host.
Currently, TFC padding is capped to PMTU - overhead to avoid
fragementation: after padding and encapsulation, we still fit within
the PMTU. That behavior is preserved in this patch.
Fixes: 91657eafb64b ("xfrm: take net hdr len into account for esp payload size calculation") Reported-by: Jianwen Ji <jiji@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit f63661566fad ("mm/page_alloc.c: clear out zone->lowmem_reserve[] if
the zone is empty") clears out zone->lowmem_reserve[] if zone is empty.
But when zone is not empty and sysctl_lowmem_reserve_ratio[i] is set to
zero, zone_managed_pages(zone) is not counted in the managed_pages either.
This is inconsistent with the description of lowmem_reserve, so fix it.
Link: https://lkml.kernel.org/r/20210527125707.3760259-1-liushixin2@huawei.com Fixes: f63661566fad ("mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reported-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Baoquan He <bhe@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since the merging of the new slab memory controller in v5.9, the page
structure stores a pointer to objcg pointer array for slab pages. When
the slab has no used objects, it can be freed in free_slab() which will
call kfree() to free the objcg pointer array in
memcg_alloc_page_obj_cgroups(). If it happens that the objcg pointer
array is the last used object in its slab, that slab may then be freed
which may caused kfree() to be called again.
With the right workload, the slab cache may be set up in a way that allows
the recursive kfree() calling loop to nest deep enough to cause a kernel
stack overflow and panic the system. In fact, we have a reproducer that
can cause kernel stack overflow on a s390 system involving kmalloc-rcl-256
and kmalloc-rcl-128 slabs with the following kfree() loop recursively
called 74 times:
[ 285.520739] [<000000000ec432fc>] kfree+0x4bc/0x560 [ 285.520740]
[<000000000ec43466>] __free_slab+0xc6/0x228 [ 285.520741]
[<000000000ec41fc2>] __slab_free+0x3c2/0x3e0 [ 285.520742]
[<000000000ec432fc>] kfree+0x4bc/0x560 : While investigating this issue, I
also found an issue on the allocation side. If the objcg pointer array
happen to come from the same slab or a circular dependency linkage is
formed with multiple slabs, those affected slabs can never be freed again.
This patch series addresses these two issues by introducing a new set of
kmalloc-cg-<n> caches split from kmalloc-<n> caches. The new set will
only contain non-reclaimable and non-dma objects that are accounted in
memory cgroups whereas the old set are now for unaccounted objects only.
By making this split, all the objcg pointer arrays will come from the
kmalloc-<n> caches, but those caches will never hold any objcg pointer
array. As a result, deeply nested kfree() call and the unfreeable slab
problems are now gone.
This patch (of 4):
Since the merging of the new slab memory controller in v5.9, the page
structure may store a pointer to obj_cgroup pointer array for slab pages.
Currently, only the __GFP_ACCOUNT bit is masked off. However, the array
is not readily reclaimable and doesn't need to come from the DMA buffer.
So those GFP bits should be masked off as well.
Do the flag bit clearing at memcg_alloc_page_obj_cgroups() to make sure
that it is consistently applied no matter where it is called.
Link: https://lkml.kernel.org/r/20210505200610.13943-1-longman@redhat.com Link: https://lkml.kernel.org/r/20210505200610.13943-2-longman@redhat.com Fixes: 286e04b8ed7a ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages") Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When I was investigating the swap code, I found the below possible race
window:
CPU 1 CPU 2
----- -----
shmem_swapin
swap_cluster_readahead
if (likely(si->flags & (SWP_BLKDEV | SWP_FS_OPS))) {
swapoff
..
si->swap_file = NULL;
..
struct inode *inode = si->swap_file->f_mapping->host;[oops!]
Close this race window by using get/put_swap_device() to guard against
concurrent swapoff.
Link: https://lkml.kernel.org/r/20210426123316.806267-5-linmiaohe@huawei.com Fixes: 8fd2e0b505d1 ("mm: swap: check if swap backing device is congested or not") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Alex Shi <alexs@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>