There are some wrong return values check in sign-file when call OpenSSL
API. The ERR() check cond is wrong because of the program only check the
return value is < 0 which ignored the return val is 0. For example:
1. CMS_final() return 1 for success or 0 for failure.
2. i2d_CMS_bio_stream() returns 1 for success or 0 for failure.
3. i2d_TYPEbio() return 1 for success and 0 for failure.
4. BIO_free() return 1 for success and 0 for failure.
Link: https://www.openssl.org/docs/manmaster/man3/ Fixes: e5a2e3c84782 ("scripts/sign-file.c: Add support for signing with a raw signature") Signed-off-by: Yusong Gao <a869920004@gmail.com> Reviewed-by: Juerg Haefliger <juerg.haefliger@canonical.com> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20231213024405.624692-1-a869920004@gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Generic code will use mdio. If it is not initialized before use,
the kernel will Oops.
Fixes: 30bba69d7db4 ("stmmac: pci: Add dwmac support for Loongson") Signed-off-by: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Issue 1
-------
Description
```````````
Current code does not call dma_sync_single_for_cpu() to sync data from
the device side memory to the CPU side memory before the XDP code path
uses the CPU side data.
This causes the XDP code path to read the unset garbage data in the CPU
side memory, resulting in incorrect handling of the packet by XDP.
Solution
````````
1. Add a call to dma_sync_single_for_cpu() before the XDP code starts to
use the data in the CPU side memory.
2. The XDP code verdict can be XDP_PASS, in which case there is a
fallback to the non-XDP code, which also calls
dma_sync_single_for_cpu().
To avoid calling dma_sync_single_for_cpu() twice:
2.1. Put the dma_sync_single_for_cpu() in the code in such a place where
it happens before XDP and non-XDP code.
2.2. Remove the calls to dma_sync_single_for_cpu() in the non-XDP code
for the first buffer only (rx_copybreak and non-rx_copybreak
cases), since the new call that was added covers these cases.
The call to dma_sync_single_for_cpu() for the second buffer and on
stays because only the first buffer is handled by the newly added
dma_sync_single_for_cpu(). And there is no need for special
handling of the second buffer and on for the XDP path since
currently the driver supports only single buffer packets.
Issue 2
-------
Description
```````````
In case the XDP code forwarded the packet (ENA_XDP_FORWARDED),
ena_unmap_rx_buff_attrs() is called with attrs set to 0.
This means that before unmapping the buffer, the internal function
dma_unmap_page_attrs() will also call dma_sync_single_for_cpu() on
the whole buffer (not only on the data part of it).
This sync is both wasteful (since a sync was already explicitly
called before) and also causes a bug, which will be explained
using the below diagram.
The following diagram shows the flow of events causing the bug.
The order of events is (1)-(4) as shown in the diagram.
CPU side memory area
(3)convert_to_xdp_frame() initializes the
headroom with xdpf metadata
||
\/
___________________________________
| |
0 | V 4K
---------------------------------------------------------------------
| xdpf->data | other xdpf | < data > | tailroom ||...|
| | fields | | GARBAGE || |
---------------------------------------------------------------------
/\ /\
|| ||
(4)ena_unmap_rx_buff_attrs() calls (2)dma_sync_single_for_cpu()
dma_sync_single_for_cpu() on the copies data from device
whole buffer page, overwriting side to CPU side memory
the xdpf->data with GARBAGE. ||
0 4K
---------------------------------------------------------------------
| headroom | < data > | tailroom ||...|
| GARBAGE | | GARBAGE || |
---------------------------------------------------------------------
Device side memory area /\
||
(1) device writes RX packet data
After the call to ena_unmap_rx_buff_attrs() in (4), the xdpf->data
becomes corrupted, and so when it is later accessed in
ena_clean_xdp_irq()->xdp_return_frame(), it causes a page fault,
crashing the kernel.
Solution
````````
Explicitly tell ena_unmap_rx_buff_attrs() not to call
dma_sync_single_for_cpu() by passing it the ENA_DMA_ATTR_SKIP_CPU_SYNC
flag.
Fixes: f7d625adeb7b ("net: ena: Add dynamic recycling mechanism for rx buffers") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-4-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Current xdp code drops packets larger than ENA_XDP_MAX_MTU.
This is an incorrect condition since the problem is not the
size of the packet, rather the number of buffers it contains.
This commit:
1. Identifies and drops XDP multi-buffer packets at the
beginning of the function.
2. Increases the xdp drop statistic when this drop occurs.
3. Adds a one-time print that such drops are happening to
give better indication to the user.
Fixes: 838c93dc5449 ("net: ena: implement XDP drop support") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-3-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The ena_setup_and_create_all_xdp_queues() function freed all the
resources upon failure, after creating only xdp_num_queues queues,
instead of freeing just the created ones.
In this patch, the only resources that are freed, are the ones
allocated right before the failure occurs.
Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Shahar Itzko <itzko@amazon.com> Signed-off-by: David Arinzon <darinzon@amazon.com> Link: https://lore.kernel.org/r/20231211062801.27891-2-darinzon@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
// TCP B: not send challenge ack for ack limit or packet loss
// TCP A: close
tcp_close
tcp_send_fin
if (!tskb && tcp_under_memory_pressure(sk))
tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet
TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN; // set FIN flag
__tcp_retransmit_skb //skb->len=0
tcp_trim_head
len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100
__pskb_trim_head
skb->data_len -= len // skb->len=-1, wrap around
... ...
ip_fragment
icmp_glue_bits //BUG_ON
If we use tcp_trim_head() to remove acked SYN from packet that contains data
or other flags, skb->len will be incorrectly decremented. We can remove SYN
flag that has been acked from rtx_queue earlier than tcp_trim_head(), which
can fix the problem mentioned above.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Co-developed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Link: https://lore.kernel.org/r/20231210020200.1539875-1-dongchenchen2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
qed_ilt_shadow_alloc() will call qed_ilt_shadow_free() to
free p_hwfn->p_cxt_mngr->ilt_shadow on error. However,
qed_cxt_tables_alloc() accesses the freed pointer on failure
of qed_ilt_shadow_alloc() through calling qed_cxt_mngr_free(),
which may lead to use-after-free. Fix this issue by setting
p_mngr->ilt_shadow to NULL in qed_ilt_shadow_free().
Fixes: fe56b9e6a8d9 ("qed: Add module with basic common support") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Link: https://lore.kernel.org/r/20231210045255.21383-1-dinghao.liu@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Make the flow for pci shutdown be the same to the pci remove.
iavf_shutdown was implementing an incomplete version
of iavf_remove. It misses several calls to the kernel like
iavf_free_misc_irq, iavf_reset_interrupt_capability, iounmap
that might break the system on reboot or hibernation.
Implement the call of iavf_remove directly in iavf_shutdown to
close this gap.
Fixes below error messages (dmesg) during shutdown stress tests -
[685814.900917] ice 0000:88:00.0: MAC 02:d0:5f:82:43:5d does not exist for
VF 0
[685814.900928] ice 0000:88:00.0: MAC 33:33:00:00:00:01 does not exist for
VF 0
Reproduction:
1. Create one VF interface:
echo 1 > /sys/class/net/<interface_name>/device/sriov_numvfs
2. Run live dmesg on the host:
dmesg -wH
3. On SUT, script below steps into vf_namespace_assignment.sh
<#!/bin/sh> // Remove <>. Git removes # line
if=<VF name> (edit this per VF name)
loop=0
while true; do
echo test round $loop
let loop++
ip netns add ns$loop
ip link set dev $if up
ip link set dev $if netns ns$loop
ip netns exec ns$loop ip link set dev $if up
ip netns exec ns$loop ip link set dev $if netns 1
ip netns delete ns$loop
done
4. Run the script for at least 1000 iterations on SUT:
./vf_namespace_assignment.sh
Expected result:
No errors in dmesg.
Fixes: 129cf89e5856 ("iavf: rename functions and structs to new name") Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Ranganatha Rao <ranganatha.rao@intel.com> Signed-off-by: Ranganatha Rao <ranganatha.rao@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
ntuple-filter feature on/off:
Default is on. If turned off, the filters will be removed from both
PF and iavf list. The removal is irrespective of current filter state.
Steps to reproduce:
-------------------
1. Ensure ntuple is on.
ethtool -K enp8s0 ntuple-filters on
2. Create a filter to receive the traffic into non-default rx-queue like 15
and ensure traffic is flowing into queue into 15.
Now, turn off ntuple. Traffic should not flow to configured queue 15.
It should flow to default RX queue.
Current FDIR state machines (SM) are not adequate to handle a few
scenarios in the link DOWN/UP event, reset event and ntuple-feature.
For example, when VF link goes DOWN and comes back UP administratively,
the expectation is that previously installed filters should also be
restored. But with current SM, filters are not restored.
So with new SM, during link DOWN filters are marked as INACTIVE in
the iavf list but removed from PF. After link UP, SM will transition
from INACTIVE to ADD_REQUEST to restore the filter.
Similarly, with VF reset, filters will be removed from the PF, but
marked as INACTIVE in the iavf list. Filters will be restored after
reset completion.
Steps to reproduce:
-------------------
1. Create a VF. Here VF is enp8s0.
2. Assign IP addresses to VF and link partner and ping continuously
from remote. Here remote IP is 1.1.1.1.
5. Ensure filter gets added and traffic is received on RX queue 15 now.
Link event testing:
-------------------
6. Bring VF link down and up. If traffic flows to configured queue 15,
test is success, otherwise it is a failure.
Reset event testing:
--------------------
7. Reset the VF. If traffic flows to configured queue 15, test is success,
otherwise it is a failure.
Because rose_ioctl() accesses sk->sk_receive_queue
without holding a sk->sk_receive_queue.lock, it can
cause a race with rose_accept().
A use-after-free for skb occurs with the following flow.
```
rose_ioctl() -> skb_peek()
rose_accept() -> skb_dequeue() -> kfree_skb()
```
Add sk->sk_receive_queue.lock to rose_ioctl() to fix this issue.
Because do_vcc_ioctl() accesses sk->sk_receive_queue
without holding a sk->sk_receive_queue.lock, it can
cause a race with vcc_recvmsg().
A use-after-free for skb occurs with the following flow.
```
do_vcc_ioctl() -> skb_peek()
vcc_recvmsg() -> skb_recv_datagram() -> skb_free_datagram()
```
Add sk->sk_receive_queue.lock to do_vcc_ioctl() to fix this issue.
The current implementation's default Pause Forward setting is causing
unnecessary network traffic. This patch disables Pause Forward to
address this issue.
Fixes: 1121f6b02e7a ("octeontx2-af: Priority flow control configuration support") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Current implementation is such that, promisc mcam entry action
is set as multicast even when there are no trusted VFs. multicast
action causes the hardware to copy packet data, which reduces
the performance.
This patch fixes this issue by setting the promisc mcam entry action to
unicast instead of multicast when there are no trusted VFs. The same
change is made for the 'allmulti' mcam entry action.
Fixes: ffd2f89ad05c ("octeontx2-pf: Enable promisc/allmulti match MCAM entries.") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The firmware ready value is 1, and get firmware ready status
function should explicitly test for that value. The firmware
ready value read will be 2 after driver load, and on unbind
till firmware rewrites the firmware ready back to 0, the value
seen by driver will be 2, which should be regarded as not ready.
Fixes: 10c073e40469 ("octeon_ep: defer probe if firmware not ready") Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The referenced change added custom cleanup code to act_ct to delete any
callbacks registered on the parent block when deleting the
tcf_ct_flow_table instance. However, the underlying issue is that the
drivers don't obtain the reference to the tcf_ct_flow_table instance when
registering callbacks which means that not only driver callbacks may still
be on the table when deleting it but also that the driver can still have
pointers to its internal nf_flowtable and can use it concurrently which
results either warning in netfilter[0] or use-after-free.
Fix the issue by taking a reference to the underlying struct
tcf_ct_flow_table instance when registering the callback and release the
reference when unregistering. Expose new API required for such reference
counting by adding two new callbacks to nf_flowtable_type and implementing
them for act_ct flowtable_ct type. This fixes the issue by extending the
lifetime of nf_flowtable until all users have unregistered.
The rvu_dl will be freed in rvu_nix_health_reporters_destroy(rvu_dl)
after the create_workqueue fails, and after that free, the rvu_dl will
be translate back through the following call chain:
Finally. in the err_dl_health label, rvu_dl being freed again in
rvu_health_reporters_destroy(rvu) by rvu_nix_health_reporters_destroy.
In the second calls of rvu_nix_health_reporters_destroy, however,
it uses rvu_dl->rvu_nix_health_reporter, which is already freed at
the end of rvu_nix_health_reporters_destroy in the first call.
So this patch prevents the first destroy by instantly returning -ENONMEN
when create_workqueue fails. In addition, since the failure of
create_workqueue is the only entrence of label err, it has been
integrated into the error-handling path of create_workqueue.
Fixes: 5ed66306eab6 ("octeontx2-af: Add devlink health reporters for NIX") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The old implementation extracted VLAN TCI info from the payload
before the VLAN tag has been pushed in the payload.
Another problem was that the VLAN TCI was extracted even if the
packet did not have VLAN protocol header.
This resulted in invalid VLAN TCI and as a consequence a random
queue was computed.
This patch fixes the above issues and use the VLAN TCI from the
skb if it is present or VLAN TCI from payload if present. If no
VLAN header is present queue 0 is selected.
Fixes: 52c4a1a85f4b ("net: fec: add ndo_select_queue to fix TX bandwidth fluctuations") Signed-off-by: Radu Bulie <radu-andrei.bulie@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
As &card->tx_queue_lock is acquired under softirq context along the
following call chain from solos_bh(), other acquisition of the same
lock inside process context should disable at least bh to avoid double
lock.
This flaw was found by an experimental static analysis tool I am
developing for irq-related deadlock.
To prevent the potential deadlock, the patch uses spin_lock_bh()
on &card->tx_queue_lock under process context code consistently to
prevent the possible deadlock scenario.
Fixes: 213e85d38912 ("solos-pci: clean up pclose() function") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
As &card->cli_queue_lock is acquired under softirq context along the
following call chain from solos_bh(), other acquisition of the same
lock inside process context should disable at least bh to avoid double
lock.
This flaw was found by an experimental static analysis tool I am
developing for irq-related deadlock.
To prevent the potential deadlock, the patch uses spin_lock_bh()
on the card->cli_queue_lock under process context code consistently
to prevent the possible deadlock scenario.
Fixes: 9c54004ea717 ("atm: Driver for Solos PCI ADSL2+ card.") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the chip is configured to timestamp all receive packets, the
timestamp in the RX completion is only valid if the metadata
present flag is not set for packets received on the wire. In
addition, internal loopback packets will never have a valid timestamp
and the timestamp field will always be zero. We must exclude
any 0 value in the timestamp field because there is no way to
determine if it is a loopback packet or not.
Add a new function bnxt_rx_ts_valid() to check for all timestamp
valid conditions.
Fixes: 66ed81dcedc6 ("bnxt_en: Enable packet timestamping for all RX packets") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231208001658.14230-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The wait_event_interruptible_timeout() function returns 0
if the timeout elapsed, -ERESTARTSYS if it was interrupted
by a signal, and the remaining jiffies otherwise if the
condition evaluated to true before the timeout elapsed.
Driver should have checked for zero return value instead of
a positive value.
MChan: Print a warning for -ERESTARTSYS. The close operation
will proceed anyway when wait_event_interruptible_timeout()
returns for any reason. Since we do the close no matter what,
we should not return this error code to the caller. Change
bnxt_close_nic() to a void function and remove all error
handling from some of the callers.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231208001658.14230-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Receive SKBs can go through the VF-rep path or the normal path.
skb_mark_for_recycle() is only called for the normal path. Fix it
to do it for both paths to fix possible stalled page pool shutdown
errors.
Fixes: 86b05508f775 ("bnxt_en: Use the unified RX page pool buffers for XDP and non-XDP") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231208001658.14230-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We are issuing HWRM_FUNC_RESET cmd to reset the device including
all reserved resources, but not clearing the reservations
within the driver struct. As a result, when the driver re-initializes
as part of resume, it believes that there is no need to do any
resource reservation and goes ahead and tries to allocate rings
which will eventually fail beyond a certain number pre-reserved by
the firmware.
Fixes: 674f50a5b026 ("bnxt_en: Implement new method to reserve rings.") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20231208001658.14230-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In case of a reset triggered by the QCA7000 itself, the behavior of the
qca_spi driver was not quite correct:
- in case of a pending RX frame decoding the drop counter must be
incremented and decoding state machine reseted
- also the reset counter must always be incremented regardless of sync
state
Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20231206141222.52029-4-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The reason for this is that the readonly setting rx_pending get
initialized and after that the range check in qcaspi_set_ringparam()
fails regardless of the provided parameter. So fix this by accepting
the exposed RX defaults. Instead of adding another magic number
better use a new define here.
Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20231206141222.52029-3-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The qca_spi driver stop and restart the SPI kernel thread
(via ndo_stop & ndo_open) in case of TX ring changes. This is
a big issue because it allows userspace to prevent restart of
the SPI kernel thread (via signals). A subsequent change of
TX ring wrongly assume a valid spi_thread pointer which result
in a crash.
So prevent this by stopping the network traffic handling and
temporary park the SPI thread.
Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20231206141222.52029-2-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Lorenzo points out that we effectively clear all unknown
flags from PIO when copying them to userspace in the netlink
RTM_NEWPREFIX notification.
We could fix this one at a time as new flags are defined,
or in one fell swoop - I choose the latter.
We could either define 6 new reserved flags (reserved1..6) and handle
them individually (and rename them as new flags are defined), or we
could simply copy the entire unmodified byte over - I choose the latter.
This unfortunately requires some anonymous union/struct magic,
so we add a static assert on the struct size for a little extra safety.
Cc: David Ahern <dsahern@kernel.org> Cc: Lorenzo Colitti <lorenzo@google.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Previously, when comparing the net namespaces, the case where the netdev
doesn't exist wasn't taken into account, and therefore can cause a crash.
In such a case, the comparing function should return false, as there is no
netdev->net to compare the devlink->net to.
Furthermore, this will result in an attempt to enter switchdev mode
without a netdev to fail, and which is the desired result as there is no
meaning in switchdev mode without a net device.
Current sync reset flow is not supported when PCIe bridge connected
directly to mlx5 device has HotPlug interrupt enabled and can be
triggered on link state change event. Return nack on reset request in
such case.
Fixes: 92501fa6e421 ("net/mlx5: Ack on sync_reset_request only if PF can do reset_now") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Due to the cited patch, devlink health commands take devlink lock and
this may result in deadlock for mlx5e_tx_reporter as it takes local
state_lock before calling devlink health report and on the other hand
devlink health commands such as diagnose for same reporter take local
state_lock after taking devlink lock (see kernel log below).
To fix it, remove local state_lock from mlx5e_tx_timeout_work() before
calling devlink_health_report() and take care to cancel the work before
any call to close channels, which may free the SQs that should be
handled by the work. Before cancel_work_sync(), use current_work() to
check we are not calling it from within the work, as
mlx5e_tx_timeout_work() itself may close the channels and reopen as part
of recovery flow.
While removing state_lock from mlx5e_tx_timeout_work() keep rtnl_lock to
ensure no change in netdev->real_num_tx_queues, but use rtnl_trylock()
and a flag to avoid deadlock by calling cancel_work_sync() before
closing the channels while holding rtnl_lock too.
Kernel log:
======================================================
WARNING: possible circular locking dependency detected
6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Not tainted
------------------------------------------------------
kworker/u16:2/65 is trying to acquire lock: ffff888122f6c2f8 (&devlink->lock_key#2){+.+.}-{3:3}, at: devlink_health_report+0x2f1/0x7e0
but task is already holding lock: ffff888121d20be0 (&priv->state_lock){+.+.}-{3:3}, at: mlx5e_tx_timeout_work+0x70/0x280 [mlx5_core]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
After IPSec TX tables are destroyed, the flow rules in TC rhashtable,
which have the destination to IPSec, are restored to the original
one, the uplink.
However, when the device is in switchdev mode and unload driver with
IPSec rules configured, TC rhashtable cleanup is done before IPSec
cleanup, which means tc_ht->tbl is already freed when walking TC
rhashtable, in order to restore the destination. So add the checking
before walking to avoid unexpected behavior.
Fixes: d1569537a837 ("net/mlx5e: Modify and restore TC rules for IPSec TX rules") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently eswitch mode_lock is so heavy, for example, it's locked
during the whole process of the mode change, which may need to hold
other locks. As the mode_lock is also used by IPSec to block mode and
encap change now, it is easy to cause lock dependency.
Since some of protections are also done by devlink lock, the eswitch
mode_lock is not needed at those places, and thus the possibility of
lockdep issue is reduced.
Fixes: c8e350e62fc5 ("net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
IPsec NAT-T packets are UDP encapsulated packets over ESP normal ones.
In case they arrive to RX, the SPI and ESP are located in inner header,
while the check was performed on outer header instead.
That wrong check caused to the situation where received rekeying request
was missed and caused to rekey timeout, which "compensated" this failure
by completing rekeying.
Fixes: d65954934937 ("net/mlx5e: Support IPsec NAT-T functionality") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Change normal IPsec flow to use the same creation/destruction functions
for status flow table as that of ESW, which first of all refines the
code to have less code duplication.
And more importantly, the ESW status table handles IPsec syndrome
checks at steering by HW, which is more efficient than the previous
behaviour we had where it was copied to WQE meta data and checked
by the driver.
Fixes: 1762f132d542 ("net/mlx5e: Support IPsec packet offload for RX in switchdev mode") Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
According to RFC4303, section "3.3.3. Sequence Number Generation",
the first packet sent using a given SA will contain a sequence
number of 1.
However if user didn't set seq/oseq, the HW used zero as first sequence
packet number. Such misconfiguration causes to drop of first packet
if replay window protection was enabled in SA.
To fix it, set sequence number to be at least 1.
Fixes: 7db21ef4566e ("net/mlx5e: Set IPsec replay sequence numbers") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Users can configure IPsec replay window size, but mlx5 driver didn't
honor their choice and set always 32bits. Fix assignment logic to
configure right size from the beginning.
Fixes: 7db21ef4566e ("net/mlx5e: Set IPsec replay sequence numbers") Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 46a0a2c96f0f ("HID: lenovo: Detect quirk-free fw on cptkbd and
stop applying workaround") introduced a regression for ThinkPad
TrackPoint Keyboard II which has similar quirks to cptkbd (so it uses
the same workarounds) but slightly different so that there are
false-positives during detecting well-behaving firmware. This commit
restricts detecting well-behaving firmware to the only model which
known to have one and have stable enough quirks to not cause
false-positives.
If an AFS cell that has an unreachable (eg. ENETUNREACH) server listed (VL
server or fileserver), an asynchronous probe to one of its addresses may
fail immediately because sendmsg() returns an error. When this happens, a
refcount underflow can happen if certain events hit a very small window.
The way this occurs is:
(1) There are two levels of "call" object, the afs_call and the
rxrpc_call. Each of them can be transitioned to a "completed" state
in the event of success or failure.
(2) Asynchronous afs_calls are self-referential whilst they are active to
prevent them from evaporating when they're not being processed. This
reference is disposed of when the afs_call is completed.
Note that an afs_call may only be completed once; once completed
completing it again will do nothing.
(3) When a call transmission is made, the app-side rxrpc code queues a Tx
buffer for the rxrpc I/O thread to transmit. The I/O thread invokes
sendmsg() to transmit it - and in the case of failure, it transitions
the rxrpc_call to the completed state.
(4) When an rxrpc_call is completed, the app layer is notified. In this
case, the app is kafs and it schedules a work item to process events
pertaining to an afs_call.
(5) When the afs_call event processor is run, it goes down through the
RPC-specific handler to afs_extract_data() to retrieve data from rxrpc
- and, in this case, it picks up the error from the rxrpc_call and
returns it.
The error is then propagated to the afs_call and that is completed
too. At this point the self-reference is released.
(6) If the rxrpc I/O thread manages to complete the rxrpc_call within the
window between rxrpc_send_data() queuing the request packet and
checking for call completion on the way out, then
rxrpc_kernel_send_data() will return the error from sendmsg() to the
app.
(7) Then afs_make_call() will see an error and will jump to the error
handling path which will attempt to clean up the afs_call.
(8) The problem comes when the error handling path in afs_make_call()
tries to unconditionally drop an async afs_call's self-reference.
This self-reference, however, may already have been dropped by
afs_extract_data() completing the afs_call
(9) The refcount underflows when we return to afs_do_probe_vlserver() and
that tries to drop its reference on the afs_call.
Fix this by making afs_make_call() attempt to complete the afs_call rather
than unconditionally putting it. That way, if afs_extract_data() manages
to complete the call first, afs_make_call() won't do anything.
The bug can be forced by making do_udp_sendmsg() return -ENETUNREACH and
sticking an msleep() in rxrpc_send_data() after the 'success:' label to
widen the race window.
River reports boot hangs with v6.6 and v6.7, and the bisect points to
commit
a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
which moves the memory allocation and kernel decompression from the
legacy decompressor (which executes *after* ExitBootServices()) to the
EFI stub, using boot services for allocating the memory. The memory
allocation succeeds but the subsequent call to decompress_kernel() never
returns, resulting in a failed boot and a hanging system.
As it turns out, this issue only occurs when physical address
randomization (KASLR) is enabled, and given that this is a feature we
can live without (virtual KASLR is much more important), let's disable
the physical part of KASLR when booting on AMI UEFI firmware claiming to
implement revision v2.0 of the specification (which was released in
2006), as this is the version these systems advertise.
In smb2_lock(), if setup_async_work() executes successfully,
work->cancel_argv will bind the argv that generated by kmalloc(). And
release_async_work() is called in ksmbd_conn_try_dequeue_request() or
smb2_lock() to release argv.
However, when setup_async_work function fails, work->cancel_argv has not
been bound to the argv, resulting in the previously allocated argv not
being released. Call kfree() to fix it.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The syzbot has reported that it can hit the warning in
ext4_dio_write_end_io() because i_size < i_disksize. Indeed the
reproducer creates a race between DIO IO completion and truncate
expanding the file and thus ext4_dio_write_end_io() sees an inconsistent
inode state where i_disksize is already updated but i_size is not
updated yet. Since we are careful when setting up DIO write and consider
it extending (and thus performing the IO synchronously with i_rwsem held
exclusively) whenever it goes past either of i_size or i_disksize, we
can use the same test during IO completion without risking entering
ext4_handle_inode_extension() without i_rwsem held. This way we make it
obvious both i_size and i_disksize are large enough when we report DIO
completion without relying on unreliable WARN_ON.
Reported-by: <syzbot+47479b71cdfc78f56d30@syzkaller.appspotmail.com> Fixes: 91562895f803 ("ext4: properly sync file size update after O_SYNC direct IO") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20231130095653.22679-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
Update code comment to clarify that the only element whose layout is
not randomized is a proper C99 flexible-array member. This update is
complementary to commit 1ee60356c2dc ("gcc-plugins: randstruct: Only
warn about true flexible arrays")
Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZWJr2MWDjXLHE8ap@work Fixes: 1ee60356c2dc ("gcc-plugins: randstruct: Only warn about true flexible arrays") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Limit the speaker digital gains to 0dB so that the users will not damage them.
Currently there is a limit in UCM, but this does not stop the user form
changing the digital gains from command line. So limit this in driver
which makes the speakers more safer without active speaker protection in
place.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20231204124736.132185-3-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
[ johan: backport to 6.6; rename snd_soc_rtd_to_cpu() ] Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Otherwise set elements can be deactivated twice which will cause a crash.
Reported-by: Xingyuan Mo <hdthky0@gmail.com> Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are many types of revsered memory passed from firmware
that should be reserved in memblock, and UMA memory passed
from firmware that should be added to system memory for system
to use.
Also for memblock there is no need to align those space into page,
which actually cause problems.
Handle them properly to prevent memory corruption on some systems.
Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vgabios is passed from firmware to kernel on Loongson64 systems.
Sane firmware will keep this pointer in reserved memory space
passed from the firmware but insane firmware keeps it in low
memory before kernel entry that is not reserved.
Previously kernel won't try to allocate memory from low memory
before kernel entry on boot, but after converting to memblock
it will do that.
Fix by resversing those memory on early boot.
Cc: stable@vger.kernel.org Fixes: a94e4f24ec83 ("MIPS: init: Drop boot_mem_map") Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A metric is default by having "Default" within its groups. The default
metricgroup name needn't be set and this can result in segv in
default_metricgroup_cmp and perf_stat__print_shadow_stats_metricgroup
that assume it has a value when there is a Default metric group. To
avoid the segv initialize the value to "".
Fixes: 1c0e47956a8e ("perf metrics: Sort the Default metricgroup") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-and-tested-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: James Clark <james.clark@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Cc: stable@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20231204182330.654255-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Json output didn't set the skip_duplicate_pmus callback yielding a
segfault.
Fixes: cd4e1efbbc40 ("perf pmus: Skip duplicate PMUs and don't print list suffix by default") Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231129213428.2227448-2-irogers@google.com
[namhyung: updated subject line according to Arnaldo] Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In general, activating long mode involves setting the EFER_LME bit in
the EFER register and then enabling the X86_CR0_PG bit in the CR0
register. At this point, the EFER_LMA bit will be set automatically by
hardware.
In the case of SVM/SEV guests where writes to CR0 are intercepted, it's
necessary for the host to set EFER_LMA on behalf of the guest since
hardware does not see the actual CR0 write.
In the case of SEV-ES guests where writes to CR0 are trapped instead of
intercepted, the hardware *does* see/record the write to CR0 before
exiting and passing the value on to the host, so as part of enabling
SEV-ES support commit f1c6366e3043 ("KVM: SVM: Add required changes to
support intercepts under SEV-ES") dropped special handling of the
EFER_LMA bit with the understanding that it would be set automatically.
However, since the guest never explicitly sets the EFER_LMA bit, the
host never becomes aware that it has been set. This becomes problematic
when userspace tries to get/set the EFER values via
KVM_GET_SREGS/KVM_SET_SREGS, since the EFER contents tracked by the host
will be missing the EFER_LMA bit, and when userspace attempts to pass
the EFER value back via KVM_SET_SREGS it will fail a sanity check that
asserts that EFER_LMA should always be set when X86_CR0_PG and EFER_LME
are set.
Fix this by always inferring the value of EFER_LMA based on X86_CR0_PG
and EFER_LME, regardless of whether or not SEV-ES is enabled.
Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts under SEV-ES") Reported-by: Peter Gonda <pgonda@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210507165947.2502412-2-seanjc@google.com>
[A two year old patch that was revived after we noticed the failure in
KVM_SET_SREGS and a similar patch was posted by Michael Roth. This is
Sean's patch, but with Michael's more complete commit message. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the CMMA state needs to be reset, the no-dat bit also needs to be
reset. Failure to do so could cause issues in the guest, since the
guest expects the bit to be cleared after a reset.
io_uring sets up the io worker kernel thread via a syscall out of an
user space prrocess. This process might have used FPU and since
copy_thread() didn't clear FPU states for kernel threads a BUG()
is triggered for using FPU inside kernel. Move code around
to always clear FPU state for user and kernel threads.
Cc: stable@vger.kernel.org Reported-by: Aurelien Jarno <aurel32@debian.org> Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1055021 Suggested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix a number of issues in the cifs filesystem implementation of the FICLONE
ioctl in cifs_remap_file_range(). This is analogous to the previously
fixed bug in cifs_file_copychunk_range() and can share the helper
functions.
Firstly, the invalidation of the destination range is handled incorrectly:
We shouldn't just invalidate the whole file as dirty data in the file may
get lost and we can't just call truncate_inode_pages_range() to invalidate
the destination range as that will erase parts of a partial folio at each
end whilst invalidating and discarding all the folios in the middle. We
need to force all the folios covering the range to be reloaded, but we
mustn't lose dirty data in them that's not in the destination range.
Further, we shouldn't simply round out the range to PAGE_SIZE at each end
as cifs should move to support multipage folios.
Secondly, there's an issue whereby a write may have extended the file
locally, but not have been written back yet. This can leaves the local
idea of the EOF at a later point than the server's EOF. If a clone request
is issued, this will fail on the server with STATUS_INVALID_VIEW_SIZE
(which gets translated to -EIO locally) if the clone source extends past
the server's EOF.
Fix this by:
(0) Flush the source region (already done). The flush does nothing and
the EOF isn't moved if the source region has no dirty data.
(1) Move the EOF to the end of the source region if it isn't already at
least at this point. If we can't do this, for instance if the server
doesn't support it, just flush the entire source file.
(2) Find the folio (if present) at each end of the range, flushing it and
increasing the region-to-be-invalidated to cover those in their
entirety.
(3) Fully discard all the folios covering the range as we want them to be
reloaded.
(4) Then perform the extent duplication.
Thirdly, set i_size after doing the duplicate_extents operation as this
value may be used by various things internally. stat() hides the issue
because setting ->time to 0 causes cifs_getatr() to revalidate the
attributes.
These were causing the cifs/001 xfstest to fail.
Fixes: 04b38d601239 ("vfs: pull btrfs clone API to vfs layer") Signed-off-by: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org
cc: Christoph Hellwig <hch@lst.de>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix a number of issues in the cifs filesystem implementation of the
copy_file_range() syscall in cifs_file_copychunk_range().
Firstly, the invalidation of the destination range is handled incorrectly:
We shouldn't just invalidate the whole file as dirty data in the file may
get lost and we can't just call truncate_inode_pages_range() to invalidate
the destination range as that will erase parts of a partial folio at each
end whilst invalidating and discarding all the folios in the middle. We
need to force all the folios covering the range to be reloaded, but we
mustn't lose dirty data in them that's not in the destination range.
Further, we shouldn't simply round out the range to PAGE_SIZE at each end
as cifs should move to support multipage folios.
Secondly, there's an issue whereby a write may have extended the file
locally, but not have been written back yet. This can leaves the local
idea of the EOF at a later point than the server's EOF. If a copy request
is issued, this will fail on the server with STATUS_INVALID_VIEW_SIZE
(which gets translated to -EIO locally) if the copy source extends past the
server's EOF.
Fix this by:
(0) Flush the source region (already done). The flush does nothing and
the EOF isn't moved if the source region has no dirty data.
(1) Move the EOF to the end of the source region if it isn't already at
least at this point. If we can't do this, for instance if the server
doesn't support it, just flush the entire source file.
(2) Find the folio (if present) at each end of the range, flushing it and
increasing the region-to-be-invalidated to cover those in their
entirety.
(3) Fully discard all the folios covering the range as we want them to be
reloaded.
(4) Then perform the copy.
Thirdly, set i_size after doing the copychunk_range operation as this value
may be used by various things internally. stat() hides the issue because
setting ->time to 0 causes cifs_getatr() to revalidate the attributes.
These were causing the generic/075 xfstest to fail.
Fixes: 620d8745b35d ("Introduce cifs_copy_file_range()") Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The KOBJ_CHANGE uevent is sent before gadget unbind is actually
executed, resulting in inaccurate uevent emitted at incorrect timing
(the uevent would have USB_UDC_DRIVER variable set while it would
soon be removed).
Move the KOBJ_CHANGE uevent to the end of the unbind function so that
uevent is sent only after the change has been made.
Commit 41a506ef71eb ("powerpc/ftrace: Create a dummy stackframe to fix
stack unwind") added use of a new stack frame on ftrace entry to fix
stack unwind. However, the commit missed updating the offset used while
tearing down the ftrace stack when ftrace is disabled. Fix the same.
In addition, the commit missed saving the correct stack pointer in
pt_regs. Update the same.
Fixes: 41a506ef71eb ("powerpc/ftrace: Create a dummy stackframe to fix stack unwind") Cc: stable@vger.kernel.org # v6.5+ Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231130065947.2188860-1-naveen@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dev_coredumpm() creates a devcoredump device and adds it
to the core kernel framework which eventually end up
sending uevent to the user space and later creates a
symbolic link to the failed device. An application
running in userspace may be interested in this symbolic
link to get the name of the failed device.
In a issue scenario, once uevent sent to the user space
it start reading '/sys/class/devcoredump/devcdX/failing_device'
to get the actual name of the device which might not been
created and it is in its path of creation.
To fix this, suppress sending uevent till the failing device
symbolic link gets created and send uevent once symbolic
link is created successfully.
Starting RX DMA on THRI interrupt is too early because TX may not have
finished yet.
This change is inspired by commit 90b8596ac460 ("serial: 8250: Prevent
starting up DMA Rx on THRI interrupt") and fixes DMA issues I had with
an AM62 SoC that is using the 8250 OMAP variant.
Cc: stable@vger.kernel.org Fixes: c26389f998a8 ("serial: 8250: 8250_omap: Add DMA support for UARTs on K3 SoCs") Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com> Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com> Link: https://lore.kernel.org/r/20231101171431.16495-1-rwahl@gmx.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes commit 439c7183e5b9 ("serial: 8250: 8250_omap: Disable RX
interrupt after DMA enable") which unfortunately set the
UART_HAS_RHR_IT_DIS bit in the UART_OMAP_IER2 register and never
cleared it.
This device has a silicon bug that makes it report a timeout interrupt
but no data in the FIFO.
The datasheet states the following in the errata section 18.1.4:
"If the host reads the receive FIFO at the same time as a
time-out interrupt condition happens, the host might read 0xCC
(time-out) in the Interrupt Indication Register (IIR), but bit 0
of the Line Status Register (LSR) is not set (means there is no
data in the receive FIFO)."
The errata description seems to indicate it concerns only polled mode of
operation when reading bit 0 of the LSR register. However, tests have
shown and NXP has confirmed that the RXLVL register also yields 0 when
the bug is triggered, and hence the IRQ driven implementation in this
driver is equally affected.
This bug has hit us on production units and when it does, sc16is7xx_irq()
would spin forever because sc16is7xx_port_irq() keeps seeing an
interrupt in the IIR register that is not cleared because the driver
does not call into sc16is7xx_handle_rx() unless the RXLVL register
reports at least one byte in the FIFO.
Fix this by always reading one byte from the FIFO when this condition
is detected in order to clear the interrupt. This approach was
confirmed to be correct by NXP through their support channels.
Tested by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Since there is no guarantee that the memory returned by
dma_alloc_coherent() is associated with a 'struct page', using the
architecture specific phys_to_page() is wrong, but using
virt_to_page() would be as well.
Stop using sg lists altogether and just use the *_single() functions
instead. This also simplifies the code a bit since the scatterlists in
this driver always have only one entry anyway.
https://lore.kernel.org/lkml/86db0fe5-930d-4cbb-bd7d-03367da38951@app.fastmail.com/
Use consistent names for dma buffers
gc: Add a commit log from the initial thread:
https://lore.kernel.org/lkml/86db0fe5-930d-4cbb-bd7d-03367da38951@app.fastmail.com/
Use consistent names for dma buffers
When typec_altmode_put_partner is called by a plug altmode upon release,
the port altmode the plug belongs to will not remove its reference to the
plug. The check to see if the altmode being released evaluates against the
released altmode's partner instead of the calling altmode itself, so change
adev in typec_altmode_put_partner to properly refer to the altmode being
released.
typec_altmode_set_partner is not run for port altmodes, so also add a check
in typec_altmode_release to prevent typec_altmode_put_partner() calls on
port altmode release.
Fixes: 8a37d87d72f0 ("usb: typec: Bus type for alternate modes") Cc: stable@vger.kernel.org Signed-off-by: RD Babiera <rdbabiera@google.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20231129192349.1773623-2-rdbabiera@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Enabling runtime pm as default for all AMD xHC 1.1 controllers caused
regression. An initial attempt to fix those was done in commit a5d6264b638e
("xhci: Enable RPM on controllers that support low-power states") but new
issues are still seen.
Revert this to get those AMD xHC 1.1 systems working
This patch went to stable an needs to be reverted from there as well.
Deduplication isn't supported on cifs, but cifs doesn't reject it, instead
treating it as extent duplication/cloning. This can cause generic/304 to go
silly and run for hours on end.
Fix cifs to indicate EOPNOTSUPP if REMAP_FILE_DEDUP is set in
->remap_file_range().
Note that it's unclear whether or not commit b073a08016a1 is meant to cause
cifs to return an error if REMAP_FILE_DEDUP.
Fixes: b073a08016a1 ("cifs: fix that return -EINVAL when do dedupe operation") Cc: stable@vger.kernel.org Suggested-by: Dave Chinner <david@fromorbit.com>
cc: Xiaoli Feng <fengxiaoli0714@gmail.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: Darrick Wong <darrick.wong@oracle.com>
cc: fstests@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/3876191.1701555260@warthog.procyon.org.uk/ Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The console is immediately assigned to the ma35d1 port without
checking its index. This oversight can lead to out-of-bounds
errors when the index falls outside the valid '0' to
MA35_UART_NR range. Such scenario trigges ran error like the
following:
UBSAN: array-index-out-of-bounds in drivers/tty/serial/ma35d1_serial.c:555:51
index -1 is out of range for type 'uart_ma35d1_port [17]
Check the index before using it and bail out with a warning.
Two series lived in parallel for some time, which led to this situation:
- The nvmem-layout container is used for dynamic layouts
- We now expect fixed layouts to also use the nvmem-layout container but
this does not require any additional driver, the support is built-in the
nvmem core.
Ensure we don't refuse to probe for wrong reasons.
The commit 89ff3dfac604 ("usb: gadget: f_hid: fix f_hidg lifetime vs
cdev") has introduced a bug that leads to hid device corruption after
the replug operation.
Reverse device managed memory allocation for the report descriptor
to fix the issue.
Tested:
This change was tested on the AMD EthanolX CRB server with the BMC
based on the OpenBMC distribution. The BMC provides KVM functionality
via the USB gadget device:
- before: KVM page refresh results in a broken USB device,
- after: KVM page refresh works without any issues.
It seems that the pointer-to-kretprobe "rp" within the kretprobe_holder is
RCU-managed, based on the (non-rethook) implementation of get_kretprobe().
The thought behind this patch is to make use of the RCU API where possible
when accessing this pointer so that the needed barriers are always in place
and to self-document the code.
The __rcu annotation to "rp" allows for sparse RCU checking. Plain writes
done to the "rp" pointer are changed to make use of the RCU macro for
assignment. For the single read, the implementation of get_kretprobe()
is simplified by making use of an RCU macro which accomplishes the same,
but note that the log warning text will be more generic.
I did find that there is a difference in assembly generated between the
usage of the RCU macros vs without. For example, on arm64, when using
rcu_assign_pointer(), the corresponding store instruction is a
store-release (STLR) which has an implicit barrier. When normal assignment
is done, a regular store (STR) is found. In the macro case, this seems to
be a result of rcu_assign_pointer() using smp_store_release() when the
value to write is not NULL.
Volume can have ranges that start with negative values, ex: -84dB to
+40dB. Apply correct range check in snd_soc_limit_volume before setting
the platform_max. Without this patch, for example setting a 0dB limit on
a volume range of -84dB to +40dB would fail.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Tested-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20231204124736.132185-2-srinivas.kandagatla@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If gpio_set_transitory() fails, we should free the GPIO again. Most
notably, the flag FLAG_REQUESTED has previously been set in
gpiod_request_commit(), and should be reset on failure.
To my knowledge, this does not affect any current users, since the
gpio_set_transitory() mainly returns 0 and -ENOTSUPP, which is converted
to 0. However the gpio_set_transitory() function calles the .set_config()
function of the corresponding GPIO chip and there are some GPIO drivers in
which some (unlikely) branches return other values like -EPROBE_DEFER,
and -EINVAL. In these cases, the above mentioned FLAG_REQUESTED would not
be reset, which results in the pin being blocked until the next reboot.
A write-access violation page fault kernel crash was observed while running
cpuhotplug LTP testcases on SEV-ES enabled systems. The crash was
observed during hotplug, after the CPU was offlined and the process
was migrated to different CPU. setup_ghcb() is called again which
tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
is a read_only variable which is initialised during booting.
Budimir noted that perf_event_validate_size() only checks the size of
the newly added event, even though the sizes of all existing events
can also change due to not all events having the same read_format.
When we attach the new event, perf_group_attach(), we do re-compute
the size for all events.
Enable GENERIC_BUG_RELATIVE_POINTERS which will store 32-bit relative
offsets to the bug address and the source file name instead of 64-bit
absolute addresses. This effectively reduces the size of the
bug_table[] array by half on 64-bit kernels.
Signed-off-by: Helge Deller <deller@gmx.de>
Stable-dep-of: 487635756198 ("parisc: Fix asm operand number out of range build error in bug table") Signed-off-by: Sasha Levin <sashal@kernel.org>
The pointer returned is later signed-extended to 0xfffffffffc00b480 at
`BPF_JMP | BPF_EXIT`. During BPF prog run, this triggers unhandled page
fault and a kernel panic.
Drop the unnecessary signed-extension on return values like other
architectures do.
The expression `task->cgroups->dfl_cgrp` involves two memory load.
Since the field offset fits in imm12 or imm14, we use ldd or ldptrd
instructions. But both instructions have the side effect that it will
signed-extended the imm operand. Finally, we got the wrong addresses
and panics is inevitable.
Use a generic ldxd instruction to avoid this kind of issues.
AmpereOne metrics were missing DefaultMetricgroupName from metrics with
"Default" in group name resulting perf to segfault. Add the missing
field to address the issue.
Fixes: 59faeaf80d02 ("perf vendor events arm64: Fix for AmpereOne metrics") Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: John Garry <john.g.garry@oracle.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20231201021550.1109196-2-ilkka@os.amperecomputing.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Clang static analyzer complains that value stored to 'rets' is never
read.Let 'buf_len = -EOVERFLOW' to make sure we can return '-EOVERFLOW'.
Fixes: 8c8d964ce90f ("mei: move hbuf_depth from the mei device to the hw modules") Signed-off-by: Su Hui <suhui@nfschina.com> Link: https://lore.kernel.org/r/20231120095523.178385-2-suhui@nfschina.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>