git.ipfire.org Git - thirdparty/kernel/stable.git/log

wifi: mac80211: send extended MLD capa/ops if AP has it

Currently the code only sends extended MLD capa/ops in
strict mode, but if the AP has it then it should also
be able to parse it. There could be cases where the AP
doesn't have it but we would want to advertise it (e.g.
if the AP supports nothing but we want to have BTM.),
but given the broken deployed APs out there right now
this is the best we can do.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213232.c9b8b3a6ca77.I1153d4283d1fbb9e5db60e7b939cc133a6345db5@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: copy first_part into HW scan

cfg80211 now reports whether this is the first part of a scan. Copy
that information into the driver request.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213231.63f6078bd7be.Ia6e5cee945e6d9617c2f427552d89d23c92eee83@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: cfg80211: add a flag for the first part of a scan

When there are no non-6 GHz channels, then the 6 GHz scan is the first
part of a split scan. Add a boolean denoting whether the scan is the
first part of a scan as it might be useful to drivers for internal
bookkeeping. This flag is also set if the scan is not split.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213231.07e5a8a452ec.Ibf18f513e507422078fb31b28947e582a20df87a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: remove DISALLOW_PUNCTURING_5GHZ code

Since iwlwifi was the only driver using this and no
longer does, we can remove all this code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213231.4dff5fb8890f.Ie531f912b252a0042c18c0734db50c3afe1adfb5@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: cfg80211: only verify part of Extended MLD Capabilities

We verify that the Extended MLD Capabilities are matching between links.
However, some bits are reserved and in particular the Recommended Max
Links subfield may not necessarily match. So only verify the known
subfields that can reliably be expected to be the same. More information
can be found in Table 9-417o, in IEEE P802.11be/D7.0.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213231.a2fad48dd3e6.Iae1740cd2ac833bc4a64fd2af718e1485158fd42@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: nl80211: make nl80211_check_scan_flags() type safe

The cast from void * here coupled with the boolean argument
on what to cast to is confusing and really not needed, just
split the code and make a type-safe interface. It seems to
even reduce the code size slightly, at least on x86-64.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213231.bdb3c96570b0.Ia153e6ce06dc9a636ff5bcc1d52468a1afd06e13@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: cfg80211: hide scan internals

Hide the internal scan fields from mac80211 and drivers, the
'notified' variable is for internal tracking, and the 'info'
is output that's passed to cfg80211_scan_done() and stored
only for delayed userspace notification.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213231.6a62e41858e2.I004f66e9c087cc6e6ae4a24951cf470961ee9466@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: fix deactivated link CSA

If the link is deactivated and the CSA completes, then that
needs to update the link station's bandwidth (only the AP STA
can exist at this point, no TDLS on inactive links) and set
the CSA to no longer be active. Fix this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213231.07f120cf687d.I5a868c501ee73fcc2355d61c2ee06e5f444b350f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: add mandatory bitrate support for 6 GHz

When a new station is added, ensure that mandatory bit-rates
are enabled for 6 GHz band.

Signed-off-by: Somashekhar Puttagangaiah <somashekhar.puttagangaiah@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213231.4aecd7f3b85b.I33a54872a3267c9f6155ce537d6c9c2a31c3f117@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: remove spurious blank line

ieee80211_process_ml_reconf_resp() has a blank line between
an if statement and the covered code, remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250609213231.a1f4ceae700d.I1d7aae17cc466c1648f31c42b935165db85d2809@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: verify state before connection

ieee80211_prep_connection is supposed to be called when both bitmaps
(valid_links and active_links) are cleared.
Make sure of it and WARN if this is not the case, to avoid weird issues.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250609213231.f616c7b693df.Ie983155627ad0d2e7c19c30ce642915246d0ed9d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mac80211: avoid weird state in error path

If we get to the error path of ieee80211_prep_connection, for example
because of a FW issue, then ieee80211_vif_set_links is called
with 0.
But the call to drv_change_vif_links from ieee80211_vif_update_links
will probably fail as well, for the same reason.
In this case, the valid_links and active_links bitmaps will be reverted
to the value of the failing connection.
Then, in the next connection, due to the logic of
ieee80211_set_vif_links_bitmaps, valid_links will be set to the ID of
the new connection assoc link, but the active_links will remain with the
ID of the old connection's assoc link.
If those IDs are different, we get into a weird state of valid_links and
active_links being different. One of the consequences of this state is
to call drv_change_vif_links with new_links as 0, since the & operation
between the bitmaps will be 0.

Since a removal of a link should always succeed, ignore the return value
of drv_change_vif_links if it was called to only remove links, which is
the case for the ieee80211_prep_connection's error path.
That way, the bitmaps will not be reverted to have the value from the
failing connection and will have 0, so the next connection will have a
good state.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250609213231.ba2011fb435f.Id87ff6dab5e1cf757b54094ac2d714c656165059@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Merge tag 'iwlwifi-next-2025-07-09' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next

Miri Korenblit says:
====================
iwlwifi features, notably:

- PNVM integrated in the ucode image
- more cleanups in the transport layer
====================

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: iwlwifi: mvm: remove support for iwl_wowlan_info_notif_v4

FWs with this version are no longer supported on any device.

Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.1668a7430521.I488d69251aed62f0b11a2553f972a1730bc8b6cf@changeid

wifi: iwlwifi: bump minimum API version in BZ

Stop supporting older FWs

Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.71404c289481.Iea4f3d36e18029a817ec5d6641d08ac5ee025678@changeid

wifi: iwlwifi: mvm: remove unneeded argument

iwl_mvm_set_key_rx_seq is called only once when the installed argument
is false. Remove this argument.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.2586112afd70.Iddf9a2b24546cb3a1506d68ca41ed215f88cff5c@changeid

wifi: iwlwifi: mvm: remove MLO GTK rekey code

iwlmvm driver does not support MLO. Remove this code

Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.4957e50dee91.I2a432256dbc3069e0300e1f833e10a93d203f538@changeid

wifi: iwlwifi: pcie: rename iwl_pci_gen1_2_probe() argument

Using 'trans' for the mac config is confusing, rename the
argument to 'mac_cfg'.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.72d87406f8d7.I8b39f01e06ad7791efe718c267cbf367233920a3@changeid

wifi: iwlwifi: match discrete/integrated to fix some names

Some device names were wrong because our internal data suggested
that discrete Ga devices have B-step RF, when they actually have
C-step. However, matching the step for them is bad anyway.

Change the code to be able to find the devinfo depending on the
device being integrated or discrete. This is only for the names,
since the RF config cannot be different for the same RF because
it's discrete or integrated, so add a kunit test that ensures
both (a) the RF config is the same and (b) the name is different
(the latter really only because that's the whole point of having
a match on the discrete/integrated bit.)

Remove the RF step matching since it's no longer needed now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.e048a94659f1.Ie5919c70e9d8e3a28152aaf3cdffd19ed3d4f5c7@changeid

wifi: iwlwifi: remove Intel driver load message

There's really not much value in printing something just
because the driver loaded, remove that message.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.fe33c279a45d.I16a9cbcfce92a1d1b8b26a20ea9911e8a5a0b1cc@changeid

wifi: iwlwifi: mvm/mld: make PHC messages debug messages

These have no real value for normal users, print them as
debug messages instead.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.bd2df0705d89.Ic6f042588ef17719653c077ff89a8b9149c22f92@changeid

wifi: iwlwifi: trans: remove iwl_trans_init

We needed it for setting up trans parameters that could change later in
the probe flow.
This is no longer true, now we know all the parameters before we allocate the
trans, so we can just send the right parameters to iwl_trans_alloc and have all
initializations done there.

Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
.../net/wireless/intel/iwlwifi/iwl-trans.c    | 25 ++--------
.../net/wireless/intel/iwlwifi/iwl-trans.h    |  8 +--
.../intel/iwlwifi/pcie/gen1_2/trans.c         | 50 ++++++++++++-------
3 files changed, 41 insertions(+), 42 deletions(-)
Link: https://patch.msgid.link/20250709081300.9602fde079de.Iaede14c91095560852f9b441f1e16546b0a06bdd@changeid

wifi: iwlwifi: pcie move common probe logic

Move the parts of the probe that are not gen specific to the common
probe function.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.91aee0874e79.Ib762365933d4dd4fc0bf07833226cd7118dee0a1@changeid

wifi: iwlwifi: bump FW API to 102 for BZ/SC/DR

Start supporting FW API version 102 on those devices.

Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.da98a7b6be42.I77150bbf55eb160dbe0ef75c3e28afc053f27ec3@changeid

wifi: iwlwifi: pcie: Move txcmd size/align calculation to callers

Refactor iwl_trans_init to accept txcmd_size and txcmd_align as parameters
instead of calculating them internally.

Signed-off-by: Yedidya Benshimol <yedidya.ben.shimol@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.237285d81461.I3552860dd062a523606c8a5c85c9a6f0d4f04262@changeid

wifi: iwlwifi: pcie move gen1_2 probe to gen1_2/trans.c

In the process of splitting the transport's different generations,
move gen1_2's probe flow and relevant helper functions to the gen1_2
subfolder

Signed-off-by: Yedidya Benshimol <yedidya.ben.shimol@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.29b909144e1a.Idaa77eddd6650cf6f113833d2fbc8d3ef08cfd8f@changeid

wifi: iwlwifi: use PNVM data embedded in .ucode files

Given compatibility issues with external PNVM data that doesn't match
the firmware it was designed with/for, future firmware releases will
include the PNVM data in the firmware files directly, avoiding those
mismatch issues. Make the driver load and use that embedded PNVM data
in preference of external files, falling back to the external file if
it isn't present.

Co-developed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.c843f77aa2d3.I7200f8dd40ef82aff1f5574fdd3966913cda592c@changeid

wifi: iwlwifi: Add an helper function for polling bits

Add iwl_poll_bits helper to simplify calls to iwl_poll_bit
for the case when the bits and mask arguments are equal.

Signed-off-by: Rotem Kerem <rotem.kerem@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20250709081300.6bbc4bccc597.Ic7a10a7f8a9a32a9a9feecaf6e3a48fa37479f2d@changeid

wifi: mac80211: fix rx link assignment for non-MLO stations

Currently, ieee80211_rx_data_set_sta() does not correctly handle the
case where the interface supports multiple links (MLO), but the station
does not (non-MLO). This can lead to incorrect link assignment or
unexpected warnings when accessing link information.

Hence, add a fix to check if the station lacks valid link support and
use its default link ID for rx->link assignment. If the station
unexpectedly has valid links, fall back to the default link.

This ensures correct link association and prevents potential issues
in mixed MLO/non-MLO environments.

Signed-off-by: Hari Chandrakanthan <quic_haric@quicinc.com>
Signed-off-by: Sarika Sharma <quic_sarishar@quicinc.com>
Link: https://patch.msgid.link/20250630084119.3583593-1-quic_sarishar@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: cfg80211: move away from using a fake platform device

Downloading regulatory "firmware" needs a device to hang off of, and so
a platform device seemed like the simplest way to do this. Now that we
have a faux device interface, use that instead as this "regulatory
device" is not anything resembling a platform device at all.

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: <linux-wireless@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2025070116-growing-skeptic-494c@gregkh
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Merge tag 'mt76-next-2025-07-07' of https://github.com/nbd168/wireless

Felix Fietkay says:
===================
mt76 patches for 6.17

- firmware recovery improvements for mt7915
- mlo improvements
- fixes
===================

Signed-off-by: Johannes Berg <johannes.berg@intel.com>

wifi: mt76: mt7921s: Introduce SDIO WiFi/BT combo module card reset

Add a hardware reset method to recover from the SDIO bus error that cannot
be resolved by the current WiFi/BT subsystem reset.

Signed-off-by: Leon Yen <leon.yen@mediatek.com>
Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com>
Link: https://patch.msgid.link/20250418093740.3814909-1-mingyen.hsieh@mediatek.com
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: mt792x: improve monitor interface handling

Enable IEEE80211_HW_NO_VIRTUAL_MONITOR to ensure the driver is notified of
all monitor interfaces and their channels.

Signed-off-by: Ming Yen Hsieh <mingyen.hsieh@mediatek.com>
Link: https://patch.msgid.link/20250625075611.1407687-1-mingyen.hsieh@mediatek.com
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: Get rid of dma_sync_single_for_device() for MMIO devices

Since the page_pool for MT76 MMIO devices are created with
PP_FLAG_DMA_SYNC_DEV flag, we do not need to sync_for_device each page
received from the pool since it is already done by the page_pool
codebase.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250625-mt76-sync-for-device-v1-1-e687e3278e1a@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: mt7996: Move num_sta accounting in mt7996_mac_sta_{add,remove}_links

Move phy num_sta accounting in mt7996_mac_sta_add() and
mt7996_mac_sta_remove() routines in order to take into account all
possibles MLO links.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250704-mt7996-mlo-fixes-v1-9-356456c73f43@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: mt7996: Add MLO support to mt7996_tx_check_aggr()

Generalize mt7996_tx_check_aggr() and mt7996_txwi_free() routines to
introduce MLO support for MT7996 driver.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250704-mt7996-mlo-fixes-v1-8-356456c73f43@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: mt7996: Fix valid_links bitmask in mt7996_mac_sta_{add,remove}

sta->valid_links bitmask can be set even for non-MLO client.

Fixes: dd82a9e02c054 ("wifi: mt76: mt7996: Rely on mt7996_sta_link in sta_add/sta_remove callbacks")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250704-mt7996-mlo-fixes-v1-7-356456c73f43@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: mt7996: Fix possible OOB access in mt7996_tx()

Fis possible Out-Of-Boundary access in mt7996_tx routine if link_id is
set to IEEE80211_LINK_UNSPECIFIED

Fixes: 3ce8acb86b661 ("wifi: mt76: mt7996: Update mt7996_tx to MLO support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250704-mt7996-mlo-fixes-v1-6-356456c73f43@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: mt7996: Fix mlink lookup in mt7996_tx_prepare_skb

Use proper link_id in mt7996_tx_prepare_skb routine for mlink lookup.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250704-mt7996-mlo-fixes-v1-5-356456c73f43@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: mt7996: Do not set wcid.sta to 1 in mt7996_mac_sta_event()

msta_link->wcid.sta is already set to 1 in mt7996_mac_sta_init_link
routine.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250704-mt7996-mlo-fixes-v1-4-356456c73f43@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: mt7996: Rely on for_each_sta_active_link() in mt7996_mcu_sta_mld_setup_tlv()

Reuse for_each_sta_active_link utility macro in
mt7996_mcu_sta_mld_setup_tlv routine.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250704-mt7996-mlo-fixes-v1-3-356456c73f43@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: mt7996: Fix secondary link lookup in mt7996_mcu_sta_mld_setup_tlv()

Use proper link_id value for secondary link lookup in
mt7996_mcu_sta_mld_setup_tlv routine.

Fixes: 00cef41d9d8f5 ("wifi: mt76: mt7996: Add mt7996_mcu_sta_mld_setup_tlv() and mt7996_mcu_sta_eht_mld_tlv()")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250704-mt7996-mlo-fixes-v1-2-356456c73f43@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: fix vif link allocation

Reuse the vif deflink for link_id = 0 in order to avoid confusion with
vif->bss_conf, which also gets a link id of 0.

Link: https://patch.msgid.link/20250704-mt7996-mlo-fixes-v1-1-356456c73f43@kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.16-rc5).

No conflicts.

No adjacent changes.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>

wifi: mt76: mt7925: fix off by one in mt7925_mcu_hw_scan()

The ssid->ssids[] and sreq->ssids[] arrays have MT7925_RNR_SCAN_MAX_BSSIDS
elements so this >= needs to be > to prevent an out of bounds access.

Fixes: 8284815ca161 ("wifi: mt76: mt7925: add RNR scan support for 6GHz")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/aDVT2tPhG_8T0Qla@stanley.mountain
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: mt7915: mcu: re-init MCU before loading FW patch

Restart the MCU and release the patch semaphore before loading the MCU
patch firmware from the host.

This fixes failures upon error recovery in case the semaphore was
previously taken and never released by the host.

This happens from time to time upon triggering a full-chip error
recovery. Under this circumstance, the hardware restart fails and the
radio is rendered inoperational.

Signed-off-by: David Bauer <mail@david-bauer.net>
Link: https://patch.msgid.link/20250402004528.1036715-3-mail@david-bauer.net
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: mt7915: mcu: lower default timeout

The default timeout set in mt76_connac2_mcu_fill_message of 20 seconds
leads to excessive stalling in case messages are lost.

Testing showed that a smaller timeout of 5 seconds is sufficient in
normal operation.

Signed-off-by: David Bauer <mail@david-bauer.net>
Link: https://patch.msgid.link/20250402004528.1036715-1-mail@david-bauer.net
Signed-off-by: Felix Fietkau <nbd@nbd.name>

wifi: mt76: mt7915: mcu: increase eeprom command timeout

Increase the timeout for MCU_EXT_CMD_EFUSE_BUFFER_MODE command.

Regular retries upon hardware-recovery have been observed. Increasing
the timeout slightly remedies this problem.

Signed-off-by: David Bauer <mail@david-bauer.net>
Link: https://patch.msgid.link/20250402004528.1036715-2-mail@david-bauer.net
Signed-off-by: Felix Fietkau <nbd@nbd.name>

Merge tag 'net-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth.

  Current release - new code bugs:

    - eth:
       - txgbe: fix the issue of TX failure
       - ngbe: specify IRQ vector when the number of VFs is 7

  Previous releases - regressions:

    - sched: always pass notifications when child class becomes empty

    - ipv4: fix stat increase when udp early demux drops the packet

    - bluetooth: prevent unintended pause by checking if advertising is active

    - virtio: fix error reporting in virtqueue_resize

    - eth:
       - virtio-net:
          - ensure the received length does not exceed allocated size
          - fix the xsk frame's length check
       - lan78xx: fix WARN in __netif_napi_del_locked on disconnect

  Previous releases - always broken:

    - bluetooth: mesh: check instances prior disabling advertising

    - eth:
       - idpf: convert control queue mutex to a spinlock
       - dpaa2: fix xdp_rxq_info leak
       - amd-xgbe: align CL37 AN sequence as per databook"

* tag 'net-6.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
  vsock/vmci: Clear the vmci transport packet properly when initializing it
  dt-bindings: net: sophgo,sg2044-dwmac: Drop status from the example
  net: ngbe: specify IRQ vector when the number of VFs is 7
  net: wangxun: revert the adjustment of the IRQ vector sequence
  net: txgbe: request MISC IRQ in ndo_open
  virtio_net: Enforce minimum TX ring size for reliability
  virtio_net: Cleanup '2+MAX_SKB_FRAGS'
  virtio_ring: Fix error reporting in virtqueue_resize
  virtio-net: xsk: rx: fix the frame's length check
  virtio-net: use the check_mergeable_len helper
  virtio-net: remove redundant truesize check with PAGE_SIZE
  virtio-net: ensure the received length does not exceed allocated size
  net: ipv4: fix stat increase when udp early demux drops the packet
  net: libwx: fix the incorrect display of the queue number
  amd-xgbe: do not double read link status
  net/sched: Always pass notifications when child class becomes empty
  nui: Fix dma_mapping_error() check
  rose: fix dangling neighbour pointers in rose_rt_device_down()
  enic: fix incorrect MTU comparison in enic_change_mtu()
  amd-xgbe: align CL37 AN sequence as per databook
  ...

Merge tag 'xfs-fixes-6.16-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:

- Fix umount hang with unflushable inodes (and add new tracepoint used
   for debugging this)

- Fix ABBA deadlock in xfs_reclaim_inode() vs xfs_ifree_cluster()

- Fix dquot buffer pin deadlock

* tag 'xfs-fixes-6.16-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: add FALLOC_FL_ALLOCATE_RANGE to supported flags mask
  xfs: fix unmount hang with unflushable inodes stuck in the AIL
  xfs: factor out stale buffer item completion
  xfs: rearrange code in xfs_buf_item.c
  xfs: add tracepoints for stale pinned inode state debug
  xfs: avoid dquot buffer pin deadlock
  xfs: catch stale AGF/AGF metadata
  xfs: xfs_ifree_cluster vs xfs_iflush_shutdown_abort deadlock
  xfs: actually use the xfs_growfs_check_rtgeom tracepoint
  xfs: Improve error handling in xfs_mru_cache_create()
  xfs: move xfs_submit_zoned_bio a bit
  xfs: use xfs_readonly_buftarg in xfs_remount_rw
  xfs: remove NULL pointer checks in xfs_mru_cache_insert
  xfs: check for shutdown before going to sleep in xfs_select_zone

ipv6: Cleanup fib6_drop_pcpu_from()

Since commit 0e2338749192 ("ipv6: fix races in ip6_dst_destroy()"),
'table' is unused in __fib6_drop_pcpu_from(), no need pass it from
fib6_drop_pcpu_from().

Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250701041235.1333687-1-yuehaibing@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'another-ip-sysctl-docs-cleanup'

Bagas Sanjaya says:

====================
Another ip-sysctl docs cleanup

Inspired by Abdelrahman's cleanup [1]. This time, mostly formatting
conversion to bullet lists.

[1]: https://lore.kernel.org/linux-doc/20250624150923.40590-1-abdelrahmanfekry375@gmail.com/
====================

Link: https://patch.msgid.link/20250701031300.19088-1-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ip-sysctl: Add link to SCTP IPv4 scoping draft

addr_scope_policy description contains pointer to SCTP IPv4 scoping
draft but not its IETF Datatracker link. Add it.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250701031300.19088-6-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ip-sysctl: Format SCTP-related memory parameters description as bullet list

The description for vector elements of SCTP-related memory usage
parameters (sctp{r,w,}mem) is formatted as normal paragraphs rather than
bullet list. Convert the description to the latter.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250701031300.19088-5-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ip-sysctl: Format pf_{enable,expose} boolean lists as bullet lists

These lists' items were separated by newlines but without bullet list
marker. Turn the lists into proper bullet list.

While at it, also reword values description for pf_expose to not repeat
mentioning SCTP_PEER_ADDR_CHANGE and SCTP_GET_PEER_ADDR_INFO sockopt.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250701031300.19088-4-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ip-sysctl: Format possible value range of ioam6_id{,_wide} as bullet list

Format possible value range bounds of ioam6_id and ioam6_id_wide as
bullet list instead of running paragraph.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250701031300.19088-3-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ip-sysctl: Format Private VLAN proxy arp aliases as bullet list

Alias names list for private VLAN proxy arp technology is formatted as
indented paragraph instead. Make it bullet list as it is better fit for
this purpose.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250701031300.19088-2-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'ptp-provide-support-for-auxiliary-clocks-for-ptp_sys_offset_extended'

Thomas Gleixner says:

====================
ptp: Provide support for auxiliary clocks for PTP_SYS_OFFSET_EXTENDED

This is a follow up to the V1 series, which can be found here:

     https://lore.kernel.org/all/20250626124327.667087805@linutronix.de

to address the merge logistics problem, which I created myself.

Changes vs. V1:

    - Make patch 1, which provides the timestamping function temporarily
      define CLOCK_AUX* if undefined so that it can be merged independently,

    - Add a missing check for CONFIG_POSIX_AUX_CLOCK in the PTP IOCTL

    - Picked up tags

Merge logistics if agreed on:

    1) Patch #1 is applied to the tip tree on top of plain v6.16-rc1 and
       tagged

    2) That tag is merged into tip:timers/ptp and the temporary CLOCK_AUX
       define is removed in a subsequent commit

    3) Network folks merge the tag and apply patches #2 + #3

So the only fallout from this are the extra merges in both trees and the
cleanup commit in the tip tree. But that way there are no dependencies and
no duplicate commits with different SHAs.

Thoughts?

Due to the above constraints there is no branch offered to pull from right
now. Sorry for the inconveniance. Should have thought about that earlier.
====================

Link: https://patch.msgid.link/20250701130923.579834908@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ptp: Enable auxiliary clocks for PTP_SYS_OFFSET_EXTENDED

Allow ioctl(PTP_SYS_OFFSET_EXTENDED*) to select CLOCK_AUX clock ids for
generating the pre and post hardware readout timestamps.

Aside of adding these clocks to the clock ID validation, this also requires
to check the timestamp to be valid, i.e. the seconds value being greater
than or equal zero. This is necessary because AUX clocks can be
asynchronously enabled or disabled, so there is no way to validate the
availability upfront.

The same could have been achieved by handing the return value of
ktime_get_aux_ts64() all the way down to the IOCTL call site, but that'd
require to modify all existing ptp::gettimex64() callbacks and their inner
call chains. The timestamp check achieves the same with less churn and less
complicated code all over the place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250701132628.491315452@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

ptp: Use ktime_get_clock_ts64() for timestamping

The inlined ptp_read_system_[pre|post]ts() switch cases expand to a copious
amount of text in drivers, e.g. ~500 bytes in e1000e. Adding auxiliary
clock support to the inlines would increase it further.

Replace the inline switch case with a call to ktime_get_clock_ts64(), which
reduces the code size in drivers and allows to access auxiliary clocks once
they are enabled in the IOCTL parameter filter.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20250701132628.426168092@linutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge tag 'ktime-get-clock-ts64-for-ptp' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Base implementation for PTP with a temporary CLOCK_AUX* workaround to
allow integration of depending changes into the networking tree.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>

bonding: don't force LACPDU tx to ~333 ms boundaries

The timer which ensures that no more than 3 LACPDUs are transmitted in
a second rearms itself every 333ms regardless of whether an LACPDU is
transmitted when the timer expires. This causes LACPDU tx to be delayed
until the next expiration of the timer, which effectively aligns LACPDUs
to ~333ms boundaries. This results in a variable amount of jitter in the
timing of periodic LACPDUs.

Change this to only rearm the timer when an LACPDU is actually sent,
allowing tx at any point after the timer has expired.

Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Reviewed-by: Carlos Bilbao <carlos.bilbao@kernel.org>
Link: https://patch.msgid.link/20250625-fix-lacpdu-jitter-v1-1-4d0ee627e1ba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

timekeeping: Provide ktime_get_clock_ts64()

PTP implements an inline switch case for taking timestamps from various
POSIX clock IDs, which already consumes quite some text space. Expanding it
for auxiliary clocks really becomes too big for inlining.

Provide a out of line version.

The function invalidates the timestamp in case the clock is invalid. The
invalidation allows to implement a validation check without the need to
propagate a return value through deep existing call chains.

Due to merge logistics this temporarily defines CLOCK_AUX[_LAST] if
undefined, so that the plain branch, which does not contain any of the core
timekeeper changes, can be pulled into the networking tree as prerequisite
for the PTP side changes. These temporary defines are removed after that
branch is merged into the tip::timers/ptp branch. That way the result in
-next or upstream in the next merge window has zero dependencies.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Acked-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/all/20250701132628.357686408@linutronix.de

vsock/vmci: Clear the vmci transport packet properly when initializing it

In vmci_transport_packet_init memset the vmci_transport_packet before
populating the fields to avoid any uninitialised data being left in the
structure.

Cc: Bryan Tan <bryan-bt.tan@broadcom.com>
Cc: Vishnu Dasa <vishnu.dasa@broadcom.com>
Cc: Broadcom internal kernel review list
Cc: Stefano Garzarella <sgarzare@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: virtualization@lists.linux.dev
Cc: netdev@vger.kernel.org
Cc: stable <stable@kernel.org>
Signed-off-by: HarshaVardhana S A <harshavardhana.sa@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250701122254.2397440-1-gregkh@linuxfoundation.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

dt-bindings: net: sophgo,sg2044-dwmac: Drop status from the example

Examples should be complete and should not have a 'status' property,
especially a disabled one because this disables the dt_binding_check of
the example against the schema. Dropping 'status' property shows
missing other properties - phy-mode and phy-handle.

Fixes: 114508a89ddc ("dt-bindings: net: Add support for Sophgo SG2044 dwmac")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@gmail.com>
Reviewed-by: Chen Wang <unicorn_wang@outlook.com>
Link: https://patch.msgid.link/20250701063621.23808-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'fix-irq-vectors'

Jiawen Wu says:

====================
Fix IRQ vectors

The interrupt vector order was adjusted by [1]commit 937d46ecc5f9 ("net:
wangxun: add ethtool_ops for channel number") in Linux-6.8. Because at
that time, the MISC interrupt acts as the parent interrupt in the GPIO
IRQ chip. When the number of Rx/Tx ring changes, the last MISC
interrupt must be reallocated. Then the GPIO interrupt controller would
be corrupted. So the initial plan was to adjust the sequence of the
interrupt vectors, let MISC interrupt to be the first one and do not
free it.

Later, irq_domain was introduced in [2]commit aefd013624a1 ("net: txgbe:
use irq_domain for interrupt controller") to avoid this problem.
However, the vector sequence adjustment was not reverted. So there is
still one problem that has been left unresolved.

Due to hardware limitations of NGBE, queue IRQs can only be requested
on vector 0 to 7. When the number of queues is set to the maximum 8,
the PCI IRQ vectors are allocated from 0 to 8. The vector 0 is used by
MISC interrupt, and althrough the vector 8 is used by queue interrupt,
it is unable to receive packets. This will cause some packets to be
dropped when RSS is enabled and they are assigned to queue 8.

This patch set fix the above problems.

[1] https://git.kernel.org/netdev/net-next/c/937d46ecc5f9
[2] https://git.kernel.org/netdev/net-next/c/aefd013624a1
====================

Link: https://patch.msgid.link/20250701063030.59340-1-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ngbe: specify IRQ vector when the number of VFs is 7

For NGBE devices, the queue number is limited to be 1 when SRIOV is
enabled. In this case, IRQ vector[0] is used for MISC and vector[1] is
used for queue, based on the previous patches. But for the hardware
design, the IRQ vector[1] must be allocated for use by the VF[6] when
the number of VFs is 7. So the IRQ vector[0] should be shared for PF
MISC and QUEUE interrupts.

+-----------+----------------------+
| Vector    | Assigned To          |
+-----------+----------------------+
| Vector 0  | PF MISC and QUEUE    |
| Vector 1  | VF 6                 |
| Vector 2  | VF 5                 |
| Vector 3  | VF 4                 |
| Vector 4  | VF 3                 |
| Vector 5  | VF 2                 |
| Vector 6  | VF 1                 |
| Vector 7  | VF 0                 |
+-----------+----------------------+

Minimize code modifications, only adjust the IRQ vector number for this
case.

Fixes: 877253d2cbf2 ("net: ngbe: add sriov function support")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20250701063030.59340-4-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: wangxun: revert the adjustment of the IRQ vector sequence

Due to hardware limitations of NGBE, queue IRQs can only be requested
on vector 0 to 7. When the number of queues is set to the maximum 8,
the PCI IRQ vectors are allocated from 0 to 8. The vector 0 is used by
MISC interrupt, and althrough the vector 8 is used by queue interrupt,
it is unable to receive packets. This will cause some packets to be
dropped when RSS is enabled and they are assigned to queue 8.

So revert the adjustment of the MISC IRQ location, to make it be the
last one in IRQ vectors.

Fixes: 937d46ecc5f9 ("net: wangxun: add ethtool_ops for channel number")
Cc: stable@vger.kernel.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://patch.msgid.link/20250701063030.59340-3-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: txgbe: request MISC IRQ in ndo_open

Move the creating of irq_domain for MISC IRQ from .probe to .ndo_open,
and free it in .ndo_stop, to maintain consistency with the queue IRQs.
This it for subsequent adjustments to the IRQ vectors.

Fixes: aefd013624a1 ("net: txgbe: use irq_domain for interrupt controller")
Cc: stable@vger.kernel.org
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250701063030.59340-2-jiawenwu@trustnetic.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'virtio-fixes-for-tx-ring-sizing-and-resize-error-reporting'

Laurent Vivier says:

====================
virtio: Fixes for TX ring sizing and resize error reporting

This patch series contains two fixes and a cleanup for the virtio subsystem.

The first patch fixes an error reporting bug in virtio_ring's
virtqueue_resize() function. Previously, errors from internal resize
helpers could be masked if the subsequent re-enabling of the virtqueue
succeeded. This patch restores the correct error propagation, ensuring that
callers of virtqueue_resize() are properly informed of underlying resize
failures.

The second patch does a cleanup of the use of '2+MAX_SKB_FRAGS'

The third patch addresses a reliability issue in virtio_net where the TX
ring size could be configured too small, potentially leading to
persistently stopped queues and degraded performance. It enforces a
minimum TX ring size to ensure there's always enough space for at least one
maximally-fragmented packet plus an additional slot.
====================

Link: https://patch.msgid.link/20250521092236.661410-1-lvivier@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

virtio_net: Enforce minimum TX ring size for reliability

The `tx_may_stop()` logic stops TX queues if free descriptors
(`sq->vq->num_free`) fall below the threshold of (`MAX_SKB_FRAGS` + 2).
If the total ring size (`ring_num`) is not strictly greater than this
value, queues can become persistently stopped or stop after minimal
use, severely degrading performance.

A single sk_buff transmission typically requires descriptors for:
- The virtio_net_hdr (1 descriptor)
- The sk_buff's linear data (head) (1 descriptor)
- Paged fragments (up to MAX_SKB_FRAGS descriptors)

This patch enforces that the TX ring size ('ring_num') must be strictly
greater than (MAX_SKB_FRAGS + 2). This ensures that the ring is
always large enough to hold at least one maximally-fragmented packet
plus at least one additional slot.

Reported-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250521092236.661410-4-lvivier@redhat.com
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

virtio_net: Cleanup '2+MAX_SKB_FRAGS'

Improve consistency by using everywhere it is needed
'MAX_SKB_FRAGS + 2' rather than '2+MAX_SKB_FRAGS' or
'2 + MAX_SKB_FRAGS'.

No functional change.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250521092236.661410-3-lvivier@redhat.com
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

virtio_ring: Fix error reporting in virtqueue_resize

The virtqueue_resize() function was not correctly propagating error codes
from its internal resize helper functions, specifically
virtqueue_resize_packet() and virtqueue_resize_split(). If these helpers
returned an error, but the subsequent call to virtqueue_enable_after_reset()
succeeded, the original error from the resize operation would be masked.
Consequently, virtqueue_resize() could incorrectly report success to its
caller despite an underlying resize failure.

This change restores the original code behavior:

       if (vdev->config->enable_vq_after_reset(_vq))
               return -EBUSY;

       return err;

Fix: commit ad48d53b5b3f ("virtio_ring: separate the logic of reset/enable from virtqueue_resize")
Cc: xuanzhuo@linux.alibaba.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250521092236.661410-2-lvivier@redhat.com
Tested-by: Lei Yang <leiyang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

virtio-net: xsk: rx: fix the frame's length check

When calling buf_to_xdp, the len argument is the frame data's length
without virtio header's length (vi->hdr_len). We check that len with

xsk_pool_get_rx_frame_size() + vi->hdr_len

to ensure the provided len does not larger than the allocated chunk
size. The additional vi->hdr_len is because in virtnet_add_recvbuf_xsk,
we use part of XDP_PACKET_HEADROOM for virtio header and ask the vhost
to start placing data from

hard_start + XDP_PACKET_HEADROOM - vi->hdr_len
not
hard_start + XDP_PACKET_HEADROOM

But the first buffer has virtio_header, so the maximum frame's length in
the first buffer can only be

xsk_pool_get_rx_frame_size()
not
xsk_pool_get_rx_frame_size() + vi->hdr_len

like in the current check.

This commit adds an additional argument to buf_to_xdp differentiate
between the first buffer and other ones to correctly calculate the maximum
frame's length.

Cc: stable@vger.kernel.org
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Fixes: a4e7ba702701 ("virtio_net: xsk: rx: support recv small mode")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20250630151315.86722-2-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'virtio-net-fixes-for-mergeable-xdp-receive-path'

Bui Quang Minh says:

====================
virtio-net: fixes for mergeable XDP receive path

This series contains fixes for XDP receive path in virtio-net
- Patch 1: add a missing check for the received data length with our
allocated buffer size in mergeable mode.
- Patch 2: remove a redundant truesize check with PAGE_SIZE in mergeable
mode
- Patch 3: make the current repeated code use the check_mergeable_len to
check for received data length in mergeable mode
====================

Link: https://patch.msgid.link/20250630144212.48471-1-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

virtio-net: use the check_mergeable_len helper

Replace the current repeated code to check received length in mergeable
mode with the new check_mergeable_len helper.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250630144212.48471-4-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

virtio-net: remove redundant truesize check with PAGE_SIZE

The truesize is guaranteed not to exceed PAGE_SIZE in
get_mergeable_buf_len(). It is saved in mergeable context, which is not
changeable by the host side, so the check in receive path is quite
redundant.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20250630144212.48471-3-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

virtio-net: ensure the received length does not exceed allocated size

In xdp_linearize_page, when reading the following buffers from the ring,
we forget to check the received length with the true allocate size. This
can lead to an out-of-bound read. This commit adds that missing check.

Cc: <stable@vger.kernel.org>
Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set")
Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250630144212.48471-2-minhquangbui99@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

wifi: iwlwifi: Fix error code in iwl_op_mode_dvm_start()

Preserve the error code if iwl_setup_deferred_work() fails. The current
code returns ERR_PTR(0) (which is NULL) on this path. I believe the
missing error code potentially leads to a use after free involving
debugfs.

Fixes: 90a0d9f33996 ("iwlwifi: Add missing check for alloc_ordered_workqueue")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/a7a1cd2c-ce01-461a-9afd-dbe535f8df01@sabinyo.mountain
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>

net: ipv6: Fix spelling mistake

change 'Maximium' to 'Maximum'

Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://patch.msgid.link/20250702055820.112190-1-zhaochenguang@kylinos.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'support-rate-management-on-traffic-classes-in-devlink-and-mlx5'

Mark Bloch says:

====================
Support rate management on traffic classes in devlink and mlx5

This patch series extends the devlink-rate API to support traffic class
(TC) bandwidth management, enabling more granular control over traffic
shaping and rate limiting across multiple TCs. The API now allows users
to specify bandwidth proportions for different traffic classes in a
single command. This is particularly useful for managing Enhanced
Transmission Selection (ETS) for groups of Virtual Functions (VFs),
allowing precise bandwidth allocation across traffic classes.

Additionally the series refines the QoS handling in net/mlx5 to support
TC arbitration and bandwidth management on vports and rate nodes.

Discussions on traffic class shaping in net-shapers began in V5 [1],
where we discussed with maintainers whether net-shapers should support
traffic classes and how this could be implemented.

Later, after further conversations with Paolo Abeni and Simon Horman,
Cosmin provided an update [2], confirming that net-shapers' tree-based
hierarchy aligns well with traffic classes when treated as distinct
subsets of netdev queues. Since mlx5 enforces a 1:1 mapping between TX
queues and traffic classes, this approach seems feasible, though some
open questions remain regarding queue reconfiguration and certain mlx5
scheduling behaviors.

Building on that discussion, Cosmin has now shared a concrete
implementation plan on the netdev mailing list [3]. The plan, developed
in collaboration with Paolo and Simon, outlines how net-shapers can be
extended to support the same use cases currently covered by
devlink-rate, with the eventual goal of aligning both and simplifying
the shaping infrastructure in the kernel.

This work was presented at Netdev 0x19 in Zagreb [4].
There we presented how TC scheduling is enforced in mlx5 hardware,
which led to discussions on the mailing list.

A summary of how things work:

Classification means labeling a packet with a traffic class based on
the packet's DSCP or VLAN PCP field, then treating packets with
different traffic classes differently during transmit processing.

In a virtualized setup, VFs are untrusted and do not control
classification or shaping. Classification is done by the hardware using
a prio-to-TC mapping set by the hypervisor. VFs only select which send
queue to use and are expected to respect the classification logic by
sending each traffic class on its dedicated queue. As stated in the
net-shapers plan [3], each transmit queue should carry only a single
traffic class. Mixing classes in a single queue can lead to HOL
blocking.

In the mlx5 implementation, if the queue used does not match the
classified traffic class, the hardware moves the queue to the correct
TC scheduler. This movement is not a reclassification; it’s a necessary
enforcement step to ensure traffic class isolation is maintained.

Extend devlink-rate API to support rate management on TCs:
- devlink: Extend the devlink rate API to support traffic class
bandwidth management

Introduce a no-op implementation:
- net/mlx5: Add no-op implementation for setting tc-bw on rate objects

Add support for enabling and disabling TC QoS on vports and nodes:
- net/mlx5: Add support for setting tc-bw on nodes
- net/mlx5: Add traffic class scheduling support for vport QoS

Support for setting tc-bw on rate objects:
- net/mlx5: Manage TC arbiter nodes and implement full support for
tc-bw

[1]
https://lore.kernel.org/netdev/20241204220931.254964-1-tariqt@nvidia.com/
[2]
https://lore.kernel.org/netdev/67df1a562614b553dcab043f347a0d7c5393ff83.camel@nvidia.com/
[3]
https://lore.kernel.org/netdev/d9831d0c940a7b77419abe7c7330e822bbfd1cfb.camel@nvidia.com/T/
[4]
https://netdevconf.info/0x19/sessions/talk/optimizing-bandwidth-allocation-with-ets-and-traffic-classes.html
====================

Link: https://patch.msgid.link/20250629142138.361537-1-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: drv-net: Add test for devlink-rate traffic class bandwidth distribution

This test suite validates the functionality of the devlink-rate API for
traffic class (TC) bandwidth allocation. It ensures that bandwidth can
be distributed between different traffic classes as configured, and
verifies that explicit TC-to-queue mapping is required for the
allocation to be effective.

The first test (test_no_tc_mapping_bandwidth) is marked as expected
failure on mlx5, since the hardware automatically enforces traffic
class separation by dynamically moving queues to the correct TC
scheduler, even without explicit TC-to-queue mapping configuration.

Test output on mlx5:
1..2
# Created VF interface: eth5
# Created VLAN eth5.101 on eth5 with tc 3 and IP 198.51.100.2
# Created VLAN eth5.102 on eth5 with tc 4 and IP 198.51.100.10
# Set representor eth4 up and added to bridge
# Bandwidth check results without TC mapping:
# TC 3: 0.19 Gbits/sec
# TC 4: 0.76 Gbits/sec
# Total bandwidth: 0.95 Gbits/sec
# TC 3 percentage: 20.0%
# TC 4 percentage: 80.0%
ok 1 devlink_rate_tc_bw.test_no_tc_mapping_bandwidth # XFAIL Bandwidth matched 80/20 split without TC mapping
# Created VF interface: eth5
# Created VLAN eth5.101 on eth5 with tc 3 and IP 198.51.100.2
# Created VLAN eth5.102 on eth5 with tc 4 and IP 198.51.100.10
# Set representor eth4 up and added to bridge
# Bandwidth check results with TC mapping:
# TC 3: 0.21 Gbits/sec
# TC 4: 0.78 Gbits/sec
# Total bandwidth: 0.98 Gbits/sec
# TC 3 percentage: 21.1%
# TC 4 percentage: 78.9%
# Bandwidth is distributed as 80/20 with TC mapping
ok 2 devlink_rate_tc_bw.test_tc_mapping_bandwidth
# Totals: pass:1 fail:0 xfail:1 xpass:0 skip:0 error:0

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250629142138.361537-9-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw

Introduce support for managing Traffic Class (TC) arbiter nodes and
associated vports TC nodes within the E-Switch QoS hierarchy. This
patch adds support for the new scheduling node type,
`SCHED_NODE_TYPE_VPORTS_TC_TSAR`, and implements full support for
setting tc-bw on both vports and nodes.

Key changes include:

- Introduced the new scheduling node type,
  `SCHED_NODE_TYPE_VPORTS_TC_TSAR`, for managing vports within the TC
  arbiter node.

- New helper functions for creating and destroying vports TC nodes
  under the TC arbiter.

- Updated the minimum rate normalization function to skip nodes of type
  `SCHED_NODE_TYPE_VPORTS_TC_TSAR`. Vports TC TSARs have bandwidth
  shares configured on them but not minimum rates, so their `min_rate`
  cannot be normalized.

- Implementation of `esw_qos_tc_arbiter_scheduling_setup()` and
  `esw_qos_tc_arbiter_scheduling_teardown()` for initializing and
  cleaning up TC arbiter scheduling elements. These functions now fully
  support tc-bw configuration on TC arbiter nodes.

- Introduced a new helper `esw_qos_calculate_tc_bw_divider()` to
  compute the total TC bandwidth share, which is used as a divider for
  normalizing each TC's share.

- Added `esw_qos_tc_arbiter_get_bw_shares()` and
  `esw_qos_set_tc_arbiter_bw_shares()` to handle the settings of
  bandwidth shares for vports traffic class TSARs.

- `esw_qos_set_tc_arbiter_bw_shares()` normalizes  each TC share based
  on the total and the firmware's maximum allowed TSAR bandwidth share.

- Refactored `mlx5_esw_devlink_rate_node_tc_bw_set()` and
  `mlx5_esw_devlink_rate_leaf_tc_bw_set()` to fully support configuring
  tc-bw on devlink rate nodes and vports, respectively.

- Refactored `mlx5_esw_qos_node_update_parent()` to ensure that tc-bw
  configuration remains compatible with setting a parent on a rate
  node, preserving level hierarchy functionality.

- Refactored `esw_qos_calc_bw_share()` to generalize its input so it
  can be used for both minimum rate and bandwidth share calculations.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250629142138.361537-8-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Add traffic class scheduling support for vport QoS

Introduce support for traffic class (TC) scheduling on vports by
allowing the vport to own multiple TC scheduling nodes. This patch
enables more granular control of QoS by defining three distinct QoS
states for vports, each providing unique scheduling behavior:

1. Regular QoS: The `sched_node` represents the vport directly,
   handling QoS as a single scheduling entity.
2. TC QoS on the vport: The `sched_node` acts as a TC arbiter, enabling
   TC scheduling directly on the vport.
3. TC QoS on the parent node: The `sched_node` functions as a rate
   limiter, with TC arbitration enabled at the parent level, associating
   multiple scheduling nodes with each vport.

Key changes include:

- Added support for new scheduling elements, vport traffic class and
  rate limiter.

- New helper functions for creating, destroying, and restoring vport TC
  scheduling nodes, handling transitions between regular QoS and TC
  arbitration states.

- Updated `esw_qos_vport_enable()` and `esw_qos_vport_disable()` to
  support both regular QoS and TC arbitration states, ensuring consistent
  transitions between scheduling modes.

- Introduced a `sched_nodes` array under `vport->qos` to store multiple
  TC scheduling nodes per vport, enabling finer control over per-TC QoS.

- Enhanced `esw_qos_vport_update_parent()` to handle transitions between
  the three QoS states based on the current and new parent node types.

This patch lays the groundwork for future support for configuring tc-bw
on vports. Although the infrastructure is in place, full support for
tc-bw is not yet implemented; attempts to set tc-bw on vports will
return `-EOPNOTSUPP`.

No functional changes are introduced at this stage.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250629142138.361537-7-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Add support for setting tc-bw on nodes

Introduce support for enabling and disabling Traffic Class (TC)
arbitration for existing devlink rate nodes. This patch adds support
for a new scheduling node type, `SCHED_NODE_TYPE_TC_ARBITER_TSAR`.

Key changes include:

- New helper functions for transitioning existing rate nodes to TC
  arbiter nodes and vice versa. These functions handle the allocation
  of TC arbiter nodes, copying of child nodes, and restoring vport QoS
  settings when TC arbitration is disabled.

- Implementation of `mlx5_esw_devlink_rate_node_tc_bw_set()` to manage
  tc-bw configuration on nodes.

- Introduced stubs for `esw_qos_tc_arbiter_scheduling_setup()` and
  `esw_qos_tc_arbiter_scheduling_teardown()`, which will be extended in
  future patches to provide full support for tc-bw on devlink rate
  objects.

- Validation functions for tc-bw settings, allowing graceful handling
  of unsupported traffic class bandwidth configurations.

- Updated `__esw_qos_alloc_node()` to insert the new node into the
  parent’s children list only if the parent is not NULL. For the root
  TSAR, the new node is inserted directly after the allocation call.

- Don't allow `tc-bw` configuration for nodes containing non-leaf
  children.

This patch lays the groundwork for future support for configuring tc-bw
on devlink rate nodes. Although the infrastructure is in place, full
support for tc-bw is not yet implemented; attempts to set tc-bw on
nodes will return `-EOPNOTSUPP`.

No functional changes are introduced at this stage.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250629142138.361537-6-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5: Add no-op implementation for setting tc-bw on rate objects

Introduce `mlx5_esw_devlink_rate_node_tc_bw_set()` and
`mlx5_esw_devlink_rate_leaf_tc_bw_set()` with no-op logic.

Future patches will add support for setting traffic class bandwidth
on rate objects.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250629142138.361537-5-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftest: netdevsim: Add devlink rate tc-bw test

Test verifies that netdevsim correctly implements devlink ops callbacks
that set tc-bw on leaf or node rate object.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250629142138.361537-4-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

devlink: Extend devlink rate API with traffic classes bandwidth management

Introduce support for specifying relative bandwidth shares between
traffic classes (TC) in the devlink-rate API. This new option allows
users to allocate bandwidth across multiple traffic classes in a
single command.

This feature provides a more granular control over traffic management,
especially for scenarios requiring Enhanced Transmission Selection.

Users can now define a relative bandwidth share for each traffic class.
For example, assigning share values of 20 to TC0 (TCP/UDP) and 80 to TC5
(RoCE) will result in TC0 receiving 20% and TC5 receiving 80% of the
total bandwidth. The actual percentage each class receives depends on
the ratio of its share value to the sum of all shares.

Example:
DEV=pci/0000:08:00.0

$ devlink port function rate add $DEV/vfs_group tx_share 10Gbit \
  tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0

$ devlink port function rate set $DEV/vfs_group \
  tc-bw 0:20 1:0 2:0 3:0 4:0 5:20 6:60 7:0

Example usage with ynl:

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \
  --do rate-set --json '{
  "bus-name": "pci",
  "dev-name": "0000:08:00.0",
  "port-index": 1,
  "rate-tc-bws": [
    {"rate-tc-index": 0, "rate-tc-bw": 50},
    {"rate-tc-index": 1, "rate-tc-bw": 50},
    {"rate-tc-index": 2, "rate-tc-bw": 0},
    {"rate-tc-index": 3, "rate-tc-bw": 0},
    {"rate-tc-index": 4, "rate-tc-bw": 0},
    {"rate-tc-index": 5, "rate-tc-bw": 0},
    {"rate-tc-index": 6, "rate-tc-bw": 0},
    {"rate-tc-index": 7, "rate-tc-bw": 0}
  ]
}'

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \
  --do rate-get --json '{
  "bus-name": "pci",
  "dev-name": "0000:08:00.0",
  "port-index": 1
}'

output for rate-get:
{'bus-name': 'pci',
'dev-name': '0000:08:00.0',
'port-index': 1,
'rate-tc-bws': [{'rate-tc-bw': 50, 'rate-tc-index': 0},
                 {'rate-tc-bw': 50, 'rate-tc-index': 1},
                 {'rate-tc-bw': 0, 'rate-tc-index': 2},
                 {'rate-tc-bw': 0, 'rate-tc-index': 3},
                 {'rate-tc-bw': 0, 'rate-tc-index': 4},
                 {'rate-tc-bw': 0, 'rate-tc-index': 5},
                 {'rate-tc-bw': 0, 'rate-tc-index': 6},
                 {'rate-tc-bw': 0, 'rate-tc-index': 7}],
'rate-tx-max': 0,
'rate-tx-priority': 0,
'rate-tx-share': 0,
'rate-tx-weight': 0,
'rate-type': 'leaf'}

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250629142138.361537-3-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: introduce type-checking attribute iteration for nlmsg

Add the nlmsg_for_each_attr_type() macro to simplify iteration over
attributes of a specific type in a Netlink message.

Convert existing users in vxlan and nfsd to use the new macro.

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Link: https://patch.msgid.link/20250629142138.361537-2-mbloch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vhost-net: reduce one userspace copy when building XDP buff

We used to do twice copy_from_iter() to copy virtio-net and packet
separately. This introduce overheads for userspace access hardening as
well as SMAP (for x86 it's stac/clac). So this patch tries to use one
copy_from_iter() to copy them once and move the virtio-net header
afterwards to reduce overheads.

Testpmd + vhost_net shows 10% improvement from 5.45Mpps to 6.0Mpps.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250701010352.74515-2-jasowang@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tun: remove unnecessary tun_xdp_hdr structure

With f95f0f95cfb7("net, xdp: Introduce xdp_init_buff utility routine"),
buffer length could be stored as frame size so there's no need to have
a dedicated tun_xdp_hdr structure. We can simply store virtio net
header instead.

Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/20250701010352.74515-1-jasowang@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'preserve-msg_zerocopy-with-forwarding'

Willem de Bruijn says:

====================
preserve MSG_ZEROCOPY with forwarding

Avoid false positive copying of zerocopy skb frags when entering the
ingress path if the skb is not queued locally but forwarded.

Patch 1 for more details and feature.

Patch 2 converts the existing selftest to a pass/fail test and adds
coverage for this new feature.
====================

Link: https://patch.msgid.link/20250630194312.1571410-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftest: net: extend msg_zerocopy test with forwarding

Zerocopy skbs are converted to regular copy skbs when data is queued
to a local socket. This happens in the existing test with a sender and
receiver communicating over a veth device.

Zerocopy skbs are sent without copying if egressing a device. Verify
that this behavior is maintained even in the common container setup
where data is forwarded over a veth to the physical device.

Update msg_zerocopy.sh to

1. Have a dummy network device to simulate a physical device.
2. Have forwarding enabled between veth and dummy.
3. Add a tx-only test that sends out dummy via the forwarding path.
4. Verify the exitcode of the sender, which signals zerocopy success.

As dummy drops all packets, this cannot be a TCP connection. Test
the new case with unconnected UDP only.

Update msg_zerocopy.c to
- Accept an argument whether send with zerocopy is expected.
- Return an exitcode whether behavior matched that expectation.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250630194312.1571410-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: preserve MSG_ZEROCOPY with forwarding

MSG_ZEROCOPY data must be copied before data is queued to local
sockets, to avoid indefinite timeout until memory release.

This test is performed by skb_orphan_frags_rx, which is called when
looping an egress skb to packet sockets, error queue or ingress path.

To preserve zerocopy for skbs that are looped to ingress but are then
forwarded to an egress device rather than delivered locally, defer
this last check until an skb enters the local IP receive path.

This is analogous to existing behavior of skb_clear_delivery_time.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250630194312.1571410-2-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'vsock-test-check-for-null-ptr-deref-when-transport-changes'

Luigi Leonardi says:

====================
vsock/test: check for null-ptr-deref when transport changes

This series introduces a new test that checks for a null pointer
dereference that may happen when there is a transport change[1]. This
bug was fixed in [2].

Note that this test *cannot* fail, it hangs if it triggers a kernel
oops. The intended use-case is to run it and then check if there is any
oops in the dmesg.

This test is based on Hyunwoo Kim's[3] and Michal's python
reproducers[4].

[1]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
[2]https://lore.kernel.org/netdev/20250110083511.30419-1-sgarzare@redhat.com/
[3]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/#t
[4]https://lore.kernel.org/netdev/2b3062e3-bdaa-4c94-a3c0-2930595b9670@rbox.co/

v4: https://lore.kernel.org/20250624-test_vsock-v4-1-087c9c8e25a2@redhat.com
v3: https://lore.kernel.org/20250611-test_vsock-v3-1-8414a2d4df62@redhat.com
v2: https://lore.kernel.org/20250314-test_vsock-v2-1-3c0a1d878a6d@redhat.com
v1: https://lore.kernel.org/20250306-test_vsock-v1-0-0320b5accf92@redhat.com
====================

Link: https://patch.msgid.link/20250630-test_vsock-v5-0-2492e141e80b@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vsock/test: Add test for null ptr deref when transport changes

Add a new test to ensure that when the transport changes a null pointer
dereference does not occur. The bug was reported upstream [1] and fixed
with commit 2cb7c756f605 ("vsock/virtio: discard packets if the
transport changes").

KASAN: null-ptr-deref in range [0x0000000000000060-0x0000000000000067]
CPU: 2 UID: 0 PID: 463 Comm: kworker/2:3 Not tainted
Workqueue: vsock-loopback vsock_loopback_work
RIP: 0010:vsock_stream_has_data+0x44/0x70
Call Trace:
virtio_transport_do_close+0x68/0x1a0
virtio_transport_recv_pkt+0x1045/0x2ae4
vsock_loopback_work+0x27d/0x3f0
process_one_work+0x846/0x1420
worker_thread+0x5b3/0xf80
kthread+0x35a/0x700
ret_from_fork+0x2d/0x70
ret_from_fork_asm+0x1a/0x30

Note that this test may not fail in a kernel without the fix, but it may
hang on the client side if it triggers a kernel oops.

This works by creating a socket, trying to connect to a server, and then
executing a second connect operation on the same socket but to a
different CID (0). This triggers a transport change. If the connect
operation is interrupted by a signal, this could cause a null-ptr-deref.

Since this bug is non-deterministic, we need to try several times. It
is reasonable to assume that the bug will show up within the timeout
period.

If there is a G2H transport loaded in the system, the bug is not
triggered and this test will always pass. This is because
`vsock_assign_transport`, when using CID 0, like in this case, sets
vsk->transport to `transport_g2h` that is not NULL if a G2H transport is
available.

[1]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/

Suggested-by: Hyunwoo Kim <v4bel@theori.io>
Suggested-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250630-test_vsock-v5-2-2492e141e80b@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

vsock/test: Add macros to identify transports

Add three new macros: TRANSPORTS_G2H, TRANSPORTS_H2G and
TRANSPORTS_LOCAL.
They can be used to identify the type of the transport(s) loaded when
using the `get_transports()` function.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20250630-test_vsock-v5-1-2492e141e80b@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dt-bindings: net: Convert socfpga-dwmac bindings to yaml

Convert the bindings for socfpga-dwmac to yaml. Since the original
text contained descriptions for two separate nodes, two separate
yaml files were created.

Signed-off-by: Mun Yew Tham <mun.yew.tham@altera.com>
Signed-off-by: Matthew Gerlach <matthew.gerlach@altera.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250630213748.71919-1-matthew.gerlach@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-07-01 (idpf, igc)

For idpf:
Michal returns 0 for key size when RSS is not supported.

Ahmed changes control queue to a spinlock due to sleeping calls.

For igc:
Vitaly disables L1.2 PCI-E link substate on I226 devices to resolve
performance issues.

* '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: disable L1.2 PCI-E link substate to avoid performance issue
  idpf: convert control queue mutex to a spinlock
  idpf: return 0 size for RSS key if not supported
====================

Link: https://patch.msgid.link/20250701164317.2983952-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

amd-xgbe: add support for giant packet size

AMD XGBE hardware supports giant Ethernet frames up to 16K bytes.
Add support for configuring and enabling giant packet handling
in the driver.

- Define new register fields and macros for giant packet support.
- Update the jumbo frame configuration logic to enable giant
packet mode when MTU exceeds the jumbo threshold.

Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250701121929.319690-1-Raju.Rangoju@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>