From: Jakub Kicinski Date: Mon, 15 Jun 2026 18:45:05 +0000 (-0700) Subject: Merge branch 'net-mlx5-add-switchdev-mode-support-for-socket-direct-single-netdev... X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=75983f837d20af89df60b4e8f08e5ca4e0a6cb72;p=thirdparty%2Fkernel%2Fstable.git Merge branch 'net-mlx5-add-switchdev-mode-support-for-socket-direct-single-netdev-part-2-2' Tariq Toukan says: ==================== net/mlx5: Add switchdev mode support for Socket Direct single netdev, part 2/2 This is part 2. Find part 1 here: https://lore.kernel.org/all/20260531113954.395443-1-tariqt@nvidia.com/ This series enables Socket Direct single netdev to operate in switchdev mode with shared FDB. SD single netdev combines multiple PCI functions behind a single netdev interface. To support switchdev offloads, these functions must participate in virtual LAG (shared FDB). Design Rather than introducing a separate LAG instance for SD, this series integrates SD secondary devices into the existing LAG structure (priv.lag) created at probe time. Each lag_func entry carries a group_id field that identifies its SD group membership (0 means not part of any SD group). An xarray mark (XA_MARK_PORT) distinguishes physical port entries from SD secondaries, enabling a single unified iterator that filters by group: - MLX5_LAG_FILTER_PORTS: iterate port-level entries only (existing behavior, used by bonding, FW LAG commands, v2p_map) - MLX5_LAG_FILTER_ALL: iterate all devices including SD secondaries (used by MPESW shared FDB across all devices) - specific group_id: iterate only devices in that SD group (used by per-group SD shared FDB operations) Existing callers use mlx5_ldev_for_each() which maps to MLX5_LAG_FILTER_PORTS, preserving current behavior for non-SD configurations. Lifecycle and ownership The SD LAG lifecycle is tied to the SD group, not to bonding events: 1. At PCI probe, mlx5_lag_add_mdev() creates the LAG structure (priv.lag) for each LAG-capable PF. e.g.: SD primary devices 2. During mlx5_sd_init(), after the SD group is fully formed (primary and secondaries paired), sd_lag_init() registers the secondary devices into the primary's existing priv.lag by calling mlx5_ldev_add_mdev() with the SD group_id. The primary's lag_func also gets its group_id set. No separate LAG instance is created. 3. After all the devices in SD group transition to switchdev, mlx5_lag_shared_fdb_create() is invoked with the group_id to create a software-only shared FDB scoped to that SD group. This sets sd_fdb_active on all lag_func entries in the group. No FW LAG commands are issued since SD devices share the same physical port. 4. If MPESW (multi-port eswitch) is enabled on top of SD groups, the per-group SD shared FDB is torn down first, then MPESW shared FDB is created spanning all devices (ports + SD secondaries) using MLX5_LAG_FILTER_ALL. On MPESW disable, per-group SD shared FDB is restored. 5. On SD teardown (mlx5_sd_cleanup or device unbind), sd_lag_cleanup() removes secondaries from priv.lag and clears the primary's group_id. The LAG structure itself is not destroyed. The sd_fdb_active flag is set on all lag_func entries in a group (not just the primary), so any device can detect the SD shared FDB state during lag_disable_change teardown without needing to look up peer entries. SD shared FDB is a pure software construct -- unlike regular LAG modes (ROCE, SRIOV, MPESW), it does not issue FW create_lag/destroy_lag commands. The software vport LAG for SD is implemented via eswitch egress ACL bounce rules, managed by the IB layer through mlx5_eth_lag_init(). And the software LAG demux is implemented via steering rules that utilize new destination, VHCA_RX. Patches E-Switch preparation (patch 1): - Skip uplink IB rep load for SD secondary devices Devcom support (patches 2-3): - Expose locked variant of send_event - Add DEVCOM_CANT_FAIL for non-rollback events SD core hardening (patches 4-6): - Make primary/secondary role determination more robust - Add L2 table silent mode query support - Expand vport metadata for SD secondary devices SD switchdev transition (patches 7-8): - Support switchdev mode transition with shared FDB - Notify SD on eswitch disable LAG integration (patches 9-12): - Store demux resources per master lag_func - Disable both regular and SD LAG on lag_disable_change - Introduce software vport LAG implementation - Add MPESW over SD LAG support Deferred init (patches 13-14): - Tie rep load/unload to SD LAG state - Defer vport metadata init until SD is ready Enablement (patch 15): - Enable SD over ECPF and allow switchdev transition v2: https://lore.kernel.org/20260608135547.482825-1-tariqt@nvidia.com v1: https://lore.kernel.org/20260604114455.434711-1-tariqt@nvidia.com ==================== Link: https://patch.msgid.link/20260612113904.537595-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski --- 75983f837d20af89df60b4e8f08e5ca4e0a6cb72