]> git.ipfire.org Git - thirdparty/systemd.git/commitdiff
udev-builtin-net_id: name auxiliary sub-function (SF) host network devices
authorJiri Pirko <jiri@nvidia.com>
Thu, 7 May 2026 14:52:02 +0000 (16:52 +0200)
committerJiri Pirko <jiri@nvidia.com>
Tue, 19 May 2026 12:39:20 +0000 (14:39 +0200)
Some drivers (currently mlx5_core) expose sub-functions (SFs) of a PCI
Physical Function as auxiliary devices. Each SF carries a host network
interface that sits below the aux device in sysfs:

  /sys/devices/.../<PF BDF>/mlx5_core.sf.<idx>/net/eth<N>

Because the network device's immediate parent is the aux device and not
a PCI device, names_pci() bails out and these interfaces fall through
to the kernel-assigned eth<N> name, which is not stable across reboots,
module reloads or topology changes.

The naming applies when the SF network device's direct sysfs parent is
the aux device that exposes sfnum, i.e. the kernel driver passes the
aux device to SET_NETDEV_DEV(). mlx5_core does so. ice's
ice_sf_cfg_netdev() currently passes the parent PF's PCI device, so ice
SF network devices sit as siblings of the PF rather than below the aux
device and fall outside this precondition; pending a kernel change in
ice to mirror mlx5's SET_NETDEV_DEV(netdev, &adev->dev), they continue
to receive the kernel-assigned name as they do today.

The aux device exposes 'sfnum', the user-defined sub-function number
(the value passed to "devlink port add ... sfnum N"), which is stable
and unique within its parent PF. The aux device's direct sysfs parent
is the PF's PCI device.

Treat an SF host network device analogously to an SR-IOV VF host
network device: walk to the parent PCI function, derive the base name
from there, then append a single-character "S<sfnum>" suffix. Lowercase
's' is already taken (slot) and the existing grammar uses one character
per token, so 'S' is the best option.

E.g. for an SF whose parent PF is at PCI 0000:c1:00.0 and which was
added with "sfnum 88":

  ID_NET_NAME_PATH=enp193s0f0S88
  ID_NET_NAME_SLOT=enp193s0f0S88

This is parallel to how SR-IOV VFs get a "v<N>" suffix on top of the
parent PF's name.

Gate the new behaviour behind NAMING_SUBFUNC and NAMING_V261. Document
the new suffix in both the ID_NET_NAME_SLOT and ID_NET_NAME_PATH
grammars in systemd.net-naming-scheme(7) and add a v261 history entry.

Co-developed-by: Claude Opus 4.7 <noreply@anthropic.com>
man/systemd.net-naming-scheme.xml
src/shared/netif-naming-scheme.c
src/shared/netif-naming-scheme.h
src/udev/udev-builtin-net_id.c

index 1a024a0c305db2367531565a4c2d73bf85516a9e..9887aa8701fce487e815e4659149e71b31ecb5da 100644 (file)
           <term><varname>ID_NET_NAME_SLOT=</varname><replaceable>prefix</replaceable>[<constant>P</constant><replaceable>domain</replaceable>]<constant>s</constant><replaceable>slot</replaceable>[<constant>f</constant><replaceable>function</replaceable>][<constant>n</constant><replaceable>port_name</replaceable>|<constant>d</constant><replaceable>dev_port</replaceable>]<constant>u</constant><replaceable>port</replaceable>…[<constant>c</constant><replaceable>config</replaceable>][<constant>i</constant><replaceable>interface</replaceable>]</term>
           <term><varname>ID_NET_NAME_SLOT=</varname><replaceable>prefix</replaceable>[<constant>P</constant><replaceable>domain</replaceable>]<constant>s</constant><replaceable>slot</replaceable>[<constant>f</constant><replaceable>function</replaceable>][<constant>n</constant><replaceable>port_name</replaceable>|<constant>d</constant><replaceable>dev_port</replaceable>]<constant>v</constant><replaceable>slot</replaceable></term>
           <term><varname>ID_NET_NAME_SLOT=</varname><replaceable>prefix</replaceable>[<constant>P</constant><replaceable>domain</replaceable>]<constant>s</constant><replaceable>slot</replaceable>[<constant>f</constant><replaceable>function</replaceable>][<constant>n</constant><replaceable>port_name</replaceable>|<constant>d</constant><replaceable>dev_port</replaceable>]<constant>r</constant><replaceable>slot</replaceable></term>
+          <term><varname>ID_NET_NAME_SLOT=</varname><replaceable>prefix</replaceable>[<constant>P</constant><replaceable>domain</replaceable>]<constant>s</constant><replaceable>slot</replaceable>[<constant>f</constant><replaceable>function</replaceable>][<constant>n</constant><replaceable>port_name</replaceable>|<constant>d</constant><replaceable>dev_port</replaceable>]<constant>S</constant><replaceable>sfnum</replaceable></term>
 
           <listitem><para>This property describes the slot position. Different schemes are used depending on
           the bus type, as described in the table below. In case of USB, BCMA, and SR-VIO devices, the full
                   <entry>… <constant>r</constant><replaceable>slot</replaceable></entry>
                   <entry>SR-IOV slot number</entry>
                 </row>
+
+                <row>
+                  <entry>… <constant>S</constant><replaceable>sfnum</replaceable></entry>
+                  <entry>Auxiliary sub-function (SF) number</entry>
+                </row>
               </tbody>
             </tgroup>
           </table>
           is linked to the particular representor, with any leading zeros removed. The physical port
           name and the bus number are ignored.</para>
 
+          <para>Auxiliary sub-function (SF) network devices, where the network device's parent is an
+          auxiliary device exposing a <constant>sfnum</constant> sysfs attribute (currently mlx5_core SFs),
+          are named based on the underlying PCI function (the PF, or for VF-SF the PF behind the VF),
+          with a suffix of <constant>S</constant> and the user-defined sub-function number from
+          <constant>sfnum</constant>. This is analogous to how SR-IOV virtual function devices are named
+          with a <constant>v</constant> suffix.</para>
+
+          <para>If the SF's parent PCI function is itself an SR-IOV virtual function (VF-SF), the
+          name is rooted at the PF and both suffixes are chained, with the <constant>v</constant>
+          suffix preceding the <constant>S</constant> suffix
+          (e.g. <literal>enp193s0f0v0S88</literal>). The PF, the VF, and the SF therefore form a
+          stable, hierarchical sequence regardless of the VF's underlying PCI bus/device/function
+          numbering.</para>
+
           <para>In some configurations a parent PCI bridge of a given network controller may be associated
           with a slot. In such case we do not generate this device property to avoid possible naming conflicts.</para>
 
           <term><varname>ID_NET_NAME_PATH=</varname><replaceable>prefix</replaceable>[<constant>P</constant><replaceable>domain</replaceable>]<constant>p</constant><replaceable>bus</replaceable><constant>s</constant><replaceable>slot</replaceable>[<constant>f</constant><replaceable>function</replaceable>][<constant>n</constant><replaceable>phys_port_name</replaceable>|<constant>d</constant><replaceable>dev_port</replaceable>]</term>
           <term><varname>ID_NET_NAME_PATH=</varname><replaceable>prefix</replaceable>[<constant>P</constant><replaceable>domain</replaceable>]<constant>p</constant><replaceable>bus</replaceable><constant>s</constant><replaceable>slot</replaceable>[<constant>f</constant><replaceable>function</replaceable>][<constant>n</constant><replaceable>phys_port_name</replaceable>|<constant>d</constant><replaceable>dev_port</replaceable>]<constant>b</constant><replaceable>number</replaceable></term>
           <term><varname>ID_NET_NAME_PATH=</varname><replaceable>prefix</replaceable>[<constant>P</constant><replaceable>domain</replaceable>]<constant>p</constant><replaceable>bus</replaceable><constant>s</constant><replaceable>slot</replaceable>[<constant>f</constant><replaceable>function</replaceable>][<constant>n</constant><replaceable>phys_port_name</replaceable>|<constant>d</constant><replaceable>dev_port</replaceable>]<constant>u</constant><replaceable>port</replaceable>…[<constant>c</constant><replaceable>config</replaceable>][<constant>i</constant><replaceable>interface</replaceable>]</term>
+          <term><varname>ID_NET_NAME_PATH=</varname><replaceable>prefix</replaceable>[<constant>P</constant><replaceable>domain</replaceable>]<constant>p</constant><replaceable>bus</replaceable><constant>s</constant><replaceable>slot</replaceable>[<constant>f</constant><replaceable>function</replaceable>][<constant>n</constant><replaceable>phys_port_name</replaceable>|<constant>d</constant><replaceable>dev_port</replaceable>]<constant>S</constant><replaceable>sfnum</replaceable></term>
 
           <listitem><para>This property describes the device installation location. Different schemes are
           used depending on the bus type, as described in the table below. For BCMA and USB devices, PCI path
                   <entry>USB port number chain</entry>
                 </row>
 
+                <row>
+                  <entry>… <constant>S</constant><replaceable>sfnum</replaceable></entry>
+                  <entry>Auxiliary sub-function (SF) number</entry>
+                </row>
+
               </tbody>
             </tgroup>
           </table>
           <xi:include href="version-info.xml" xpointer="v260"/>
           </listitem>
         </varlistentry>
+
+        <varlistentry>
+          <term><constant>v261</constant></term>
+
+          <listitem><para>Stable names are now generated for auxiliary sub-function (SF) network devices
+          (such as <literal>mlx5_core</literal> SFs). The name is built from the parent PCI Physical Function's
+          path with an <literal>S<replaceable>sfnum</replaceable></literal> suffix, where
+          <replaceable>sfnum</replaceable> is the user-defined SF number (the value passed to
+          <command>devlink port add … sfnum <replaceable>N</replaceable></command>, exposed by the kernel as
+          the <constant>sfnum</constant> sysfs attribute on the auxiliary device). This is analogous to the
+          <literal>v<replaceable>N</replaceable></literal> suffix used for SR-IOV virtual functions; for SFs
+          hosted on SR-IOV VFs (VF-SF), the two suffixes are chained on top of the PF's base name.</para>
+
+          <xi:include href="version-info.xml" xpointer="v261"/>
+          </listitem>
+        </varlistentry>
       </variablelist>
 
     <para>Note that <constant>latest</constant> may be used to denote the latest scheme known (to this
index ddf3bdde9e84cf80b2c85f93aae80823775d6b94..978399ca4139ff6504a4eb646faa0530d35d0818 100644 (file)
@@ -31,6 +31,7 @@ static const NamingScheme naming_schemes[] = {
         { "v258", NAMING_V258 },
         { "v259", NAMING_V259 },
         { "v260", NAMING_V260 },
+        { "v261", NAMING_V261 },
         /* … add more schemes here, as the logic to name devices is updated … */
 
         EXTRA_NET_NAMING_MAP
index 52c462a6bbd2363426459c1d496247556ee539fe..1848cb570bc9a4bc6d7e0bfb7acdaa31b4e29d18 100644 (file)
@@ -44,6 +44,9 @@ typedef enum NamingSchemeFlags {
         NAMING_USE_INTERFACE_PROPERTY    = 1 << 20, /* Use INTERFACE udev property, rather than sysname, when no renaming is requested. */
         NAMING_DEVICETREE_ALIASES_WLAN   = 1 << 21, /* Generate names from devicetree aliases for WLAN devices */
         NAMING_MCTP                      = 1 << 22, /* Use "mc" prefix for MCTP devices */
+        NAMING_SUBFUNC                   = 1 << 23, /* Generate names for auxiliary sub-function (SF) network
+                                                     * devices (e.g. mlx5_core SFs), based on the parent PF's
+                                                     * PCI path and the user-defined sfnum, with an "S" suffix. */
 
         /* And now the masks that combine the features above */
         NAMING_V238 = 0,
@@ -67,6 +70,7 @@ typedef enum NamingSchemeFlags {
         NAMING_V258 = NAMING_V257 | NAMING_USE_INTERFACE_PROPERTY,
         NAMING_V259 = NAMING_V258 | NAMING_DEVICETREE_ALIASES_WLAN,
         NAMING_V260 = NAMING_V259 | NAMING_MCTP,
+        NAMING_V261 = NAMING_V260 | NAMING_SUBFUNC,
 
         EXTRA_NET_NAMING_SCHEMES
 
index 2491484cf607413512462593d3d5f1c4d6ee4a58..d368fb23372308cd199cfd59f74ad273d4f09c25 100644 (file)
@@ -165,6 +165,46 @@ static int get_virtfn_info(sd_device *pcidev, sd_device **ret_physfn_pcidev, cha
         return -ENOENT;
 }
 
+static int get_subfunc_info(sd_device *aux_dev, sd_device **ret_parent_pcidev, char **ret_suffix) {
+        sd_device *parent;
+        unsigned sfnum;
+        char *suffix;
+        int r;
+
+        assert(aux_dev);
+        assert(ret_parent_pcidev);
+        assert(ret_suffix);
+
+        /* The auxiliary device must expose an 'sfnum' attribute. This is currently used by the
+         * mlx5_core driver for sub-function (SF) auxiliary devices. The sfnum is the user-defined
+         * stable identifier passed to "devlink port add ... sfnum N". */
+        r = device_get_sysattr_unsigned_filtered(aux_dev, "sfnum", &sfnum);
+        if (r < 0)
+                return r;
+
+        /* Walk one hop up: the auxiliary device's parent must be a PCI function. It can be either
+         * the PF directly, or an SR-IOV VF — mlx5 also supports SFs hosted on VFs (VF-SF), see
+         * Documentation/networking/representors.rst in the kernel tree. The VF case is handled by
+         * the existing virtfn logic in names_pci(), so here we just return the immediate PCI
+         * parent and a single "S<sfnum>" suffix piece. */
+        r = sd_device_get_parent(aux_dev, &parent);
+        if (r < 0)
+                return r;
+
+        r = device_in_subsystem(parent, "pci");
+        if (r < 0)
+                return r;
+        if (r == 0)
+                return -ENODEV;
+
+        if (asprintf(&suffix, "S%u", sfnum) < 0)
+                return log_oom_debug();
+
+        *ret_parent_pcidev = sd_device_ref(parent);
+        *ret_suffix = suffix;
+        return 0;
+}
+
 static int get_port_specifier(sd_device *dev, char **ret) {
         const char *phys_port_name;
         unsigned dev_port;
@@ -928,24 +968,56 @@ static int names_devicetree(UdevEvent *event, const char *prefix) {
 
 static int names_pci(UdevEvent *event, const char *prefix) {
         sd_device *parent, *dev = ASSERT_PTR(ASSERT_PTR(event)->dev);
-        _cleanup_(sd_device_unrefp) sd_device *physfn_pcidev = NULL;
-        _cleanup_free_ char *virtfn_suffix = NULL;
+        _cleanup_(sd_device_unrefp) sd_device *physfn_pcidev = NULL, *parent_pcidev = NULL;
+        _cleanup_free_ char *virtfn_suffix = NULL, *subfunc_suffix = NULL, *combined_suffix = NULL;
+        const char *suffix = NULL;
 
         assert(prefix);
 
-        /* check if our direct parent is a PCI device with no other bus in-between */
-        if (get_matching_parent(dev, STRV_MAKE("pci"), /* skip_virtio= */ true, &parent) < 0)
+        /* If the network device's direct parent is an auxiliary device that exposes a stable
+         * 'sfnum' attribute (currently mlx5_core sub-functions), peel off the SF identity into
+         * a 'S<sfnum>' suffix piece and pick up the aux device's underlying PCI function as
+         * 'parent'. The aux device is just a bump on the path; everything below — PF/VF
+         * resolution, slot/onboard lookup — proceeds the same way as for any PCI-rooted
+         * network device. */
+        if (naming_scheme_has(NAMING_SUBFUNC)) {
+                sd_device *aux;
+
+                if (get_matching_parent(dev, STRV_MAKE("auxiliary"),
+                                        /* skip_virtio= */ false, &aux) >= 0)
+                        (void) get_subfunc_info(aux, &parent_pcidev, &subfunc_suffix);
+        }
+
+        /* SF: parent is the aux device's PCI function. Otherwise the network device's direct
+         * parent must itself be a PCI device. */
+        if (subfunc_suffix)
+                parent = parent_pcidev;
+        else if (get_matching_parent(dev, STRV_MAKE("pci"), /* skip_virtio= */ true, &parent) < 0)
                 return 0;
 
-        /* If this is an SR-IOV virtual device, get base name using physical device and add virtfn suffix. */
+        /* If parent is an SR-IOV VF, walk to the parent PF and add a 'v<N>' suffix piece. The
+         * onboard BIOS label is intentionally not exposed for any "child function" (VF, SF, or
+         * VF-SF), since the label refers to the parent's physical port, not to a logical child. */
+        bool is_child_function = !!subfunc_suffix;
         if (naming_scheme_has(NAMING_SR_IOV_V) &&
-            get_virtfn_info(parent, &physfn_pcidev, &virtfn_suffix) >= 0)
+            get_virtfn_info(parent, &physfn_pcidev, &virtfn_suffix) >= 0) {
                 parent = physfn_pcidev;
-        else
+                is_child_function = true;
+        }
+        if (!is_child_function)
                 (void) names_pci_onboard_label(event, parent, prefix);
 
-        (void) names_pci_onboard(event, parent, prefix, virtfn_suffix);
-        (void) names_pci_slot(event, parent, prefix, virtfn_suffix);
+        /* Compose the final suffix in PF -> [VF ->] SF order, e.g. "v0", "S88", or "v0S88". */
+        if (virtfn_suffix && subfunc_suffix) {
+                combined_suffix = strjoin(virtfn_suffix, subfunc_suffix);
+                if (!combined_suffix)
+                        return log_oom_debug();
+                suffix = combined_suffix;
+        } else
+                suffix = virtfn_suffix ?: subfunc_suffix;
+
+        (void) names_pci_onboard(event, parent, prefix, suffix);
+        (void) names_pci_slot(event, parent, prefix, suffix);
         return 0;
 }