Allen Hubbe [Mon, 11 Sep 2023 18:08:15 +0000 (11:08 -0700)]
vdpa: consume device_features parameter
Consume the parameter to device_features when parsing command line
options. Otherwise the parameter may be used again as an option name.
# vdpa dev add ... device_features 0xdeadbeef mac 00:11:22:33:44:55
Unknown option "0xdeadbeef"
Fixes: a4442ce58ebb ("vdpa: allow provisioning device features") Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Mon, 11 Sep 2023 15:19:48 +0000 (09:19 -0600)]
Merge branch 'devlink-dump-selector' into next
Jiri Pirko says:
====================
From: Jiri Pirko <jiri@nvidia.com>
First 5 patches are preparations for the last one.
Motivation:
For SFs, one devlink instance per SF is created. There might be
thousands of these on a single host. When a user needs to know port
handle for specific SF, he needs to dump all devlink ports on the host
which does not scale good.
Solution:
Allow user to pass devlink handle (and possibly other attributes)
alongside the dump command and dump only objects which are matching
the selection.
Example:
$ devlink port show
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false
$ devlink port show auxiliary/mlx5_core.eth.0
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
$ devlink port show auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false
devlink: implement dump selector for devlink objects show commands
Introduce a new helper dl_argv_parse_with_selector() to be used
by show() functions instead of dl_argv().
Implement it to check if all needed options got get commands are
specified. In case they are not, ask kernel for dump passing only
the options (attributes) that are present, creating sort of partial
key to instruct kernel to do partial dump.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
mnl_utils: introduce a helper to check if dump policy exists for command
Benefit from GET_POLICY command of ctrl netlink and introduce a helper
that dumps policies and finds out, if there is a separate policy
specified for dump op of specified command.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
In preparation to the follow-up dump selector patch, make sure that the
command line arguments parsing function returns -ENOENT in case the
option is missing so the caller can distinguish.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
In preparation to the follow-up dump selector patch, introduce function
dl_argv_dry_parse() which allows to do dry parsing of command line
arguments without printing out any error messages to the user.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
devlink: make parsing of handle non-destructive to argv
Currently, handle parsing is destructive as the "\0" string ends are
being put in certain positions during parsing. That prevents it from
being used repeatedly. This is problematic with the follow-up patch
implementing dry-parsing. Fix by making a copy of handle argv during
parsing.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
This is basically a cosmetic change. The SB index is not required to be
passed by user and implicitly index 0 is used. This is ensured by
special treating at the end of dl_argv_parse(). Move this option from
optional to required options.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
François Michel [Thu, 31 Aug 2023 14:01:32 +0000 (16:01 +0200)]
tc: fix several typos in netem's usage string
Add missing brackets and surround brackets by single spaces
in the netem usage string.
Also state the P14 argument as optional.
Signed-off-by: François Michel <francois.michel@uclouvain.be> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Hangbin Liu [Fri, 1 Sep 2023 08:02:26 +0000 (16:02 +0800)]
iplink_bridge: fix incorrect root id dump
Fix the typo when dump root_id.
Fixes: 70dfb0b8836d ("iplink: bridge: export bridge_id and designated_root") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
François Michel [Wed, 30 Aug 2023 15:05:21 +0000 (17:05 +0200)]
tc: fix typo in netem's usage string
Fixes a misplaced newline in netem's usage string.
Signed-off-by: François Michel <francois.michel@uclouvain.be> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
David Ahern [Fri, 25 Aug 2023 00:38:58 +0000 (17:38 -0700)]
Merge branch 'vrf-exec-selinux' into next
Andrea Claudi says:
====================
In order to execute a service with VRF, a user should start it using
"ip vrf exec". For example, using systemd, the user can encapsulate the
ExecStart command in ip vrf exec as shown below:
This is incorrect, as the context for httpd should be httpd_t, not
ifconfig_t.
This happens because ipvrf_exec invokes cmd_exec without setting the
correct SELinux context before. Without the correct setting, the process
is executed using ip's SELinux context.
This patch series makes "ip vrf exec" SELinux-aware using the
setexecfilecon functions, which retrieves the correct context to be used
on the next execvp() call.
Matthieu Baerts [Wed, 23 Aug 2023 07:24:07 +0000 (09:24 +0200)]
ss: mptcp: display seq related counters as decimal
This is aligned with what is printed for TCP sockets.
The main difference here is that these counters can be larger (u32 vs
u64) but WireShark and TCPDump are also printing these MPTCP counters as
decimal and they look fine.
So it sounds better to do the same here with ss for those who want to
easily count how many bytes have been exchanged between two runs without
having to think in hexa.
Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Matthieu Baerts [Wed, 23 Aug 2023 07:24:06 +0000 (09:24 +0200)]
ss: mptcp: display info counters as unsigned
Some counters from mptcp_info structure were stored as an unsigned
number (u8) but displayed as a signed one.
Even if it is unlikely these u8 counters -- number of subflows and
ADD_ADDR -- have a value bigger than 2^7, it still sounds better to
display them as unsigned.
Fixes: 9c3be2c0 ("ss: mptcp: add msk diag interface support") Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Maximilian Bosch [Tue, 22 Aug 2023 12:33:07 +0000 (14:33 +0200)]
ip-vrf: recommend using CAP_BPF rather than CAP_SYS_ADMIN
The CAP_SYS_ADMIN capability allows far too much, to quote
`capabilities(7)`:
Note: this capability is overloaded; see Notes to kernel developers, below.
In the case of `ip-vrf(8)` this is needed to load a BPF program.
According to the same section of the same man-page, using `CAP_BPF` is
preferred if that's the reason for `CAP_SYS_ADMIN`;
perform the same BPF operations as are governed by CAP_BPF (but the latter, weaker capability is preferred for accessing
that functionality).
Local testing revealed that `ip vrf exec` for an unprivileged user is
sufficient if the `CAP_BPF` capability is given rather than
`CAP_SYS_ADMIN`.
In a previous version of the patch[1] it was mentioned that
CAP_SYS_ADMIN was still required for Linux <5.8, however it was
suggested to not make man-pages dependent on the kernel version. Also,
it was suggested to improve the wording and the formatting of the entire
paragraph mentioning capabilities which was also done.
Signed-off-by: Maximilian Bosch <maximilian@mbosch.me>
[1] https://lore.kernel.org/netdev/e6t4ucjdrcitzneh2imygsaxyb2aasxfn2q2a4zh5yqdx3vold@kutwh5kwixva/T/#m628a1900a7e5012bb87e6cb3c94af6c7281cf2bf
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Phil Sutter [Tue, 22 Aug 2023 12:19:16 +0000 (14:19 +0200)]
ss: Fix socket type check in packet_show_line()
The field is accessed before being assigned a meaningful value,
effectively disabling the checks.
Fixes: 4a0053b606a34 ("ss: Unify packet stats output from netlink and proc") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Pedro Tammela [Sat, 19 Aug 2023 20:54:48 +0000 (17:54 -0300)]
utils: fix get_integer() logic
After 3a463c15, get_integer() doesn't return the converted value and
always writes 0 in 'val' in case of success.
Fix the logic so it writes the converted value in 'val'.
Fixes: 3a463c15 ("Add get_long utility and adapt get_integer accordingly" Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David Ahern <dsahern@kernel.org>
ss: change aafilter port from int to long (inode support)
The aafilter struct considers the port as (usually) 32 bit signed
integer. In case of a unix socket, the port is used with an inode
number which is an unsigned int. In this case, the 'ss' command
fails because it assumes that the value does not look like a port
(<0).
Here an example of command call where the inode is passed and
is larger than a signed integer:
Jiri Pirko [Thu, 10 Aug 2023 14:01:02 +0000 (16:01 +0200)]
devlink: accept "name" command line option instead of "trap"/"group"
It is common for all iproute2 apps to have command line option
names matching with show command outputs. However, that is not true
in case of trap and trap group devlink objects.
Correct would be to have "trap" and "group" in the outputs, but that is
not possible to change now. Instead of that, accept "name" instead of
"trap" and "group" options.
Examples:
$ devlink trap show netdevsim/netdevsim1
netdevsim/netdevsim1:
name source_mac_is_multicast type drop generic true action drop group l2_drops
name vlan_tag_mismatch type drop generic true action drop group l2_drops
name ingress_vlan_filter type drop generic true action drop group l2_drops
name ingress_spanning_tree_filter type drop generic true action drop group l2_drops
name port_list_is_empty type drop generic true action drop group l2_drops
name port_loopback_filter type drop generic true action drop group l2_drops
name fid_miss type exception generic false action trap group l2_drops
name blackhole_route type drop generic true action drop group l3_drops
name ttl_value_is_too_small type exception generic true action trap group l3_exceptions
name tail_drop type drop generic true action drop group buffer_drops
name ingress_flow_action_drop type drop generic true action drop group acl_drops
name egress_flow_action_drop type drop generic true action drop group acl_drops
name igmp_query type control generic true action mirror group mc_snooping
name igmp_v1_report type control generic true action trap group mc_snooping
$ devlink trap show netdevsim/netdevsim1 trap source_mac_is_multicast
netdevsim/netdevsim1:
name source_mac_is_multicast type drop generic true action drop group l2_drops
$ devlink trap show netdevsim/netdevsim1 name source_mac_is_multicast
netdevsim/netdevsim1:
name source_mac_is_multicast type drop generic true action drop group l2_drops
$ devlink trap group
netdevsim/netdevsim1:
name l2_drops generic true
name l3_drops generic true policer 1
name l3_exceptions generic true policer 1
name buffer_drops generic true policer 2
name acl_drops generic true policer 3
name mc_snooping generic true policer 3
$ devlink trap group show netdevsim/netdevsim1 group l2_drops
netdevsim/netdevsim1:
name l2_drops generic true
$ devlink trap group show netdevsim/netdevsim1 name l2_drops
name l2_drops generic true
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Vladimir Oltean [Mon, 7 Aug 2023 22:09:36 +0000 (01:09 +0300)]
tc/taprio: fix JSON output when TCA_TAPRIO_ATTR_ADMIN_SCHED is present
When the kernel reports that a configuration change is pending
(and that the schedule is still in the administrative state and
not yet operational), we (tc -j -p qdisc show) produce the following
output:
which is invalid json, because the second group of "base_time",
"cycle_time", etc etc is placed in an unlabeled sub-object. If we pipe
it into jq, it complains:
parse error: Objects must consist of key:value pairs at line 53, column 14
Since it represents the administrative schedule, give this unnamed JSON
object the "admin" name. We now print valid JSON which looks like this:
Fixes: 602fae856d80 ("taprio: Add support for changing schedules") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Vladimir Oltean [Mon, 7 Aug 2023 22:09:35 +0000 (01:09 +0300)]
tc/taprio: don't print netlink attributes which weren't reported by the kernel
When an admin schedule is pending and hasn't yet become operational, the
kernel will report only the parameters of the admin schedule in a nested
TCA_TAPRIO_ATTR_ADMIN_SCHED attribute.
However, we default to printing zeroes even for the parameters of the
operational base time, when that doesn't exist.
Fixes: 0dd16449356f ("tc: Add support for configuring the taprio scheduler") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Nicolas Escande [Fri, 4 Aug 2023 16:49:52 +0000 (18:49 +0200)]
man: bridge: update bridge link show
Add missing man page documentation for bridge link show features added in
commit 13a5d8fcb41b ("bridge: link: allow filtering on bridge name") and
commit 64108901b737 ("bridge: Add support for setting bridge port attributes")
Signed-off-by: Nicolas Escande <nico.escande@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Ido Schimmel [Wed, 2 Aug 2023 16:41:15 +0000 (19:41 +0300)]
bridge: Add backup nexthop ID support
Extend the bridge and ip utilities to set and show the backup nexthop ID
bridge port attribute. A value of 0 (default) disables the feature, in
which case the attribute is not printed since it is not emitted by the
kernel.
Example:
# bridge -d link show dev swp1 | grep -o "backup_nhid [0-9]*"
# bridge -d -j -p link show dev swp1 | jq '.[]["backup_nhid"]'
null
# bridge link set dev swp1 backup_nhid 10
# bridge -d link show dev swp1 | grep -o "backup_nhid [0-9]*"
backup_nhid 10
# bridge -d -j -p link show dev swp1 | jq '.[]["backup_nhid"]'
10
# bridge link set dev swp1 backup_nhid 0
# bridge -d link show dev swp1 | grep -o "backup_nhid [0-9]*"
# bridge -d -j -p link show dev swp1 | jq '.[]["backup_nhid"]'
null
# ip -d link show dev swp1 | grep -o "backup_nhid [0-9]*"
# ip -d -j -p lin show dev swp1 | jq '.[]["linkinfo"]["info_slave_data"]["backup_nhid"]'
null
# ip link set dev swp1 type bridge_slave backup_nhid 10
# ip -d link show dev swp1 | grep -o "backup_nhid [0-9]*"
backup_nhid 10
# ip -d -j -p lin show dev swp1 | jq '.[]["linkinfo"]["info_slave_data"]["backup_nhid"]'
10
# ip link set dev swp1 type bridge_slave backup_nhid 0
# ip -d link show dev swp1 | grep -o "backup_nhid [0-9]*"
# ip -d -j -p lin show dev swp1 | jq '.[]["linkinfo"]["info_slave_data"]["backup_nhid"]'
null
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Paolo Lungaroni [Mon, 31 Jul 2023 18:36:16 +0000 (20:36 +0200)]
seg6: man: ip-link.8: add description of NEXT-C-SID flavor for SRv6 End.X behavior
This patch extends the manpage by providing the description of NEXT-C-SID
support for the SRv6 End.X behavior as defined in RFC 8986 [1].
The code/logic required to handle the "flavors" framework has already been
merged into iproute2 by commit: 04a6b456bf74 ("seg6: add support for flavors in SRv6 End* behaviors").
Some examples:
ip -6 route add 2001:db8::1 encap seg6local action End.X nh6 fc00::1 flavors next-csid dev eth0
Standard Output:
ip -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End.X nh6 fc00::1 flavors next-csid lblen 32 nflen 16 dev eth0 metric 1024 pref medium
Nicolas Escande [Wed, 26 Jul 2023 07:25:07 +0000 (09:25 +0200)]
bridge: link: allow filtering on bridge name
When using 'brige link show' we can either dump all links enslaved to any bridge
(called without arg ) or display a single link (called with dev arg).
However there is no way to dummp all links of a single bridge.
To do so, this adds new optional 'master XXX' arg to 'bridge link show' command.
usage: bridge link show master br0
Signed-off-by: Nicolas Escande <nico.escande@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add support for the so called "stateless" configuration pattern (read
from /etc, fall back to /usr), giving system administrators a way to
define local configuration without changing any distro-provided files.
In practice this means that each configuration file FOO is loaded
from /usr/lib/iproute2/FOO unless /etc/iproute2/FOO exists.
Signed-off-by: Gioele Barabucci <gioele@svario.it> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
While building iproute2 6.4.0 with musl using Yocto Project, errors such
as the following were encountered:
| mdb.c: In function 'mdb_parse_vni':
| mdb.c:666:47: error: 'ULONG_MAX' undeclared (first use in this function)
| 666 | if ((endptr && *endptr) || vni_num == ULONG_MAX)
| | ^~~~~~~~~
| mdb.c:666:47: note: 'ULONG_MAX' is defined in header '<limits.h>'; did you forget to '#include <limits.h>'?
Include limits.h in bridge/mdb.c to fix this issue. This change is based
on one in Alpine Linux, but the author there had no plans to submit:
https://git.alpinelinux.org/aports/commit/main/iproute2/include.patch?id=bd46efb8a8da54948639cebcfa5b37bd608f1069
The files bpf_api.h and bpf_elf.h are useful for TC BPF programs
to use. And there is no requirement that those be GPL only;
we intend to allow BSD licensed BPF helpers as well.
This makes the file license same as libbpf.
Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
It is not currently possible to add a filter matching on port 0 despite
it being a valid port number. This is caused by cited commit which
treats a value of 0 as an indication that the port was not specified.
Instead of inferring that a port range was specified by checking that both
the minimum and the maximum ports are non-zero, simply add a boolean
argument to parse_range() and set it after parsing a port range.
Before:
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp src_port 0 action pass
Illegal "src_port"
# tc filter add dev swp1 ingress pref 2 proto ip flower ip_proto udp dst_port 0 action pass
Illegal "dst_port"
# tc filter add dev swp1 ingress pref 3 proto ip flower ip_proto udp src_port 0-100 action pass
Illegal "src_port"
# tc filter add dev swp1 ingress pref 4 proto ip flower ip_proto udp dst_port 0-100 action pass
Illegal "dst_port"
After:
# tc filter add dev swp1 ingress pref 1 proto ip flower ip_proto udp src_port 0 action pass
# tc filter add dev swp1 ingress pref 2 proto ip flower ip_proto udp dst_port 0 action pass
# tc filter add dev swp1 ingress pref 3 proto ip flower ip_proto udp src_port 0-100 action pass
# tc filter add dev swp1 ingress pref 4 proto ip flower ip_proto udp dst_port 0-100 action pass
# tc filter show dev swp1 ingress | grep _port
src_port 0
dst_port 0
src_port 0-100
dst_port 0-100
Fixes: 767b6fd620dd ("tc: flower: fix port value truncation") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Vladimir Oltean [Wed, 5 Jul 2023 10:51:55 +0000 (13:51 +0300)]
tc/taprio: fix parsing of "fp" option when it doesn't appear last
When installing a Qdisc this way:
tc qdisc replace dev $ifname handle 8001: parent root stab overhead 24 taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 \
sched-entry S 01 1216 \
sched-entry S fe 12368 \
fp P E E E E E E E \
flags 0x2
the parser will error out when it tries to parse the "fp" array and it
finds "flags" as one of the elements, expecting it to be one of "P" or
"E".
The way this is handled in the parsing of other array arguments of
variable size (max-sdu, map, queues etc) is to not fail, call PREV_ARG()
and attempt re-parsing the argument as something else. Do that for "fp"
as well.
Apparently mqprio handles this case correctly, so I must have forgotten
to apply the same treatment for taprio as well, during development.
Fixes: 5fbca3b469ec ("tc/taprio: add support for preemptible traffic classes") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Zahari Doychev [Thu, 29 Jun 2023 19:57:36 +0000 (21:57 +0200)]
f_flower: simplify cfm dump function
The standard print function can be used to print the cfm attributes in
both standard and json use cases. In this way no string buffer is needed
which simplifies the code.
Signed-off-by: Zahari Doychev <zdoychev@maxlinear.com> Reviewed-by: Petr Machata <me@pmachata.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Edwin Peer [Sun, 11 Jun 2023 10:57:38 +0000 (13:57 +0300)]
iplink: filter stats using RTEXT_FILTER_SKIP_STATS
Don't request statistics we do not intend to render. This avoids the
possibility of a truncated IFLA_VFINFO_LIST when statistics are not
requested as well as the fetching of unnecessary data.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> Cc: Edwin Peer <espeer@gmail.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Ido Schimmel [Wed, 7 Jun 2023 15:35:50 +0000 (18:35 +0300)]
f_flower: Add l2_miss support
Add the ability to match on packets that encountered a layer 2 miss in
bridge driver's FDB / MDB. Example:
# tc filter add dev swp2 egress pref 1 proto all flower indev swp1 l2_miss 1 action drop
# tc filter add dev swp2 egress pref 1 proto all flower indev swp1 l2_miss 0 action drop
# tc filter show dev swp2 egress
filter protocol all pref 1 flower chain 0
filter protocol all pref 1 flower chain 0 handle 0x1
indev swp1
l2_miss 1
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1
filter protocol all pref 1 flower chain 0 handle 0x2
indev swp1
l2_miss 0
not_in_hw
action order 1: gact action drop
random type none pass val 0
index 2 ref 1 bind 1
This series introduces a new DCB subcommand: rewr, which is used to
configure the in-kernel DCB rewrite table [1].
Rewrite support is added as a separate DCB subcommand, rather than an
APP opt-in flag or similar. This goes in line with what we did to dcbnl,
where rewrite is a separate object. Obviously this requires a bit more
code to implement the new command, but much of the existing dcb-app code
(especially the bookkeeping code) can be reused. In some cases a little
adaptation is needed.
Initially, I have only made support for the prio-pcp and prio-dscp
parameters, as DSCP and PCP are the only selectors that currently have
a user [2] and to be honest, I am not even sure it makes sense to add
dgram, stream, ethtype rewrite support - At least the rewriter of Sparx5
does not support this. Any input here is much appreciated!
Examples:
Rewrite DSCP to 63 for packets with priority 1
$ dcb rewr add dev eth0 prio-dscp 1:63
Rewrite PCP 7 and DEI to 1 for packets with priority 1
$ dcb rewr add dev eth0 prio-pcp 1:7de
A new manpage has been added, to cover the new dcb-rewr subcommand, and
its parameters. Also I took the liberty to clean up a few things in the
dcb-app manpage.
Daniel Machon [Tue, 6 Jun 2023 07:19:45 +0000 (09:19 +0200)]
man: dcb-rewr: add new manpage for dcb-rewr
Add a new manpage for dcb-rewr. Most of the content is copied over from
dcb-app, as the same set of commands and parameters (in reverse) applies
to dcb-rewr.
Reviewed-by: Petr Machata <me@pmachata.org> Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Daniel Machon [Tue, 6 Jun 2023 07:19:43 +0000 (09:19 +0200)]
dcb: rewr: add new dcb-rewr subcommand
Add a new subcommand 'rewr' for configuring the in-kernel DCB rewrite
table. The rewrite table of the kernel is similar to the APP table,
therefore, much of the existing bookkeeping code from dcb-app, can be
reused in the dcb-rewr implementation.
Initially, only support for configuring PCP and DSCP-based rewrite has
been added.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Daniel Machon [Tue, 6 Jun 2023 07:19:40 +0000 (09:19 +0200)]
dcb: app: modify dcb_app_print_filtered() for dcb-rewr reuse
Where dcb-app requires protocol to be the printed key, dcb-rewr requires
it to be the priority. Adapt existing dcb-app print functions for this.
dcb_app_print_filtered() has been modified, to take two callbacks; one
for printing the entire string (pid and prio), and one for the pid type
(dec, hex, dscp, pcp). This saves us for making one dedicated function
for each pid type for both app and rewr.
Also, printing the colon is now expected to be handled by the
print_pid_prio() callback.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Daniel Machon [Tue, 6 Jun 2023 07:19:39 +0000 (09:19 +0200)]
dcb: app: rename dcb_app_print_key_*() functions
In preparation for changing the prototype of dcb_app_print_filtered(),
rename the _print_key_*() functions to _print_pid_*(), as the protocol
can both be key and value with the introduction of dcb-rewr.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Daniel Machon [Tue, 6 Jun 2023 07:19:38 +0000 (09:19 +0200)]
dcb: app: move colon printing out of callbacks
In preparation for changing the prototype of dcb_app_print_filtered(),
move the colon printing out of the callbacks, and into
dcb_app_print_filtered().
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Daniel Machon [Tue, 6 Jun 2023 07:19:37 +0000 (09:19 +0200)]
dcb: app: replace occurrences of %d with %u for printing unsigned int
In preparation for changing the prototype of dcb_app_print_filtered(),
replace occurrences of %d for printing unsigned integer, with %u as it
ought to be.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Add userspace support for the [no]localbypass vxlan netlink
attribute. With localbypass on (default), the vxlan driver processes
the packets destined to the local machine by itself, bypassing the
userspace nework stack. With nolocalbypass the packets are always
forwarded to the userspace network stack, so userspace programs,
such as tcpdump have a chance to process them.
Signed-off-by: Vladimir Nikishkin <vladimir@nikishkin.pw> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Andrea Claudi <aclaudi@redhat.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David Ahern <dsahern@kernel.org>