man: use clsact qdisc for port mirroring examples on matchall and mirred
The clsact qdisc supports ingress and egress. Instead of using two qdiscs
to do ingress and egress port mirroring, clsact can be used. Therefore, use
clsact for the port mirroring examples on the tc-matchall.8 and tc-mirred.8
documents.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The version field in mnlu was being passed in but never set.
This meant that all places mnlu_gen_socket was used, the version would
be uninitialized data from malloc().
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Three new "last time" counters have been added to "struct mptcp_info":
last_data_sent, last_data_recv and last_ack_recv. They have been added
in commit 18d82cde7432 ("mptcp: add last time fields in mptcp_info") in
net-next recently.
This patch prints out these new counters into mptcp_stats output in ss.
Kernel has add IFLA_EXT_MASK attribute for indicating that certain
extended ifinfo values are requested by the user application. The ip
link show cmd always request VFs extended ifinfo.
In this case, RTM_GETLINK for greater than about 220 VFs truncates
IFLA_VFINFO_LIST due to the maximum reach of nlattr's nla_len being
exceeded. As a result, ip link show command only show the truncated
VFs info sucn as:
#ip link show dev eth0
1: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 ...
link/ether ...
vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff ...
Truncated VF list: eth0
This patch add novf to support filter links with no VF info:
ip link show novf
v2:
- use an one word option instead of an option with on/off.
- fix the issue that break changes made for the link filter
already done for VF's.
v3:
- "novf" set vfinfo to 0 and the RTEXT_FILTER_VF flag is not added.
Signed-off-by: Mingshuai Ren <renmingshuai@huawei.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Max Gautier [Mon, 18 Mar 2024 15:49:13 +0000 (16:49 +0100)]
arpd: create /var/lib/arpd on first use
The motivation is to build distributions packages without /var to go
towards stateless systems, see link below (TL;DR: provisionning anything
outside of /usr on boot).
We only try do create the database directory when it's in the default
location, and assume its parent (/var/lib in the usual case) exists.
Links: https://0pointer.net/blog/projects/stateless.html Signed-off-by: Max Gautier <mg@max.gautier.name> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Date Huang [Fri, 22 Mar 2024 12:39:22 +0000 (20:39 +0800)]
bridge: vlan: fix compressvlans usage
Fix the incorrect short opt for compressvlans and color
in usage
Signed-off-by: Date Huang <tjjh89017@hotmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Fri, 15 Mar 2024 15:05:23 +0000 (15:05 +0000)]
Merge branch 'nexthop-grp-stats' into next
Petr Machata says:
====================
Next hop group stats allow verification of balancedness of a next hop
group. The feature was merged in kernel commit 7cf497e5a122 ("Merge branch
'nexthop-group-stats'"). This patchset adds to ip the corresponding
support.
NH group stats come in two flavors: as statistics for SW and for HW
datapaths. The former is shown when -s is given to "ip nexthop". The latter
demands more work from the kernel, and possibly driver and HW, and might
not be always necessary. Therefore tie it to -s -s, similarly to how ip
link shows more detailed stats when -s is given twice.
Here's an example usage:
# ip link add name gre1 up type gre \
local 172.16.1.1 remote 172.16.1.2 tos inherit
# ip nexthop replace id 1001 dev gre1
# ip nexthop replace id 1002 dev gre1
# ip nexthop replace id 1111 group 1001/1002 hw_stats on
# ip -s -s -j -p nexthop show id 1111
[ {
[ ...snip... ]
"hw_stats": {
"enabled": true,
"used": true
},
"group_stats": [ {
"id": 1001,
"packets": 0,
"packets_hw": 0
},{
"id": 1002,
"packets": 0,
"packets_hw": 0
} ]
} ]
hw_stats.enabled shows whether hw_stats have been requested for the given
group. hw_stats.used shows whether any driver actually implemented the
counter. group_stats[].packets show the total stats, packets_hw only the
HW-datapath stats.
Petr Machata [Thu, 14 Mar 2024 14:52:15 +0000 (15:52 +0100)]
ip: ipnexthop: Allow toggling collection of nexthop group HW statistics
Besides SW datapath stats, the kernel also support collecting statistics
from HW datapath, for nexthop groups offloaded to HW. Since collection of
these statistics may consume HW resources, there is an interface to request
that the HW stats be recorded. Add this toggle to "ip nexthop".
Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Petr Machata [Thu, 14 Mar 2024 14:52:14 +0000 (15:52 +0100)]
ip: ipnexthop: Support dumping next hop group HW stats
Besides SW datapath stats, the kernel also support collecting statistics
from HW datapath, for nexthop groups offloaded to HW. Request that these be
collected when ip is given "-s -s", similarly to how "ip link" shows more
statistics in that case.
Besides the statistics themselves, also show whether the collection of HW
statistics was in fact requested, and whether any driver actually
implemented the request.
Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Petr Machata [Thu, 14 Mar 2024 14:52:13 +0000 (15:52 +0100)]
ip: ipnexthop: Support dumping next hop group stats
Next hop group stats allow verification of balancedness of a next hop
group. The feature was merged in kernel commit 7cf497e5a122 ("Merge branch
'nexthop-group-stats'"). Add to ip the corresponding support. The
statistics are requested if "ip nexthop" is started with -s.
Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Petr Machata [Thu, 14 Mar 2024 14:52:12 +0000 (15:52 +0100)]
libnetlink: Add rta_getattr_uint()
NLA_UINT attributes have a 4-byte payload if possible, and an 8-byte one if
necessary. Add a function to extract these. Since we need to dispatch on
length anyway, make the getter truly universal by supporting also u8 and
u16.
Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
The removal of tick usage in netem, means that some of the
helper functions in tc are no longer used and can be safely removed.
Other functions can be made static.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The current version of netem in iproute2 has a maximum of 4.3 seconds
because of scaled 32 bit clock values. Some users would like to be
able to use larger delays to emulate things like storage delays.
Since kernel version 4.15, netem qdisc had netlink parameters
to express wider range of delays in nanoseconds. But the iproute2
side was never updated to use them.
This does break compatibility with older kernels (4.14 and earlier).
With these out of support kernels, the latency/delay parameter
will end up being ignored.
Reported-by: Marc Blanchet <marc.blanchet@viagenie.ca> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Lars Ellenberg [Fri, 1 Mar 2024 12:33:24 +0000 (13:33 +0100)]
ss: fix output of MD5 signature keys configured on TCP sockets
da9cc6ab introduced printing of MD5 signature keys when found.
But when changing printf() to out() calls with 90351722,
the implicit printf call in print_escape_buf() was overlooked.
That results in a funny output in the first line:
"<all-your-tcp-signature-keys-concatenated>State"
and ambiguity as to which of those bytes belong to which socket.
Add a static void out_escape_buf() immediately before we use it.
da9cc6ab (ss: print MD5 signature keys configured on TCP sockets, 2017-10-06) 90351722 (ss: Replace printf() calls for "main" output by calls to helper, 2017-12-12)
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Tue, 27 Feb 2024 04:08:34 +0000 (04:08 +0000)]
Merge branch 'ss-socket-local-storage' into next
Quentin Deslandes says:
====================
BPF allows programs to store socket-specific data using
BPF_MAP_TYPE_SK_STORAGE maps. The data is attached to the socket itself,
and Martin added INET_DIAG_REQ_SK_BPF_STORAGES, so it can be fetched
using the INET_DIAG mechanism.
Currently, ss doesn't request the socket-local data, this patch aims to
fix this.
The first patch requests the socket-local data for the requested map ID
(--bpf-map-id=) or all the maps (--bpf-maps). It then prints the map_id
in COL_EXT.
Patch #2 uses libbpf and BTF to pretty print the map's content, like
`bpftool map dump` would do.
Patch #3 updates ss' man page to explain new options.
While I think it makes sense for ss to provide the socket-local storage
content for the sockets, it's difficult to conciliate the column-based
output of ss and having readable socket-local data. Hence, the
socket-local data is printed in a readable fashion over multiple lines
under its socket statistics, independently of the column-based approach.
Here is an example of ss' output with --bpf-maps:
[...]
ESTAB 340116 0 [...]
map_id: 114 [
(struct my_sk_storage){
.field_hh = (char)3,
(union){
.a = (int)17,
.b = (int)17,
},
}
]
Changed this series to an RFC as the merging window for net-next is
closed.
Changes from v8:
* Remove usage of libbpf_bpf_map_type_str() which requires libbpf-1.0+
and provide very little added value (David).
* Use ENABLE_BPF_SKSTORAGE_SUPPORT to gate the BPF socket-local storage
support, instead of HAVE_LIBBPF. iproute2 depends on libbpf-0.1, but
this change needs libbpf-0.5+. If the requirements are not met, ss can
still be compiled and used without BPF socket-local storage support, but
a warning will be printed at compile time.
Changes from v7:
* Fix comment format and checkpatch warnings (Stephen, David).
* Replaced Co-authored-by with Co-developed-by + Signed-off-by for
Martin's contribution on patch #1 to follow checkpatch requirements,
with Martin's approval.
Changes from v6:
* Remove column dedicated to BPF socket-local storage (COL_SKSTOR),
use COL_EXT instead (Matthieu).
Changes from v5:
* Add support for --oneline when printing socket-local data.
* Use \t to indent instead of " " to be consistent with other columns.
* Removed Martin's ack on patch #2 due to amount of lines changed.
Changes from v4:
* Fix return code for 2 calls.
* Fix issue when inet_show_netlink() retries a request.
* BPF dump object is created in bpf_map_opts_load_info().
Changes from v3:
* Minor refactoring to reduce number of HAVE_LIBBF usage.
* Update ss' man page.
* btf_dump structure created to print the socket-local data is cached
in bpf_map_opts. Creation of the btf_dump structure is performed if
needed, before printing the data.
* If a map can't be pretty-printed, print its ID and a message instead
of skipping it.
* If show_all=true, send an empty message to the kernel to retrieve all
the maps (as Martin suggested).
Changes from v2:
* bpf_map_opts_is_enabled is not inline anymore.
* Add more #ifdef HAVE_LIBBPF to prevent compilation error if
libbpf support is disabled.
* Fix erroneous usage of args instead of _args in vout().
* Add missing btf__free() and close(fd).
Changes from v1:
* Remove the first patch from the series (fix) and submit it separately.
* Remove double allocation of struct rtattr.
* Close BPF map FDs on exit.
* If bpf_map_get_fd_by_id() fails with ENOENT, print an error message
and continue to the next map ID.
* Fix typo in new command line option documentation.
* Only use bpf_map_info.btf_value_type_id and ignore
bpf_map_info.btf_vmlinux_value_type_id (unused for socket-local storage).
* Use btf_dump__dump_type_data() instead of manually using BTF to
pretty-print socket-local storage data. This change alone divides the size
of the patch series by 2.
ss is able to print the map ID(s) for which a given socket has BPF
socket-local storage defined (using --bpf-maps or --bpf-map-id=). However,
the actual content of the map remains hidden.
This change aims to pretty-print the socket-local storage content following
the socket details, similar to what `bpftool map dump` would do. The exact
output format is inspired by drgn, while the BTF data processing is similar
to bpftool's.
ss will use libbpf's btf_dump__dump_type_data() to ease pretty-printing
of binary data. This requires out_bpf_sk_storage_print_fn() as a print
callback function used by btf_dump__dump_type_data(). vout() is also
introduced, which is similar to out() but accepts a va_list as
parameter.
ss' output remains unchanged unless --bpf-maps or --bpf-map-id= is used,
in which case each socket containing BPF local storage will be followed by
the content of the storage before the next socket's info is displayed.
Signed-off-by: Quentin Deslandes <qde@naccy.de> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Yedaya Katsman [Sat, 17 Feb 2024 21:21:02 +0000 (23:21 +0200)]
ip: Add missing command exaplantions in man page
There are a few commands missing from the ip command syntax list, add
them. They are also missing from the see also section, add them there as
well.
Note there isn't a ip-ila man page, so I didn't link to it.
Also fix a few punctuation mistakes.
Signed-off-by: Yedaya Katsman <yedaya.ka@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
While sock_diag is able to return BPF socket-local storage in response
to INET_DIAG_REQ_SK_BPF_STORAGES requests, ss doesn't request it.
This change introduces the --bpf-maps and --bpf-map-id= options to request
BPF socket-local storage for all SK_STORAGE maps, or only specific ones.
The bigger part of this change will check the requested map IDs and
ensure they are valid. The column COL_EXT is used to print the
socket-local data into.
When --bpf-maps is used, ss will send an empty
INET_DIAG_REQ_SK_BPF_STORAGES request, in return the kernel will send
all the BPF socket-local storage entries for a given socket. The BTF
data for each map is loaded on demand, as ss can't predict which map ID
are used.
When --bpf-map-id=ID is used, a file descriptor to the requested maps is
open to 1) ensure the map doesn't disappear before the data is printed,
and 2) ensure the map type is BPF_MAP_TYPE_SK_STORAGE. The BTF data for
each requested map is loaded before the request is sent to the kernel.
Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Quentin Deslandes <qde@naccy.de> Signed-off-by: David Ahern <dsahern@kernel.org>
Takanori Hirano [Sun, 11 Feb 2024 01:38:48 +0000 (01:38 +0000)]
tc: Change of json format in tc-fw
In the case of a process such as mapping a json to a structure,
it can be difficult if the keys have the same name but different types.
Since handle is used in hex string, change it to fw.
Signed-off-by: Takanori Hirano <me@hrntknr.net> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Throughout ifstat.c, ifstat_ent.val is accessed as a long long unsigned
type, however it is defined as __u64. This works by coincidence on many
systems, however on ppc64le, __u64 is a long unsigned.
This patch makes the type definition consistent with all of the places
where it is accessed.
Fixes: 5a52102b7c8f ("ifstat: Add extended statistics to ifstat") Reviewed-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Gallagher <sgallagh@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Fri, 9 Feb 2024 15:25:46 +0000 (16:25 +0100)]
docs, man: fix some typos
Fix some typos and spelling errors in iproute2 documentation.
Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Fri, 9 Feb 2024 15:25:45 +0000 (16:25 +0100)]
treewide: fix typos in various comments
Fix various typos and spelling errors in some iproute2 comments.
Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Maks Mishin [Tue, 6 Feb 2024 23:54:16 +0000 (02:54 +0300)]
ctrl: Fix fd leak in ctrl_listen()
Use the same pattern for handling rtnl_listen() errors that
is used across other iproute2 commands. All other commands
exit with status of 2 if rtnl_listen fails.
Reported-off-by: Maks Mishin <maks.mishinFZ@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Aahil Awatramani [Thu, 25 Jan 2024 23:11:47 +0000 (23:11 +0000)]
ip/bond: add coupled_control support
coupled_control specifies whether the LACP state machine's MUX in the
802.3ad mode should have separate Collecting and Distributing states per
IEEE 802.1AX-2008 5.4.15 for coupled and independent control state.
By default this setting is on and does not separate the Collecting and
Distributing states, maintaining the bond in coupled control. If set off,
will toggle independent control state machine which will seperate
Collecting and Distributing states.
Signed-off-by: Aahil Awatramani <aahila@google.com>
v2:
Dropped uapi header change
Use of print_on_off and parse_on_off Signed-off-by: David Ahern <dsahern@kernel.org>
Yedaya Katsman [Sat, 3 Feb 2024 20:03:05 +0000 (22:03 +0200)]
ip: Add missing stats command to usage
The stats command was added in 54d82b0699a0 ("ip: Add a new family of
commands, "stats""), but wasn't included in the subcommand list in the
help usage.
Add it in the right position alphabetically.
Fixes: 54d82b0699a0 ("ip: Add a new family of commands, "stats"") Signed-off-by: Yedaya Katsman <yedaya.ka@gmail.com> Reviewed-by: Petr Machata <me@pmachata.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Yedaya Katsman [Sat, 27 Jan 2024 16:45:08 +0000 (18:45 +0200)]
ip: remove non-existent amt subcommand from usage
Commit 6e15d27aae94 ("ip: add AMT support") added "amt" to the list
of "first level" commands list, which isn't correct, as it isn't present
in the cmds list. remove it from the usage help.
Fixes: 6e15d27aae94 ("ip: add AMT support") Signed-off-by: Yedaya Katsman <yedaya.ka@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Tue, 30 Jan 2024 15:49:23 +0000 (15:49 +0000)]
Merge branch 'echo-tc-filter-actions' into next
Victor Nogueira says:
====================
Continuing on what Hangbin Liu started [1], this patch set adds support for
the NLM_F_ECHO flag for tc actions and filters. For qdiscs it will require
some kernel surgery, and we'll send it soon after this surgery is merged.
When user space configures the kernel with netlink messages, it can set
NLM_F_ECHO flag to request the kernel to send the applied configuration
back to the caller. This allows user space to receive back configuration
information that is populated by the kernel. Often because there are
parameters that can only be set by the kernel which become visible with the
echo, or because user space lets the kernel choose a default value.
To illustrate a use case where the kernel will give us a default value,
the example below shows the user not specifying the action index:
tc -echo actions add action mirred egress mirror dev lo
total acts 0
Added action
action order 1: mirred (Egress Mirror to device lo) pipe
index 1 ref 1 bind 0
not_in_hw
Note that the echoed response indicates that the kernel gave us a value
of index 1
Victor Nogueira [Wed, 24 Jan 2024 15:34:55 +0000 (12:34 -0300)]
tc: add NLM_F_ECHO support for actions
This patch adds the -echo flag to tc command line and support for it in
tc actions. If the user specifies this flag for an action command, the
kernel will return the command's result back to user space.
For example:
tc -echo actions add action mirred egress mirror dev lo
total acts 0
Added action
action order 1: mirred (Egress Mirror to device lo) pipe
index 10 ref 1 bind 0
not_in_hw
As illustrated above, the kernel will give us an index of 10
The same can be done for other action commands (replace, change, and
delete). For example:
tc -echo actions delete action mirred index 10
total acts 0
Deleted action
action order 1: mirred (Egress Mirror to device lo) pipe
index 10 ref 0 bind 0
not_in_hw
Signed-off-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org>
The function basename() expects a mutable character string,
which now causes a warning:
bpf_legacy.c: In function ‘bpf_load_common’:
bpf_legacy.c:975:38: warning: passing argument 1 of ‘__xpg_basename’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
975 | basename(cfg->object), cfg->mode == EBPF_PINNED ?
| ~~~^~~~~~~~
In file included from bpf_legacy.c:21:
/usr/include/libgen.h:34:36: note: expected ‘char *’ but argument is of type ‘const char *’
34 | extern char *__xpg_basename (char *__path) __THROW;
Fixes: f20ff2f19552 ("bpf: keep parsed program mode in struct bpf_cfg_in") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Victor Nogueira [Tue, 23 Jan 2024 21:38:11 +0000 (18:38 -0300)]
m_mirred: Allow mirred to block
So far the mirred action has dealt with syntax that handles
mirror/redirection for netdev. A matching packet is redirected or mirrored
to a target netdev.
In this patch we enable mirred to mirror to a tc block as well.
IOW, the new syntax looks as follows:
... mirred <ingress | egress> <mirror | redirect> [index INDEX] < <blockid BLOCKID> | <dev <devname>> >
Examples of mirroring or redirecting to a tc block:
$ tc filter add block 22 protocol ip pref 25 \
flower dst_ip 192.168.0.0/16 action mirred egress mirror blockid 22
Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David Ahern <dsahern@kernel.org>
There are cases where NULL is passed as format string when
nothing is to be printed. This is commonly done in the print_bool
function when a flag is false. Glibc seems to handle this case nicely
but for musl it will cause a segmentation fault
Since nothing needs to be printed, in this case; just check
for NULL and return.
Reported-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add a new option `-Q/--no-queues` to ss(8) to suppress the two standard
columns Send-Q and Recv-Q. This helps to keep the output steady for
monitoring purposes (like listening sockets).
Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: David Ahern <dsahern@kernel.org>