If we forked, returning from the function will make the calling code to
continue in both the child and parent process. Make cmd_exec exit if
setup failed and it forked already.
An example of issues this causes, where a failure in setup causes
multiple unnecessary tries:
```
$ ip netns
ef
ab
$ ip -all netns exec ls
netns: ef
setting the network namespace "ef" failed: Operation not permitted
netns: ab
setting the network namespace "ab" failed: Operation not permitted
netns: ab
setting the network namespace "ab" failed: Operation not permitted
```
Signed-off-by: Yedaya Katsman <yedaya.ka@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
man: use clsact qdisc for port mirroring examples on matchall and mirred
The clsact qdisc supports ingress and egress. Instead of using two qdiscs
to do ingress and egress port mirroring, clsact can be used. Therefore, use
clsact for the port mirroring examples on the tc-matchall.8 and tc-mirred.8
documents.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The version field in mnlu was being passed in but never set.
This meant that all places mnlu_gen_socket was used, the version would
be uninitialized data from malloc().
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Max Gautier [Mon, 18 Mar 2024 15:49:13 +0000 (16:49 +0100)]
arpd: create /var/lib/arpd on first use
The motivation is to build distributions packages without /var to go
towards stateless systems, see link below (TL;DR: provisionning anything
outside of /usr on boot).
We only try do create the database directory when it's in the default
location, and assume its parent (/var/lib in the usual case) exists.
Links: https://0pointer.net/blog/projects/stateless.html Signed-off-by: Max Gautier <mg@max.gautier.name> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Date Huang [Fri, 22 Mar 2024 12:39:22 +0000 (20:39 +0800)]
bridge: vlan: fix compressvlans usage
Fix the incorrect short opt for compressvlans and color
in usage
Signed-off-by: Date Huang <tjjh89017@hotmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The removal of tick usage in netem, means that some of the
helper functions in tc are no longer used and can be safely removed.
Other functions can be made static.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The current version of netem in iproute2 has a maximum of 4.3 seconds
because of scaled 32 bit clock values. Some users would like to be
able to use larger delays to emulate things like storage delays.
Since kernel version 4.15, netem qdisc had netlink parameters
to express wider range of delays in nanoseconds. But the iproute2
side was never updated to use them.
This does break compatibility with older kernels (4.14 and earlier).
With these out of support kernels, the latency/delay parameter
will end up being ignored.
Reported-by: Marc Blanchet <marc.blanchet@viagenie.ca> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Lars Ellenberg [Fri, 1 Mar 2024 12:33:24 +0000 (13:33 +0100)]
ss: fix output of MD5 signature keys configured on TCP sockets
da9cc6ab introduced printing of MD5 signature keys when found.
But when changing printf() to out() calls with 90351722,
the implicit printf call in print_escape_buf() was overlooked.
That results in a funny output in the first line:
"<all-your-tcp-signature-keys-concatenated>State"
and ambiguity as to which of those bytes belong to which socket.
Add a static void out_escape_buf() immediately before we use it.
da9cc6ab (ss: print MD5 signature keys configured on TCP sockets, 2017-10-06) 90351722 (ss: Replace printf() calls for "main" output by calls to helper, 2017-12-12)
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Tue, 27 Feb 2024 04:08:34 +0000 (04:08 +0000)]
Merge branch 'ss-socket-local-storage' into next
Quentin Deslandes says:
====================
BPF allows programs to store socket-specific data using
BPF_MAP_TYPE_SK_STORAGE maps. The data is attached to the socket itself,
and Martin added INET_DIAG_REQ_SK_BPF_STORAGES, so it can be fetched
using the INET_DIAG mechanism.
Currently, ss doesn't request the socket-local data, this patch aims to
fix this.
The first patch requests the socket-local data for the requested map ID
(--bpf-map-id=) or all the maps (--bpf-maps). It then prints the map_id
in COL_EXT.
Patch #2 uses libbpf and BTF to pretty print the map's content, like
`bpftool map dump` would do.
Patch #3 updates ss' man page to explain new options.
While I think it makes sense for ss to provide the socket-local storage
content for the sockets, it's difficult to conciliate the column-based
output of ss and having readable socket-local data. Hence, the
socket-local data is printed in a readable fashion over multiple lines
under its socket statistics, independently of the column-based approach.
Here is an example of ss' output with --bpf-maps:
[...]
ESTAB 340116 0 [...]
map_id: 114 [
(struct my_sk_storage){
.field_hh = (char)3,
(union){
.a = (int)17,
.b = (int)17,
},
}
]
Changed this series to an RFC as the merging window for net-next is
closed.
Changes from v8:
* Remove usage of libbpf_bpf_map_type_str() which requires libbpf-1.0+
and provide very little added value (David).
* Use ENABLE_BPF_SKSTORAGE_SUPPORT to gate the BPF socket-local storage
support, instead of HAVE_LIBBPF. iproute2 depends on libbpf-0.1, but
this change needs libbpf-0.5+. If the requirements are not met, ss can
still be compiled and used without BPF socket-local storage support, but
a warning will be printed at compile time.
Changes from v7:
* Fix comment format and checkpatch warnings (Stephen, David).
* Replaced Co-authored-by with Co-developed-by + Signed-off-by for
Martin's contribution on patch #1 to follow checkpatch requirements,
with Martin's approval.
Changes from v6:
* Remove column dedicated to BPF socket-local storage (COL_SKSTOR),
use COL_EXT instead (Matthieu).
Changes from v5:
* Add support for --oneline when printing socket-local data.
* Use \t to indent instead of " " to be consistent with other columns.
* Removed Martin's ack on patch #2 due to amount of lines changed.
Changes from v4:
* Fix return code for 2 calls.
* Fix issue when inet_show_netlink() retries a request.
* BPF dump object is created in bpf_map_opts_load_info().
Changes from v3:
* Minor refactoring to reduce number of HAVE_LIBBF usage.
* Update ss' man page.
* btf_dump structure created to print the socket-local data is cached
in bpf_map_opts. Creation of the btf_dump structure is performed if
needed, before printing the data.
* If a map can't be pretty-printed, print its ID and a message instead
of skipping it.
* If show_all=true, send an empty message to the kernel to retrieve all
the maps (as Martin suggested).
Changes from v2:
* bpf_map_opts_is_enabled is not inline anymore.
* Add more #ifdef HAVE_LIBBPF to prevent compilation error if
libbpf support is disabled.
* Fix erroneous usage of args instead of _args in vout().
* Add missing btf__free() and close(fd).
Changes from v1:
* Remove the first patch from the series (fix) and submit it separately.
* Remove double allocation of struct rtattr.
* Close BPF map FDs on exit.
* If bpf_map_get_fd_by_id() fails with ENOENT, print an error message
and continue to the next map ID.
* Fix typo in new command line option documentation.
* Only use bpf_map_info.btf_value_type_id and ignore
bpf_map_info.btf_vmlinux_value_type_id (unused for socket-local storage).
* Use btf_dump__dump_type_data() instead of manually using BTF to
pretty-print socket-local storage data. This change alone divides the size
of the patch series by 2.
ss is able to print the map ID(s) for which a given socket has BPF
socket-local storage defined (using --bpf-maps or --bpf-map-id=). However,
the actual content of the map remains hidden.
This change aims to pretty-print the socket-local storage content following
the socket details, similar to what `bpftool map dump` would do. The exact
output format is inspired by drgn, while the BTF data processing is similar
to bpftool's.
ss will use libbpf's btf_dump__dump_type_data() to ease pretty-printing
of binary data. This requires out_bpf_sk_storage_print_fn() as a print
callback function used by btf_dump__dump_type_data(). vout() is also
introduced, which is similar to out() but accepts a va_list as
parameter.
ss' output remains unchanged unless --bpf-maps or --bpf-map-id= is used,
in which case each socket containing BPF local storage will be followed by
the content of the storage before the next socket's info is displayed.
Signed-off-by: Quentin Deslandes <qde@naccy.de> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Yedaya Katsman [Sat, 17 Feb 2024 21:21:02 +0000 (23:21 +0200)]
ip: Add missing command exaplantions in man page
There are a few commands missing from the ip command syntax list, add
them. They are also missing from the see also section, add them there as
well.
Note there isn't a ip-ila man page, so I didn't link to it.
Also fix a few punctuation mistakes.
Signed-off-by: Yedaya Katsman <yedaya.ka@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
While sock_diag is able to return BPF socket-local storage in response
to INET_DIAG_REQ_SK_BPF_STORAGES requests, ss doesn't request it.
This change introduces the --bpf-maps and --bpf-map-id= options to request
BPF socket-local storage for all SK_STORAGE maps, or only specific ones.
The bigger part of this change will check the requested map IDs and
ensure they are valid. The column COL_EXT is used to print the
socket-local data into.
When --bpf-maps is used, ss will send an empty
INET_DIAG_REQ_SK_BPF_STORAGES request, in return the kernel will send
all the BPF socket-local storage entries for a given socket. The BTF
data for each map is loaded on demand, as ss can't predict which map ID
are used.
When --bpf-map-id=ID is used, a file descriptor to the requested maps is
open to 1) ensure the map doesn't disappear before the data is printed,
and 2) ensure the map type is BPF_MAP_TYPE_SK_STORAGE. The BTF data for
each requested map is loaded before the request is sent to the kernel.
Co-developed-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Quentin Deslandes <qde@naccy.de> Signed-off-by: David Ahern <dsahern@kernel.org>
Takanori Hirano [Sun, 11 Feb 2024 01:38:48 +0000 (01:38 +0000)]
tc: Change of json format in tc-fw
In the case of a process such as mapping a json to a structure,
it can be difficult if the keys have the same name but different types.
Since handle is used in hex string, change it to fw.
Signed-off-by: Takanori Hirano <me@hrntknr.net> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Throughout ifstat.c, ifstat_ent.val is accessed as a long long unsigned
type, however it is defined as __u64. This works by coincidence on many
systems, however on ppc64le, __u64 is a long unsigned.
This patch makes the type definition consistent with all of the places
where it is accessed.
Fixes: 5a52102b7c8f ("ifstat: Add extended statistics to ifstat") Reviewed-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Gallagher <sgallagh@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Fri, 9 Feb 2024 15:25:46 +0000 (16:25 +0100)]
docs, man: fix some typos
Fix some typos and spelling errors in iproute2 documentation.
Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Fri, 9 Feb 2024 15:25:45 +0000 (16:25 +0100)]
treewide: fix typos in various comments
Fix various typos and spelling errors in some iproute2 comments.
Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Maks Mishin [Tue, 6 Feb 2024 23:54:16 +0000 (02:54 +0300)]
ctrl: Fix fd leak in ctrl_listen()
Use the same pattern for handling rtnl_listen() errors that
is used across other iproute2 commands. All other commands
exit with status of 2 if rtnl_listen fails.
Reported-off-by: Maks Mishin <maks.mishinFZ@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Aahil Awatramani [Thu, 25 Jan 2024 23:11:47 +0000 (23:11 +0000)]
ip/bond: add coupled_control support
coupled_control specifies whether the LACP state machine's MUX in the
802.3ad mode should have separate Collecting and Distributing states per
IEEE 802.1AX-2008 5.4.15 for coupled and independent control state.
By default this setting is on and does not separate the Collecting and
Distributing states, maintaining the bond in coupled control. If set off,
will toggle independent control state machine which will seperate
Collecting and Distributing states.
Signed-off-by: Aahil Awatramani <aahila@google.com>
v2:
Dropped uapi header change
Use of print_on_off and parse_on_off Signed-off-by: David Ahern <dsahern@kernel.org>
Yedaya Katsman [Sat, 3 Feb 2024 20:03:05 +0000 (22:03 +0200)]
ip: Add missing stats command to usage
The stats command was added in 54d82b0699a0 ("ip: Add a new family of
commands, "stats""), but wasn't included in the subcommand list in the
help usage.
Add it in the right position alphabetically.
Fixes: 54d82b0699a0 ("ip: Add a new family of commands, "stats"") Signed-off-by: Yedaya Katsman <yedaya.ka@gmail.com> Reviewed-by: Petr Machata <me@pmachata.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Yedaya Katsman [Sat, 27 Jan 2024 16:45:08 +0000 (18:45 +0200)]
ip: remove non-existent amt subcommand from usage
Commit 6e15d27aae94 ("ip: add AMT support") added "amt" to the list
of "first level" commands list, which isn't correct, as it isn't present
in the cmds list. remove it from the usage help.
Fixes: 6e15d27aae94 ("ip: add AMT support") Signed-off-by: Yedaya Katsman <yedaya.ka@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Tue, 30 Jan 2024 15:49:23 +0000 (15:49 +0000)]
Merge branch 'echo-tc-filter-actions' into next
Victor Nogueira says:
====================
Continuing on what Hangbin Liu started [1], this patch set adds support for
the NLM_F_ECHO flag for tc actions and filters. For qdiscs it will require
some kernel surgery, and we'll send it soon after this surgery is merged.
When user space configures the kernel with netlink messages, it can set
NLM_F_ECHO flag to request the kernel to send the applied configuration
back to the caller. This allows user space to receive back configuration
information that is populated by the kernel. Often because there are
parameters that can only be set by the kernel which become visible with the
echo, or because user space lets the kernel choose a default value.
To illustrate a use case where the kernel will give us a default value,
the example below shows the user not specifying the action index:
tc -echo actions add action mirred egress mirror dev lo
total acts 0
Added action
action order 1: mirred (Egress Mirror to device lo) pipe
index 1 ref 1 bind 0
not_in_hw
Note that the echoed response indicates that the kernel gave us a value
of index 1
Victor Nogueira [Wed, 24 Jan 2024 15:34:55 +0000 (12:34 -0300)]
tc: add NLM_F_ECHO support for actions
This patch adds the -echo flag to tc command line and support for it in
tc actions. If the user specifies this flag for an action command, the
kernel will return the command's result back to user space.
For example:
tc -echo actions add action mirred egress mirror dev lo
total acts 0
Added action
action order 1: mirred (Egress Mirror to device lo) pipe
index 10 ref 1 bind 0
not_in_hw
As illustrated above, the kernel will give us an index of 10
The same can be done for other action commands (replace, change, and
delete). For example:
tc -echo actions delete action mirred index 10
total acts 0
Deleted action
action order 1: mirred (Egress Mirror to device lo) pipe
index 10 ref 0 bind 0
not_in_hw
Signed-off-by: Victor Nogueira <victor@mojatatu.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org>
The function basename() expects a mutable character string,
which now causes a warning:
bpf_legacy.c: In function ‘bpf_load_common’:
bpf_legacy.c:975:38: warning: passing argument 1 of ‘__xpg_basename’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
975 | basename(cfg->object), cfg->mode == EBPF_PINNED ?
| ~~~^~~~~~~~
In file included from bpf_legacy.c:21:
/usr/include/libgen.h:34:36: note: expected ‘char *’ but argument is of type ‘const char *’
34 | extern char *__xpg_basename (char *__path) __THROW;
Fixes: f20ff2f19552 ("bpf: keep parsed program mode in struct bpf_cfg_in") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Victor Nogueira [Tue, 23 Jan 2024 21:38:11 +0000 (18:38 -0300)]
m_mirred: Allow mirred to block
So far the mirred action has dealt with syntax that handles
mirror/redirection for netdev. A matching packet is redirected or mirrored
to a target netdev.
In this patch we enable mirred to mirror to a tc block as well.
IOW, the new syntax looks as follows:
... mirred <ingress | egress> <mirror | redirect> [index INDEX] < <blockid BLOCKID> | <dev <devname>> >
Examples of mirroring or redirecting to a tc block:
$ tc filter add block 22 protocol ip pref 25 \
flower dst_ip 192.168.0.0/16 action mirred egress mirror blockid 22
Co-developed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Co-developed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: David Ahern <dsahern@kernel.org>
There are cases where NULL is passed as format string when
nothing is to be printed. This is commonly done in the print_bool
function when a flag is false. Glibc seems to handle this case nicely
but for musl it will cause a segmentation fault
Since nothing needs to be printed, in this case; just check
for NULL and return.
Reported-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Add a new option `-Q/--no-queues` to ss(8) to suppress the two standard
columns Send-Q and Recv-Q. This helps to keep the output steady for
monitoring purposes (like listening sockets).
Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: David Ahern <dsahern@kernel.org>
ss: show extra info when '--processes' is not used
A recent modification broke "extra" options for all protocols showing
info about the processes when '-p' / '--processes' option was not used
as well. In other words, all the additional bits displayed at the end or
at the next line were no longer printed if the user didn't ask to show
info about processes as well.
The reason is that, the "current_field" pointer never switched to the
"Ext" column. If the user didn't ask to display the processes, nothing
happened when trying to print extra bits using the "out()" function,
because the current field was still pointing to the "Process" one, now
marked as disabled.
Before the commit mentioned below, it was not an issue not to switch to
the "Ext" or "Process" columns because they were never marked as
"disabled".
Here is a quick list of options that were no longer displayed if '-p' /
'--processes' was not set:
That was just by quickly reading the code, I probably missed some. But
this shows that the impact can be quite important for all scripts using
'ss' to monitor connections or to report info.
Fixes: 1607bf53 ("ss: prevent "Process" column from being printed unless requested") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>