Hangbin Liu [Mon, 31 May 2021 09:47:39 +0000 (17:47 +0800)]
configure: add options ability
There are more and more global environment variables that land everywhere
in configure, which is making user hard to know which one does what.
Using command-line options would make it easier for users to learn or
remember the config options.
This patch converts the INCLUDE variable to command option first. Check
if the first variable has '-' to compile with the old INCLUDE path
setting method.
Signed-off-by: Hangbin Liu <haliu@redhat.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Heiko Thiery [Sat, 8 May 2021 06:49:26 +0000 (08:49 +0200)]
lib/fs: fix issue when {name,open}_to_handle_at() is not implemented
With commit d5e6ee0dac64 the usage of functions name_to_handle_at() and
open_by_handle_at() are introduced. But these function are not available
e.g. in uclibc-ng < 1.0.35. To have a backward compatibility check for the
availability in the configure script and in case of absence do a direct
syscall.
Fixes: d5e6ee0dac64 ("ss: introduce cgroup2 cache and helper functions") Cc: Dmitry Yakunin <zeil@yandex-team.ru> Cc: Petr Vorel <petr.vorel@gmail.com> Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com> Reviewed-by: Petr Vorel <petr.vorel@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org>
David Ahern [Sun, 9 May 2021 22:50:18 +0000 (22:50 +0000)]
config.mk: Rerun configure when it is newer than config.mk
config.mk needs to be re-generated any time configure is changed.
Rename the existing make target and add a check that the config.mk
file needs to exist and must be newer than configure script.
Signed-off-by: David Ahern <dsahern@kernel.org> Reviewed-by: Petr Vorel <petr.vorel@gmail.com> Tested-by: Petr Vorel <petr.vorel@gmail.com>
Jakub Kicinski [Sat, 1 May 2021 03:10:59 +0000 (20:10 -0700)]
ip: dynamically size columns when printing stats
This change makes ip -s -s output size the columns
automatically. I often find myself using json
output because the normal output is unreadable.
Even on a laptop after 2 days of uptime byte
and packet counters almost overflow their columns,
let alone a busy server.
Paolo Lungaroni [Sat, 8 May 2021 15:44:58 +0000 (17:44 +0200)]
seg6: add counters support for SRv6 Behaviors
We introduce the "count" optional attribute for supporting counters in SRv6
Behaviors as defined in [1], section 6. For each SRv6 Behavior instance,
counters defined in [1] are:
- the total number of packets that have been correctly processed;
- the total amount of traffic in bytes of all packets that have been
correctly processed;
In addition, we introduce a new counter that counts the number of packets
that have NOT been properly processed (i.e. errors) by an SRv6 Behavior
instance.
Each SRv6 Behavior instance can be configured, at the time of its creation,
to make use of counters specifing the "count" attribute as follows:
$ ip -6 route add 2001:db8::1 encap seg6local action End count dev eth0
per-behavior counters can be shown by adding "-s" to the iproute2 command
line, i.e.:
$ ip -s -6 route show 2001:db8::1
2001:db8::1 encap seg6local action End packets 0 bytes 0 errors 0 dev eth0
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it> Signed-off-by: David Ahern <dsahern@kernel.org>
Andrea Claudi [Thu, 6 May 2021 10:42:06 +0000 (12:42 +0200)]
tc: htb: improve burst error messages
When a wrong value is provided for "burst" or "cburst" parameters, the
resulting error message is unclear and can be misleading:
$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "buffer"
The message claims an illegal "buffer" is provided, but neither the
inline help nor the man page list "buffer" among the htb parameters, and
the only way to know that "burst", "maxburst" and "buffer" are synonyms
is to look into tc/q_htb.c.
This commit tries to improve this simply changing the error string to
the parameter name provided in the user-given command, clearly pointing
out where the wrong value is.
$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100KBps burst errtrigger
Illegal "burst"
$ tc class add dev dummy0 parent 1: classid 1:1 htb rate 100Kbps maxburst errtrigger
Illegal "maxburst"
Reported-by: Sebastian Mitterle <smitterl@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Fix this returning an error if key length is longer than
TIPC_AEAD_KEYLEN_MAX.
Fixes: 24bee3bf9752 ("tipc: add new commands to set TIPC AEAD key") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Fix this returning an error if provided algname is longer than
TIPC_AEAD_ALG_NAME.
Fixes: 24bee3bf9752 ("tipc: add new commands to set TIPC AEAD key") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Hoang Le [Thu, 6 May 2021 03:27:24 +0000 (10:27 +0700)]
tipc: call a sub-routine in separate socket
When receiving a result from first query to netlink, we may exec
a another query inside the callback. If calling this sub-routine
in the same socket, it will be discarded the result from previous
exection.
To avoid this we perform a nested query in separate socket.
Fixes: 202102830663 ("tipc: use the libmnl functions in lib/mnl_utils.c") Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Tyson Moore [Thu, 29 Apr 2021 18:28:47 +0000 (14:28 -0400)]
tc-cake: update docs to include LE diffserv
Linux kernel commit b8392808eb3fc28e ("sch_cake: add RFC 8622 LE PHB
support to CAKE diffserv handling") added packets with LE diffserv to
the Bulk priority tin. Update the documentation to reflect this change.
Signed-off-by: Tyson Moore <tyson@tyson.me> Signed-off-by: David Ahern <dsahern@kernel.org>
Andrea Claudi [Sat, 1 May 2021 16:39:23 +0000 (18:39 +0200)]
dcb: fix memory leak
main() dinamically allocates dcb, but when dcb_help() is called it
returns without freeing it.
Fix this using a goto, as it is already done in the same function.
Fixes: 67033d1c1c8a ("Add skeleton of a new tool, dcb") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Reviewed-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Andrea Claudi [Sat, 1 May 2021 16:39:22 +0000 (18:39 +0200)]
dcb: fix return value on dcb_cmd_app_show
dcb_cmd_app_show() is supposed to return EINVAL if an incorrect argument
is provided.
Fixes: 8e9bed1493f5 ("dcb: Add a subtool for the DCB APP object") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Reviewed-by: Petr Machata <me@pmachata.org> Signed-off-by: David Ahern <dsahern@kernel.org>
Andrea Claudi [Sat, 1 May 2021 17:05:45 +0000 (19:05 +0200)]
lib: bpf_legacy: avoid to pass invalid argument to close()
In function bpf_obj_open, if bpf_fetch_prog_arg() return an error, we
end up in the out: path with a negative value for fd, and pass it to
close.
Avoid this checking for fd to be positive.
Fixes: 32e93fb7f66d ("{f,m}_bpf: allow for sharing maps") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Andrea Claudi [Sat, 1 May 2021 16:44:35 +0000 (18:44 +0200)]
tc: q_ets: drop dead code from argument parsing
Checking for nbands to be at least 1 at this point is useless. Indeed:
- ets requires "bands", "quanta" or "strict" to be specified
- if "bands" is specified, nbands cannot be negative, see parse_nbands()
- if "strict" is specified, nstrict cannot be negative, see
parse_nbands()
- if "quantum" is specified, nquanta cannot be negative, see
parse_quantum()
- if "bands" is not specified, nbands is set to nstrict+nquanta
- the previous if statement takes care of the case when none of them are
specified and nbands is 0, terminating execution.
Thus nbands cannot be < 1 at this point and this code cannot be executed.
Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: David Ahern <dsahern@kernel.org>
mptcp: make sure flag signal is set when add addr with port
When add address with port, it is mean to send an ADD_ADDR to remote,
so it must have flag signal set.
Fixes: 42fbca91cd61 ("mptcp: add support for port based endpoint") Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David Ahern <dsahern@kernel.org>
The default behavior for source MACVLAN is to duplicate packets to
appropriate type source devices, and then do the normal destination MACVLAN
flow. This patch adds an option to skip destination MACVLAN processing if
any matching source MACVLAN device has the option set.
This allows setting up a "catch all" device for source MACVLAN: create one
or more devices with type source nodst, and one device with e.g. type vepa,
and incoming traffic will be received on exactly one device.
Signed-off-by: Jethro Beekman <kernel@jbeekman.nl> Signed-off-by: David Ahern <dsahern@kernel.org>
$ rdma res show srq
dev ibp8s0f0 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f0 srqn 4 type BASIC lqpn 125-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141-156 pdn 10 pid 3584 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 6 type BASIC lqpn 157-172 pdn 11 pid 3590 comm ibv_srq_pingpon
dev ibp8s0f1 srqn 0 type BASIC pdn 3 comm [ib_ipoib]
dev ibp8s0f1 srqn 1 type BASIC lqpn 329-344 pdn 4 pid 3586 comm ibv_srq_pingpon
$ rdma res show srq lqpn 126-141
dev ibp8s0f0 srqn 4 type BASIC lqpn 126-128,130-140 pdn 9 pid 3581 comm ibv_srq_pingpon
dev ibp8s0f0 srqn 5 type BASIC lqpn 141 pdn 10 pid 3584 comm ibv_srq_pingpon
$ rdma res show srq lqpn 127
dev ibp8s0f0 srqn 4 type BASIC lqpn 127 pdn 9 pid 3581 comm ibv_srq_pingpon
Reviewed-by: Ido Kalir <idok@nvidia.com> Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
$ rdma res show ctx
dev ibp8s0f0 ctxn 0 pid 980 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 1 pid 981 comm ibv_rc_pingpong
dev ibp8s0f0 ctxn 2 pid 992 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong
$ rdma res show ctx dev ibp8s0f1
dev ibp8s0f1 ctxn 0 pid 984 comm ibv_rc_pingpong
dev ibp8s0f1 ctxn 1 pid 987 comm ibv_rc_pingpong
Reviewed-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Ido Kalir <idok@nvidia.com> Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Andrea Claudi [Mon, 19 Apr 2021 13:49:57 +0000 (15:49 +0200)]
lib: bpf_legacy: fix missing socket close when connect() fails
In functions bpf_{send,recv}_map_fds(), when connect fails after a
socket is successfully opened, we return with error missing a close on
the socket.
Fix this closing the socket if opened and using a single return point
for both the functions.
Fixes: 6256f8c9e45f ("tc, bpf: finalize eBPF support for cls and act front-end") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Mon, 19 Apr 2021 13:49:56 +0000 (15:49 +0200)]
lib: bpf_legacy: treat 0 as a valid file descriptor
As stated in the man page(), open returns a non-negative integer as a
file descriptor. Hence, when checking for its return value to be ok, we
should include 0 as a valid value.
This fixes a covscan warning about a missing close() in this function.
Fixes: ecb05c0f997d ("bpf: improve error reporting around tail calls") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Mon, 19 Apr 2021 13:37:25 +0000 (15:37 +0200)]
ip: netns: fix missing netns close on some error paths
In functions netns_pids() and netns_identify_pid(), the netns file is
not closed on some error paths.
Fix this using a conditional close and a single return point on both
functions.
Fixes: 44b563269ea1 ("ip-nexthop: support flush by id") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
bridge: vlan: dump port only if there are any vlans
When I added support for new vlan rtm dumping, I made a mistake in the
output format when there are no vlans on the port. This patch fixes it by
not printing ports without vlan entries (similar to current situation).
Example (no vlans):
$ bridge -d vlan show
port vlan-id
Fixes: e5f87c834193 ("bridge: vlan: add support for the new rtm dump call") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Tony Ambardar [Tue, 20 Apr 2021 08:26:36 +0000 (01:26 -0700)]
ip: drop 2-char command assumption
The 'ip' utility hardcodes the assumption of being a 2-char command, where
any follow-on characters are passed as an argument:
$ ./ip-full help
Object "-full" is unknown, try "ip help".
This confusing behaviour isn't seen with 'tc' for example, and was added in
a 2005 commit without documentation. It was noticed during testing of 'ip'
variants built/packaged with different feature sets (e.g. w/o BPF support).
Mitigate the problem by redoing the command without the 2-char assumption
if the follow-on characters fail to parse as a valid command.
Fixes: 351efcde4e62 ("Update header files to 2.6.14") Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org>
The build of iproute2 relies on having correct copy of santized
kernel headers. The vdpa utility introduced a dependency on
the vdpa related headers, but these headers were not present
in iproute2 repo.
Fixes: c2ecc82b9d4c ("vdpa: Add vdpa tool") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Thu, 22 Apr 2021 05:20:13 +0000 (05:20 +0000)]
Merge branch 'bridge-vlan' into next
Nikolay Aleksandrov says:
====================
From: Nikolay Aleksandrov <nikolay@nvidia.com>
This set extends the bridge vlan code to use the new vlan RTM calls
which allow to dump detailed per-port, per-vlan information and also to
manipulate the per-vlan options. It also allows to monitor any vlan
changes (add/del/option change). The rtm vlan dumps have an extensible
format which allows us to add new options and attributes easily, and
also to request the kernel to filter on different vlan information when
dumping. The new kernel dump code tries to use compressed vlan format as
much as possible (it includes netlink attributes for vlan start and
end) to reduce the number of generated messages and netlink traffic.
The iproute2 support is activated by using the "-d" flag when showing
vlan information, that will cause it to use the new rtm dump call and
get all the detailed information, if "-s" is also specified it will dump
per-vlan statistics as well. Obviously in that case the vlans cannot be
compressed. To change per-vlan options (currently only STP state is
supported) a new vlan command is added - "set". It can be used to set
options of bridge or port vlans and vlan ranges can be used, all of the
new vlan option code uses extack to show more understandable errors.
The set adds the first supported per-vlan option - STP state.
Man pages and usage information are updated accordingly.
Example:
$ bridge -d vlan show
port vlan-id
ens13 1 PVID Egress Untagged
state forwarding
bridge 1 PVID Egress Untagged
state forwarding
$ bridge vlan set vid 1 dev ens13 state blocking
$ bridge -d vlan show
port vlan-id
ens13 1 PVID Egress Untagged
state blocking
bridge 1 PVID Egress Untagged
state forwarding
Add support for vlan activity monitoring, we display vlan notifications on
vlan add/del/options change. The man page and help are also updated
accordingly.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
bridge: vlan: add support for the new rtm dump call
Use the new bridge vlan rtm dump helper to dump all of the available
vlan information when -details (-d) is used with vlan show. It is also
capable of dumping vlan stats if -statistics (-s) is added.
Currently this is the only interface capable of dumping per-vlan
options. The vlan dump format is compatible with current vlan show, it
uses the same helpers to dump vlan information. The new addition is one
line which will contain the per-vlan options (similar to ip -d link show
for ports). Currently only the vlan STP state is printed.
The call uses compressed vlan format by default.
Example:
$ bridge -s -d vlan show
port vlan-id
virbr1 1 PVID Egress Untagged
state forwarding
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
bridge: vlan: add option set command and state option
Add a new per-vlan option set command. It allows to manipulate vlan
options, those can be bridge-wide or per-port depending on what device
is specified. The first option that can be set is the vlan STP state,
it is identical to the bridge port STP state. The man page is also
updated accordingly.
Example:
$ bridge vlan set vid 10 dev br0 state learning
or a range:
$ bridge vlan set vid 10-20 dev swp1 state blocking
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Rename print_portstate to print_stp_state in preparation for use by vlan
code as well (per-vlan state), and export it. To be in line with the new
naming rename also port_states to stp_states as they'll be used for
vlans, too.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
This adds iproute2 support for mptcp event monitoring, e.g. creation,
establishment, address announcements from the peer, subflow establishment
and so on.
While the kernel-generated events are primarily aimed at mptcpd (e.g. for
subflow management), this is also useful for debugging.
Andrea Claudi [Sun, 18 Apr 2021 12:56:30 +0000 (14:56 +0200)]
rdma: stat: fix return code
libmnl defines MNL_CB_OK as 1 and MNL_CB_ERROR as -1. rdma uses these
return codes, and stat_qp_show_parse_cb() should do the same.
Fixes: 16ce4d23661a ("rdma: stat: initialize ret in stat_qp_show_parse_cb()") Reported-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Tue, 13 Apr 2021 22:50:45 +0000 (00:50 +0200)]
nexthop: fix memory leak in add_nh_group_attr()
grps is dinamically allocated with a calloc, and not freed in a return
path in the for cycle. This commit fix it.
While at it, make the function use a single return point.
Fixes: 63df8e8543b0 ("Add support for nexthop objects") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Tue, 13 Apr 2021 22:50:20 +0000 (00:50 +0200)]
q_cake: remove useless check on argv
In cake_parse_opt(), *argv is checked not to be null when parsing for
overhead and mpu parameters. However this is useless, since *argv
matches right before for "overhead" or "mpu".
Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Tue, 13 Apr 2021 22:48:37 +0000 (00:48 +0200)]
devlink: always check strslashrsplit() return value
strslashrsplit() return value is not checked in __dl_argv_handle(),
despite the fact that it can return EINVAL.
This commit fix it and make __dl_argv_handle() return error if
strslashrsplit() return an error code.
Fixes: 2f85a9c53587 ("devlink: allow to parse both devlink and port handle in the same time") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
The format for erspan/erspan6 output is not valid JSON, as on version 2 a
valueless key was presented. The direction should be value and erspan_dir
should be the key.
Fixes: 289763626721 ("erspan: add erspan version II support") Cc: u9012063@gmail.com Reported-by: Christian Pössinger <christian@poessinger.com> Signed-off-by: Christian Pössinger <christian@poessinger.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Petr Machata [Wed, 17 Mar 2021 12:24:14 +0000 (13:24 +0100)]
ip: Fix batch processing
After the comment cited below, batch mode neglects to set the global
variable batch_mode to a non-zero value. Netns and VRF commands use this
variable, and break in batch mode. Fix by setting the value again.
Fixes: 1d9a81b8c9f3 ("Unify batch processing across tools") Reported-by: Tim Rice <trice@posteo.net> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Sabrina Dubroca [Fri, 19 Mar 2021 16:57:17 +0000 (17:57 +0100)]
ip: xfrm: add support for tfcpad
This patch adds support for setting and displaying the Traffic Flow
Confidentiality attribute for an XFRM state, which allows padding ESP
packets to a specified length.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David Ahern <dsahern@kernel.org>
David Ahern [Fri, 19 Mar 2021 15:05:29 +0000 (15:05 +0000)]
Merge branch 'nexthop-resilient-hash' into next
Petr Machata says:
====================
Support for resilient next-hop groups was recently accepted to Linux
kernel[1]. Resilient next-hop groups add a layer of indirection between the
SKB hash and the next hop. Thus the hash is used to reference a hash table
bucket, which is then used to reference a particular next hop. This allows
the system more flexibility when assigning SKB hash space to next hops.
Previously, each next hop had to be assigned a continuous range of SKB hash
space. With a hash table as an intermediate layer, it is possible to
reassign next hops with a hash table bucket granularity. In turn, this
mends issues with traffic flow redirection resulting from next hop removal
or adjustments in next-hop weights.
In this patch set, introduce support for resilient next-hop groups to
iproute2.
- Patch #1 brings include/uapi/linux/nexthop.h and /rtnetlink.h up to date.
- Patches #2 and #3 add new helpers that will be useful later.
- Patch #4 extends the ip/nexthop sub-tool to accept group type as a
command line argument, and to dispatch based on the specified type.
- Patch #5 adds the support for resilient next-hop groups.
- Patch #6 adds the support for resilient next-hop group bucket interface.
To illustrate the usage, consider the following commands:
# ip nexthop add id 1 via 192.0.2.2 dev dummy1
# ip nexthop add id 2 via 192.0.2.3 dev dummy1
# ip nexthop add id 10 group 1/2 type resilient \
buckets 8 idle_timer 60 unbalanced_timer 300
The last command creates a resilient next-hop group. It will have 8
buckets, each bucket will be considered idle when no traffic hits it for at
least 60 seconds, and if the table remains out of balance for 300 seconds,
it will be forcefully brought into balance.
And this is how the next-hop group bucket interface looks:
# ip nexthop bucket show id 10
id 10 index 0 idle_time 5.59 nhid 1
id 10 index 1 idle_time 5.59 nhid 1
id 10 index 2 idle_time 8.74 nhid 2
id 10 index 3 idle_time 8.74 nhid 2
id 10 index 4 idle_time 8.74 nhid 1
id 10 index 5 idle_time 8.74 nhid 1
id 10 index 6 idle_time 8.74 nhid 1
id 10 index 7 idle_time 8.74 nhid 1
Ido Schimmel [Wed, 17 Mar 2021 12:54:35 +0000 (13:54 +0100)]
nexthop: Add support for nexthop buckets
Add ability to dump multiple nexthop buckets and get a specific one.
Example:
# ip nexthop add id 10 group 1/2 type resilient buckets 8
# ip nexthop
id 1 via 192.0.2.2 dev dummy10 scope link
id 2 via 192.0.2.19 dev dummy20 scope link
id 10 group 1/2 type resilient buckets 8 idle_timer 120 unbalanced_timer 0 unbalanced_time 0
# ip nexthop bucket
id 10 index 0 idle_time 28.1 nhid 2
id 10 index 1 idle_time 28.1 nhid 2
id 10 index 2 idle_time 28.1 nhid 2
id 10 index 3 idle_time 28.1 nhid 2
id 10 index 4 idle_time 28.1 nhid 1
id 10 index 5 idle_time 28.1 nhid 1
id 10 index 6 idle_time 28.1 nhid 1
id 10 index 7 idle_time 28.1 nhid 1
# ip nexthop bucket show nhid 1
id 10 index 4 idle_time 53.59 nhid 1
id 10 index 5 idle_time 53.59 nhid 1
id 10 index 6 idle_time 53.59 nhid 1
id 10 index 7 idle_time 53.59 nhid 1
# ip nexthop bucket get id 10 index 5
id 10 index 5 idle_time 81 nhid 1
# ip -j -p nexthop bucket get id 10 index 5
[ {
"id": 10,
"bucket": {
"index": 5,
"idle_time": 104.89,
"nhid": 1
},
"flags": [ ]
} ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Ido Schimmel [Wed, 17 Mar 2021 12:54:33 +0000 (13:54 +0100)]
nexthop: Add ability to specify group type
Next patches are going to add a 'resilient' nexthop group type, so allow
users to specify the type using the 'type' argument. Currently, only
'mpath' type is supported.
These two commands are equivalent:
# ip nexthop add id 10 group 1/2/3
# ip nexthop add id 10 group 1/2/3 type mpath
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Petr Machata [Wed, 17 Mar 2021 12:54:32 +0000 (13:54 +0100)]
nexthop: Extract a helper to parse a NH ID
NH ID extraction is a common operation, and will become more common still
with the resilient NH groups support. Add a helper that does what it
usually done and returns the parsed NH ID.
Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Tony Ambardar [Thu, 11 Mar 2021 21:47:54 +0000 (13:47 -0800)]
lib/bpf: add missing limits.h includes
Several functions in bpf_glue.c and bpf_libbpf.c rely on PATH_MAX, which is
normally included from <limits.h> in other iproute2 source files.
It fixes errors seen using gcc 10.2.0, binutils 2.35.1 and musl 1.1.24:
bpf_glue.c: In function 'get_libbpf_version':
bpf_glue.c:46:11: error: 'PATH_MAX' undeclared (first use in this function);
did you mean 'AF_MAX'?
46 | char buf[PATH_MAX], *s;
| ^~~~~~~~
| AF_MAX
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com> Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Sabrina Dubroca [Tue, 9 Mar 2021 15:44:33 +0000 (16:44 +0100)]
ip: xfrm: limit the length of the security context name when printing
Security context names are not guaranteed to be NUL-terminated by the
kernel, so we can't just print them using %s directly. The length of
the string is determined by sctx->ctx_len, so we can use that to limit
what fprintf outputs.
While at it, factor that out to a separate function, since the exact
same code is used to print the security context for both policies and
states.
Fixes: b2bb289a57fe ("xfrm security context support") Reported-by: Paul Wouters <pwouters@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
q_cake: Fix incorrect printing of signed values in class statistics
The deficit returned from the kernel is signed, but was printed with a %u
specifier in the format string, leading to negative values to be printed as
high unsigned values instead. In addition, we passed a negative value to
sprint_time() even though that expects an unsigned value. Fix this by
changing the format specifier and reversing the sign of negative time
values.
Fixes: 714444c0cb26 ("Add support for CAKE qdisc") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Roi Dayan [Mon, 22 Feb 2021 12:10:30 +0000 (14:10 +0200)]
dcb: Fix compilation warning about reallocarray
In older distros we need bsd/stdlib.h but newer distro doesn't
need it. Also old distro will need libbsd-devel installed and newer
doesn't. To remove a possible dependency on libbsd-devel replace usage
of reallocarray to realloc.
dcb_app.c: In function ‘dcb_app_table_push’:
dcb_app.c:68:25: warning: implicit declaration of function ‘reallocarray’; did you mean ‘realloc’?
Fixes: 8e9bed1493f5 ("dcb: Add a subtool for the DCB APP object") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Mon, 22 Feb 2021 18:14:32 +0000 (19:14 +0100)]
lib/fs: Fix single return points for get_cgroup2_*
Functions get_cgroup2_id() and get_cgroup2_path() may call close() with
a negative argument.
Avoid that making the calls conditional on the file descriptors.
get_cgroup2_path() may also return NULL leaking a file descriptor.
Ensure this does not happen using a single return point.
Fixes: d5e6ee0dac64 ("ss: introduce cgroup2 cache and helper functions") Fixes: 8f1cd119b377 ("lib: fix checking of returned file handle size for cgroup") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Mon, 22 Feb 2021 18:14:31 +0000 (19:14 +0100)]
lib/fs: avoid double call to mkdir on make_path()
make_path() function calls mkdir two times in a row. The first one it
stores mkdir return code, and then it calls it again to check for errno.
This seems unnecessary, as we can use the return code from the first
call and check for errno if not 0.
Fixes: ac3415f5c1b1d ("lib/fs: Fix and simplify make_path()") Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Mon, 22 Feb 2021 17:43:10 +0000 (18:43 +0100)]
lib/bpf: Fix and simplify bpf_mnt_check_target()
As stated in commit ac3415f5c1b1 ("lib/fs: Fix and simplify make_path()"),
calling stat() before mkdir() is racey, because the entry might change in
between.
As the call to stat() seems to only check for target existence, we can
simply call mkdir() unconditionally and catch all errors but EEXIST.
Fixes: 95ae9a4870e7 ("bpf: fix mnt path when from env") Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Andrea Claudi [Mon, 22 Feb 2021 11:40:36 +0000 (12:40 +0100)]
lib/namespace: fix ip -all netns return code
When ip -all netns {del,exec} are called and no netns is present, ip
exit with status 0. However this does not happen if no netns has been
created since boot time: in that case, indeed, the NETNS_RUN_DIR is not
present and netns_foreach() exit with code 1.
$ ls /var/run/netns
ls: cannot access '/var/run/netns': No such file or directory
$ ip -all netns exec ip link show
$ echo $?
1
$ ip -all netns del
$ echo $?
1
$ ip netns add test
$ ip netns del test
$ ip -all netns del
$ echo $?
0
$ ls -a /var/run/netns
. ..
This leaves us in the unpleasant situation where the same command, when
no netns is present, does the same stuff (in this case, nothing), but
exit with two different statuses.
Fix this treating ENOENT in a different way from other errors, similarly
to what we already do in ipnetns.c netns_identify_pid()
Fixes: e998e118ddc3 ("lib: Exec func on each netns") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Mon, 22 Feb 2021 20:23:01 +0000 (21:23 +0100)]
ip: lwtunnel: seg6: bail out if table ids are invalid
When table and vrftable are used in SRv6, ip should bail out if table
ids are not valid, and return a proper error message to the user.
Achieve this simply checking rtnl_rttable_a2n return value, as we
already do in the rest of iproute.
Fixes: 0486388a877a ("add support for table name in SRv6 End.DT* behaviors") Fixes: 69629b4e43c4 ("seg6: add support for vrftable attribute in SRv6 End.DT4/DT6 behaviors") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Andrea Claudi [Mon, 22 Feb 2021 20:22:47 +0000 (21:22 +0100)]
tc: m_gate: use SPRINT_BUF when needed
sprint_time64() uses SPRINT_BSIZE-1 as a constant buffer lenght in its
implementation, however m_gate uses shorter buffers when calling it.
Fix this using SPRINT_BUF macro to get the buffer, thus getting a
SPRINT_BSIZE-long buffer.
Fixes: 07d5ee70b5b3 ("iproute2-next:tc:action: add a gate control action") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Vladimir Oltean [Thu, 11 Feb 2021 10:45:02 +0000 (12:45 +0200)]
man8/bridge.8: be explicit that "flood" is an egress setting
Talking to varios people, it became apparent that there is a certain
ambiguity in the description of these flags. They refer to egress
flooding, which should perhaps be stated more clearly.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>