Geliang Tang [Sat, 15 Jan 2022 16:04:33 +0000 (00:04 +0800)]
mptcp: add id check for deleting address
This patch added the id check for deleting address in mptcp_parse_opt().
The ADDRESS argument is invalid for the non-zero id address, only needed
for the id 0 address.
# ip mptcp endpoint delete id 1
# ip mptcp endpoint delete id 0 10.0.1.1
Signed-off-by: Geliang Tang <geliang.tang@suse.com> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Petr Machata [Tue, 18 Jan 2022 11:09:30 +0000 (12:09 +0100)]
dcb: Rewrite array-formatting code to not cause warnings with Clang
Some installation of Clang are unhappy about the use of a hand-rolled
formatting strings, and emit warnings such as this one:
dcb.c:334:31: warning: format string is not a string literal
[-Wformat-nonliteral]
Rewrite the impacted code so that it always uses literal format strings.
Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Kevin Bracey [Thu, 6 Jan 2022 11:16:04 +0000 (13:16 +0200)]
q_cake: allow changing to diffserv3
A diffserv3 option (enum value 0) was never sent to the kernel, so it
was not possible to use "tc qdisc change" to select it.
This also meant that were also relying on the kernel's default being
diffserv3 when adding. If the default were to change, we wouldn't have
been able to request diffserv3 explicitly.
Signed-off-by: Kevin Bracey <kevin@bracey.fi> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Vincent Mailhol [Sun, 9 Jan 2022 15:30:40 +0000 (00:30 +0900)]
iplink_can: add ctrlmode_{supported,_static} to the "--details --json" output
This patch is the userland counterpart of [1]. Indeed, [1] enables the
can netlink interface to report the CAN controller capabilities.
Previously, only the options which were switched on were reported
(i.e. can_priv::ctrlmode). Here, we add two additional pieces of
information to the json report:
- ctrlmode_supported: the options that can be modified by netlink
- ctrlmode_static: option which are statically enabled by the driver
(i.e. can not be turned off)
For your information, we borrowed the naming convention from struct
can_priv [2].
Contrary to the ctrlmode, the ctrlmode_{supported,_static} are only
reported in the json context. The reason is that this newly added
information can quickly become very verbose and we do not want to
overload the default output. You can think of the "ip --details link
show canX" output as the verbose mode and the "ip --details --json
link show canX" output as the *very* verbose mode.
*Example:*
This is how the output would look like for a dummy driver which would
have:
- CAN_CTRLMODE_LOOPBACK, CAN_CTRLMODE_LISTENONLY,
CAN_CTRLMODE_3_SAMPLES, CAN_CTRLMODE_FD, CAN_CTRLMODE_CC_LEN8_DLC
and TDC-AUTO supported by the driver
- CAN_CTRLMODE_CC_LEN8_DLC turned on by the user
- CAN_CTRLMODE_FD_NON_ISO statically enabled by the driver
Leon Romanovsky [Sun, 9 Jan 2022 18:41:39 +0000 (20:41 +0200)]
rdma: Don't allocate sparse array
The addition of driver QP type with index 0xFF caused to the following
clang compilation error:
res.c:152:10: warning: result of comparison of constant 256 with expression of type 'uint8_t' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare]
if (idx < ARRAY_SIZE(qp_types_str) && qp_types_str[idx])
~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~
Instead of allocating very sparse array, simply create separate check
for the driver QP type.
Fixes: 39307384cea7 ("rdma: Add driver QP type string") Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Leon Romanovsky [Sun, 9 Jan 2022 18:41:38 +0000 (20:41 +0200)]
rdma: Limit copy data by the destination size
The strncat() function will copy upto n bytes supplied as third
argument. The n bytes shouldn't be no more than destination and
not the source.
This change fixes the following clang compilation warnings:
res-srq.c:75:25: warning: size argument in 'strncat' call appears to be size of the source [-Wstrncat-size]
strncat(qp_str, tmp, sizeof(tmp) - 1);
^~~~~~~~~~~~~~~
res-srq.c:99:23: warning: size argument in 'strncat' call appears to be size of the source [-Wstrncat-size]
strncat(qp_str, tmp, sizeof(tmp) - 1);
^~~~~~~~~~~~~~~
res-srq.c:142:25: warning: size argument in 'strncat' call appears to be size of the source [-Wstrncat-size]
strncat(qp_str, tmp, sizeof(tmp) - 1);
^~~~~~~~~~~~~~~
Fixes: 9b272e138d23 ("rdma: Add SRQ resource tracking information") Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Andrea Claudi [Mon, 3 Jan 2022 18:00:22 +0000 (19:00 +0100)]
testsuite: Fix tc/vlan.t test
Following commit 8323b20f1d76 ("net/sched: act_vlan: No dump for unset
priority"), the kernel no longer dump vlan priority if not explicitly
set before.
When modifying a vlan, tc/vlan.t test expects to find priority set to 0
without setting it explicitly. Thus, after 8323b20f1d76 this test fails.
Fix this simply removing the check on priority.
Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Davide Caratti [Thu, 16 Dec 2021 14:29:59 +0000 (15:29 +0100)]
mptcp: add support for changing the backup flag
Linux supports 'MPTCP_PM_CMD_SET_FLAGS' since v5.12, and this control has
recently been extended to allow setting flags for a given endpoint id.
Although there is no use for changing 'signal' or 'subflow' flags, it can
be helpful to set/clear the backup bit on existing endpoints: add the 'ip
mptcp endpoint change <...>' command for this purpose.
Paul Chaignon [Thu, 16 Dec 2021 15:33:36 +0000 (16:33 +0100)]
lib/bpf: fix verbose flag when using libbpf
Since commit 6d61a2b55799 ("lib: add libbpf support"), passing the
verbose flag to tc filter doesn't dump the verifier logs anymore in case
of successful loading.
This commit fixes it by setting the log_level attribute before loading.
To that end, we need to call bpf_object__load_xattr directly instead of
relying on bpf_object__load.
Fixes: 6d61a2b55799 ("lib: add libbpf support") Signed-off-by: Paul Chaignon <paul@isovalent.com> Acked-by: Hangbin Liu <haliu@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
tc: Add support for ce_threshold_value/mask in fq_codel
Commit dfcb63ce1de6 ("fq_codel: generalise ce_threshold marking for subset
of traffic") added support in fq_codel for setting a value and mask that
will be applied to the diffserv/ECN byte to turn on the ce_threshold
feature for a subset of traffic.
This adds support to iproute for setting these values. The parameter is
called ce_threshold_selector and takes a value followed by a
slash-separated mask. Some examples:
# apply ce_threshold to ECT(1) traffic
tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x1/0x3
# apply ce_threshold to ECN-capable traffic marked as diffserv AF22
tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x50/0xfc
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Paolo Abeni [Fri, 26 Nov 2021 10:35:44 +0000 (11:35 +0100)]
mptcp: add support for fullmesh flag
The link kernel supports this endpoint flag since v5.15, let's
expose it to user-space. It allows creation on fullmesh topolgy
via MPTCP subflow.
Additionally update the related man-page, clarifying the behavior
of related options.
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Maxim Petrov [Wed, 17 Nov 2021 19:11:24 +0000 (22:11 +0300)]
ip/ipnexthop: fix unsigned overflow in parse_nh_group_type_res()
0UL has type 'unsigned long' which is likely to be 64bit on modern machines. At
the same time, the '{idle,unbalanced}_timer' variables are declared as u32, so
these variables cannot be greater than '~0UL / 100' when 'unsigned long' is 64
bits. In such condition it is still possible to pass the check but get the
overflow later when the timers are multiplied by 100 in 'addattr32'.
Fix the possible overflow by changing '~0UL' to 'UINT32_MAX'.
Fixes: 91676718228b ("nexthop: Add support for resilient nexthop groups") Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Maxim Petrov [Tue, 16 Nov 2021 19:32:26 +0000 (22:32 +0300)]
lib/bpf_legacy: remove always-true check
The 'name' field of the 'struct bpf_prog_info' is a plain C array. Thus, the
logical condition in bpf_dump_prog_info() is useless as the array address is
always true, so just remove it.
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Davide Caratti [Thu, 11 Nov 2021 09:52:13 +0000 (10:52 +0100)]
mptcp: fix JSON output when dumping endpoints by id
iproute ignores '-j' command line argument when dumping endpoints by id:
[dcaratti@dcaratti iproute2]$ ./ip/ip -j mptcp endpoint show
[{"address":"1.2.3.4","id":42,"signal":true,"backup":true}]
[dcaratti@dcaratti iproute2]$ ./ip/ip -j mptcp endpoint show id 42
1.2.3.4 id 42 signal backup
fix mptcp_addr_show() to use the proper JSON helpers.
Fixes: 7e0767cd862b ("add support for mptcp netlink interface") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Andrea Claudi <aclaudi@redhat.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Anssi Hannula [Thu, 4 Nov 2021 14:42:05 +0000 (16:42 +0200)]
man: tc-u32: Fix page to match new firstfrag behavior
Commit 690b11f4a6b8 ("tc: u32: Fix firstfrag filter.") applied in 2012
changed the "ip firstfrag" selector to not match non-fragmented packets
anymore.
However, the documentation added in f15a23966fff ("tc: add a man page
for u32 filter") in 2015 includes an example that relies on the previous
behavior (non-fragmented packet counted as first fragment).
Due to this, the example does not work correctly and does not actually
classify regular SSH packets.
Modify the example to use a raw u16 selector on the fragment offset to
make it work, and also make the firstfrag description more clear about
the current behavior.
Fixes: f15a23966fff ("tc: add a man page for u32 filter") Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Cc: Phil Sutter <phil@nwl.cc> Cc: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
David Ahern [Thu, 4 Nov 2021 15:44:56 +0000 (09:44 -0600)]
Merge branch 'can-tdc-plus-cleanups' into next
Vincent Mailhol says:
====================
The main purpose is to add commandline support for Transmitter Delay
Compensation (TDC) in iproute. Other issues found during the
development of this feature also get addressed.
This patch series contains four patches which respectively:
1. Correct the bittiming ranges in the print_usage function and add
the units to give more clarity: some parameters are in milliseconds,
some in nano seconds, some in time quantum and the newly TDC
parameters introduced in this series would be in clock period.
2. Do some code refactoring on function print_ctrlmode().
3. factorize the many print_*(PRINT_JSON, ...) and fprintf
occurrences in a single print_*(PRINT_ANY, ...) call and fix the
signedness while doing that.
4. report the value of the bitrate prescalers (brp and dbrp).
5. adds command line support for the TDC in iproute and goes together
with below series in the kernel:
https://lore.kernel.org/linux-can/20210814091750.73931-1-mailhol.vincent@wanadoo.fr/T/#t
** Changelog **
>From RFC v5 to v6:
* Dropped the RFC tag because the related patch series on the kernel
side were pulled into net-next.
* Remove the changes in include/uapi/linux/can/netlink.h because
these should be pulled separately.
* Add another patch (the second of this series) to do some cleanup
on function print_ctrlmode().
* Minor fixes in the patch comments (grammar, rephrasing).
>From RFC v4 to RFC v5:
* Add the unit (bps, tq, ns or ms) in print_usage()
* Rewrote void can_print_timing_min_max() to better factorize the
code.
* Rewrote the commit message of the two last patches (those related
to TDC) to either add clarification of fix inacurracies.
>From v3 to RFC v4:
* Reflect the changes made on the kernel side.
>From RFC v2 to v3:
* Dropped the RFC tag. Now that the kernel patch reach the testing
branch, I am finaly ready.
* Regression fix: configuring a link with only nominal bittiming
returned -EOPNOTSUPP
* Added two more patches to the series:
- iplink_can: fix configuration ranges in print_usage()
- iplink_can: print brp and dbrp bittiming variables
* Other small fixes on formatting.
>From RFC v1 to RFC v2:
* Add an additional patch to the series to fix the issues reported
by Stephen Hemminger
Ref: https://lore.kernel.org/linux-can/20210506112007.1666738-1-mailhol.vincent@wanadoo.fr/T/#t
Vincent Mailhol [Wed, 3 Nov 2021 16:44:28 +0000 (01:44 +0900)]
iplink_can: add new CAN FD bittiming parameters: Transmitter Delay Compensation (TDC)
At high bit rates, the propagation delay from the TX pin to the RX pin
of the transceiver causes measurement errors: the sample point on the
RX pin might occur on the previous bit.
This issue is addressed in ISO 11898-1 section 11.3.3 "Transmitter
delay compensation" (TDC).
This patch brings command line support to nine TDC parameters which
were recently added to the kernel's CAN netlink interface in order to
implement TDC:
- IFLA_CAN_TDC_TDCV_MIN: Transmitter Delay Compensation Value
minimum value
- IFLA_CAN_TDC_TDCV_MAX: Transmitter Delay Compensation Value
maximum value
- IFLA_CAN_TDC_TDCO_MIN: Transmitter Delay Compensation Offset
minimum value
- IFLA_CAN_TDC_TDCO_MAX: Transmitter Delay Compensation Offset
maximum value
- IFLA_CAN_TDC_TDCF_MIN: Transmitter Delay Compensation Filter
window minimum value
- IFLA_CAN_TDC_TDCF_MAX: Transmitter Delay Compensation Filter
window maximum value
- IFLA_CAN_TDC_TDCV: Transmitter Delay Compensation Value
- IFLA_CAN_TDC_TDCO: Transmitter Delay Compensation Offset
- IFLA_CAN_TDC_TDCF: Transmitter Delay Compensation Filter window
All those new parameters are nested together into the attribute
IFLA_CAN_TDC.
The TDC parameters extend the FD parameters. As such, the TDC
parameters must be specified together the "fd on" flag.
When "fd on" flag is provided, a tdc-mode parameter allows to specify
how to operate. Valid options for tdc-mode are:
* auto: the transmitter dynamically measures TDCV for each of the
transmitted frames. As such, TDCV can not be manually provided. In
this mode, the user must specify TDCO and may also specify TDCF if
supported.
* manual: use a static TDCV provided by the user. In this mode, the
user must specify both TDCV and TDCO and may also specify TDCF if
supported.
* off: TDC is explicitly disabled.
* tdc-mode parameter omitted (default mode): the kernel decides
whether TDC should be enabled or not and if so, it calculates the
TDC values. TDC parameters are an expert option and the average
user is not expected to provide those, thus the presence of this
"default mode".
If the fd flag is omitted, all the FD values (including TDC values)
remain unchanged.
If "fd off" flag is specified, all FD values (including TDC values)
are zeroed.
TDCV is always reported in manual mode. In auto mode, TDCV is reported
only if the value is available. Especially, the TDCV might not be
available if the controller has no feature to report it or if the
value in not yet available (i.e. no data sent yet and measurement did
not occur).
TDCF is reported only if tdcf_max is not zero (i.e. if supported by
the controller).
For reference, here are a few samples of how the output looks like:
| $ ip link set can0 type can bitrate 1000000 dbitrate 8000000 fd on tdco 7 tdcf 8 tdc-mode auto
Vincent Mailhol [Wed, 3 Nov 2021 16:44:27 +0000 (01:44 +0900)]
iplink_can: print brp and dbrp bittiming variables
Report the value of the bit-rate prescaler (brp) for both the nominal
and the data bittiming.
Currently, only the constant brp values (brp_{min,max,inc}) are being
reported. Also, brp is the only member of struct can_bittiming not
being reported.
Noticeably, brp could be calculated by hand from the other bittiming
parameters with below formula:
Vincent Mailhol [Wed, 3 Nov 2021 16:44:26 +0000 (01:44 +0900)]
iplink_can: use PRINT_ANY to factorize code and fix signedness
Current implementation heavily relies on some "if (is_json_context())"
switches to decide the context and then does some print_*(PRINT_JSON,
...) when in json context and some fprintf(...) else.
Furthermore, current implementation uses either print_int() or the
conversion specifier %d to print unsigned integers.
This patch factorizes each pairs of print_*(PRINT_JSON, ...) and
fprintf() into a single print_*(PRINT_ANY, ...) call. While doing this
replacement, it uses proper unsigned function print_uint() as well as
the conversion specifier %u when the parameter is an unsigned integer.
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: David Ahern <dsahern@kernel.org>
Vincent Mailhol [Wed, 3 Nov 2021 16:44:24 +0000 (01:44 +0900)]
iplink_can: fix configuration ranges in print_usage() and add unit
The configuration ranges in print_usage() are taken from "Table 8 -
Time segments' minimum configuration ranges" in section 11.3.1.2
"Configuration of the bit time parameters" of ISO 11898-1.
The standard clearly specifies that "implementations may allow time
segments that exceed the minimum required configuration ranges
specified in Table 8".
Because no maximum ranges are given in the standard, all given ranges
{ a..b } are simply replaced with { NUMBER }.
The actual ranges are specific to each device and can be confirmed
doing:
$ ip --details link show can0
1: can0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN mode DEFAULT group default qlen 10
link/can promiscuity 0 minmtu 0 maxmtu 0
can state STOPPED restart-ms 0
ES582.1/ES584.1: tseg1 2..256 tseg2 2..128 sjw 1..128 brp 1..512 brp-inc 1
ES582.1/ES584.1: dtseg1 2..32 dtseg2 1..16 dsjw 1..8 dbrp 1..32 dbrp-inc 1
clock 80000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
Finally, the unit (bps, tq, ns or ms) are given. The rationale to add
the units is that the TDC parameters (that will be introduced in the
upcoming patches) are measured in a different unit than the other
bittiming parameters: clock period (a.k.a. minimum time quantum)
instead of time quantum. Adding the units disambiguates things.
For reference, before the change:
$ ip link set can0 type can help
Usage: ip link set DEVICE type can
[ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
[ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
phase-seg2 PHASE-SEG2 [ sjw SJW ] ]
[ loopback { on | off } ]
[ listen-only { on | off } ]
[ triple-sampling { on | off } ]
[ one-shot { on | off } ]
[ berr-reporting { on | off } ]
[ fd { on | off } ]
[ fd-non-iso { on | off } ]
[ presume-ack { on | off } ]
...and after it:
$ ip link set can0 type can help
Usage: ip link set DEVICE type can
[ bitrate BITRATE [ sample-point SAMPLE-POINT] ] |
[ tq TQ prop-seg PROP_SEG phase-seg1 PHASE-SEG1
phase-seg2 PHASE-SEG2 [ sjw SJW ] ]
[ loopback { on | off } ]
[ listen-only { on | off } ]
[ triple-sampling { on | off } ]
[ one-shot { on | off } ]
[ berr-reporting { on | off } ]
[ fd { on | off } ]
[ fd-non-iso { on | off } ]
[ presume-ack { on | off } ]
[ cc-len8-dlc { on | off } ]
[ restart-ms TIME-MS ]
[ restart ]
[ termination { 0..65535 } ]
Where: BITRATE := { NUMBER in bps }
SAMPLE-POINT := { 0.000..0.999 }
TQ := { NUMBER in ns }
PROP-SEG := { NUMBER in tq }
PHASE-SEG1 := { NUMBER in tq }
PHASE-SEG2 := { NUMBER in tq }
SJW := { NUMBER in tq }
RESTART-MS := { 0 | NUMBER in ms }
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: David Ahern <dsahern@kernel.org>
Moshe Shemesh [Sun, 31 Oct 2021 06:48:47 +0000 (08:48 +0200)]
devlink: Fix cmd_dev_param_set() to check configuration mode
This patch is fixing a bug, when param set user command includes
configuration mode which is not supported, the tool may not respond
with error if the requested value is 0. In such case
cmd_dev_param_set_cb() won't find the requested configuration mode and
returns ctx->value as initialized (equal 0). Then cmd_dev_param_set()
may find that requested value equals current value and returns success.
Fixing the bug by adding a flag cmode_found which is set only if
cmd_dev_param_set_cb() finds the requested configuration mode.
Daniel Borkmann [Mon, 25 Oct 2021 15:47:27 +0000 (17:47 +0200)]
ip, neigh: Add missing NTF_USE support
Currently, ip neigh does not support the NTF_USE flag. Similar to other flags
such as extern_learn, add cmdline support. The flag dump support is explicitly
missing here, since the kernel does not propagate the flag back to user space.
Usage example:
# ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn
# ./ip/ip n
192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE
[...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David Ahern <dsahern@kernel.org>
Daniel Borkmann [Mon, 25 Oct 2021 15:47:26 +0000 (17:47 +0200)]
ip, neigh: Fix up spacing in netlink dump
Fix up spacing to consistently add a single ' ' after an attribute has
been printed. Currently, it is a bit of a mix of before and after which
can lead to double spacing to be printed.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David Ahern <dsahern@kernel.org>
David Ahern [Sat, 16 Oct 2021 18:52:02 +0000 (12:52 -0600)]
Merge branch 'rdma-optional-stats' into next
Mark Zhang says:
====================
This is supplementary part of kernel series [1], which provides an
extension to the rdma statistics tool that allows to set or list
optional counters dynamically, using netlink.
David Ahern [Fri, 15 Oct 2021 23:59:33 +0000 (17:59 -0600)]
Merge branch 'config-libdir' into next
Andrea Claudi says:
====================
This series add support for the libdir parameter in iproute2 configure
script. The idea is to make use of the fact that packaging systems may
assume that 'configure' comes from autotools allowing a syntax similar
to the autotools one, and using it to tell iproute2 where the distro
expects to find its lib files.
Patches 1-2 fix a parsing issue on current configure options, that may
trigger an endless loop when no value is provided with some options;
Patch 3 fixes a parsing issue bailing out when more than one value is
provided for a single option;
Patch 4 simplifies options parsing, moving semantic checks out of the
while loop processing options;
Patch 5 introduces support for the --opt=value style on current options,
for uniformity;
Patch 6 adds the --prefix option, that may be used by some packaging
systems when calling the configure script;
Patch 7 finally adds the --libdir option, and also drops the static
LIBDIR var from the Makefile.
Changelog:
----------
v4 -> v5
- bail out when multiple values are provided with a single option
- simplify option parsing and reduce code duplication, as suggested
by Phil Sutter
- remove a nasty eval on libdir option processing
v3 -> v4
- fix parsing issue on '--include_dir' and '--libbpf_dir'
- split '--opt value' and '--opt=value' use cases, avoid code
duplication moving semantic checks on value to dedicated functions
v2 -> v3
- fix parsing error on prefix and libdir options.
v1 -> v2
- consolidate '--opt value' and '--opt=value' use cases, as suggested
by David Ahern.
- added patch 2 to manage the --prefix option, used by the Debian
packaging system, as reported by Luca Boccassi, and use it when
setting lib directory.
Andrea Claudi [Thu, 14 Oct 2021 08:50:55 +0000 (10:50 +0200)]
configure: add the --libdir option
This commit allows users/packagers to choose a lib directory to store
iproute2 lib files.
At the moment iproute2 ship lib files in /usr/lib and offers no way to
modify this setting. However, according to the FHS, distros may choose
"one or more variants of the /lib directory on systems which support
more than one binary format" (e.g. /usr/lib64 on Fedora).
As Luca states in commit a3272b93725a ("configure: restore backward
compatibility"), packaging systems may assume that 'configure' is from
autotools, and try to pass it some parameters.
Allowing the '--libdir=/path/to/libdir' syntax, we can use this to our
advantage, and let the lib directory to be chosen by the distro
packaging system.
Note that LIBDIR uses "\${prefix}/lib" as default value because autoconf
allows this to be expanded to the --prefix value at configure runtime.
"\${prefix}" is replaced with the PREFIX value in check_lib_dir().
Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David Ahern <dsahern@kernel.org>
Andrea Claudi [Thu, 14 Oct 2021 08:50:52 +0000 (10:50 +0200)]
configure: simplify options parsing
This commit simplifies options parsing moving all the code not related to
parsing out of the case statement.
- The conditional shift after the assignments is moved right after the
case, reducing code duplication.
- The semantic checks on the LIBBPF_FORCE value is moved after the loop
like we already did for INCLUDE and LIBBPF_DIR.
- Finally, the loop condition is changed to check remaining arguments, thus
making it possible to get rid of the null string case break.
As a bonus, now the help message states that on or off should follow
--libbpf_force
Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David Ahern <dsahern@kernel.org>
Andrea Claudi [Thu, 14 Oct 2021 08:50:51 +0000 (10:50 +0200)]
configure: fix parsing issue with more than one value per option
With commit a9c3d70d902a ("configure: add options ability") users are no
more able to provide wrong command lines like:
$ ./configure --include_dir foo bar
The script simply bails out when user provides more than one value for a
single option. However, in doing so, it breaks backward compatibility with
some packaging system, which expects unknown options to be ignored.
Commit a3272b93725a ("configure: restore backward compatibility") fix this
issue, but makes it possible again for users to provide wrong command lines
such as the one above.
This fixes the issue simply ignoring autoconf-like options such as
'--opt=value'.
Fixes: a3272b93725a ("configure: restore backward compatibility") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David Ahern <dsahern@kernel.org>
Andrea Claudi [Thu, 14 Oct 2021 08:50:50 +0000 (10:50 +0200)]
configure: fix parsing issue on libbpf_dir option
configure is stuck in an endless loop if '--libbpf_dir' option is used
without a value:
$ ./configure --libbpf_dir
./configure: line 515: shift: 2: shift count out of range
./configure: line 515: shift: 2: shift count out of range
[...]
Fix it splitting 'shift 2' into two consecutive shifts, and making the
second one conditional to the number of remaining arguments.
A check is also provided after the while loop to verify the libbpf dir
exists; also, as LIBBPF_DIR does not have a default value, configure bails
out if the user does not specify a value after --libbpf_dir, thus avoiding
to produce an erroneous configuration.
Fixes: 7ae2585b865a ("configure: convert LIBBPF environment variables to command-line options") Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David Ahern <dsahern@kernel.org>
Neta Ostrovsky [Thu, 14 Oct 2021 07:53:58 +0000 (10:53 +0300)]
rdma: Add optional-counters set/unset support
This patch provides an extension to the rdma statistics tool
that allows to set/unset optional counters set dynamically,
using new netlink commands.
Note that the optional counter statistic implementation is
driver-specific and may impact the performance.
Examples:
To enable a set of optional counters on link rocep8s0f0/1:
$ sudo rdma statistic set link rocep8s0f0/1 optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts
To disable all optional counters on link rocep8s0f0/1:
$ sudo rdma statistic unset link rocep8s0f0/1 optional-counters
Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Neta Ostrovsky [Thu, 14 Oct 2021 07:53:57 +0000 (10:53 +0300)]
rdma: Add stat "mode" support
This patch introduces the "mode" command, which presents the enabled or
supported (when the "supported" argument is available) optional
counters.
An optional counter is a vendor-specific counter that may be
dynamically enabled/disabled. This enhancement of hwcounters allows
exposing of counters which are for example mutual exclusive and cannot
be enabled at the same time, counters that might degrades performance,
optional debug counters, etc.
Examples:
To present currently enabled optional counters on link rocep8s0f0/1:
$ rdma statistic mode link rocep8s0f0/1
link rocep8s0f0/1 optional-counters cc_rx_ce_pkts
To present supported optional counters on link rocep8s0f0/1:
$ rdma statistic mode supported link rocep8s0f0/1
link rocep8s0f0/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts
Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
David reported ipmptcp breaks hard the build when updating the
relevant kernel headers.
We should be more careful in the header section, explicitly
including all the required dependencies respecting the usual order
between systems and local headers.
Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David Ahern <dsahern@kernel.org>
Paul Chaignon [Wed, 13 Oct 2021 14:39:27 +0000 (16:39 +0200)]
lib/bpf: fix map-in-map creation without prepopulation
When creating map-in-maps, the outer map can be prepopulated using the
inner_idx field of inner maps. That field defines the index of the inner
map in the outer map. It is ignored if set to -1.
Commit 6d61a2b55799 ("lib: add libbpf support") however started using
that field to identify inner maps. While iterating over all maps looking
for inner maps, maps with inner_idx set to -1 are erroneously skipped.
As a result, trying to create a map-in-map with prepopulation disabled
fails because the inner_id of the outer map is not correctly set.
This bug can be observed with strace -ebpf (notice the zero inner_map_fd
for the outer map creation):
David Ahern [Sat, 9 Oct 2021 23:37:12 +0000 (17:37 -0600)]
Merge branch 'ioam-encap-modes' into next
Justin Iurman says:
====================
Following the series applied to net-next (see [1]), here are the corresponding
changes to iproute2.
In the current implementation, IOAM can only be inserted directly (i.e., only
inside packets generated locally) by default, to be compliant with RFC8200.
This patch adds support for in-transit packets and provides the ip6ip6
encapsulation of IOAM (RFC8200 compliant). Therefore, three ioam6 encap modes
are defined:
Since we use the cache netlink socket for each nexthop we can keep it open
instead of opening and closing it on every add call. The socket is opened
once, on the first add call and then reused for the rest.
Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>
David Ahern [Mon, 4 Oct 2021 00:31:44 +0000 (18:31 -0600)]
Merge branch 'nexthop-cache' into next
Nikolay Aleksandrov says:
====================
This set tries to help with an old ask that we've had for some time
which is to print nexthop information while monitoring or dumping routes.
The core problem is that people cannot follow nexthop changes while
monitoring route changes, by the time they check the nexthop it could be
deleted or updated to something else. In order to help them out I've
added a nexthop cache which is populated (only used if -d / show_details
is specified) while decoding routes and kept up to date while monitoring.
The nexthop information is printed on its own line starting with the
"nh_info" attribute and its embedded inside it if printing JSON. To
cache the nexthop entries I parse them into structures, in order to
reuse most of the code the print helpers have been altered so they rely
on prepared structures. Nexthops are now always parsed into a structure,
even if they won't be cached, that structure is later used to print the
nexthop and destroyed if not going to be cached. New nexthops (not found
in the cache) are retrieved from the kernel using a private netlink
socket so they don't disrupt an ongoing dump, similar to how interfaces
are retrieved and cached.
I have tested the set with the kernel forwarding selftests and also by
stressing it with nexthop create/update/delete in loops while monitoring.
Comments are very welcome as usual. :)
Changes since RFC:
- reordered parse/print splits, in order to do that I have to parse
resilient groups first, then add nh entry parsing so code has been
reordered as well and patch order has changed, but there have been
no functional changes (as before refactoring of old code is done in
the first 8 patches and then patches 9-12 add the new cache and use it)
- re-run all tests above
Patch breakdown:
Patches 1-2: update current route helpers to take parsed arguments so we
can directly pass them from the nh_entry structure later
Patch 3: adds new nha_res_grp structure which describes a resilient
nexhtop group
Patch 4: splits print_nh_res_group into a parse and print parts
which use the new nha_res_grp structure
Patch 5: adds new nh_entry structure which describes a nexthop
Patch 6: factors out print_nexthop's attribute parsing into nh_entry
structure used before printing
Patch 7: factors out print_nexthop's nh_entry structure printing
Patch 8: factors out ipnh_get's rtnl talk part and allows to use a
different rt handle for the communication
Patch 9: adds nexthop cache and helpers to manage it, it uses the
new __ipnh_get to retrieve nexthops
Patch 10: adds a new helper print_cache_nexthop_id that prints nexthop
information from its id, if the nexthop is not found in the
cache it fetches it
Patch 11: the new print_cache_nexthop_id helper is used when printing
routes with show_details (-d) to output detailed nexthop
information, the format after nh_info is the same as
ip nexthop show
Patch 12: changes print_nexthop into print_cache_nexthop which always
outputs the nexthop information and can also update the cache
(based on process_cache argument), it's used to keep the
cache up to date while monitoring
Example outputs (monitor):
[NEXTHOP]id 101 via 169.254.2.22 dev veth2 scope link proto unspec
[NEXTHOP]id 102 via 169.254.3.23 dev veth4 scope link proto unspec
[NEXTHOP]id 103 group 101/102 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
[ROUTE]unicast 192.0.2.0/24 nhid 203 table 4 proto boot scope global
nh_info id 203 group 201/202 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
nexthop via 169.254.2.12 dev veth3 weight 1
nexthop via 169.254.3.13 dev veth5 weight 1
[NEXTHOP]id 204 via fe80:2::12 dev veth3 scope link proto unspec
[NEXTHOP]id 205 via fe80:3::13 dev veth5 scope link proto unspec
[NEXTHOP]id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
[ROUTE]unicast 2001:db8:1::/64 nhid 206 table 4 proto boot scope global metric 1024 pref medium
nh_info id 206 group 204/205 type resilient buckets 512 idle_timer 0 unbalanced_timer 0 unbalanced_time 0 scope global proto unspec
nexthop via fe80:2::12 dev veth3 weight 1
nexthop via fe80:3::13 dev veth5 weight 1
[NEXTHOP]id 2 encap mpls 200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink
[ROUTE]unicast 2.3.4.10 nhid 2 table main proto boot scope global
nh_info id 2 encap mpls 200/300 via 10.1.1.1 dev ens20 scope link proto unspec onlink
ip: nexthop: add print_cache_nexthop which prints and manages the nh cache
Add a new helper print_cache_nexthop replacing print_nexthop which can
update the nexthop cache if the process_cache argument is true. It is
used when monitoring netlink messages to keep the nexthop cache up to
date with nexthop changes happening. For the old callers and anyone
who's just dumping nexthops its _nocache version is used which is a
wrapper for print_cache_nexthop.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David Ahern <dsahern@kernel.org>