xfrm: enable to set non-wildcard mark 0 on SAs and SPs
ip xfrm considers that the user-defined mark is "any" as soon as
(mark.v & mark.m == 0), which prevents from specifying non-wildcard
marks that include the value 0 (typically 0/0xffffffff).
Yet, matching exactly mark 0 is useful for instance to separate
vti policies from global policies.
Fan Du [Tue, 1 Oct 2013 04:09:05 +0000 (21:09 -0700)]
xfrm: use memcpy to suppress gcc phony buffer overflow warning.
This bug is reported from below link:
https://bugzilla.redhat.com/show_bug.cgi?id=982761
An simplified command from its original reproducing method in bugzilla:
ip xfrm state add src 10.0.0.2 dst 10.0.0.1 proto ah spi 0x12345678 auth md5 12
will cause below spew from gcc.
Reported-by: Sohny Thomas <sthomas@linux.vnet.ibm.com>
limit : max number of packets on whole Qdisc (default 10000)
flow_limit : max number of packets per flow (default 100)
quantum : the max deficit per RR round (default is 2 MTU)
initial_quantum : initial credit for new flows (default is 10 MTU)
maxrate : max per flow rate (default : unlimited)
buckets : number of RB trees (default : 1024) in hash table.
(consumes 8 bytes per bucket)
[no]pacing : disable/enable pacing (default is enable)
Usage :
tc qdisc add dev $ETH root fq
tc qdisc del dev $ETH root 2>/dev/null
tc qdisc add dev $ETH root handle 1: mq
for i in `seq 1 4`
do
tc qdisc add dev $ETH parent 1:$i est 1sec 4sec fq
done
New command line flag to output statistics in JSON format.
In our envrionment, we have scripts that parse output of commands.
It is better to use a format supported by existing parsers.
linklayer interface between kernel and tc/userspace
This iproute2 tc patch is connected to the kernel
- commit 8a8e3d84b17 (net_sched: restore "linklayer atm" handling)
The rate table calculated by tc, have gotten replaced in the kernel
and is no-longer used for lookups.
This happened in kernel release v3.8 caused by kernel
- commit 56b765b79 ("htb: improved accuracy at high rates").
This change unfortunately caused breakage of tc overhead and
linklayer parameters.
Kernel overhead handling got fixed in kernel v3.10 by
- commit 01cb71d2d47 (net_sched: restore "overhead xxx" handling)
Kernel linklayer handling got fixed in kernel v3.11 by
- commit 8a8e3d84b17 (net_sched: restore "linklayer atm" handling)
The linklayer fix introduced a struct change, that allow the linklayer
attribute to be transferred between tc and kernel. This patch make use
of this linklayer attribute.
The linklayer setting is transfer to the kernel. And linklayer
setting received from the kernel is printed with a prefixed
"linklayer" when listing current configuration. The default
TC_LINKLAYER_ETHERNET is only printed in detailed output mode.
Nicolas Dichtel [Thu, 29 Aug 2013 12:29:07 +0000 (14:29 +0200)]
ipnetns: fix ip batch mode when using 'netns exec'
Since commit a05f6511f543, ip batch mode is broken when using 'netns exec' cmd.
When WIFEXITED() returns true, it means that the child exited normally, hence
we must not call exit() but just returns the status. If we call exit, the next
commands in the file file are not executed.
If WIFEXITED() returns false, we can call exit() because it means that the
child failed.
Thomas Egerer [Thu, 29 Aug 2013 12:00:36 +0000 (14:00 +0200)]
ip/xfrm: Fix potential SIGSEGV when printing extra flags
The git-commit dc8867d0, that added support for displaying the
extra-flags of a state, introduced a potential segfault.
Trying to show a state without the extra-flag attribute and show_stats
enabled, would cause the NULL pointer in tb[XFRMA_SA_EXTRA_FLAGS] to be
dereferenced.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Martin Schwenke [Mon, 19 Aug 2013 05:43:30 +0000 (15:43 +1000)]
ip: Add label option to ip monitor
Prefix labelling is currently only activated when monitoring "all"
objects. However, the output can still be confusing when monitoring
more than 1 object, so add an option to always print prefix labels.
Signed-off-by: Martin Schwenke <martin@meltin.net>
Thomas Richter [Tue, 30 Jul 2013 06:16:41 +0000 (08:16 +0200)]
iproute vxlan add support for fdb replace command
Add support for the bridge fdb replace command to replace an
existing entry in the vxlan device driver forwarding data base.
The entry is identified with its unicast mac address and its
corresponding remote destination information is updated.
This is useful for virtual machine migration and replaces the
bridge fdb del and bridge fdb add commands.
It follows the same interface as ip neigh replace commands.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Stefan Tomanek [Sat, 3 Aug 2013 12:23:16 +0000 (14:23 +0200)]
ip rule: add route suppression options
When configuring a system with multiple network uplinks and default routes, it
is often convenient to reference a routing table multiple times - but reject
its routing decision if certain constraints are not met by it.
Consider this setup:
$ ip route add table secuplink default via 10.42.23.1
$ ip rule add pref 100 table main suppress_prefixlength 0
$ ip rule add pref 150 fwmark 0xA table secuplink
With this setup, packets marked 0xA will be processed by the additional routing
table "secuplink", but only if no suitable route in the main routing table can
be found. By suppressing entries with a prefixlength of 0 (or less), the
default route (/0) of the table "main" is hidden to packets processed by rule
100; packets traveling to destinations via more specific routes are processed
as usual.
It is also possible to suppress a routing entry if a device belonging to
a specific interface group is to be used:
$ ip rule add pref 150 table main suppress_group 1
Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de>
vxlan: Allow setting destination to unicast address.
This patch allows setting VXLAN destination to unicast address.
It allows that VXLAN can be used as peer-to-peer tunnel without
multicast.
v6: change back to the v3 except for using new attribute because
replacing command-line parameters breaks existing scripts,
based by Cong Wang's comments.
v5: rebase on the latest.
v4: replace "group" with "remote" based by David Stevens's comments.
v3: move a new attribute REMOTE into the last of an enum list
based by Stephen Hemminger's comments.
fix the usage to show explicitly that both "remote" and "group"
cannot be specified, based by Ben Hutchings's comments.
v2: use a new argument "remote" instead of "group" based by
Stephen Hemminger's comments.
netns: follow return value conventions of the rest of the code
The netns code was using EXIT_SUCCESS/EXIT_FAILURE but the rest of the ip
code used -1 explictly, so change to follow convention. Also, certain types
of errors like fork failure should abort a batch operation, rather than just
returning an error.
In tc-ematch.8, remove no-op .ti requests to prevent translation warnings
These do nothing on an 80-column display. They were clearly somebody's
boilerplate way of setting up hanging indents, but the syntax lines
are way too short to require them. And since most were argumentless
they would have been no-ops on any sized display.
Thomas Richter [Mon, 8 Jul 2013 12:05:43 +0000 (14:05 +0200)]
iproute2 vxlan documentation update for ip command
The ip link command line help and the ip-link.8.in
man page are outdated in regards to the vxlan support.
The patch updates both the command line help for the
ip command and its man page.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Thomas Richter [Fri, 5 Jul 2013 07:08:50 +0000 (09:08 +0200)]
iproute2 vxlan documentation update for bridge command
The bridge fdb command line help and the bridge.8
man page are outdated in regards to the vxlan support.
The patch updates both the command line help for the
bridge command and its man page.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
execvp() does not return when the command succeed, hence all commands in the
batch file after the line 'ip netns exec' are not executed.
Let's fork before calling execvp() if batch mode is used..
Example:
$ cat test.batch
netns add netns1
netns exec netns1 ip l
netns
$ ip -b test.batch
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: sit0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT
link/sit 0.0.0.0 brd 0.0.0.0
All command after 'netns exec' are never executed.
With the patch:
$ ip -b test.batch
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: sit0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT
link/sit 0.0.0.0 brd 0.0.0.0
netns1
Now, existing netns are displayed.
Signed-off-by: JunweiZhang <junwei.zhang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Amerigo Wang [Tue, 2 Jul 2013 09:39:28 +0000 (17:39 +0800)]
iptunnel: check SIT_ISATAP flag only for SIT tunnel
Without patch, I got:
# ./ip/ip tunnel show
ip_vti0: ioctl 89f4 failed: Invalid argument
ip_vti0: ip/ip remote any local any ttl inherit nopmtudisc key 0
this is due to VTI_ISVTI has the same numeric value with SIT_ISATAP,
but only sit tunnel has SIOCGETPRL, therefore it should check for SIT
tunnel first.
Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cong Wang <amwang@redhat.com>
tc-stab.8: Fix synopsis errors, an invalid escape, and incorrect English usge.
The command synopsis is regularized and part of it split off into an
OPTIONS section. This allows the page to lift to XML-DocBook.
An invalid \p escape was removed.
This page was written by someone who didn't understand the use of
definite and indefinite articles in English, nor its punctuation rules.
I've fixed these mistakes, and some glitches in punctuation and
capitalization.
Eric Dumazet [Tue, 25 Jun 2013 20:29:17 +0000 (13:29 -0700)]
ss: add more TCP_INFO components
Allow ss -i to display more TCP informations :
unacked:N Number of un-acked packets
retrans:X/Y X: number of outstanding retransmit packets
Y: total number of retransmits for the session
lost:N Number of lost packets (tcpi_lost)
sacked:N Number of sacked packets (tcpi_sacked)
facked:N Number of facked packets (tcpi_facked)
reordering:N Reordering level (if different of 3)
esr@thyrsus.com [Fri, 21 Jun 2013 19:57:21 +0000 (15:57 -0400)]
First set of manpage markup fixes
Enclosed patch fixes inappropriate uses of the .SS macro. Fuller explanation
in the change comment.
There are other problems in these pages that block lifting to
XML-DocBook, most notably in the command synopses. They will take
some creativity to fix. I'm working on it
>From 75745adba4b45b87577b61a2daa886dd444f44da Mon Sep 17 00:00:00 2001
From: "Eric S. Raymond" <esr@thyrsus.com>
Date: Fri, 21 Jun 2013 15:27:38 -0400
Subject: [PATCH] Abolish presentation-level misuse of the .SS macro.
This change fixes most (but not all) fatal errors in attempts to lift
the iproute2 manual pages to XML-DocBook. Where .SS is still used it
is a real subsection header, not just a way to outdent and bold text.
Presentation-level instances are turned into .TP calls and tables.
Cong Wang [Sat, 15 Jun 2013 01:39:19 +0000 (09:39 +0800)]
add quickack option to ip route
This patch adds quickack option to enable/disable TCP quick ack
mode for per-route.
Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Cong Wang <amwang@redhat.com>
Eric Dumazet [Mon, 3 Jun 2013 05:51:33 +0000 (05:51 +0000)]
get_rate: detect 32bit overflows
On Mon, 2013-06-03 at 16:36 +0100, Ben Hutchings wrote:
> Oops, I read this as being strtol() currently, not strtod(). Currently
> '1.5gbit' will work, but this change will break that. So I think you
> need to keep bps as a double.
Arg
> Then here I think the check should be *rate != floor(bps), i.e. accept
> rounding down of a non-integer number of bytes but any other change is
> assumed to be overflow.
Thanks Ben, here is v4 then ;)
[PATCH v4] get_rate: detect 32bit overflows
Current rate limit is 34.359.738.360 bit per second, and
unfortunately 40Gbps links are above it.
overflows in get_rate() are currently not detected, and some
users are confused. Let's detect this and complain.
Note that some qdisc are ready to get extended range, but this will
need additional attributes and new iproute2
With help from Ben Hutchings
Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
iproute2: fix build failure on sparc due to -Wformat and tv_usec
tv_usec is "suseconds_t" which is apparently usually
a signed long, but sometimes not....
Change the printf modifier to use signed and
cast the tv_usec to long in case it's not already long.
gcc -Wall -Wstrict-prototypes -Werror -Wmissing-prototypes -Wmissing-declarations -Wold-style-definition -O2 -I../include -DRESOLVE_HOSTNAMES -DLIBDIR=\"/usr/lib\" -DCONFDIR=\"/etc/iproute2\" -D_GNU_SOURCE -fPIC -c -o utils.o utils.c
utils.c: In function 'print_timestamp':
utils.c:802:2: error: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type '__suseconds_t' [-Werror=format]
cc1: all warnings being treated as errors
Signed-off-by: Andreas Henriksson <andreas@fatal.se>
bridge: Add support for setting bridge port attributes
Also set flags to BRIDGE_FLAGS_SELF instead of using OR operation.
This allows setting the bridge mode when not being used with a
master device.
To allow setting both master and self devices simultaneously we
will need to add a {self|master} field similar to fdb commands.
For now the command sets are mutually exclusive as noted in the
original commit.
With this patch 'bridge link set' works now,
# ./bridge/bridge link set dev veth1 cost 3
# ./bridge/bridge link show
10: veth1 state UP : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master bridge0 state forwarding priority 3 cost 3
CC: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
iptuntap: allow creation of multi-queue tun/tap device
This patch adds multi_queue option to ip tuntap.
This allows IFF_MULTI_QUEUE flag to be specified during
tun/tap device creation enabling multi-queue support in tun/tap
device.
Example: ip tuntap add dev tap0 mode tap multi_queue
Nicolas Dichtel [Fri, 17 May 2013 15:42:34 +0000 (08:42 -0700)]
ss: allow to retrieve AF_PACKET info via netlink
This patch add support of netlink messages for AF_PACKET and thus it allows
to get filter information of this kind of sockets.
To dump these filters info the option --bfp must be specified and the user
must have admin rights.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Nicolas Dichtel [Fri, 17 May 2013 08:47:33 +0000 (01:47 -0700)]
ipnetconf: by default dump all entries
This is now possible, because the dump function has been added in kernel.
Note that IPv4 and IPv6 entries are displayed.
Before this patch, only all entries were displayed.
Example:
$ ip netconf
ipv4 dev lo forwarding on rp_filter off mc_forwarding 0
ipv4 dev eth0 forwarding on rp_filter off mc_forwarding 1
ipv4 all forwarding on rp_filter off mc_forwarding 1
ipv4 default forwarding on rp_filter off mc_forwarding 0
ipv6 dev lo forwarding on mc_forwarding 0
ipv6 dev eth0 forwarding on mc_forwarding 0
ipv6 all forwarding on mc_forwarding 0
ipv6 default forwarding on mc_forwarding 0
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
This change shifts burden onto the users to choose the UDP port value.
Kernel default value is incorrect UDP port 5287 but now there is
an official assigned port for VXLAN.
The kernel can't change because of legacy compatibility
but new deployments should not use the legacy port value.
David L Stevens [Mon, 6 May 2013 12:11:46 +0000 (05:11 -0700)]
iproute2: support NTF_ROUTER flag in VXLAN fdb entries
This patch allows setting the "NTF_ROUTER" flag in VXLAN forwarding table
entries to enable L3 switching for router destinations while still allowing
L2 redirection appliances for non-router MAC destinations.
Signed-Off-By: David L Stevens <dlstevens@us.ibm.com>
Add ability to set UDP destination port on a per device basis.
If no port is assigned, the default IANA assigned port will be used.
If you want the kernel default value, then use port 0.
Source port range option is now called 'srcport', to avoid
confusion. The old option syntax is accepted for compatiablity.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Daniel Borkmann [Tue, 30 Apr 2013 06:22:50 +0000 (06:22 +0000)]
ip: ipv6: add tokenized interface identifier support
This patch adds support for tokenized IIDs, that enable
administrators to assign well-known host-part addresses
to nodes whilst still obtaining global network prefix
from Router Advertisements. This is the iproute2 part for
the kernel patch f53adae4eae5 (``net: ipv6: add tokenized
interface identifier support'').
Example commands with iproute2:
Setting a device token:
# ip token set ::1a:2b:3c:4d/64 dev eth1
Getting a device token:
# ip token get dev eth1
token ::1a:2b:3c:4d dev eth1
Listing all tokens:
# ip token list (or: ip token)
token :: dev eth0
token ::1a:2b:3c:4d dev eth1
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Alexander Duyck [Mon, 29 Apr 2013 05:50:13 +0000 (05:50 +0000)]
iproute2: act_ipt fix xtables breakage on older versions.
In trying to build on a RHEL6.3 I ran into several build issues that are
addressed in this patch.
The first is that xtables_merge_options only has 3 parameters. It appears
this is how this code was originally. As such for the case where the version
is less than 6 I am assuming it would be correct to maintain the original
setup that only had 3 parameters being passed instead of 4.
I also ran into an issue with the define for __ALIGN_KERNEL not being present.
I believe this may be due to the fact that __ALIGN_KERNEL was moved into a
separate header from ALIGN after the UAPI changes. In order to just cover all
of the bases I have moved the main definition for the macros into
__ALIGN_KERNEL_MASK and __ALIGN_KERNEL and if ALIGN is also needed then it is
just a direct redefine to __ALIGN_KERNEL.
Cc: Hasan Chowdhury <shemonc@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>