]> git.ipfire.org Git - thirdparty/iproute2.git/log
thirdparty/iproute2.git
9 years agoiplink: bridge_slave: add support for IFLA_BRPORT_PROXYARP_WIFI
Nikolay Aleksandrov [Tue, 16 Feb 2016 15:08:53 +0000 (16:08 +0100)] 
iplink: bridge_slave: add support for IFLA_BRPORT_PROXYARP_WIFI

Add support to be able to view and change IFLA_BRPORT_PROXYARP_WIFI port
attribute.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge_slave: add support for IFLA_BRPORT_PROXYARP
Nikolay Aleksandrov [Tue, 16 Feb 2016 15:08:52 +0000 (16:08 +0100)] 
iplink: bridge_slave: add support for IFLA_BRPORT_PROXYARP

Add support to be able to view and change IFLA_BRPORT_PROXYARP port
attribute.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge_slave: export read-only values
Nikolay Aleksandrov [Tue, 16 Feb 2016 15:08:51 +0000 (16:08 +0100)] 
iplink: bridge_slave: export read-only values

Export all the read-only values that get returned about a bridge port
such as the timers, the ids, designated_port and cost,
topology_change_ack and config_pending. For the bridge ids the
br_dump_bridge_id function is exported from iplink_bridge.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agonetns: Fix an off-by-one strcpy() in netns_map_add().
Nicolas Cavallari [Fri, 12 Feb 2016 13:47:39 +0000 (14:47 +0100)] 
netns: Fix an off-by-one strcpy() in netns_map_add().

netns_map_add() does a malloc of (sizeof (struct nsid_cache) +
strlen(name)) and then proceed with strcpy() of name into the
zero-length member at the end of the nsid_cache structure.  The
nul-terminator is written outside of the allocated memory and may
overwrite the allocator's internal structure.

This can trigger a segmentation fault on i386 uclibc with names of size 8:
after the corruption occurs, the call to closedir() on netns_map_init()
crashes while freeing the DIR structure.

Here is the relevant valgrind output:

==1251== Memcheck, a memory error detector
==1251== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1251== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright
info
==1251== Command: ./ip netns
==1251==
==1251== Invalid write of size 1
==1251==    at 0x4011975: strcpy (in
/usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==1251==    by 0x8058B00: netns_map_add (ipnetns.c:181)
==1251==    by 0x8058E2A: netns_map_init (ipnetns.c:226)
==1251==    by 0x8058E79: do_netns (ipnetns.c:776)
==1251==    by 0x804D9FF: do_cmd (ip.c:110)
==1251==    by 0x804D814: main (ip.c:300)

9 years agoRevert "tipc: add peer remove functionality"
Stephen Hemminger [Tue, 9 Feb 2016 18:51:32 +0000 (10:51 -0800)] 
Revert "tipc: add peer remove functionality"

This reverts commit f9dec657e4578d50a2432e5842d97c857faa6c2c.

Since this code is not in upstream kernel, it shouldn't be in iproute2

9 years agoip route: add mpls multipath support
Roopa Prabhu [Mon, 8 Feb 2016 00:28:16 +0000 (16:28 -0800)] 
ip route: add mpls multipath support

This patch adds support to add mpls multipath
routes.

example:
ip -f mpls route add 100 \
nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
9 years agoiplink: bond_slave: fix ad_actor/partner_oper_port_state output
Nikolay Aleksandrov [Mon, 8 Feb 2016 16:13:58 +0000 (17:13 +0100)] 
iplink: bond_slave: fix ad_actor/partner_oper_port_state output

It seems that I've made a mistake when I exported these, instead of a
space in the end I've put a newline character which is wrong and breaks
the single line output.

Fixes: 7d6bc3b87abad ("bonding: export 3ad actor and partner port state")
Reported-by: Sam Tannous <stannous@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for netfilter call attributes
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:39 +0000 (00:14 +0100)] 
iplink: bridge: add support for netfilter call attributes

This patch implements support for the IFLA_BR_NF_CALL_(IP|IP6|ARP)TABLES
attributes in iproute2 so it can change their values.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_STARTUP_QUERY_INTVL
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:38 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_STARTUP_QUERY_INTVL

This patch implements support for the IFLA_BR_MCAST_STARTUP_QUERY_INTVL
attribute in iproute2 so it can change the startup query interval.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_QUERY_RESPONSE_INTVL
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:37 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_QUERY_RESPONSE_INTVL

This patch implements support for the IFLA_BR_MCAST_QUERY_RESPONSE_INTVL
attribute in iproute2 so it can change the query response interval.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_QUERY_INTVL
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:36 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_QUERY_INTVL

This patch implements support for the IFLA_BR_MCAST_QUERY_INTVL attribute
in iproute2 so it can change the query interval.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_QUERIER_INTVL
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:35 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_QUERIER_INTVL

This patch implements support for the IFLA_BR_MCAST_QUERIER_INTVL
attribute in iproute2 so it can change the querier interval.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_MEMBERSHIP_INTVL
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:34 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_MEMBERSHIP_INTVL

This patch implements support for the IFLA_BR_MCAST_MEMBERSHIP_INTVL
attribute in iproute2 so it can change the membership interval.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_LAST_MEMBER_INTVL
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:33 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_LAST_MEMBER_INTVL

This patch implements support for the IFLA_BR_MCAST_LAST_MEMBER_INTVL
attribute in iproute2 so it can change the last member interval.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_STARTUP_QUERY_CNT
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:32 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_STARTUP_QUERY_CNT

This patch implements support for the IFLA_BR_MCAST_STARTUP_QUERY_CNT
attribute in iproute2 so it can change the startup query count.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_LAST_MEMBER_CNT
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:31 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_LAST_MEMBER_CNT

This patch implements support for the IFLA_BR_MCAST_LAST_MEMBER_CNT
attribute in iproute2 so it can change the last member count value.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_HASH_MAX
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:30 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_HASH_MAX

This patch implements support for the IFLA_BR_MCAST_HASH_MAX attribute
in iproute2 so it can change the maximum hashed entries.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_HASH_ELASTICITY
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:29 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_HASH_ELASTICITY

This patch implements support for the IFLA_BR_MCAST_HASH_ELASTICTITY
attribute in iproute2 so it can change the hash elasticity value.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_QUERIER
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:28 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_QUERIER

This patch implements support for the IFLA_BR_MCAST_QUERIER attribute
in iproute2 so it can toggle the mcast querier value.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_QUERY_USE_IFADDR
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:27 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_QUERY_USE_IFADDR

This patch implements support for the IFLA_BR_MCAST_QUERY_USE_IFADDR
attribute in iproute2 so it can toggle the multicast_query_use_ifaddr val.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_SNOOPING
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:26 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_SNOOPING

This patch implements support for the IFLA_BR_MCAST_SNOOPING attribute
in iproute2 so it can change the multicast snooping value.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_MCAST_ROUTER
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:25 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_MCAST_ROUTER

This patch implements support for the IFLA_BR_MCAST_ROUTER attribute
in iproute2 so it can change the multicast router value.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_VLAN_DEFAULT_PVID
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:24 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_VLAN_DEFAULT_PVID

This patch implements support for the IFLA_BR_VLAN_DEFAULT_PVID
attribute in iproute2 so it can change the default pvid.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_GROUP_ADDR
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:23 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_GROUP_ADDR

This patch implements support for the IFLA_BR_GROUP_ADDR attribute
in iproute2 so it can change the group address.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: add support for IFLA_BR_GROUP_FWD_MASK
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:22 +0000 (00:14 +0100)] 
iplink: bridge: add support for IFLA_BR_GROUP_FWD_MASK

This patch implements support for the IFLA_BR_GROUP_FWD_MASK attribute
in iproute2 so it can change the group forwarding mask.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: export read-only timers
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:21 +0000 (00:14 +0100)] 
iplink: bridge: export read-only timers

Netlink already provides hello_timer, tcn_timer, topology_change_timer
and gc_timer, so let's make them visible.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: export root_(port|path_cost), topology_change and change_detected
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:20 +0000 (00:14 +0100)] 
iplink: bridge: export root_(port|path_cost), topology_change and change_detected

Netlink already export these values, we just need to make them visible.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agoiplink: bridge: export bridge_id and designated_root
Nikolay Aleksandrov [Mon, 8 Feb 2016 23:14:19 +0000 (00:14 +0100)] 
iplink: bridge: export bridge_id and designated_root

Netlink returns the bridge_id and designated_root, we just need to
make them visible.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
9 years agobridge: support for static fdb entries
Roopa Prabhu [Wed, 27 Jan 2016 17:09:37 +0000 (09:09 -0800)] 
bridge: support for static fdb entries

There is no intuitive option to add static fdb entries today.
'temp' seems to have a side effect of adding
'static' fdb entries. But the name and intent
of 'temp' does not say anything about it being static.

example:
bridge fdb add operates as follows:

$bridge fdb add 00:01:02:03:04:05 dev eth0 master
$bridge fdb add 00:01:02:03:04:06 dev eth0 master temp
$bridge fdb add 00:01:02:03:04:07 dev eth0 master local

$bridge fdb show
00:01:02:03:04:05 dev eth0 permanent
00:01:02:03:04:06 dev eth0 static
00:01:02:03:04:07 dev eth0 permanent
00:01:02:03:04:08 dev eth0 <<== dynamic, ageable learned mac

This patch adds a new bridge fdb type 'static' which
makes sure NUD_NOARP and NUD_REACHABLE is set for static
entries. This effectively is nothing but what 'temp'
does today. But the name 'temp' is misleading.

After the patch:
$bridge fdb add 00:01:02:03:04:06 dev eth0 master static

$bridge fdb show
00:01:02:03:04:06 dev eth0 static

'temp' could ideally be a dynamic mac that can age (ie just
NUD_REACHABLE). But, 'temp' sets 'NUD_NOARP' and 'NUD_REACHABLE'.
Too late to change 'temp' now. But, we are thinking of introduing a
'dynamic' keyword after this patch that only sets NUD_REACHABLE.

Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
9 years agotc, bpf: use bind/type macros from gelf
Daniel Borkmann [Sun, 7 Feb 2016 01:11:53 +0000 (02:11 +0100)] 
tc, bpf: use bind/type macros from gelf

Don't reimplement them and rather use the macros from the gelf header,
that is, GELF_ST_BIND()/GELF_ST_TYPE().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
9 years agotc, bpf: give some more hints wrt false relos
Daniel Borkmann [Sun, 7 Feb 2016 01:11:52 +0000 (02:11 +0100)] 
tc, bpf: give some more hints wrt false relos

Provide some more hints to the user/developer when relos have been found
that don't point to ld64 imm instruction. Ran couple of times into relos
generated by clang [1], where the compiler tried to uninline inlined
functions with eBPF and emitted BPF_JMP | BPF_CALL opcodes. If this seems
the case, give a hint that the user should do a work-around to use
always_inline annotation.

  [1] https://llvm.org/bugs/show_bug.cgi?id=26243#c3

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
9 years agotc, bpf: improve verifier logging
Daniel Borkmann [Sun, 7 Feb 2016 01:11:51 +0000 (02:11 +0100)] 
tc, bpf: improve verifier logging

With a bit larger, branchy eBPF programs f.e. already ~BPF_MAXINSNS/7 in
size, it happens rather quickly that bpf(2) rejects also valid programs
when only the verifier log buffer size we have in tc is too small.

Change that, so by default we don't do any logging, and only in error
case we retry with logging enabled. If we should fail providing a
reasonable dump of the verifier analysis, retry few times with a larger
log buffer so that we can at least give the user a chance to debug the
program.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
9 years agotc, bpf, examples: further bpf_api improvements
Daniel Borkmann [Sun, 7 Feb 2016 01:11:50 +0000 (02:11 +0100)] 
tc, bpf, examples: further bpf_api improvements

Add a couple of improvements to tc's BPF api, that facilitate program
development.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
9 years agogeneve: add support for lwt tunnel creation and dst port selection
Paolo Abeni [Thu, 28 Jan 2016 13:48:55 +0000 (14:48 +0100)] 
geneve: add support for lwt tunnel creation and dst port selection

This change add the ability to create lwt/flow based/externally
controlled geneve device and to select the udp destination port used
by a full geneve tunnel.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 years agotc: fix compilation with old gcc (< 4.6) (bis)
Nicolas Dichtel [Wed, 3 Feb 2016 08:25:00 +0000 (09:25 +0100)] 
tc: fix compilation with old gcc (< 4.6) (bis)

Commit 8f80d450c3cb ("tc: fix compilation with old gcc (< 4.6)") was reverted
to ease the merge of the net-next branch.

Here is the new version.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
9 years agoipmonitor: match user option 'all' before 'all-nsid'
Roopa Prabhu [Wed, 3 Feb 2016 00:53:40 +0000 (16:53 -0800)] 
ipmonitor: match user option 'all' before 'all-nsid'

'ip monitor all' is broken on older kernels.
This patch fixes 'ip monitor all' to match
'all' and not 'all-nsid'.

It moves parsing arg 'all-nsid' to after parsing
'all'.

Before:
$ip monitor all
NETLINK_LISTEN_ALL_NSID: Protocol not available

After:
$ip monitor all
[NEIGH]Deleted 10.0.0.1 dev eth1 lladdr c4:54:44:4f:b2:dd STALE

Fixes: 449b824ad196 ("ipmonitor: allows to monitor in several netns")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
9 years agotc, bpf: make sure relo is in relation with map section
Daniel Borkmann [Thu, 21 Jan 2016 23:46:28 +0000 (00:46 +0100)] 
tc, bpf: make sure relo is in relation with map section

Add a test that symbol from relocation entry is actually related
to map section and bail out with an error message if it's not the
case; in relation to [1].

  [1] https://llvm.org/bugs/show_bug.cgi?id=26243

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
9 years agoiproute2: fix building with musl
Gustavo Zacarias [Thu, 21 Jan 2016 18:19:48 +0000 (15:19 -0300)] 
iproute2: fix building with musl

We need limits.h for PATH_MAX, fixes:

rt_names.c:364:13: error: â€˜PATH_MAX’ undeclared (first use in this
function)

Signed-off-by: Gustavo Zacarias <gustavo@zacarias.com.ar>
9 years agoip-link: remove warning message
Zhang Shengju [Thu, 21 Jan 2016 02:23:49 +0000 (02:23 +0000)] 
ip-link: remove warning message

the warning was:
iproute.c:301:12: warning: 'val' may be used uninitialized in this
function [-Wmaybe-uninitialized]
   features &= ~RTAX_FEATURE_ECN;
            ^
iproute.c:575:10: note: 'val' was declared here
   __u32 val;
  ^

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
9 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger...
Stephen Hemminger [Tue, 2 Feb 2016 04:57:23 +0000 (15:57 +1100)] 
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2

9 years agoss: support closing inet sockets via SOCK_DESTROY.
Lorenzo Colitti [Fri, 8 Jan 2016 08:32:37 +0000 (17:32 +0900)] 
ss: support closing inet sockets via SOCK_DESTROY.

This patch adds a -K / --kill option to ss that attempts to
forcibly close matching sockets using SOCK_DESTROY.

Because ss typically prints sockets instead of acting on them,
and because the kernel only supports forcibly closing some types
of sockets, the output of -K is as follows:

- If closing the socket succeeds, the socket is printed.
- If the kernel does not support forcibly closing this type of
  socket (e.g., if it's a UDP socket, or a TIME_WAIT socket),
  the socket is silently skipped.
- If an error occurs (e.g., permission denied), the error is
  reported and ss exits.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
9 years agolibnetlink: don't print NETLINK_SOCK_DIAG errors in rtnl_talk
Lorenzo Colitti [Fri, 8 Jan 2016 08:32:36 +0000 (17:32 +0900)] 
libnetlink: don't print NETLINK_SOCK_DIAG errors in rtnl_talk

This change is a no-op, as currently no code uses rtnl_talk on
NETLINK_SOCK_DIAG_BY_FAMILY sockets. It is needed to suppress
spurious errors when using SOCK_DESTROY via rtnl_talk.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
9 years agoip-link: fix man page warnings
Thomas Faivre [Thu, 14 Jan 2016 17:10:20 +0000 (18:10 +0100)] 
ip-link: fix man page warnings

grff wrapper returns warnings when parsing the ip-link.8.in file.

How to reproduce:
$ man --warnings ip-link > /dev/null
`R' is a string (producing the registered sign), not a macro.
[...]

Signed-off-by: Thomas Faivre <thomas.faivre@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
9 years agovxlan: fix help and man text
Thomas Faivre [Thu, 14 Jan 2016 17:10:19 +0000 (18:10 +0100)] 
vxlan: fix help and man text

Options 'group' and 'remote' cannot take 'any' as value but 'local' can.

Signed-off-by: Thomas Faivre <thomas.faivre@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
9 years agotc, bpf: more header checks on loading elf
Daniel Borkmann [Tue, 12 Jan 2016 01:03:08 +0000 (02:03 +0100)] 
tc, bpf: more header checks on loading elf

eBPF llvm backend can support different BPF formats, make sure the object
we're trying to load matches with regards to endiannes and while at it, also
check for other attributes related to BPF ELFs.

  # llc --version
  LLVM (http://llvm.org/):
    LLVM version 3.8.0svn
    Optimized build.
    Built Jan  9 2016 (02:08:10).
    Default target: x86_64-unknown-linux-gnu
    Host CPU: ivybridge

    Registered Targets:
      bpf    - BPF (host endian)
      bpfeb  - BPF (big endian)
      bpfel  - BPF (little endian)
      [...]

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
9 years agotc, bpf: check section names and type everywhere
Daniel Borkmann [Tue, 12 Jan 2016 01:03:07 +0000 (02:03 +0100)] 
tc, bpf: check section names and type everywhere

When extracting sections, we better check for name and type. Noticed
that some llvm versions emit .strtab and .shstrtab (e.g. saw it on pre
3.7), while more recent ones only seem to emit .strtab. Thus, make sure
we get the right sections.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
9 years agotc, clsact: add clsact frontend
Daniel Borkmann [Tue, 12 Jan 2016 00:42:20 +0000 (01:42 +0100)] 
tc, clsact: add clsact frontend

Add the tc part for the kernel commit 1f211a1b929c ("net, sched: add
clsact qdisc"). Quoting example usage from that commit description:

  Example, adding qdisc:

  # tc qdisc add dev foo clsact
  # tc qdisc show dev foo
  qdisc mq 0: root
  qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
  qdisc clsact ffff: parent ffff:fff1

  Adding filters (deleting, etc works analogous by specifying ingress/egress):

  # tc filter add dev foo ingress bpf da obj bar.o sec ingress
  # tc filter add dev foo egress  bpf da obj bar.o sec egress
  # tc filter show dev foo ingress
  filter protocol all pref 49152 bpf
  filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
  # tc filter show dev foo egress
  filter protocol all pref 49152 bpf
  filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action

The ingress parent alias can also be used with ingress qdisc.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
9 years agotc, ingress: clean up ingress handling a bit
Daniel Borkmann [Tue, 12 Jan 2016 00:42:19 +0000 (01:42 +0100)] 
tc, ingress: clean up ingress handling a bit

Clean it up a bit, we can also get rid of some ugly ifdefs as in our case
TC_H_INGRESS is always defined.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
9 years agoupdate headers (post 4.4 merge window)
Stephen Hemminger [Mon, 18 Jan 2016 17:40:13 +0000 (09:40 -0800)] 
update headers (post 4.4 merge window)

9 years agoMerge branch 'net-next'
Stephen Hemminger [Mon, 18 Jan 2016 17:37:45 +0000 (09:37 -0800)] 
Merge branch 'net-next'

9 years agoRevert "tc: fix compilation with old gcc (< 4.6)"
Stephen Hemminger [Mon, 18 Jan 2016 17:37:38 +0000 (09:37 -0800)] 
Revert "tc: fix compilation with old gcc (< 4.6)"

This reverts commit 8f80d450c3cb0996d839996807b77ca28bd4da09.

9 years agotipc: add peer remove functionality
Richard Alpe [Tue, 5 Jan 2016 09:57:40 +0000 (10:57 +0100)] 
tipc: add peer remove functionality

This enables a user to remove an offline peer from the kernel data
structures. This could for example be useful when deliberately scaling
in peer nodes in a cloud environment.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
9 years agov4.4.0 v4.4.0
Stephen Hemminger [Mon, 11 Jan 2016 16:33:03 +0000 (08:33 -0800)] 
v4.4.0

9 years agoRevert "tipc: add peer remove functionality"
Stephen Hemminger [Mon, 11 Jan 2016 16:31:46 +0000 (08:31 -0800)] 
Revert "tipc: add peer remove functionality"

This reverts commit d4585a4bb120e2f60b088a7e934bf2ae4e6b5b68.
This commit is meant for later kernel.

9 years agotc: flower no need to specify the ethertype
Jamal Hadi Salim [Sun, 10 Jan 2016 19:56:31 +0000 (14:56 -0500)] 
tc: flower no need to specify the ethertype

since all tc classifiers are required to specify ethertype as part of grammar
By not allowing eth_type to be specified we remove contradiction for
example when a user specifies:
tc filter add ... priority xxx protocol ip flower eth_type ipv6
This patch removes that contradiction

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
9 years agotc: fix compilation with old gcc (< 4.6)
Julien Floret [Thu, 7 Jan 2016 13:03:13 +0000 (14:03 +0100)] 
tc: fix compilation with old gcc (< 4.6)

gcc < 4.6 does not handle C11 syntax for the static initialization of
anonymous struct/union, hence the following error:

tc_bpf.c:260: error: unknown field map_type specified in initializer

Signed-off-by: Julien Floret <julien.floret@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
9 years agoiplink: replace exit with return
Roopa Prabhu [Sun, 10 Jan 2016 00:02:12 +0000 (16:02 -0800)] 
iplink: replace exit with return

This patch replaces exits with returns in iplink
command. Helps to continue on errors when
invoked with ip -force -batch.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
9 years agotc: m_connmark: Fix help text
Phil Sutter [Wed, 6 Jan 2016 16:46:50 +0000 (17:46 +0100)] 
tc: m_connmark: Fix help text

When specifying a conntrack zone, the 'zone' keyword has to be used
before the actual zone index.

Signed-off-by: Phil Sutter <phil@nwl.cc>
9 years agoman: fix whatis for fq
Stephen Hemminger [Wed, 6 Jan 2016 18:29:06 +0000 (10:29 -0800)] 
man: fix whatis for fq

The FQ man page was not following whatis formatting rules.

9 years agotipc: add peer remove functionality
Richard Alpe [Tue, 5 Jan 2016 09:57:40 +0000 (10:57 +0100)] 
tipc: add peer remove functionality

This enables a user to remove an offline peer from the kernel data
structures. This could for example be useful when deliberately scaling
in peer nodes in a cloud environment.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
9 years agotipc: fix help text spelling error in node.c
Richard Alpe [Tue, 5 Jan 2016 09:57:39 +0000 (10:57 +0100)] 
tipc: fix help text spelling error in node.c

9 years agoman: iplink: document new addrgenmodes
Bjørn Mork [Mon, 4 Jan 2016 09:58:06 +0000 (10:58 +0100)] 
man: iplink: document new addrgenmodes

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
9 years agoiplink: support show and set of "addrgenmode random"
Bjørn Mork [Mon, 4 Jan 2016 09:58:05 +0000 (10:58 +0100)] 
iplink: support show and set of "addrgenmode random"

"random" is a new IPv6 addrgenmode, enabling "stable_secret" type
addresses with an auto-generated secret.

$ ip link set eth0 addrgenmode random

$ ip -d link show dev eth0
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:21:86:a3:25:7d brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode random

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
9 years agoiplink: support setting addrgenmode stable_secret
Bjørn Mork [Mon, 4 Jan 2016 09:58:04 +0000 (10:58 +0100)] 
iplink: support setting addrgenmode stable_secret

It is possible to switch to another addrgenmode after setting a
valid secret.  Allow switching back without reconfiguring the
secret for completeness.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
9 years agoupdate most kernel headers
Stephen Hemminger [Wed, 6 Jan 2016 17:14:29 +0000 (09:14 -0800)] 
update most kernel headers

still have issues with xtables

9 years agoUpdate to current iptables headers
Stephen Hemminger [Sun, 3 Jan 2016 23:14:27 +0000 (15:14 -0800)] 
Update to current iptables headers

Keep in sync with current iptables upstream

9 years agoadd coverity model file
Stephen Hemminger [Thu, 31 Dec 2015 02:06:12 +0000 (18:06 -0800)] 
add coverity model file

Track any coverity overrides for this project.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
9 years agolnstat: fix error handling
Stephen Hemminger [Thu, 31 Dec 2015 01:28:11 +0000 (17:28 -0800)] 
lnstat: fix error handling

Error handling was silent and had leaks.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
9 years agomonitor: fix file handle leak
Stephen Hemminger [Thu, 31 Dec 2015 01:19:04 +0000 (17:19 -0800)] 
monitor: fix file handle leak

In some cases passing file to monitor left file open.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
9 years agogenl: make string const
Stephen Hemminger [Thu, 31 Dec 2015 01:17:45 +0000 (17:17 -0800)] 
genl: make string const

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
9 years agoiproute2: ip-route.8.in: Add expires option for ip route
Hangbin Liu [Fri, 25 Dec 2015 03:12:16 +0000 (11:12 +0800)] 
iproute2: ip-route.8.in: Add expires option for ip route

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
9 years agoiproute2: ip-route.8.in: Add missing '[' before 'pref'
Hangbin Liu [Fri, 25 Dec 2015 03:12:15 +0000 (11:12 +0800)] 
iproute2: ip-route.8.in: Add missing '[' before 'pref'

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
9 years agoroute: allow routes to be configured with expire values
Hangbin Liu [Mon, 21 Dec 2015 08:29:36 +0000 (16:29 +0800)] 
route: allow routes to be configured with expire values

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
9 years agoMerge branch 'master' into net-next
Stephen Hemminger [Tue, 22 Dec 2015 05:37:21 +0000 (21:37 -0800)] 
Merge branch 'master' into net-next

9 years agoiptunnel: Fix compile error in ip/tunnel.c
Phil Sutter [Mon, 21 Dec 2015 19:42:56 +0000 (20:42 +0100)] 
iptunnel: Fix compile error in ip/tunnel.c

I repeatedly failed to get this right, so now I have to clean up my mess
afterwards.

Fixes: 7d6aadcd0a1dc ("ip{,6}tunnel: have a shared stats parser/printer")
Signed-off-by: Phil Sutter <phil@nwl.cc>
9 years agoip{,6}tunnel: have a shared stats parser/printer
Phil Sutter [Fri, 18 Dec 2015 10:58:06 +0000 (11:58 +0100)] 
ip{,6}tunnel: have a shared stats parser/printer

This has a slight side-effect of not aborting when /proc/net/dev is
malformed, but OTOH stats are not parsed for uninteresting interfaces.

Signed-off-by: Phil Sutter <phil@nwl.cc>
9 years agolwtunnel: implement support for ip6 encap
Paolo Abeni [Fri, 18 Dec 2015 09:50:38 +0000 (10:50 +0100)] 
lwtunnel: implement support for ip6 encap

Currently ip6 encap support for lwtunnel is missing.
This patch implement it, mostly duplicating the ipv4 parts.

Also be sure to insert a space after the encap type, when
showing lwtunnel, to avoid the tunnel type and the following
argument being merged into a single word.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 years agogre: add support for collect metadata flag
Paolo Abeni [Fri, 18 Dec 2015 09:50:37 +0000 (10:50 +0100)] 
gre: add support for collect metadata flag

This patch add support for IFLA_GRE_COLLECT_METADATA via the
'external' keyword to the gre link.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 years agovxlan: add support for collect metadata flag
Paolo Abeni [Fri, 18 Dec 2015 09:50:36 +0000 (10:50 +0100)] 
vxlan: add support for collect metadata flag

This patch add support for IFLA_VXLAN_COLLECT_METADATA via the
'external' keyword to the vxlan link.

Also enforce mutual exclusion between 'vni' and 'external'.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 years agoiproute: print addrgenmode stable_secret and fallback otherwise
Hannes Frederic Sowa [Wed, 16 Dec 2015 09:52:36 +0000 (10:52 +0100)] 
iproute: print addrgenmode stable_secret and fallback otherwise

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
9 years agobpf: minor fix in api and bpf_dump_error() usage
Daniel Borkmann [Mon, 14 Dec 2015 15:57:32 +0000 (16:57 +0100)] 
bpf: minor fix in api and bpf_dump_error() usage

Fix a whitespace in bpf_dump_error() usage, and also a missing closing
bracket in ntohl() macro for eBPF programs.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
9 years agoinclude: update kernel headers
Stephen Hemminger [Fri, 18 Dec 2015 01:21:53 +0000 (17:21 -0800)] 
include: update kernel headers

Current headers for net-next

9 years agoMerge branch 'master' into net-next
Stephen Hemminger [Fri, 18 Dec 2015 01:21:15 +0000 (17:21 -0800)] 
Merge branch 'master' into net-next

9 years agolwtunnel: fix argument parsing
Paolo Abeni [Tue, 15 Dec 2015 11:18:04 +0000 (12:18 +0100)] 
lwtunnel: fix argument parsing

Currently parse_encap_ip() does not update correctly argv/argc;
if multiple lwtunnel arguments are provided, the parsing fails after
the first one, i.e.

 ip route add 172.16.101.0/24 dev vxlan1 encap ip id 42 dst 192.168.255.1

fails with:

 Error: either "to" is duplicate, or "dst" is a garbage.

This commit addresses the issue, stepping to next argument at each iteration
of the parsing loop.

Fixes: 1e5293056a02 ("lwtunnel: Add encapsulation support to ip route")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
9 years agoroute: Fix printing of locked entries
Phil Sutter [Sat, 12 Dec 2015 13:09:48 +0000 (14:09 +0100)] 
route: Fix printing of locked entries

Commit 0f7543322c5fd ("route: ignore RTAX_HOPLIMIT of value -1")
accidentally reordered fprintf statements. This patch restores the
original ordering.

Fixes: 0f7543322c5fd ("route: ignore RTAX_HOPLIMIT of value -1")
Signed-off-by: Phil Sutter <phil@nwl.cc>
9 years agoip neigh: device is optional for proxy entries
Konstantin Khlebnikov [Mon, 30 Nov 2015 22:17:06 +0000 (01:17 +0300)] 
ip neigh: device is optional for proxy entries

Though dumping such entries crashes present kernels.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
9 years agoila: Add support for ILA lwtunnels
Tom Herbert [Mon, 30 Nov 2015 22:57:28 +0000 (14:57 -0800)] 
ila: Add support for ILA lwtunnels

This patch:
 - Adds a utility function for parsing a 64 bit address
 - Adds a utility function for converting a 64 bit address to ASCII
 - Adds and ILA encap type in lwt tunnels

Signed-off-by: Tom Herbert <tom@herbertland.com>
9 years agoexamples, bpf: further improve examples
Daniel Borkmann [Tue, 1 Dec 2015 23:25:36 +0000 (00:25 +0100)] 
examples, bpf: further improve examples

Improve example files further and add a more generic set of possible
helpers for them that can be used.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
9 years agoMerge branch 'master' into net-next
Stephen Hemminger [Thu, 10 Dec 2015 16:56:18 +0000 (08:56 -0800)] 
Merge branch 'master' into net-next

9 years agoip: fix format string when reading statistics
Stephen Hemminger [Thu, 10 Dec 2015 16:52:10 +0000 (08:52 -0800)] 
ip: fix format string when reading statistics

The tunnel code was doing sscanf(buf, "%ld", &x) where x was unsigned
long.

9 years agotc.8: Fix reference to tc-tcindex.8
Phil Sutter [Thu, 10 Dec 2015 12:24:51 +0000 (13:24 +0100)] 
tc.8: Fix reference to tc-tcindex.8

Just a typo there, it's spelled correctly in SEE ALSO section..

Signed-off-by: Phil Sutter <phil@nwl.cc>
9 years agovrf: Add support for table names
David Ahern [Tue, 8 Dec 2015 20:24:44 +0000 (12:24 -0800)] 
vrf: Add support for table names

Currently, the table id for VRF devices requires an integer. Convert
it to use rtnl_rttable_a2n which handles table names from the iproute2
directory.

This also fixes a bug in the original commit where table name are not
properly handled.

Fixes: 15faa0a30bed ("add support for VRF device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
9 years agolibnetlink: don't confuse variables in rtnl_talk()
Nicolas Dichtel [Thu, 3 Dec 2015 16:13:48 +0000 (17:13 +0100)] 
libnetlink: don't confuse variables in rtnl_talk()

There is two variables named 'len' in rtnl_talk. In fact, commit
c079e121a73a didn't work. For example, it was possible to trigger
a seg fault with this command:
$ ip link set gre2 type ip6gre hoplimit 32

Let's rename the argument len to maxlen.

Fixes: c079e121a73a ("libnetlink: add size argument to rtnl_talk")
Reported-by: Thomas Faivre <thomas.faivre@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
9 years agoroute: ignore RTAX_HOPLIMIT of value -1
Phil Sutter [Wed, 2 Dec 2015 12:50:22 +0000 (13:50 +0100)] 
route: ignore RTAX_HOPLIMIT of value -1

Older kernels use -1 internally as indicator to use the sysctl default,
but they still export the setting. Newer kernels use 0 to indicate that
(which is why the conversion from -1 to 0 was done here), but they also
stopped exporting the value. Since the meaning of -1 is clear, treat it
equally like default on newer kernels (which is to not print anything).

Signed-off-by: Phil Sutter <phil@nwl.cc>
9 years agoiptunnel: cleanup code
Stephen Hemminger [Sun, 29 Nov 2015 20:05:39 +0000 (12:05 -0800)] 
iptunnel: cleanup code

Make iptunnel pass checkpatch (mostly).

9 years agoip_tunnel: determine tunnel address family from the tunnel type
Konstantin Shemyak [Thu, 26 Nov 2015 16:22:05 +0000 (18:22 +0200)] 
ip_tunnel: determine tunnel address family from the tunnel type

On 24.11.2015 02:26, Stephen Hemminger wrote:
> On Thu, 12 Nov 2015 21:10:08 +0000
> Konstantin Shemyak <konstantin@shemyak.com> wrote:
>
>> When creating an IP tunnel over IPv6, the address family must be passed in
>> the option, e.g.
>>
>> ip -6 tunnel add mode ip6gre local 1::1 remote 2::2
>>
>> This makes it impossible to create both IPv4 and IPv6 tunnels in one batch.
>>
>> In fact the address family option is redundant here, as each tunnel mode is
>> relevant for only one address family.
>> The patch determines whether the applicable address family is AF_INET6
>> instead of the default AF_INET and makes the "-6" option unnecessary for
>> "ip tunnel add".
>>
>> Signed-off-by: Konstantin Shemyak <konstantin@shemyak.com>
>> ---
>>   ip/iptunnel.c                          | 26 ++++++++++++++++++++++++++
>>   testsuite/tests/ip/tunnel/add_tunnel.t | 14 ++++++++++++++
>>   2 files changed, 40 insertions(+)
>>   create mode 100755 testsuite/tests/ip/tunnel/add_tunnel.t
>>
>> diff --git a/ip/iptunnel.c b/ip/iptunnel.c
>> index 78fa988..7826a37 100644
>> --- a/ip/iptunnel.c
>> +++ b/ip/iptunnel.c
>> @@ -629,8 +629,34 @@ static int do_6rd(int argc, char **argv)
>>          return tnl_6rd_ioctl(cmd, medium, &ip6rd);
>>   }
>>
>> +static int tunnel_mode_is_ipv6(char *tunnel_mode) {
>> +       char *ipv6_modes[] = {
>> +               "ipv6/ipv6", "ip6ip6",
>> +               "vti6",
>> +               "ip/ipv6", "ipv4/ipv6", "ipip6", "ip4ip6",
>> +               "ip6gre", "gre/ipv6",
>> +               "any/ipv6", "any"
>> +       };
>> +       int i;
>> +
>> +       for (i = 0; i < sizeof(ipv6_modes) / sizeof(char *); i++) {
>> +               if (strcmp(ipv6_modes[i], tunnel_mode) == 0)
>> +                       return 1;
>> +       }
>> +       return 0;
>> +}
>> +
>
> The ipv6_modes table should be static const.

Thank you for the note! attached the corrected patch.

> Also is it possible to use strstr for ipv6 and ip6 or even strchr(tunnel_mode, '6')
> to simplify this?

There is IPv6 tunnel mode 'any', and IPv4 tunnel mode 'ipv6/ip' (aka
'sit'). It looks to me that attempts to find some substring match
would not make the code much shorter, but definitely less readable.

Konstantin Shemyak.

>From 42d27db0055c3a114fe6eb86d680bef9ec098ad4 Mon Sep 17 00:00:00 2001
From: Konstantin Shemyak <konstantin@shemyak.com>
Date: Thu, 12 Nov 2015 20:52:02 +0200
Subject: [PATCH] Tunnel address family is determined from the tunnel mode

When the tunnel mode already tells the IP address family, "ip tunnel"
command determines it and does not require option "-4"/"-6" to be passed.

This makes possible creating both IPv4 and IPv6 tunnels in one batch.

Signed-off-by: Konstantin Shemyak <konstantin@shemyak.com>
9 years ago{f,m}_bpf: add more example code
Daniel Borkmann [Thu, 26 Nov 2015 14:38:46 +0000 (15:38 +0100)] 
{f,m}_bpf: add more example code

I've added three examples to examples/bpf/ that demonstrate how one can
implement eBPF tail calls in tc with f.e. multiple levels of nesting.
That should act as a good starting point, but also as test cases for the
ELF loader and kernel. A real test suite for {f,m,e}_bpf is still to be
developed in future work.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
9 years ago{f,m}_bpf: allow updates on program arrays
Daniel Borkmann [Thu, 26 Nov 2015 14:38:45 +0000 (15:38 +0100)] 
{f,m}_bpf: allow updates on program arrays

Since we have all infrastructure in place now, allow atomic live updates
on program arrays. This can be very useful e.g. in case programs that are
being tail-called need to be replaced, f.e. when classifier functionality
needs to be changed, new protocols added/removed during runtime, etc.

Thus, provide a way for in-place code updates, minimal example: Given is
an object file cls.o that contains the entry point in section 'classifier',
has a globally pinned program array 'jmp' with 2 slots and id of 0, and
two tail called programs under section '0/0' (prog array key 0) and '0/1'
(prog array key 1), the section encoding for the loader is <id/key>.
Adding the filter loads everything into cls_bpf:

  tc filter add dev foo parent ffff: bpf da obj cls.o

Now, the program under section '0/1' needs to be replaced with an updated
version that resides in the same section (also full path to tc's subfolder
of the mount point can be passed, e.g. /sys/fs/bpf/tc/globals/jmp):

  tc exec bpf graft m:globals/jmp obj cls.o sec 0/1

In case the program resides under a different section 'foo', it can also
be injected into the program array like:

  tc exec bpf graft m:globals/jmp key 1 obj cls.o sec foo

If the new tail called classifier program is already available as a pinned
object somewhere (here: /sys/fs/bpf/tc/progs/parser), it can be injected
into the prog array like:

  tc exec bpf graft m:globals/jmp key 1 fd m:progs/parser

In the kernel, the program on key 1 is being atomically replaced and the
old one's refcount dropped.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
9 years ago{f, m}_bpf: allow for user-defined object pinnings
Daniel Borkmann [Thu, 26 Nov 2015 14:38:44 +0000 (15:38 +0100)] 
{f, m}_bpf: allow for user-defined object pinnings

The recently introduced object pinning can be further extended in order
to allow sharing maps beyond tc namespace. F.e. maps that are being pinned
from tracing side, can be accessed through this facility as well.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
9 years ago{f, m}_bpf: check map attributes when fetching as pinned
Daniel Borkmann [Thu, 26 Nov 2015 14:38:43 +0000 (15:38 +0100)] 
{f, m}_bpf: check map attributes when fetching as pinned

Make use of the new show_fdinfo() facility and verify that when a
pinned map is being fetched that its basic attributes are the same
as the map we declared from the ELF file. I.e. when placed into the
globalns, collisions could occur. In such a case warn the user and
bail out.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>