1 From foo@baz Fri Mar 15 20:48:31 PDT 2019
2 From: Daniel Borkmann <daniel@iogearbox.net>
3 Date: Wed, 20 Feb 2019 00:15:30 +0100
4 Subject: ipvlan: disallow userns cap_net_admin to change global mode/flags
6 From: Daniel Borkmann <daniel@iogearbox.net>
8 [ Upstream commit 7cc9f7003a969d359f608ebb701d42cafe75b84a ]
10 When running Docker with userns isolation e.g. --userns-remap="default"
11 and spawning up some containers with CAP_NET_ADMIN under this realm, I
12 noticed that link changes on ipvlan slave device inside that container
13 can affect all devices from this ipvlan group which are in other net
14 namespaces where the container should have no permission to make changes
15 to, such as the init netns, for example.
17 This effectively allows to undo ipvlan private mode and switch globally to
18 bridge mode where slaves can communicate directly without going through
19 hostns, or it allows to switch between global operation mode (l2/l3/l3s)
20 for everyone bound to the given ipvlan master device. libnetwork plugin
21 here is creating an ipvlan master and ipvlan slave in hostns and a slave
22 each that is moved into the container's netns upon creation event.
28 8: cilium_host@bond0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
29 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
30 ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
31 inet 10.41.0.1/32 scope link cilium_host
32 valid_lft forever preferred_lft forever
35 * Spawn container & change ipvlan mode setting inside of it:
37 # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf
38 9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c
40 # docker exec -ti client ip -d a
42 10: cilium0@if4: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
43 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
44 ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
45 inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
46 valid_lft forever preferred_lft forever
48 # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2
50 # docker exec -ti client ip -d a
52 10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
53 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
54 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
55 inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
56 valid_lft forever preferred_lft forever
58 * In hostns (mode switched to l2):
62 8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
63 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
64 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
65 inet 10.41.0.1/32 scope link cilium_host
66 valid_lft forever preferred_lft forever
69 Same l3 -> l2 switch would also happen by creating another slave inside
70 the container's network namespace when specifying the existing cilium0
71 link to derive the actual (bond0) master:
73 # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2
75 # docker exec -ti client ip -d a
77 2: cilium1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
78 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
79 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
80 10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
81 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
82 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
83 inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
84 valid_lft forever preferred_lft forever
90 8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
91 link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
92 ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
93 inet 10.41.0.1/32 scope link cilium_host
94 valid_lft forever preferred_lft forever
97 One way to mitigate it is to check CAP_NET_ADMIN permissions of
98 the ipvlan master device's ns, and only then allow to change
99 mode or flags for all devices bound to it. Above two cases are
100 then disallowed after the patch.
102 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
103 Acked-by: Mahesh Bandewar <maheshb@google.com>
104 Signed-off-by: David S. Miller <davem@davemloft.net>
105 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
107 drivers/net/ipvlan/ipvlan_main.c | 9 ++++++++-
108 1 file changed, 8 insertions(+), 1 deletion(-)
110 --- a/drivers/net/ipvlan/ipvlan_main.c
111 +++ b/drivers/net/ipvlan/ipvlan_main.c
112 @@ -463,7 +463,12 @@ static int ipvlan_nl_changelink(struct n
113 struct ipvl_port *port = ipvlan_port_get_rtnl(ipvlan->phy_dev);
116 - if (data && data[IFLA_IPVLAN_MODE]) {
119 + if (!ns_capable(dev_net(ipvlan->phy_dev)->user_ns, CAP_NET_ADMIN))
122 + if (data[IFLA_IPVLAN_MODE]) {
123 u16 nmode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
125 err = ipvlan_set_port_mode(port, nmode);
126 @@ -530,6 +535,8 @@ static int ipvlan_link_new(struct net *s
127 struct ipvl_dev *tmp = netdev_priv(phy_dev);
129 phy_dev = tmp->phy_dev;
130 + if (!ns_capable(dev_net(phy_dev)->user_ns, CAP_NET_ADMIN))
132 } else if (!netif_is_ipvlan_port(phy_dev)) {
133 err = ipvlan_port_create(phy_dev);