Jozsef Kadlecsik [Sat, 22 Feb 2020 10:24:20 +0000 (11:24 +0100)]
netfilter: ipset: Fix forceadd evaluation path
When the forceadd option is enabled, the hash:* types should find and replace
the first entry in the bucket with the new one if there are no reuseable
(deleted or timed out) entries. However, the position index was just not set
to zero and remained the invalid -1 if there were no reuseable entries.
Reported-by: syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Serhey Popovych [Fri, 31 Jan 2020 16:28:34 +0000 (18:28 +0200)]
ip_set: Include kernel header instead of UAPI
This header is used to build kernel modules not userspace thus it is
correct to include linux/in.h kernel variant and not UAPI.
This fixes build on old and not widely supported systems like RHEL6 and
Debian GNU/Linux 7 (wheezy) before headers split to UAPI and kernel.
Fixes: 62d787ba5e66 ("netfilter: added missing includes to a number of header-files.") Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports
In the case of huge hash:* types of sets, due to the single spinlock of
a set the processing of the whole set under spinlock protection could take
too long.
There were four places where the whole hash table of the set was processed
from bucket to bucket under holding the spinlock:
- During resizing a set, the original set was locked to exclude kernel side
add/del element operations (userspace add/del is excluded by the
nfnetlink mutex). The original set is actually just read during the
resize, so the spinlocking is replaced with rcu locking of regions.
However, thus there can be parallel kernel side add/del of entries.
In order not to loose those operations a backlog is added and replayed
after the successful resize.
- Garbage collection of timed out entries was also protected by the spinlock.
In order not to lock too long, region locking is introduced and a single
region is processed in one gc go. Also, the simple timer based gc running
is replaced with a workqueue based solution. The internal book-keeping
(number of elements, size of extensions) is moved to region level due to
the region locking.
- Adding elements: when the max number of the elements is reached, the gc
was called to evict the timed out entries. The new approach is that the gc
is called just for the matching region, assuming that if the region
(proportionally) seems to be full, then the whole set does. We could scan
the other regions to check every entry under rcu locking, but for huge
sets it'd mean a slowdown at adding elements.
- Listing the set header data: when the set was defined with timeout
support, the garbage collector was called to clean up timed out entries
to get the correct element numbers and set size values. Now the set is
scanned to check non-timed out entries, without actually calling the gc
for the whole set.
Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
SOFTIRQ-unsafe lock order issues during working on the patch.
Reported-by: syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com Reported-by: syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com Reported-by: syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Jozsef Kadlecsik [Sat, 25 Jan 2020 17:55:06 +0000 (18:55 +0100)]
netfilter: ipset: fix suspicious RCU usage in find_set_and_id
find_set_and_id() is called when the NFNL_SUBSYS_IPSET mutex is held.
However, in the error path there can be a follow-up recvmsg() without
the mutex held. Use the start() function of struct netlink_dump_control
instead of dump() to verify and report if the specified set does not
exist.
Thanks to Pablo Neira Ayuso for helping me to understand the subleties
of the netlink protocol.
Jozsef Kadlecsik [Sun, 19 Jan 2020 11:04:13 +0000 (12:04 +0100)]
netfilter: ipset: use bitmap infrastructure completely
The bitmap allocation did not use full unsigned long sizes
when calculating the required size and that was triggered by KASAN
as slab-out-of-bounds read in several places. The patch fixes all
of them.
Cong Wang [Fri, 10 Jan 2020 19:53:08 +0000 (11:53 -0800)]
netfilter: fix a use-after-free in mtype_destroy()
map->members is freed by ip_set_free() right before using it in
mtype_ext_cleanup() again. So we just have to move it down.
Reported-by: syzbot+4c3cc6dbe7259dbf9054@syzkaller.appspotmail.com Fixes: 40cd63bf33b2 ("netfilter: ipset: Support extensions which need a per data destroy function") Cc: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Serhey Popovych [Fri, 29 Nov 2019 09:21:34 +0000 (11:21 +0200)]
ip_set: Pass init_net when @net is missing in match check params data structure
It is better to restrict ipsets to default network namespace on old
kernels that does not contain @net parameter in @struct xt_mtchk_param
(i.e. ones prior to commit a83d8e8d099f ("netfilter: xtables:
add struct xt_mtchk_param::net"), tag v2.6.34) instead of panicing
on them.
Found and tested on RHEL 6 with 2.6.32 kernels.
Fixes: 90e279db0cf5 ("Add more compatibility checkings to support older kernel releases") Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Serhey Popovych [Fri, 29 Nov 2019 09:21:33 +0000 (11:21 +0200)]
netfilter: xt_set: Do not restrict --map-set to the mangle table
While mangle table is primary place for packet modification setting
mark, traffic class priority or hardware NIC queue can be done in any
table with exception similar to using mark in policy-based routing
setups (configured with ip-rule(8)) should be done before routing
happens (i.e. in PREROUTING chain that usable in mangle or raw tables
only).
There is no such restriction in MARK target used to set packet mark and
CLASSIFY target used to set traffic class priority. Both are free to use
in any table. There is no known target that can modify hardware queue
for packet.
This helps in keeping filtering and packet modification rules together
in filter table.
Tested with rule in filter table with SET target using --map-prio and
HTB for scheduling packets at egress.
Serhey Popovych [Fri, 29 Nov 2019 09:21:32 +0000 (11:21 +0200)]
em_ipset: Build on old kernels
Make sure TCF_EM_IPSET defined and corresponds to current upstream value
if not defined in target kernel. You need iproute2 version that supports
em_ipset to communicate correctly. Include ip_set_compat.h after
pkt_cls.h to prevent TCF_EM_IPSET redefine error.
Detect skb->iif => skb->skb_iif rename after commit 8964be4a9a5c ("net:
rename skb->iif to skb->skb_iif").
Add dev_get_by_index_rcu() define pointing to __dev_get_by_index() to
build on RHEL6 kernels with explicit note that this may not work on all
architectures.
Always build em_ipset regardless of CONFIG_NET_EMATCH_IPSET option.
Serhey Popovych [Fri, 29 Nov 2019 09:21:31 +0000 (11:21 +0200)]
compat: Use skb_vlan_tag_present() instead of vlan_tx_tag_present()
Since RHEL6 provides it as preprocessor define and does not provide
vlan_tx_tag_present(). Add defines in case of vlan_tx_tag_present()
isn't available to back tc_skb_protocol() to old behaviour before
commit d8b9605d2697 ("net: sched: fix skb->protocol use in case
of accelerated vlan path").
Serhey Popovych [Fri, 29 Nov 2019 09:21:30 +0000 (11:21 +0200)]
configure.ac: Support building with old autoconf 2.63
This version found on RHEL6 making autoreconf fail with following error:
configure.ac:61: error: possibly undefined macro: AS_VAR_COPY
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation.
Apply fix from https://github.com/gdnsd/gdnsd/issues/85 to fix problem.
Serhey Popovych [Fri, 29 Nov 2019 09:21:29 +0000 (11:21 +0200)]
configure.ac: Build on kernels without skb->vlan_proto correctly
Support for EtherType other than ETH_P_8021Q for VLAN header introduced
with commit 86a9bad3ab6b ("net: vlan: add protocol argument to packet
tagging functions") in upstream kernel since v3.10.
To support build on older kernels check for ->vlan_proto presence in
@struct sk_buff and return htons(ETH_P_8021Q) when it is missing.
Serhey Popovych [Fri, 29 Nov 2019 09:21:27 +0000 (11:21 +0200)]
configure.ac: Better match for ipv6_skip_exthdr() frag_offp arg presence
On older kernels (i.e. ones before commit 5c3a0fd7d0fc ("ip*.h: Remove
extern from function prototypes") in upstream since v3.13) we fail to
match number of arguments ipv6_skip_exthdr() correctly. Configure
chooses 3 args, while function has actually 4 args.
This happens because on these kernels tab (\t) is used for intendation
between function result type and function name.
Fix by matching either space for kernels with mentioned change or tab
for older kernels to select number of arguments correctly.
Fix nla_policies to fully support NL_VALIDATE_STRICT
Since v5.2 (commit "netlink: re-add parse/validate functions in
strict mode") NL_VALIDATE_STRICT is enabled. Fix the ipset nla_policies
which did not support strict mode and thus the corresponding ipset
commands failed.
Thomas Gleixner [Thu, 31 Oct 2019 17:57:52 +0000 (18:57 +0100)]
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kristian Evensen [Thu, 26 Sep 2019 10:06:45 +0000 (12:06 +0200)]
ipset: Add wildcard support to net,iface
The net,iface equal functions currently compares the full interface
names. In several cases, wildcard (or prefix) matching is useful. For
example, when converting a large iptables rule-set to make use of ipset,
I was able to significantly reduce the number of set elements by making
use of wildcard matching.
Wildcard matching is enabled by adding "wildcard" when adding an element
to a set. Internally, this causes the IPSET_FLAG_IFACE_WILDCARD-flag to
be set. When this flag is set, only the initial part of the interface
name is used for comparison.
Wildcard matching is done per element and not per set, as there are many
cases where mixing wildcard and non-wildcard elements are useful. This
means that is up to the user to handle (avoid) overlapping interface
names.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Stefano Brivio [Thu, 10 Oct 2019 17:18:14 +0000 (19:18 +0200)]
ipset: Copy the right MAC address in hash:ip,mac IPv6 sets
Same as commit 1b4a75108d5b ("netfilter: ipset: Copy the right MAC
address in bitmap:ip,mac and hash:ip,mac sets"), another copy and paste
went wrong in commit 8cc4ccf58379 ("netfilter: ipset: Allow matching on
destination MAC address for mac and ipmac sets").
When I fixed this for IPv4 in 1b4a75108d5b, I didn't realise that
hash:ip,mac sets also support IPv6 as family, and this is covered by a
separate function, hash_ipmac6_kadt().
In hash:ip,mac sets, the first dimension is the IP address, and the
second dimension is the MAC address: check the IPSET_DIM_TWO_SRC flag
in flags while deciding which MAC address to copy, destination or
source.
This way, mixing source and destination matches for the two dimensions
of ip,mac hash type works as expected, also for IPv6. With this setup:
ip netns add A
ip link add veth1 type veth peer name veth2 netns A
ip addr add 2001:db8::1/64 dev veth1
ip -net A addr add 2001:db8::2/64 dev veth2
ip link set veth1 up
ip -net A link set veth2 up
dst=$(ip netns exec A cat /sys/class/net/veth2/address)
ip netns exec A ipset create test_hash hash:ip,mac family inet6
ip netns exec A ipset add test_hash 2001:db8::1,${dst}
ip netns exec A ip6tables -A INPUT -p icmpv6 --icmpv6-type 135 -j ACCEPT
ip netns exec A ip6tables -A INPUT -m set ! --match-set test_hash src,dst -j DROP
ipset now correctly matches a test packet:
# ping -c1 2001:db8::2 >/dev/null
# echo $?
0
Reported-by: Chen, Yi <yiche@redhat.com> Fixes: 8cc4ccf58379 ("netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Jeremy Sowden [Thu, 3 Oct 2019 19:56:03 +0000 (20:56 +0100)]
netfilter: ipset: move ip_set_comment functions from ip_set.h to ip_set_core.c.
Most of the functions are only called from within ip_set_core.c.
The exception is ip_set_init_comment. However, this is too complex to
be a good candidate for a static inline function. Move it to
ip_set_core.c, change its linkage to extern and export it, leaving a
declaration in ip_set.h.
ip_set_comment_free is only used as an extension destructor, so change
its prototype to match and drop cast.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Jeremy Sowden [Thu, 3 Oct 2019 19:56:02 +0000 (20:56 +0100)]
netfilter: ipset: remove inline from static functions in .c files.
The inline function-specifier should not be used for static functions
defined in .c files since it bloats the kernel. Instead leave the
compiler to decide which functions to inline.
While a couple of the files affected (ip_set_*_gen.h) are technically
headers, they contain templates for generating the common parts of
particular set-types and so we treat them like .c files.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Jeremy Sowden [Wed, 7 Aug 2019 14:16:59 +0000 (15:16 +0100)]
netfilter: added missing includes to a number of header-files.
A number of netfilter header-files used declarations and definitions
from other headers without including them. Added include directives to
make those declarations and definitions available.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Of these the first three were not included anywhere else. The last,
ip_set_timeout.h, was included in a couple of other places, but defined
inline functions which call other inline functions defined in ip_set.h,
so ip_set.h had to be included before it.
Inlined all four into ip_set.h, and updated the other files that
included ip_set_timeout.h.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Dan Carpenter [Sat, 24 Aug 2019 14:49:55 +0000 (17:49 +0300)]
netfilter: ipset: Fix an error code in ip_set_sockfn_get()
The copy_to_user() function returns the number of bytes remaining to be
copied. In this code, that positive return is checked at the end of the
function and we return zero/success. What we should do instead is
return -EFAULT.
Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Shijie Luo reported that when stress-testing ipset with multiple concurrent
create, rename, flush, list, destroy commands, it can result
ipset <version>: Broken LIST kernel message: missing DATA part!
error messages and broken list results. The problem was the rename operation
was not properly handled with respect of listing. The patch fixes the issue.
Reported-by: Shijie Luo <luoshijie1@huawei.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Stefano Brivio [Mon, 24 Jun 2019 13:20:12 +0000 (15:20 +0200)]
ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets
In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the
KADT functions for sets matching on MAC addreses the copy of source or
destination MAC address depending on the configured match.
This was done correctly for hash:mac, but for hash:ip,mac and
bitmap:ip,mac, copying and pasting the same code block presents an
obvious problem: in these two set types, the MAC address is the second
dimension, not the first one, and we are actually selecting the MAC
address depending on whether the first dimension (IP address) specifies
source or destination.
Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags.
This way, mixing source and destination matches for the two dimensions
of ip,mac set types works as expected. With this setup:
ip netns add A
ip link add veth1 type veth peer name veth2 netns A
ip addr add 192.0.2.1/24 dev veth1
ip -net A addr add 192.0.2.2/24 dev veth2
ip link set veth1 up
ip -net A link set veth2 up
dst=$(ip netns exec A cat /sys/class/net/veth2/address)
ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16
ip netns exec A ipset add test_bitmap 192.0.2.1,${dst}
ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP
ip netns exec A ipset create test_hash hash:ip,mac
ip netns exec A ipset add test_hash 192.0.2.1,${dst}
ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP
ipset correctly matches a test packet:
# ping -c1 192.0.2.2 >/dev/null
# echo $?
0
Reported-by: Chen Yi <yiche@redhat.com> Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Stefano Brivio [Mon, 24 Jun 2019 13:20:11 +0000 (15:20 +0200)]
ipset: Actually allow destination MAC address for hash:ip,mac sets too
In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the
KADT check that prevents matching on destination MAC addresses for
hash:mac sets, but forgot to remove the same check for hash:ip,mac set.
Drop this check: functionality is now commented in man pages and there's
no reason to restrict to source MAC address matching anymore.
Reported-by: Chen Yi <yiche@redhat.com> Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Stefano Brivio [Sun, 26 May 2019 21:14:06 +0000 (23:14 +0200)]
ipset: Fix memory accounting for hash types on resize
If a fresh array block is allocated during resize, the current in-memory
set size should be increased by the size of the block, not replaced by it.
Before the fix, adding entries to a hash set type, leading to a table
resize, caused an inconsistent memory size to be reported. This becomes
more obvious when swapping sets with similar sizes:
# cat hash_ip_size.sh
#!/bin/sh
FAIL_RETRIES=10
tries=0
while [ ${tries} -lt ${FAIL_RETRIES} ]; do
ipset create t1 hash:ip
for i in `seq 1 4345`; do
ipset add t1 1.2.$((i / 255)).$((i % 255))
done
t1_init="$(ipset list t1|sed -n 's/Size in memory: \(.*\)/\1/p')"
ipset create t2 hash:ip
for i in `seq 1 4360`; do
ipset add t2 1.2.$((i / 255)).$((i % 255))
done
t2_init="$(ipset list t2|sed -n 's/Size in memory: \(.*\)/\1/p')"
ipset swap t1 t2
t1_swap="$(ipset list t1|sed -n 's/Size in memory: \(.*\)/\1/p')"
t2_swap="$(ipset list t2|sed -n 's/Size in memory: \(.*\)/\1/p')"
Florent Fourcot [Tue, 8 Jan 2019 19:37:33 +0000 (20:37 +0100)]
netfilter: ipset: remove useless memset() calls
One of the memset call is buggy: it does not erase full array, but only
pointer size.
Moreover, after a check, first step of nla_parse_nested/nla_parse is to
erase tb array as well. We can remove both calls safely.
Qian Cai [Sun, 2 Dec 2018 04:06:01 +0000 (23:06 -0500)]
netfilter/ipset: replace a strncpy() with strscpy()
To make overflows as obvious as possible and to prevent code from blithely
proceeding with a truncated string. This also has a side-effect to fix a
compilation warning when using GCC 8.2.1.
net/netfilter/ipset/ip_set_core.c: In function 'ip_set_sockfn_get':
net/netfilter/ipset/ip_set_core.c:2027:3: warning: 'strncpy' writing 32
bytes into a region of size 2 overflows the destination
[-Wstringop-overflow=]
Signed-off-by: Qian Cai <cai@gmx.us> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Pan Bian [Mon, 26 Nov 2018 10:42:10 +0000 (18:42 +0800)]
netfilter: ipset: do not call ipset_nest_end after nla_nest_cancel
In the error handling block, nla_nest_cancel(skb, atd) is called to
cancel the nest operation. But then, ipset_nest_end(skb, atd) is
unexpected called to end the nest operation. This patch calls the
ipset_nest_end only on the branch that nla_nest_cancel is
not called.
Fixes: 45040978c89("netfilter: ipset: Fix set:list type crash when
flush/dump set in parallel")
Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Serhey Popovych [Sun, 18 Nov 2018 19:08:23 +0000 (21:08 +0200)]
configure.ac: Fix build regression on RHEL/CentOS/SL
This was introduced with commit 0f82228387ae ("Use more robust awk
patterns to check for backward compatibility") on RHEL 7.3+ because
it's kernel contains backported upstream commit 633c9a840d0b
("netfilter: nfnetlink: avoid recurrent netns lookups in call_batch")
that introduces @net of @struct net type parameter matched with $GREP
after $AWK returns whole @struct nfnl_callback.
This causes incorrect IPSET_CBFN() prototype choose for ->call()
of @struct nfnl_callback producing following warnings during the build:
.../ipset/ip_set_core.c:2007:3: warning: initialization from
incompatible pointer type [enabled by default]
.call = ip_set_destroy,
^
../ipset/ip_set_core.c:2007:3: warning: (near initialization
for ‘ip_set_netlink_subsys_cb[3].call’) [enabled by default]
Fix by matching pattern to the end of first function pointer in
@struct nfnl_callback instead of end of struct.
Jozsef Kadlecsik [Tue, 30 Oct 2018 21:30:30 +0000 (22:30 +0100)]
Correct workaround in patch "Fix calling ip_set() macro at dumping"
As Pablo pointed out, in order to fix the bogus warnings, there's
no need for the non-useful rcu_read_lock/unlock dancing. Call
rcu_dereference_raw() instead, the ref_netlink protects the set.
Jozsef Kadlecsik [Mon, 22 Oct 2018 20:25:09 +0000 (22:25 +0200)]
Introduction of new commands and protocol version 7
Two new commands (IPSET_CMD_GET_BYNAME, IPSET_CMD_GET_BYINDEX) are
introduced. The new commands makes possible to eliminate the getsockopt
operation (in iptables set/SET match/target) and thus use only netlink
communication between userspace and kernel for ipset. With the new
protocol version, userspace can exactly know which functionality is
supported by the running kernel.
Both the kernel and userspace is fully backward compatible.
License cleanup: add SPDX license identifier to uapi header files with no license
Many user space API headers are missing licensing information, which
makes it hard for compliance tools to determine the correct license.
By default are files without license information under the default
license of the kernel, which is GPLV2. Marking them GPLV2 would exclude
them from being included in non GPLV2 code, which is obviously not
intended. The user space API headers fall under the syscall exception
which is in the kernels COPYING file:
NOTE! This copyright does *not* cover user programs that use kernel
services by normal system calls - this is merely considered normal use
of the kernel, and does *not* fall under the heading of "derived work".
otherwise syscall usage would not be possible.
Update the files which contain no license information with an SPDX
license identifier. The chosen identifier is 'GPL-2.0 WITH
Linux-syscall-note' which is the officially assigned identifier for the
Linux syscall exception. SPDX license identifiers are a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne. See the previous patch in this series for the
methodology of how this patch was researched.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Kirill Tkhai [Mon, 22 Oct 2018 18:46:53 +0000 (20:46 +0200)]
net: Convert ip_set_net_ops
These pernet_operations initialize and destroy
net_generic(net, ip_set_net_id)-related data.
Since ip_set is under CONFIG_IP_SET, it's easy
to watch drivers, which depend on this config.
All of them are in net/netfilter/ipset directory,
except of net/netfilter/xt_set.c. There are no
more drivers, which use ip_set, and all of
the above don't register another pernet_operations.
Also, there are is no indirect users, as header
file include/linux/netfilter/ipset/ip_set.h does
not define indirect users by something like this:
So, there are no more pernet operations, dereferencing
net_generic(net, ip_set_net_id).
ip_set_net_ops are OK to be executed in parallel
for several net, so we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Lance Roy [Wed, 3 Oct 2018 05:39:00 +0000 (22:39 -0700)]
netfilter: Replace spin_is_locked() with lockdep
lockdep_assert_held() is better suited to checking locking requirements,
since it won't get confused when someone else holds the lock. This is
also a step towards possibly removing spin_is_locked().
Signed-off-by: Lance Roy <ldr709@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: <netfilter-devel@vger.kernel.org> Cc: <coreteam@netfilter.org> Cc: <netdev@vger.kernel.org> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jozsef Kadlecsik [Thu, 18 Oct 2018 15:27:49 +0000 (17:27 +0200)]
Library reworked to support embedding ipset completely
The ipset library is rewritten/extended to support embedding
ipset, so that sets can fully be managed without calling the ipset
binary. The ipset binary relies completely on the new library.
The libipset.3 manpage was written about the library functions
and usage.
ip_set_create() and ip_set_net_init() attempt to allocate physically
contiguous memory for ip_set_list. If memory is fragmented, the
allocations could easily fail:
Stefano Brivio [Wed, 29 Aug 2018 17:51:12 +0000 (19:51 +0200)]
manpage: Add comment about matching on destination MAC address
Patch "ipset: Allow matching on destination MAC address for mac
and ipmac sets" allows the user to match on destination MAC
addresses in some selected cases. Add a comment to the manpage
detailing in which cases it makes sense.
Stefano Brivio [Fri, 17 Aug 2018 19:09:48 +0000 (21:09 +0200)]
ipset: Make invalid MAC address checks consistent
Set types bitmap:ipmac and hash:ipmac check that MAC addresses
are not all zeroes.
Introduce one missing check, and make the remaining ones
consistent, using is_zero_ether_addr() instead of comparing
against an array containing zeroes.
This was already done for hash:mac sets in commit 26c97c5d8dac
("netfilter: ipset: Use is_zero_ether_addr instead of static and
memcmp").
Stefano Brivio [Fri, 17 Aug 2018 19:09:47 +0000 (21:09 +0200)]
ipset: Allow matching on destination MAC address for mac and ipmac sets
There doesn't seem to be any reason to restrict MAC address
matching to source MAC addresses in set types bitmap:ipmac,
hash:ipmac and hash:mac. With this patch, and this setup:
ip netns add A
ip link add veth1 type veth peer name veth2 netns A
ip addr add 192.0.2.1/24 dev veth1
ip -net A addr add 192.0.2.2/24 dev veth2
ip link set veth1 up
ip -net A link set veth2 up
ip netns exec A ipset create test hash:mac
dst=$(ip netns exec A cat /sys/class/net/veth2/address)
ip netns exec A ipset add test ${dst}
ip netns exec A iptables -P INPUT DROP
ip netns exec A iptables -I INPUT -m set --match-set test dst -j ACCEPT
ipset will match packets based on destination MAC address:
Eric Westbrook [Tue, 28 Aug 2018 21:14:42 +0000 (15:14 -0600)]
netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net
Allow /0 as advertised for hash:net,port,net sets.
For "hash:net,port,net", ipset(8) says that "either subnet
is permitted to be a /0 should you wish to match port
between all destinations."
Make that statement true.
Before:
# ipset create cidrzero hash:net,port,net
# ipset add cidrzero 0.0.0.0/0,12345,0.0.0.0/0
ipset v6.34: The value of the CIDR parameter of the IP address is invalid
# ipset create cidrzero6 hash:net,port,net family inet6
# ipset add cidrzero6 ::/0,12345,::/0
ipset v6.34: The value of the CIDR parameter of the IP address is invalid
After:
# ipset create cidrzero hash:net,port,net
# ipset add cidrzero 0.0.0.0/0,12345,0.0.0.0/0
# ipset test cidrzero 192.168.205.129,12345,172.16.205.129
192.168.205.129,tcp:12345,172.16.205.129 is in set cidrzero.
# ipset create cidrzero6 hash:net,port,net family inet6
# ipset add cidrzero6 ::/0,12345,::/0
# ipset test cidrzero6 fe80::1,12345,ff00::1
fe80::1,tcp:12345,ff00::1 is in set cidrzero6.
Stefano Brivio [Wed, 22 Aug 2018 09:22:53 +0000 (11:22 +0200)]
Fix use-after-free in ipset_parse_name_compat()
When check_setname is used in ipset_parse_name_compat(), the
'str' and 'saved' macro arguments point in fact to the same
buffer. Free the 'saved' argument only after using it.
While at it, remove a useless NULL check on 'saved'.
Stefano Brivio [Wed, 22 Aug 2018 09:22:54 +0000 (11:22 +0200)]
Simplify return statement in ipset_mnl_query()
As we loop as long as 'ret' is greater than zero, and break only
if we get an error in mnl_cb_run2 (with ret <= 0), we can just
return ret without checking once more if it's greater than zero.
ipset: list:set: Decrease refcount synchronously on deletion and replace
Commit 45040978c899 ("netfilter: ipset: Fix set:list type crash
when flush/dump set in parallel") postponed decreasing set
reference counters to the RCU callback.
An 'ipset del' command can terminate before the RCU grace period
is elapsed, and if sets are listed before then, the reference
counter shown in userspace will be wrong:
# ipset create h hash:ip; ipset create l list:set; ipset add l
# ipset del l h; ipset list h
Name: h
Type: hash:ip
Revision: 4
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 88
References: 1
Number of entries: 0
Members:
# sleep 1; ipset list h
Name: h
Type: hash:ip
Revision: 4
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 88
References: 0
Number of entries: 0
Members:
Fix this by making the reference count update synchronous again.
As a result, when sets are listed, ip_set_name_byindex() might
now fetch a set whose reference count is already zero. Instead
of relying on the reference count to protect against concurrent
set renaming, grab ip_set_ref_lock as reader and copy the name,
while holding the same lock in ip_set_rename() as writer
instead.
Reported-by: Li Shuang <shuali@redhat.com> Fixes: 45040978c899 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Florent Fourcot [Mon, 4 Jun 2018 14:51:19 +0000 (16:51 +0200)]
netfilter: ipset: forbid family for hash:mac sets
Userspace `ipset` command forbids family option for hash:mac type:
ipset create test hash:mac family inet4
ipset v6.30: Unknown argument: `family'
However, this check is not done in kernel itself. When someone use
external netlink applications (pyroute2 python library for example), one
can create hash:mac with invalid family and inconsistant results from
userspace (`ipset` command cannot read set content anymore).
This patch enforce the logic in kernel, and forbids insertion of
hash:mac with a family set.
Since IP_SET_PROTO_UNDEF is defined only for hash:mac, this patch has no
impact on other hash:* sets
Jozsef Kadlecsik [Thu, 31 May 2018 16:45:21 +0000 (18:45 +0200)]
List timing out entries with "timeout 1" instead of zero timeout value
When listing sets with timeout support, there's a probability that
just timing out entries with "0" timeout value is listed/saved.
However when restoring the saved list, the zero timeout value means
permanent elelements.
The new behaviour is that timing out entries are listed with "timeout 1"
instead of zero.
Stefano Brivio [Tue, 8 May 2018 15:43:30 +0000 (17:43 +0200)]
tests/check_klog.sh: Try dmesg too, don't let shell terminate script
Some hosts might not use /var/log/kern.log for kernel messages,
so if we can't find a match there, try dmesg next.
If no matches are found, don't let the shell terminate the
script, so that we have a chance to try dmesg and actually echo
"no match!" if no matches are found: set +e before the setname
loop.
Inserting rule before one with SET target we get error with warning in
dmesg(1) output:
# iptables -A FORWARD -t mangle -j SET --map-set test src --map-prio
# iptables -I FORWARD 1 -t mangle -j ACCEPT
iptables: Invalid argument. Run `dmesg' for more information.
# dmesg |tail -n1
[268578.026643] mapping of prio or/and queue is allowed only from \
OUTPUT/FORWARD/POSTROUTING chains
Rather than checking for supported hook bits for SET target check for
unsupported one as done in all rest of matches and targets.
Parsing is attempted both for numbers and service names and
the temporary stored error message triggered to reset the state
parameters about the set. Reported by Yuri D'Elia.