netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
Kyle Zeng reported that there is a race between IPSET_CMD_ADD and IPSET_CMD_SWAP
in netfilter/ip_set, which can lead to the invocation of `__ip_set_put` on a wrong
`set`, triggering the `BUG_ON(set->ref == 0);` check in it.
The race is caused by using the wrong reference counter, i.e. the ref counter instead
of ref_netlink.
netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
The missing IP_SET_HASH_WITH_NET0 macro in ip_set_hash_netportnet can
lead to the use of wrong `CIDR_POS(c)` for calculating array offsets,
which can lead to integer underflow. As a result, it leads to slab
out-of-bound access.
This patch adds back the IP_SET_HASH_WITH_NET0 macro to
ip_set_hash_netportnet to address the issue.
Azeem Shaikh [Tue, 13 Jun 2023 00:34:37 +0000 (00:34 +0000)]
netfilter: ipset: Replace strlcpy with strscpy
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
Direct replacement is safe here since return value from all
callers of STRLCPY macro were ignored.
netfilter: ipset: Add schedule point in call_ad().
syzkaller found a repro that causes Hung Task [0] with ipset. The repro
first creates an ipset and then tries to delete a large number of IPs
from the ipset concurrently:
The first deleting thread hogs a CPU with nfnl_lock(NFNL_SUBSYS_IPSET)
held, and other threads wait for it to be released.
Previously, the same issue existed in set->variant->uadt() that could run
so long under ip_set_lock(set). Commit 5e29dc36bd5e ("netfilter: ipset:
Rework long task execution when adding/deleting entries") tried to fix it,
but the issue still exists in the caller with another mutex.
While adding/deleting many IPs, we should release the CPU periodically to
prevent someone from abusing ipset to hang the system.
Note we need to increment the ipset's refcnt to prevent the ipset from
being destroyed while rescheduling.
Nowadays, carp seems to be the preferred name for protocol 112. Simply
change the expected output for lack of idea for a backwards compatible
change which is not simply using another protocol.
Phil Sutter [Sun, 5 Mar 2023 11:46:49 +0000 (12:46 +0100)]
tests: xlate: Make test input valid
Make sure ipset at least accepts the test input by running it against
plain ipset once for sanity. This exposed two issues:
* Set 'hip5' doesn't have comment support, so add the commented elements
to 'hip6' instead (likely a typo).
* Set 'bip1' range 2.0.0.1-2.1.0.1 exceeds the max allowed for bitmap
sets. Reduce it accordingly.
Fixes: 7587d1c4b5465 ("tests: add tests ipset to nftables") Signed-off-by: Phil Sutter <phil@nwl.cc>
Phil Sutter [Sun, 5 Mar 2023 11:43:23 +0000 (12:43 +0100)]
tests: xlate: Test built binary by default
Testing the host's iptables-translate by default is unintuitive. Since
the ipset-translate symlink is created upon 'make install', add a local
symlink to the repository pointing at a built binary in src/. Using this
by default is consistent with the regular testsuite.
Sam James [Sat, 28 Jan 2023 18:25:33 +0000 (19:25 +0100)]
configure.ac: fix bashisms
configure scripts need to be runnable with a POSIX-compliant /bin/sh.
On many (but not all!) systems, /bin/sh is provided by Bash, so errors
like this aren't spotted. Notably Debian defaults to /bin/sh provided
by dash which doesn't tolerate such bashisms as '=='.
This retains compatibility with bash.
Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Gavrilov Ilia [Sat, 28 Jan 2023 18:09:52 +0000 (19:09 +0100)]
netfilter: ipset: Fix overflow before widen in the bitmap_ip_create() function.
When first_ip is 0, last_ip is 0xFFFFFFFF, and netmask is 31, the value of
an arithmetic expression 2 << (netmask - mask_bits - 1) is subject
to overflow due to a failure casting operands to a larger data type
before performing the arithmetic.
Note that it's harmless since the value will be checked at the next step.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Jozsef Kadlecsik [Fri, 30 Dec 2022 11:32:37 +0000 (12:32 +0100)]
netfilter: ipset: Rework long task execution when adding/deleting entries
When adding/deleting large number of elements in one step in ipset, it can
take a reasonable amount of time and can result in soft lockup errors. The
patch 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of
consecutive elements to add/delete") tried to fix it by limiting the max
elements to process at all. However it was not enough, it is still possible
that we get hung tasks. Lowering the limit is not reasonable, so the
approach in this patch is as follows: rely on the method used at resizing
sets and save the state when we reach a smaller internal batch limit,
unlock/lock and proceed from the saved state. Thus we can avoid long
continuous tasks and at the same time removed the limit to add/delete large
number of elements in one step.
The nfnl mutex is held during the whole operation which prevents one to issue
other ipset commands in parallel.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org> Reported-by: syzbot+9204e7399656300bf271@syzkaller.appspotmail.com Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete")
Jozsef Kadlecsik [Thu, 29 Dec 2022 14:00:21 +0000 (15:00 +0100)]
netfilter: ipset: fix hash:net,port,net hang with /0 subnet
The hash:net,port,net set type supports /0 subnets. However, the patch
commit 5f7b51bf09baca8e titled "netfilter: ipset: Limit the maximal range
of consecutive elements to add/delete" did not take into account it and
resulted in an endless loop. The bug is actually older but the patch 5f7b51bf09baca8e brings it out earlier.
Handle /0 subnets properly in hash:net,port,net set types.
Reported-by: Марк Коренберг <socketpair@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Jozsef Kadlecsik [Mon, 21 Nov 2022 12:16:56 +0000 (13:16 +0100)]
netfilter: ipset: restore allowing 64 clashing elements in hash:net,iface
The patch "netfilter: ipset: enforce documented limit to prevent allocating
huge memory" was too strict and prevented to add up to 64 clashing elements
to a hash:net,iface type of set. This patch fixes the issue and now the type
behaves as documented.
Vishwanath Pai [Thu, 10 Nov 2022 21:31:26 +0000 (16:31 -0500)]
netfilter: ipset: Add support for new bitmask parameter
Add a new parameter to complement the existing 'netmask' option. The
main difference between netmask and bitmask is that bitmask takes any
arbitrary ip address as input, it does not have to be a valid netmask.
The name of the new parameter is 'bitmask'. This lets us mask out
arbitrary bits in the ip address, for example:
ipset create set1 hash:ip bitmask 255.128.255.0
ipset create set2 hash:ip,port family inet6 bitmask ffff::ff80
Vishwanath Pai [Thu, 10 Nov 2022 21:30:26 +0000 (16:30 -0500)]
netfilter: ipset: Add support for new bitmask parameter
Add a new parameter to complement the existing 'netmask' option. The
main difference between netmask and bitmask is that bitmask takes any
arbitrary ip address as input, it does not have to be a valid netmask.
The name of the new parameter is 'bitmask'. This lets us mask out
arbitrary bits in the ip address, for example:
ipset create set1 hash:ip bitmask 255.128.255.0
ipset create set2 hash:ip,port family inet6 bitmask ffff::ff80
Vishwanath Pai [Wed, 28 Sep 2022 18:26:50 +0000 (14:26 -0400)]
netfilter: ipset: regression in ip_set_hash_ip.c
This patch introduced a regression: commit 48596a8ddc46 ("netfilter:
ipset: Fix adding an IPv4 range containing more than 2^31 addresses")
The variable e.ip is passed to adtfn() function which finally adds the
ip address to the set. The patch above refactored the for loop and moved
e.ip = htonl(ip) to the end of the for loop.
What this means is that if the value of "ip" changes between the first
assignement of e.ip and the forloop, then e.ip is pointing to a
different ip address than "ip".
Test case:
$ ipset create jdtest_tmp hash:ip family inet hashsize 2048 maxelem 100000
$ ipset add jdtest_tmp 10.0.1.1/31
ipset v6.21.1: Element cannot be added to the set: it's already added
The value of ip gets updated inside the "else if (tb[IPSET_ATTR_CIDR])"
block but e.ip is still pointing to the old value.
Wolfram Sang [Mon, 7 Nov 2022 21:09:04 +0000 (22:09 +0100)]
netfilter: move from strlcpy with unused retval to strscpy
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Kees Cook [Mon, 7 Nov 2022 20:58:52 +0000 (21:58 +0100)]
netlink: Bounds-check struct nlmsgerr creation
In preparation for FORTIFY_SOURCE doing bounds-check on memcpy(),
switch from __nlmsg_put to nlmsg_put(), and explain the bounds check
for dealing with the memcpy() across a composite flexible array struct.
Avoids this future run-time warning:
memcpy: detected field-spanning write (size 32) of single field "&errmsg->msg" at net/netlink/af_netlink.c:2447 (size 16)
Cc: Jakub Kicinski <kuba@kernel.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: syzbot <syzkaller@googlegroups.com> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220901071336.1418572-1-keescook@chromium.org Signed-off-by: David S. Miller <davem@davemloft.net>
sched: consistently handle layer3 header accesses in the presence of VLANs
There are a couple of places in net/sched/ that check skb->protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb->protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb->protocol to use a helper that will always see the VLAN ethertype.
However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).
To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.
To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.
v3:
- Remove empty lines
- Move vlan variable definitions inside loop in skb_protocol()
- Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
bpf_skb_ecn_set_ce()
v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb->protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
calling the helper twice
Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com> Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter: ipset: enforce documented limit to prevent allocating huge memory
Daniel Xu reported that the hash:net,iface type of the ipset subsystem does
not limit adding the same network with different interfaces to a set, which
can lead to huge memory usage or allocation failure.
The quick reproducer is
$ ipset create ACL.IN.ALL_PERMIT hash:net,iface hashsize 1048576 timeout 0
$ for i in $(seq 0 100); do /sbin/ipset add ACL.IN.ALL_PERMIT 0.0.0.0/0,kaf_$i timeout 0 -exist; done
The backtrace when vmalloc fails:
[Tue Oct 25 00:13:08 2022] ipset: vmalloc error: size 1073741848, exceeds total pages
<...>
[Tue Oct 25 00:13:08 2022] Call Trace:
[Tue Oct 25 00:13:08 2022] <TASK>
[Tue Oct 25 00:13:08 2022] dump_stack_lvl+0x48/0x60
[Tue Oct 25 00:13:08 2022] warn_alloc+0x155/0x180
[Tue Oct 25 00:13:08 2022] __vmalloc_node_range+0x72a/0x760
[Tue Oct 25 00:13:08 2022] ? hash_netiface4_add+0x7c0/0xb20
[Tue Oct 25 00:13:08 2022] ? __kmalloc_large_node+0x4a/0x90
[Tue Oct 25 00:13:08 2022] kvmalloc_node+0xa6/0xd0
[Tue Oct 25 00:13:08 2022] ? hash_netiface4_resize+0x99/0x710
<...>
The fix is to enforce the limit documented in the ipset(8) manpage:
> The internal restriction of the hash:net,iface set type is that the same
> network prefix cannot be stored with more than 64 different interfaces
> in a single set.
Reported-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
The parser assumes the set is an IPv4 ipset because IPSET_OPT_FAMILY is
not set.
# ipset-translate restore < ./ipset-mwan3_set_connected_ipv6.dump
add table inet global
add set inet global mwan3_connected_v6 { type ipv6_addr; flags interval; }
flush set inet global mwan3_connected_v6
ipset v7.15: Error in line 4: Syntax error: '64' is out of range 0-32
Remove ipset_xlate_type_get(), call ipset_xlate_set_get() instead to
obtain the set type and family.
Reported-by: Florian Eckert <fe@dev.tdt.de> Fixes: 325af556cd3a ("add ipset to nftables translation infrastructure") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
originally reported in
https://lists.opensuse.org/archives/list/factory@lists.opensuse.org/thread/ZIXKNQHSSCQ4ZLEGYYKLAXQ4PQ5EYFGZ/
by Larry Len Rainey
Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: ipset: Fix maximal range check in hash_ipportnet4_uadt()
Clang warns:
net/netfilter/ipset/ip_set_hash_ipportnet.c:249:29: warning: variable
'port_to' is uninitialized when used here [-Wuninitialized]
if (((u64)ip_to - ip + 1)*(port_to - port + 1) > IPSET_MAX_RANGE)
^~~~~~~
net/netfilter/ipset/ip_set_hash_ipportnet.c:167:45: note: initialize the
variable 'port_to' to silence this warning
u32 ip = 0, ip_to = 0, p = 0, port, port_to;
^
= 0
net/netfilter/ipset/ip_set_hash_ipportnet.c:249:39: warning: variable
'port' is uninitialized when used here [-Wuninitialized]
if (((u64)ip_to - ip + 1)*(port_to - port + 1) > IPSET_MAX_RANGE)
^~~~
net/netfilter/ipset/ip_set_hash_ipportnet.c:167:36: note: initialize the
variable 'port' to silence this warning
u32 ip = 0, ip_to = 0, p = 0, port, port_to;
^
= 0
2 warnings generated.
The range check was added before port and port_to are initialized.
Shuffle the check after the initialization so that the check works
properly.
Fixes: 7fb6c63025ff ("netfilter: ipset: Limit the maximal range of consecutive elements to
add/delete")
Limit the maximal range of consecutive elements to add/delete fix
Avoid possible number overflows when calculating the number of
consecutive elements. Also, compute properly the consecutive
elements in the case of hash:net* types.
Limit the maximal range of consecutive elements to add/delete
The range size of consecutive elements were not limited. Thus one
could define a huge range which may result soft lockup errors due
to the long execution time. Now the range size is limited to 2^20
entries. Reported by Brad Spengler.
ipset_parse_argv() parses, builds and send the netlink messages to the
kernel. This patch extracts the parser and wrap it around the new
ipset_parser() function.
This patch comes is preparation for the ipset to nftables translation
infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Jozsef Kadlecsik [Sat, 26 Jun 2021 20:14:38 +0000 (22:14 +0200)]
Fix patch "Parse port before trying by service name"
The patch broke parsing service names: number parsing failures
are hard errors which erase data, thus making impossible to
parse input as a string. Fix it by enabling soft (warning)
failures in the case of port number parsing.
Jozsef Kadlecsik [Wed, 17 Feb 2021 09:07:27 +0000 (10:07 +0100)]
Silence unused-but-set-variable warnings
When ipset is compiled in non-debug mode, in some environments warnings
treated as errors emitted:
session.c: In function 'build_msg':
session.c:1985:28: warning: variable 'type' set but not used
[-Wunused-but-set-variable]
const struct ipset_type *type;
^
session.c:2030:28: warning: variable 'type' set but not used
[-Wunused-but-set-variable]
const struct ipset_type *type;
^
Fix it by hiding the unused variable definitions/settings in non-debug mode.
Neutron Soutmun [Mon, 18 Jan 2021 04:58:30 +0000 (11:58 +0700)]
ipset: fix print format warning
* Use PRIx64 for portablility over various architectures.
* The format string for the 64bit number printing is incorrect,
the `%` sign is missing.
* The force types casting over the uint32_t and uint64_t are unnecessary
which warned by the compiler on different architecture.
Vasily Averin [Sun, 20 Dec 2020 12:21:13 +0000 (13:21 +0100)]
netfilter: ipset: fix shift-out-of-bounds in htable_bits()
htable_bits() can call jhash_size(32) and trigger shift-out-of-bounds
UBSAN: shift-out-of-bounds in net/netfilter/ipset/ip_set_hash_gen.h:151:6
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 0 PID: 8498 Comm: syz-executor519
Not tainted 5.10.0-rc7-next-20201208-syzkaller #0
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
__ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
htable_bits net/netfilter/ipset/ip_set_hash_gen.h:151 [inline]
hash_mac_create.cold+0x58/0x9b net/netfilter/ipset/ip_set_hash_gen.h:1524
ip_set_create+0x610/0x1380 net/netfilter/ipset/ip_set_core.c:1115
nfnetlink_rcv_msg+0xecc/0x1180 net/netfilter/nfnetlink.c:252
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:600
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6e8/0x810 net/socket.c:2345
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2432
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
This patch replaces htable_bits() by simple fls(hashsize - 1) call:
it alone returns valid nbits both for round and non-round hashsizes.
It is normal to set any nbits here because it is validated inside
following htable_size() call which returns 0 for nbits>31.
Vasily Averin [Sun, 20 Dec 2020 12:17:01 +0000 (13:17 +0100)]
netfilter: ipset: fixes possible oops in mtype_resize
currently mtype_resize() can cause oops
t = ip_set_alloc(htable_size(htable_bits));
if (!t) {
ret = -ENOMEM;
goto out;
}
t->hregion = ip_set_alloc(ahash_sizeof_regions(htable_bits));
Increased htable_bits can force htable_size() to return 0.
In own turn ip_set_alloc(0) returns not 0 but ZERO_SIZE_PTR,
so follwoing access to t->hregion should trigger an OOPS.