Several parts of network stack rely on iph->ihl validation
done by network stack before PRE_ROUTING.
Disable this feature for user namespaces for now.
tcp option handling is likely safe even for LOCAL_IN, so this
this leaves tcp option mangling via nft_exthdr.c as-is.
I don't think these are the only means to alter packets, but these
appear to be relatively prominent.
This could be relaxed later. Example:
- allow userns for ingress hook.
- allow userns if base is transport header.
Also, we should revalidate or restrict generally:
- Don't allow linklayer writes to spill into network header
- restrict ipv4 and ipv6 to 'known safe' writes, e.g.
saddr/daddr/check/tos
Reported-by: Qi Tang <tpluszz77@gmail.com>
Reported-by: Tong Liu <lyutoon@gmail.com>
Tested-by: Qi Tang <tpluszz77@gmail.com>
Link: https://lore.kernel.org/netfilter-devel/20260515100411.3141-1-fw@strlen.de/
Signed-off-by: Florian Westphal <fw@strlen.de>
{
struct sk_buff *nskb;
+ if (e->state.net->user_ns != &init_user_ns)
+ return -EPERM;
+
if (diff < 0) {
unsigned int min_len = skb_transport_offset(e->skb);
if (nfqnl_mangle(nla_data(nfqa[NFQA_PAYLOAD]),
payload_len, entry, diff) < 0)
verdict = NF_DROP;
-
- if (ct && diff)
+ else if (ct && diff)
nfnl_ct->seq_adjust(entry->skb, ct, ctinfo, diff);
}
struct nft_payload_set *priv = nft_expr_priv(expr);
int err;
+ if (ctx->net->user_ns != &init_user_ns)
+ return -EPERM;
+
priv->base = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE]));
priv->len = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN]));