Add a 2-bit per-skb tc depth field to track packet loops across the stack.
The previous per-CPU loop counters like MIRRED_NEST_LIMIT
assume a single call stack and lose state in two cases:
1) When a packet is queued and reprocessed later (e.g., egress->ingress
via backlog), the per-cpu state is gone by the time it is dequeued.
2) With XPS/RPS a packet may arrive on one CPU and be processed on
another.
A per-skb field solves both by travelling with the packet itself.
The field fits in existing padding, using 2 bits that were previously a
hole:
pahole before(-) and after (+) diff looks like:
__u8 slow_gro:1; /* 132: 3 1 */
__u8 csum_not_inet:1; /* 132: 4 1 */
__u8 unreadable:1; /* 132: 5 1 */
+ __u8 tc_depth:2; /* 132: 6 1 */
- /* XXX 2 bits hole, try to pack */
/* XXX 1 byte hole, try to pack */
__u16 tc_index; /* 134 2 */
There used to be a ttl field which was removed as part of tc_verd in commit
aec745e2c520 ("net-tc: remove unused tc_verd fields"). It was already
unused by that time, due to remove earlier in commit
c19ae86a510c ("tc: remove
unused redirect ttl").
The first user of this field is netem, which increments tc_depth on
duplicated packets before re-enqueueing them at the root qdisc. On
re-entry, netem skips duplication for any skb with tc_depth already set,
bounding recursion to a single level regardless of tree topology.
The other user is mirred which increments it on each pass
and limits to depth to MIRRED_DEFER_LIMIT (3).
The new field was called ttl in earlier versions of this patch
but renamed to tc_depth to avoid confusion with IP ttl.
Note (looking at you Sashiko! Dont ignore me and continue bringing this up):
1. Since both mirred and netem utilize the same 2-bit tc_depth field it is
possible when netem and mirred are used together that netem qdisc to skip
the duplication step. This is a known trade-off, as a 2-bit field cannot
independently track both features' recursion depths and it is not considered
sane to have a setup that addresses both features on at the same time.
2. skb_scrub_packet does not clear tc_depth. This means a packet's loop history
is preserved even across namespaces. While this might be restrictive for
some topologies, it is also design intent to provide robustness against loops
across namespaces.
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260525122556.973584-2-jhs@mojatatu.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>