netfilter: flowtable: dedicated slab for flow entry
The size of `struct flow_offload` has grown beyond 256 bytes on 64-bit
kernels (currently 280 bytes) because of the `flow_offload_tunnel`
member added recently. So kmalloc() allocates from the kmalloc-512 slab,
causing significant memory waste per entry.
Introduce a dedicated slab cache for flow entries to reduce memory
footprint. Results in a reduction from 512 bytes to 320 bytes per entry
on x86_64 kernels.
Signed-off-by: Qingfang Deng <dqfext@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>