BUG/MINOR: sock-inet: ignore conntrack for transparent sockets on Linux
As reported in github issue #3192, in certain situations with transparent
listeners, it is possible to get the incoming connection's destination
wrong via SO_ORIGINAL_DST. Two cases were identified thus far:
- incorrect conntrack configuration where NOTRACK is used only on
incoming packets, resulting in reverse connections being created
from response packets. It's then mostly a matter of timing, i.e.
whether or not the connection is confirmed before the source is
retrieved, but in this case the connection's destination address
as retrieved by SO_ORIGINAL_DST is the client's address.
- late outgoing retransmit that recreates a just expired conntrack
entry, in reverse direction as well. It's possible that combinations
of RST or FIN might play a role here in speeding up conntrack eviction,
as well as the rollover of source ports on the client whose new
connection matches an older one and simply refreshes it due to
nf_conntrack_tcp_loose being set by default.
TPROXY doesn't require conntrack, only REDIRECT, DNAT etc do. However
the system doesn't offer any option to know how a conntrack entry was
created (i.e. normally or via a response packet) to let us know that
it's pointless to check the original destination, nor does it permit
to access the local vs peer addresses in opposition to src/dst which
can be wrong in this case.
One alternate approach could consist in only checking SO_ORIGINAL_DST
for listening sockets not configured with the "transparent" option,
but the problem here is that our low-level API only works with FDs
without knowing their purpose, so it's unknown there that the fd
corresponds to a listener, let alone in transparent mode.
A (slightly more expensive) variant of this approach here consists in
checking on the socket itself that it was accepted in transparent mode
using IP_TRANSPARENT, and skip SO_ORIGINAL_DST if this is the case.
This does the job well enough (no more client addresses appearing in
the dst field) and remains a good compromise. A future improvement of
the API could permit to pass the transparent flag down the stack to
that function.
This should be backported to stable versions after some observation
in latest -dev.
For reference, here are some links to older conversations on that topic
that Lukas found during this analysis: