net/rds: rds_tcp_accept_one ought to not discard messages
RDS/TCP differs from RDS/RDMA in that message acknowledgment
is done based on TCP sequence numbers:
As soon as the last byte of a message has been acknowledged
by the TCP stack of a peer, "rds_tcp_write_space()" goes on
to discard prior messages from the send queue.
Which is fine, for as long as the receiver never throws any messages away.
Unfortunately, that is *not* the case since the introduction of MPRDS:
commit
1a0e100fb2c96 "RDS: TCP: Enable multipath RDS for TCP"
A new function "rds_tcp_accept_one_path" was introduced,
which is entitled to return "NULL", if no connection path is currently
available.
Unfortunately, this happens after the "->accept()" call, and the new socket
often already contains messages, since the peer already transitioned
to "RDS_CONN_UP" on behalf of "TCP_ESTABLISHED".
That's also the case after this [1]:
commit
1a0e100fb2c96 "RDS: TCP: Force every connection to be initiated by
numerically smaller IP address"
which tried to address the situation of pending data by only transitioning
connections from a smaller IP address to "RDS_CONN_UP".
But even in those cases, and in particular if the "RDS_EXTHDR_NPATHS"
handshake has not occurred yet, and therefore we're working with
"c_npaths <= 1", "c_conn[0]" may be in a state distinct from
"RDS_CONN_DOWN", and therefore all messages on the just accepted socket
will be tossed away.
This fix changes "rds_tcp_accept_one":
* If connected from a peer with a larger IP address, the new socket
will continue to get closed right away.
With commit [1] above, there should not be any messages
in the socket receive buffer, since the peer never transitioned
to "RDS_CONN_UP".
Therefore it should be okay to not make any efforts to dispatch
the socket receive buffer.
* If connected from a peer with a smaller IP address,
we call "rds_tcp_accept_one_path" to find a free slot/"path".
If found, business goes on as usual.
If none was found, we save/stash the newly accepted socket
into "rds_tcp_accepted_sock", in order to not lose any
messages that may have arrived already.
We then return from "rds_tcp_accept_one" with "-ENOBUFS".
Later on, when a slot/"path" does become available again
(e.g. state transitioned to "RDS_CONN_DOWN",
or HS extension header was received with "c_npaths > 1")
we call "rds_tcp_conn_slots_available" that simply re-issues
a "rds_tcp_accept_one_path" worker-callback and picks
up the new socket from "rds_tcp_accepted_sock", and thereby
continuing where it left with "-ENOBUFS" last time.
Since a new slot has become available, those messages
won't be lost, since processing proceeds as if that slot
had been available the first time around.
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Link: https://patch.msgid.link/20260122055213.83608-3-achender@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>