]> git.ipfire.org Git - thirdparty/openssl.git/blame - doc/designs/quic-design/tx-packetiser.md
QUIC TX Packetiser and Streams Mapper
[thirdparty/openssl.git] / doc / designs / quic-design / tx-packetiser.md
CommitLineData
fabce809
P
1TX Packetiser
2=============
3
4This module creates frames from the application data obtained from
5the application. It also receives CRYPTO frames from the TLS Handshake
6Record Layer and ACK frames from the ACK Handling And Loss Detector
7subsystem.
8
9The packetiser also deals with the flow and congestion controllers.
10
11Creation & Destruction
12----------------------
13
14```c
a73078b7
HL
15typedef struct quic_tx_packetiser_args_st {
16 /* Configuration Settings */
17 QUIC_CONN_ID cur_scid; /* Current Source Connection ID we use. */
18 QUIC_CONN_ID cur_dcid; /* Current Destination Connection ID we use. */
19 BIO_ADDR peer; /* Current destination L4 address we use. */
20 /* ACK delay exponent used when encoding. */
21 uint32_t ack_delay_exponent;
22
23 /* Injected Dependencies */
24 OSSL_QTX *qtx; /* QUIC Record Layer TX we are using */
25 QUIC_TXPIM *txpim; /* QUIC TX'd Packet Information Manager */
26 QUIC_CFQ *cfq; /* QUIC Control Frame Queue */
27 OSSL_ACKM *ackm; /* QUIC Acknowledgement Manager */
28 QUIC_STREAM_MAP *qsm; /* QUIC Streams Map */
29 QUIC_TXFC *conn_txfc; /* QUIC Connection-Level TX Flow Controller */
30 QUIC_RXFC *conn_rxfc; /* QUIC Connection-Level RX Flow Controller */
31 const OSSL_CC_METHOD *cc_method; /* QUIC Congestion Controller */
32 OSSL_CC_DATA *cc_data; /* QUIC Congestion Controller Instance */
33 OSSL_TIME (*now)(void *arg); /* Callback to get current time. */
34 void *now_arg;
35
36 /*
37 * Injected dependencies - crypto streams.
38 *
39 * Note: There is no crypto stream for the 0-RTT EL.
40 * crypto[QUIC_PN_SPACE_APP] is the 1-RTT crypto stream.
41 */
42 QUIC_SSTREAM *crypto[QUIC_PN_SPACE_NUM];
43} QUIC_TX_PACKETISER_ARGS;
fabce809
P
44
45_owur typedef struct ossl_quic_tx_packetiser_st OSSL_QUIC_TX_PACKETISER;
46
a73078b7 47OSSL_QUIC_TX_PACKETISER *ossl_quic_tx_packetiser_new(QUIC_TX_PACKETISER_ARGS *args);
fabce809
P
48void ossl_quic_tx_packetiser_free(OSSL_QUIC_TX_PACKETISER *tx);
49```
50
51Structures
52----------
53
54### Connection
55
56Represented by an QUIC_CONNECTION object.
57
58### Stream
59
60Represented by an QUIC_STREAM object.
61
62As per [RFC 9000 2.3 Stream Prioritization], streams should contain a priority
63provided by the calling application. For MVP, this is not required to be
64implemented because only one stream is supported. However, packets being
65retransmitted should be preferentially sent as noted in
66[RFC 9000 13.3 Retransmission of Information].
67
68```c
69void SSL_set_priority(SSL *stream, uint32_t priority);
70uint32_t SSL_get_priority(SSL *stream);
71```
72
73For protocols where priority is not meaningful, the set function is a noop and
74the get function returns a constant value.
75
a73078b7
HL
76Interactions
77------------
78
79The packetiser interacts with the following components, the APIs for which
80can be found in their respective design documents and header files:
81
82- SSTREAM: manages application stream data for transmission.
83- QUIC_STREAM_MAP: Maps stream IDs to QUIC_STREAM objects and tracks which
84 streams are active (i.e., need servicing by the TX packetiser).
85- Crypto streams for each EL other than 0-RTT (each is one SSTREAM).
86- CFQ: queried for generic control frames
87- QTX: record layer which completed packets are written to.
88- TXPIM: logs information about transmitted packets, provides information to
89 FIFD.
90- FIFD: notified of transmitted packets.
91- ACKM: loss detector.
92- Connection and stream-level TXFC and RXFC instances.
93- Congestion controller (not needed for MVP).
94
95### SSTREAM
96
97Each application or crypto stream has a SSTREAM object for the sending part.
98This manages the buffering of data written to the stream, frees that data when
99the packet it was sent in was acknowledged, and can return the data for
100retransmission on loss. It receives loss and acknowledgement notifications from
101the FIFD without direct TX packetiser involvement.
102
103### QUIC Stream Map
104
105The TX packetiser queries the QUIC stream map for a list of active streams
106(QUIC_STREAM), which are iterated on a rotating round robin basis. Each
107QUIC_STREAM provides access to the various components, such as a QUIC_SSTREAM
108instance (for streams with a send part). Streams are marked inactive when
109they no longer have any need to generate frames at the present time.
fabce809 110
a73078b7
HL
111### Crypto Streams
112
113The crypto streams for each EL (other than 0-RTT, which does not have a crypto
114stream) are represented by SSTREAM instances. The TX packetiser queries SSTREAM
115instances provided to it as needed when generating packets.
116
117### CFQ
118
119Many control frames do not require special handling and are handled by the
120generic CFQ mechanism. The TX packetiser queries the CFQ for any frames to be
121sent and schedules them into a packet.
122
123### QUIC Write Record Layer
124
125Coalesced frames are passed to the QUIC record layer for encryption and sending.
126To send accumulated frames as packets to the QUIC Write Record Layer:
fabce809
P
127
128```c
a73078b7
HL
129int ossl_qtx_write_pkt(OSSL_QTX *qtx, const OSSL_QTX_PKT *pkt);
130```
fabce809 131
a73078b7
HL
132The packetiser will attempt to maximise the number of bytes in a packet.
133It will also attempt to create multiple packets to send simultaneously.
fabce809 134
a73078b7
HL
135The packetiser should also implement a wait time to allow more data to
136accumulate before exhausting it's supply of data. The length of the wait
137will depend on how much data is queued already and how much space remains in
138the packet being filled. Once the wait is finished, the packets will be sent
139by calling:
fabce809 140
a73078b7
HL
141```c
142void ossl_qtx_flush_net(OSSL_QTX *qtx);
fabce809
P
143```
144
a73078b7
HL
145The write record layer is responsible for coalescing multiple QUIC packets
146into datagrams.
147
148### TXPIM, FIFD, ACK Handling and Loss Detector
149
150ACK handling and loss detection is provided by the ACKM and FIFD. The FIFD uses
151the per-packet information recorded by the TXPIM to track which frames are
152contained within a packet which was lost or acknowledged, and generates
153callbacks to the TX packetiser, SSTREAM instances and CFQ to allow it to
154regenerate those frames as needed.
155
1561. When a packet is sent, the packetiser informs the FIFD, which also informs
157 the ACK Manager.
1582. When a packet is ACKed, the FIFD notifies applicable SSTREAMs and the CFQ
159 as appropriate.
1603. When a packet is lost, the FIFD notifies the TX packetiser of any frames
161 which were in the lost packet for which the Regenerate strategy is
162 applicable.
1634. Currently, no notifications to the TX packetiser are needed when packets
164 are discarded (e.g. due to an EL being discarded).
165
166### Flow Control
167
168The packetiser interacts with connection and stream-level TXFC and RXFC
169instances. It interacts with RXFC instances to know when to generate flow
170control frames, and with TXFC instances to know how much stream data it is
171allowed to send in a packet.
172
173### Congestion Control
174
175The packetiser is likely to interact with the congestion controller in the
176future. Currently, congestion control is a no-op.
177
178Packets
179-------
180
181Packet formats are defined in [RFC 9000 17.1 Packet Formats].
182
183### Packet types
fabce809 184
a73078b7
HL
185QUIC supports a number of different packets. The combination of packets of
186different encryption levels as per [RFC 9000 12.2 Coalescing Packets], is done
187by the record layer. Non-encrypted packets are not handled by the TX Packetiser
188and callers may send them by direct calls to the record layer.
189
190#### Initial Packet
191
192Refer to [RFC 9000 17.2.2 Initial Packet].
193
194#### Handshake Packet
195
196Refer to [RFC 9000 17.2.4 Handshake Packet].
197
198#### App Data 0-RTT Packet
199
200Refer to [RFC 9000 17.2.3 0-RTT].
201
202#### App Data 1-RTT Packet
203
204Refer to [RFC 9000 17.3.1 1-RTT].
205
206Packetisation and Processing
207----------------------------
208
209### Definitions
210
211 - Maximum Datagram Payload Length (MDPL): The maximum number of UDP payload
212 bytes we can put in a UDP packet. This is derived from the applicable PMTU.
213 This is also the maximum size of a single QUIC packet if we place only one
214 packet in a datagram. The MDPL may vary based on both local source IP and
215 destination IP due to different path MTUs.
216
217 - Maximum Packet Length (MPL): The maximum size of a fully encrypted
218 and serialized QUIC packet in bytes in some given context. Typically
219 equal to the MDPL and never greater than it.
220
221 - Maximum Plaintext Payload Length (MPPL): The maximum number of plaintext
222 bytes we can put in the payload of a QUIC packet. This is related to
223 the MDPL by the size of the encoded header and the size of any AEAD
224 authentication tag which will be attached to the ciphertext.
225
226 - Coalescing MPL (CMPL): The maximum number of bytes left to serialize
227 another QUIC packet into the same datagram as one or more previous
228 packets. This is just the MDPL minus the total size of all previous
229 packets already serialized into to the same datagram.
230
231 - Coalescing MPPL (CMPPL): The maximum number of payload bytes we can put in
232 the payload of another QUIC packet which is to be coalesced with one or
233 more previous QUIC packets and placed into the same datagram. Essentially,
234 this is the room we have left for another packet payload.
235
236 - Remaining CMPPL (RCMPPL): The number of bytes left in a packet whose payload
237 we are currently forming. This is the CMPPL minus any bytes we have already
238 put into the payload.
239
240 - Minimum Datagram Length (MinDPL): In some cases we must ensure a datagram
241 has a minimum size of a certain number of bytes. This does not need to be
242 accomplished with a single packet, but we may need to add PADDING frames
243 to the final packet added to a datagram in this case.
244
245 - Minimum Packet Length (MinPL): The minimum serialized packet length we
246 are using while serializing a given packet. May often be 0. Used to meet
247 MinDPL requirements, and thus equal to MinDPL minus the length of any packets
248 we have already encoded into the datagram.
249
250 - Minimum Plaintext Payload Length (MinPPL): The minimum number of bytes
251 which must be placed into a packet payload in order to meet the MinPL
252 minimum size when the packet is encoded.
253
254 - Active Stream: A stream which has data or flow control frames ready for
255 transmission.
256
257### Frames
fabce809
P
258
259Frames are taken from [RFC 9000 12.4 Frames and Frame Types].
260
261| Type | Name | I | H | 0 | 1 | N | C | P | F |
262| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
263| 0x00 | padding | ✓ | ✓ | ✓ | ✓ | ✓ | | ✓
264| 0x01 | ping | ✓ | ✓ | ✓ | ✓ | | | | |
265| 0x02 | ack 0x02 | ✓ | ✓ | | ✓ | ✓ | ✓ | | |
266| 0x03 | ack 0x03 | ✓ | ✓ | | ✓ | ✓ | ✓ | | |
267| 0x04 | reset_stream | | | ✓ | ✓ | | | | |
268| 0x05 | stop_sending | | | ✓ | ✓ | | | | |
269| 0x06 | crypto | ✓ | ✓ | | ✓ | | | | |
270| 0x07 | new_token | | | | ✓ | | | | |
271| 0x08 | stream 0x08 | | | ✓ | ✓ | | | | ✓ |
272| 0x09 | stream 0x09 | | | ✓ | ✓ | | | | ✓ |
273| 0x0A | stream 0x0A | | | ✓ | ✓ | | | | ✓ |
274| 0x0B | stream 0x0B | | | ✓ | ✓ | | | | ✓ |
275| 0x0C | stream 0x0C | | | ✓ | ✓ | | | | ✓ |
276| 0x0D | stream 0x0D | | | ✓ | ✓ | | | | ✓ |
277| 0x0E | stream 0x0E | | | ✓ | ✓ | | | | ✓ |
278| 0x0F | stream 0x0F | | | ✓ | ✓ | | | | ✓ |
279| 0x10 | max_data | | | ✓ | ✓ | | | | |
280| 0x11 | max_stream_data | | | ✓ | ✓ | | | | |
281| 0x12 | max_streams 0x12 | | | ✓ | ✓ | | | | |
282| 0x13 | max_streams 0x13 | | | ✓ | ✓ | | | | |
283| 0x14 | data_blocked | | | ✓ | ✓ | | | | |
284| 0x15 | stream_data_blocked | | | ✓ | ✓ | | | | |
285| 0x16 | streams_blocked 0x16 | | | ✓ | ✓ | | | | |
286| 0x17 | streams_blocked 0x17 | | | ✓ | ✓ | | | | |
287| 0x18 | new_connection_id | | | ✓ | ✓ | | | ✓ | |
288| 0x19 | retire_connection_id | | | ✓ | ✓ | | | | |
289| 0x1A | path_challenge | | | ✓ | ✓ | | | ✓ | |
290| 0x1B | path_response | | | | ✓ | | | ✓ | |
291| 0x1C | connection_close 0x1C | ✓ | ✓ | ✓ | ✓ | ✓
292| 0x1D | connection_close 0x1D | | | ✓ | ✓ | ✓ | | | |
293| 0x1E | handshake_done | | | | ✓ | | | | |
294
295The various fields are as defined in RFC 9000.
296
a73078b7 297#### Pkts
fabce809
P
298
299_Pkts_ are defined as:
300
301| Pkts | Description|
302| :---: | --- |
303| I | Valid in Initial packets|
304| H | Valid in Handshake packets|
305| 0 | Valid in 0-RTT packets|
306| 1 | Valid in 1-RTT packets|
307
a73078b7 308#### Spec
fabce809
P
309
310_Spec_ is defined as:
311
312| Spec | Description |
313| :---: | --- |
314| N | Not ack-eliciting. |
315| C | does not count toward bytes in flight for congestion control purposes. |
316| P | Can be used to probe new network paths during connection migration. |
317| F | The contents of frames with this marking are flow controlled. |
318
319For `C`, `N` and `P`, the entire packet must consist of only frames with the
320marking for the packet to qualify for it. For example, a packet with an ACK
321frame and a _stream_ frame would qualify for neither the `C` or `N` markings.
322
fabce809
P
323#### Notes
324
325- Do we need the distinction between 0-rtt and 1-rtt when both are in
326 the Application Data number space?
327- 0-RTT packets can morph into 1-RTT packets and this needs to be handled by
328 the packetiser.
329
a73078b7 330### Frame Type Prioritisation
fabce809 331
a73078b7
HL
332The frame types listed above are reordered below in the order of priority with
333which we want to serialize them. We discuss the motivations for this priority
334ordering below. Items without a line between them have the same priority.
fabce809 335
a73078b7
HL
336```plain
337HANDSHAKE_DONE GCR / REGEN
338----------------------------
339MAX_DATA REGEN
340DATA_BLOCKED REGEN
341MAX_STREAMS REGEN
342STREAMS_BLOCKED REGEN
343----------------------------
fabce809 344
fabce809 345
a73078b7
HL
346NEW_CONNECTION_ID GCR
347RETIRE_CONNECTION_ID GCR
348----------------------------
349PATH_CHALLENGE -
350PATH_RESPONSE -
351----------------------------
352ACK - (non-ACK-eliciting)
353----------------------------
354CONNECTION_CLOSE *** (non-ACK-eliciting)
355----------------------------
356NEW_TOKEN GCR
fabce809 357
a73078b7
HL
358----------------------------
359CRYPTO GCR/*q
360
361============================ ] priority group, repeats per stream
362RESET_STREAM GCR* ]
363STOP_SENDING GCR* ]
364---------------------------- ]
365MAX_STREAM_DATA REGEN ]
366STREAM_DATA_BLOCKED REGEN ]
367---------------------------- ]
368STREAM *q ]
369============================ ]
fabce809 370
a73078b7
HL
371----------------------------
372PING -
373----------------------------
374PADDING - (non-ACK-eliciting)
fabce809
P
375```
376
a73078b7
HL
377(See [Frame in Flight Manager](quic-fifm.md) for information on the meaning of
378the second column, which specifies the retransmission strategy for each frame
379type.)
380
381- `PADDING`: For obvious reasons, this frame type is the lowest priority. We only
382 add `PADDING` frames at the very end after serializing all other frames if we
383 have been asked to ensure a non-zero MinPL but have not yet met that minimum.
384
385- `PING`: The `PING` frame is encoded as a single byte. It is used to make a packet
386 ACK-eliciting if it would not otherwise be ACK-eliciting. Therefore we only
387 need to send it if
388
389 a. we have been asked to ensure the packet is ACK-eliciting, and
390 b. we do not have any other ACK-eliciting frames in the packet.
391
392 Thus we wait until the end before adding the PING frame as we may end up
393 adding other ACK-eliciting frames and not need to add it. There is never
394 a need to add more than one PING frame. If we have been asked to ensure
395 the packet is ACK-eliciting and we do not know for sure up front if we will
396 add any other ACK-eliciting packet, we must reserve one byte of our CMPPL
397 to ensure we have room for this. We can cancel this reservation if we
398 add an ACK-eliciting frame earlier. For example:
399
400 - We have been asked to ensure a packet is ACK-eliciting and the CMPPL is
401 1000 (we are coalescing with another packet).
402 - We allocate 999 bytes for non-PING frames.
403 - While adding non-PING frames, we add a STREAM frame, which is
404 ACK-eliciting, therefore the PING frame reservation is cancelled
405 and we increase our allocation for non-PING frames to 1000 bytes.
406
407- `HANDSHAKE_DONE`: This is a single byte frame with no data which is used to
408 indicate handshake completion. It is only ever sent once. As such, it can be
409 implemented as a single flag, and there is no risk of it outcompeting other
410 frames. It is therefore trivially given the highest priority.
411
412- `MAX_DATA`, `DATA_BLOCKED`: These manage connection-level flow control. They
413 consist of a single integer argument, and, as such, take up little space, but
414 are also critical to ensuring the timely expansion of the connection-level
415 flow control window. Thus there is a performance reason to include them in
416 packets with high priority and due to their small size and the fact that there
417 will only ever be at most one per packet, there is no risk of them
418 outcompeting other frames.
419
420- `MAX_STREAMS`, `STREAMS_BLOCKED`: Similar to the frames above for
421 connection-level flow control, but controls rate at which new streams are
422 opened. The same arguments apply here, so they are prioritised equally.
423
424- `STREAM`: This is the bread and butter of a QUIC packet, and contains
425 application-level stream data. As such these frames can usually be expected to
426 consume most of our packet's payload budget. We must generally assume that
427
428 - there are many streams, and
429 - several of those streams have much more data waiting to be sent than
430 can be sent in a single packet.
431
432 Therefore we must ensure some level of balance between multiple competing
433 streams. We refer to this as stream scheduling. There are many strategies that
434 can be used for this, and in the future we might even support
435 application-signalled prioritisation of specific streams. We discuss
436 stream scheduling further below.
437
438 Because these frames are expected to make up the bulk of most packets, we
439 consider them low priority, higher only than `PING` and `PADDING` frames.
440 Moreover, we give priority to control frames as unlike `STREAM` frames, they
441 are vital to the maintenance of the health of the connection itself. Once we
442 have serialized all other frame types, we can reserve the rest of the packet
443 for any `STREAM` frames. Since all `STREAM` frames are ACK-eliciting, if we
444 have any `STREAM` frame to send at all, it cancels any need for any `PING`
445 frame, and may be able to partially or wholly obviate our need for any
446 `PADDING` frames which we might otherwise have needed. Thus once we start
447 serializing STREAM frames, we are limited only by the remaining CMPPL.
448
449- `MAX_STREAM_DATA`, `STREAM_DATA_BLOCKED`: Stream-level flow control. These
450 contain only a stream ID and integer value used for flow control, so they are
451 not large. Since they are critical to the management and health of a specific
452 stream, and because they are small and have no risk of stealing too many bytes
453 from the `STREAM` frames they follow, we always serialize these before any
454 corresponding `STREAM` frames for a given stream ID.
455
456- `RESET_STREAM`, `STOP_SENDING`: These terminate a given stream ID and thus are
457 also associated with a stream. They are also small. As such, we consider these
458 higher priority than both `STREAM` frames and the stream-level flow control
459 frames.
460
461- `NEW_CONNECTION_ID`, `RETIRE_CONNECTION_ID`: These are critical for connection
462 management and are not particularly large, therefore they are given a high
463 priority.
464
465- `PATH_CHALLENGE`, `PATH_RESPONSE`: Used during connection migration, these
466 are small and are given a high priority.
467
468- `CRYPTO`: These frames generate the logical crypto stream, which is a logical
469 bidirectional bytestream used to transport TLS records for connection
470 handshake and management purposes. As such, the crypto stream is viewed as
471 similar to application streams but of a higher priority. We are willing to let
472 `CRYPTO` frames outcompete all application stream-related frames if need be,
473 as `CRYPTO` frames are more important to the maintenance of the connection and
474 the handshake layer should not generate an excessive amount of data.
475
476- `CONNECTION_CLOSE`, `NEW_TOKEN`: The `CONNECTION_CLOSE` frame can contain a
477 user-specified reason string. The `NEW_TOKEN` frame contains an opaque token
478 blob. Both can be arbitrarily large but for the fact that they must fit in a
479 single packet and are thus ultimately limited by the MPPL. However, these
480 frames are important to connection maintenance and thus are given a priority
481 just above that of `CRYPTO` frames. The `CONNECTION_CLOSE` frame has higher
482 priority than `NEW_TOKEN`.
483
484- `ACK`: `ACK` frames are critical to avoid needless retransmissions by our peer.
485 They can also potentially become large if a large number of ACK ranges needs
486 to be transmitted. Thus `ACK` frames are given a fairly high priority;
487 specifically, their priority is higher than all frames which have the
488 potential to be large but below all frames which contain only limited data,
489 such as connection-level flow control. However, we reserve the right to adapt
490 the size of the ACK frames we transmit by chopping off some of the PN ranges
491 to limit the size of the ACK frame if its size would be otherwise excessive.
492 This ensures that the high priority of the ACK frame does not starve the
493 packet of room for stream data.
494
495### Stream Scheduling
496
497**Stream budgeting.** When it is time to add STREAM frames to a packet under
498construction, we take our Remaining CMPPL and call this value the Streams
499Budget. There are many ways we could make use of this Streams Budget.
500
501For the purposes of stream budgeting, we consider all bytes of STREAM frames,
502stream-level flow control frames, RESET_STREAM and STOP_SENDING frames to
503“belong” to their respective streams, and the encoded sizes of these frames are
504accounted to those streams for budgeting purposes. If the total number of bytes
505of frames necessary to serialize all pending data from all active streams is
506less than our Streams Budget, there is no need for any prioritisation.
507Otherwise, there are a number of strategies we could employ. We can categorise
508the possible strategies into two groups to begin with:
509
510 - **Intrapacket muxing (IRPM)**. When the data available to send across all
511 streams exceeds the Streams Budget for the packet, allocate an equal
512 portion of the packet to each stream.
513
514 - **Interpacket muxing (IXPM).** When the data available to send across all
515 streams exceeds the Streams Budget for the packet, try to fill the packet
516 using as few streams as possible, and multiplex by using different
517 streams in different packets.
518
519Though obvious, IRPM does not appear to be a widely used strategy [1] [2],
520probably due to a clear downside: if a packet is lost and it contains data for
521multiple streams, all of those streams will be held up. This undermines a key
522advantage of QUIC, namely the ability of streams to function independently of
523one another for the purposes of head-of-line blocking. By contrast, with IXPM,
524if a packet is lost, typically only a single stream is held up.
525
526Suppose we choose IXPM. We must now choose a strategy for deciding when to
527schedule streams on packets. [1] establishes that there are two basic
528strategies found in use:
529
530 - A round robin (RR) strategy in which the frame scheduler switches to
531 the next active stream every n packets (where n ≥ 1).
532
533 - A sequential (SEQ) strategy in which a stream keeps being transmitted
534 until it is no longer active.
535
536The SEQ strategy does not appear to be suitable for general-purpose
537applications as it presumably starves other streams of bandwidth. It appears
538that this strategy may be chosen in some implementations because it can offer
539greater efficiency with HTTP/3, where there are performance benefits to
540completing transmission of one stream before beginning the next. However, it
541does not seem like a suitable choice for an application-agnostic QUIC
542implementation. Thus the RR strategy is the better choice and the popular choice
543in a survey of implementations.
544
545The choice of `n` for the RR strategy is most trivially 1 but there are
546suggestions [1] that a higher value of `n` may lead to greater performance due
547to packet loss in typical networks occurring in small durations affecting small
548numbers of consecutive packets. Thus, if `n` is greater than 1, fewer streams
549will be affected by packet loss and held up on average. However, implementing
550different values of `n` poses no non-trivial implementation concerns, so it is
551not a major concern for discussion here. Such a parameter can easily be made
552configurable.
553
554Thus, we choose what active stream to select to fill in a packet on a
555revolving round robin basis, moving to the next stream in the round robin
556every `n` packets. If the available data in the active stream is not enough to
557fill a packet, we do also move to the next stream, so IRPM can still occur in
558this case.
559
560When we fill a packet with a stream, we start with any applicable `RESET_STREAM`
561or `STOP_SENDING` frames, followed by stream-level flow control frames if
562needed, followed by `STREAM` frames.
563
564(This means that `RESET_STREAM`, `STOP_SENDING`, `MAX_STREAM_DATA`,
565 `STREAM_DATA_BLOCKED` and `STREAM` frames are interleaved rather than occurring
566 in a fixed priority order; i.e., first there could be a `STOP_SENDING` frame
567 for one stream, then a `STREAM` frame for another, then another `STOP_SENDING`
568 frame for another stream, etc.)
569
570[1] [Same Standards; Different Decisions: A Study of QUIC and HTTP/3
571Implementation Diversity (Marx et al. 2020)](https://qlog.edm.uhasselt.be/epiq/files/QUICImplementationDiversity_Marx_final_11jun2020.pdf)
572[2] [Resource Multiplexing and Prioritization in HTTP/2 over TCP versus HTTP/3
573over QUIC (Marx et al. 2020)](https://h3.edm.uhasselt.be/files/ResourceMultiplexing_H2andH3_Marx2020.pdf)
574
575### Packets with Special Requirements
576
577Some packets have special requirements which the TX packetiser must meet:
578
579- **Padded Initial Datagrams.**
580 A datagram must always be padded to at least 1200 bytes if it contains an
581 Initial packet. (If there are multiple packets in the datagram, the padding
582 does not necessarily need to be part of the Initial packet itself.) This
583 serves to confirm that the QUIC minimum MTU is met.
584
585- **Token in Initial Packets.**
586 Initial packets may need to contain a token. If used, token is contained in
587 all further Initial packets sent by the client, not just the first Initial
588 packet.
589
590- **Anti-amplification Limit.** Sometimes a lower MDPL may be imposed due to
591 anti-amplification limits. (Only a concern for servers, so not relevant to
592 MVP.)
593
594 Note: It has been observed that a lot of implementations are not fastidious
595 about enforcing the amplification limit in terms of precise packet sizes.
596 Rather, they just use it to determine if they can send another packet, but not
597 to determine what size that packet must be. Implementations with 'precise'
598 anti-amplification implementations appear to be rare.
599
600- **MTU Probes.** These packets have a precisely crafted size for the purposes
601 of probing a path MTU. Unlike ordinary packets, they are routinely expected to
602 be lost and this loss should not be taken as a signal for congestion control
603 purposes. (Not relevant for MVP.)
604
605- **Path/Migration Probes.** These packets are sent to verify a new path
606 for the purposes of connection migration.
607
608- **ACK Manager Probes.** Packets produced because the ACK manager has
609 requested a probe be sent. These MUST be made ACK-eliciting (using a PING
610 frame if necessary). However, these packets need not be reserved exclusively
611 for ACK Manager purposes; they SHOULD contain new data if available, and MAY
612 contain old data.
613
614We handle the need for different kinds of packet via a notion of “archetypes”.
615The TX packetiser is requested to generate a datagram via the following call:
fabce809
P
616
617```c
a73078b7
HL
618/* Generate normal packets containing most frame types. */
619#define TX_PACKETISER_ARCHETYPE_NORMAL 0
620/* Generate ACKs only. */
621#define TX_PACKETISER_ARCHETYPE_ACK_ONLY 1
fabce809 622
a73078b7
HL
623int ossl_quic_tx_packetiser_generate(OSSL_QUIC_TX_PACKETISER *txp,
624 uint32_t archetype);
fabce809
P
625```
626
a73078b7
HL
627More archetypes can be added in the future as required. The archetype limits
628what frames can be placed into the packets of a datagram.
fabce809 629
a73078b7 630### Encryption Levels
fabce809 631
a73078b7
HL
632A QUIC connection progresses through Initial, Handshake, 0-RTT and 1-RTT
633encryption levels (ELs). The TX packetiser decides what EL to use to send a
634packet; or rather, it would be more accurate to say that the TX packetiser
635decides what ELs need a packet generating. Many resources are instantiated per
636EL, and can only be managed using a packet of that EL, therefore a datagram will
637frequently need to contain multiple packets to manage the resources of different
638ELs. We can thus view datagram construction as a process of determining if an EL
639needs to produce a packet for each EL, and concatenating the resulting packets.
fabce809 640
a73078b7 641The following EL-specific resources exist:
fabce809 642
a73078b7
HL
643- The crypto stream, a bidirectional byte stream abstraction provided
644 to the handshake layer. There is one crypto stream for each of the Initial,
645 Handshake and 1-RTT ELs. (`CRYPTO` frames are prohibited in 0-RTT packets,
646 which is to say the 0-RTT EL has no crypto stream of its own.)
fabce809 647
a73078b7
HL
648- Packet number spaces and acknowledgements. The 0-RTT and 1-RTT ELs
649 share a PN space, but Initial and Handshake ELs both have their own
650 PN spaces. Thus, Initial packets can only be acknowledged using an `ACK`
651 frame sent in an Initial packet, etc.
fabce809 652
a73078b7 653Thus, a fully generalised datagram construction methodology looks like this:
fabce809 654
a73078b7
HL
655- Let E be the set of ELs which are not discarded and for which `pending(el)` is
656 true, where `pending()` is a predicate function determining if the EL has data
657 to send.
fabce809 658
a73078b7
HL
659- Determine if we are limited by anti-amplification restrictions.
660 (Not relevant for MVP since this is only needed on the server side.)
fabce809 661
a73078b7
HL
662- For each EL in E, construct a packet bearing in mind the Remaining CMPPL
663 and append it to the datagram.
fabce809 664
a73078b7 665 For the Initial EL, we attach a token if we have been given one.
fabce809 666
a73078b7
HL
667 If Initial is in E, the total length of the resulting datagram must be at
668 least 1200, but it is up to us to which packets of which ELs in E we add
669 padding to.
fabce809 670
a73078b7 671- Send the datagram.
fabce809 672
a73078b7 673### TX Key Update
fabce809 674
a73078b7
HL
675The TX packetiser decides when to tell the QRL to initiate a TX-side key update.
676It decides this using information provided by the QRL.
fabce809
P
677
678### Restricting packet sizes
679
a73078b7 680Two factors impact the size of packets that can be sent:
fabce809 681
a73078b7 682* The maximum datagram payload length (MDPL)
fabce809
P
683* Congestion control
684
a73078b7
HL
685The MDPL limits the size of an entire datagram, whereas congestion control
686limits how much data can be in flight at any given time, which may cause a lower
687limit to be imposed on a given packet.
fabce809
P
688
689### Stateless Reset
690
691Refer to [RFC 9000 10.3 Stateless Reset]. It's entirely reasonable for
692the state machine to send this directly and immediately if required.
693
694[RFC 9000 2.3 Stream Prioritization]: https://datatracker.ietf.org/doc/html/rfc9000#section-2.3
695[RFC 9000 4.1 Data Flow Control]: https://datatracker.ietf.org/doc/html/rfc9000#section-4.1
696[RFC 9000 10.3 Stateless Reset]: https://datatracker.ietf.org/doc/html/rfc9000#section-10.3
697[RFC 9000 12.2 Coalescing Packets]: https://datatracker.ietf.org/doc/html/rfc9000#section-12.2
698[RFC 9000 12.4 Frames and Frame Types]: https://datatracker.ietf.org/doc/html/rfc9000#section-12.4
699[RFC 9000 13.3 Retransmission of Information]: https://datatracker.ietf.org/doc/html/rfc9000#section-13.3
700[RFC 9000 17.1 Packet Formats]: https://datatracker.ietf.org/doc/html/rfc9000#section-17
701[RFC 9000 17.2.1 Version Negotiation Packet]: https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.1
702[RFC 9000 17.2.2 Initial Packet]: https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.2
703[RFC 9000 17.2.3 0-RTT]: https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.3
704[RFC 9000 17.2.4 Handshake Packet]: https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.4
705[RFC 9000 17.2.5 Retry Packet]: https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.5
706[RFC 9000 17.3.1 1-RTT]: https://datatracker.ietf.org/doc/html/rfc9000#section-17.3.1
707[RFC 9002]: https://datatracker.ietf.org/doc/html/rfc9002