]>
Commit | Line | Data |
---|---|---|
fabce809 P |
1 | TX Packetiser |
2 | ============= | |
3 | ||
4 | This module creates frames from the application data obtained from | |
5 | the application. It also receives CRYPTO frames from the TLS Handshake | |
6 | Record Layer and ACK frames from the ACK Handling And Loss Detector | |
7 | subsystem. | |
8 | ||
9 | The packetiser also deals with the flow and congestion controllers. | |
10 | ||
11 | Creation & Destruction | |
12 | ---------------------- | |
13 | ||
14 | ```c | |
a73078b7 HL |
15 | typedef struct quic_tx_packetiser_args_st { |
16 | /* Configuration Settings */ | |
17 | QUIC_CONN_ID cur_scid; /* Current Source Connection ID we use. */ | |
18 | QUIC_CONN_ID cur_dcid; /* Current Destination Connection ID we use. */ | |
19 | BIO_ADDR peer; /* Current destination L4 address we use. */ | |
20 | /* ACK delay exponent used when encoding. */ | |
21 | uint32_t ack_delay_exponent; | |
22 | ||
23 | /* Injected Dependencies */ | |
24 | OSSL_QTX *qtx; /* QUIC Record Layer TX we are using */ | |
25 | QUIC_TXPIM *txpim; /* QUIC TX'd Packet Information Manager */ | |
26 | QUIC_CFQ *cfq; /* QUIC Control Frame Queue */ | |
27 | OSSL_ACKM *ackm; /* QUIC Acknowledgement Manager */ | |
28 | QUIC_STREAM_MAP *qsm; /* QUIC Streams Map */ | |
29 | QUIC_TXFC *conn_txfc; /* QUIC Connection-Level TX Flow Controller */ | |
30 | QUIC_RXFC *conn_rxfc; /* QUIC Connection-Level RX Flow Controller */ | |
31 | const OSSL_CC_METHOD *cc_method; /* QUIC Congestion Controller */ | |
32 | OSSL_CC_DATA *cc_data; /* QUIC Congestion Controller Instance */ | |
33 | OSSL_TIME (*now)(void *arg); /* Callback to get current time. */ | |
34 | void *now_arg; | |
35 | ||
36 | /* | |
37 | * Injected dependencies - crypto streams. | |
38 | * | |
39 | * Note: There is no crypto stream for the 0-RTT EL. | |
40 | * crypto[QUIC_PN_SPACE_APP] is the 1-RTT crypto stream. | |
41 | */ | |
42 | QUIC_SSTREAM *crypto[QUIC_PN_SPACE_NUM]; | |
43 | } QUIC_TX_PACKETISER_ARGS; | |
fabce809 P |
44 | |
45 | _owur typedef struct ossl_quic_tx_packetiser_st OSSL_QUIC_TX_PACKETISER; | |
46 | ||
a73078b7 | 47 | OSSL_QUIC_TX_PACKETISER *ossl_quic_tx_packetiser_new(QUIC_TX_PACKETISER_ARGS *args); |
fabce809 P |
48 | void ossl_quic_tx_packetiser_free(OSSL_QUIC_TX_PACKETISER *tx); |
49 | ``` | |
50 | ||
51 | Structures | |
52 | ---------- | |
53 | ||
54 | ### Connection | |
55 | ||
56 | Represented by an QUIC_CONNECTION object. | |
57 | ||
58 | ### Stream | |
59 | ||
60 | Represented by an QUIC_STREAM object. | |
61 | ||
62 | As per [RFC 9000 2.3 Stream Prioritization], streams should contain a priority | |
63 | provided by the calling application. For MVP, this is not required to be | |
64 | implemented because only one stream is supported. However, packets being | |
65 | retransmitted should be preferentially sent as noted in | |
66 | [RFC 9000 13.3 Retransmission of Information]. | |
67 | ||
68 | ```c | |
69 | void SSL_set_priority(SSL *stream, uint32_t priority); | |
70 | uint32_t SSL_get_priority(SSL *stream); | |
71 | ``` | |
72 | ||
73 | For protocols where priority is not meaningful, the set function is a noop and | |
74 | the get function returns a constant value. | |
75 | ||
a73078b7 HL |
76 | Interactions |
77 | ------------ | |
78 | ||
79 | The packetiser interacts with the following components, the APIs for which | |
80 | can be found in their respective design documents and header files: | |
81 | ||
82 | - SSTREAM: manages application stream data for transmission. | |
83 | - QUIC_STREAM_MAP: Maps stream IDs to QUIC_STREAM objects and tracks which | |
84 | streams are active (i.e., need servicing by the TX packetiser). | |
85 | - Crypto streams for each EL other than 0-RTT (each is one SSTREAM). | |
86 | - CFQ: queried for generic control frames | |
87 | - QTX: record layer which completed packets are written to. | |
88 | - TXPIM: logs information about transmitted packets, provides information to | |
89 | FIFD. | |
90 | - FIFD: notified of transmitted packets. | |
91 | - ACKM: loss detector. | |
92 | - Connection and stream-level TXFC and RXFC instances. | |
93 | - Congestion controller (not needed for MVP). | |
94 | ||
95 | ### SSTREAM | |
96 | ||
97 | Each application or crypto stream has a SSTREAM object for the sending part. | |
98 | This manages the buffering of data written to the stream, frees that data when | |
99 | the packet it was sent in was acknowledged, and can return the data for | |
100 | retransmission on loss. It receives loss and acknowledgement notifications from | |
101 | the FIFD without direct TX packetiser involvement. | |
102 | ||
103 | ### QUIC Stream Map | |
104 | ||
105 | The TX packetiser queries the QUIC stream map for a list of active streams | |
106 | (QUIC_STREAM), which are iterated on a rotating round robin basis. Each | |
107 | QUIC_STREAM provides access to the various components, such as a QUIC_SSTREAM | |
108 | instance (for streams with a send part). Streams are marked inactive when | |
109 | they no longer have any need to generate frames at the present time. | |
fabce809 | 110 | |
a73078b7 HL |
111 | ### Crypto Streams |
112 | ||
113 | The crypto streams for each EL (other than 0-RTT, which does not have a crypto | |
114 | stream) are represented by SSTREAM instances. The TX packetiser queries SSTREAM | |
115 | instances provided to it as needed when generating packets. | |
116 | ||
117 | ### CFQ | |
118 | ||
119 | Many control frames do not require special handling and are handled by the | |
120 | generic CFQ mechanism. The TX packetiser queries the CFQ for any frames to be | |
121 | sent and schedules them into a packet. | |
122 | ||
123 | ### QUIC Write Record Layer | |
124 | ||
125 | Coalesced frames are passed to the QUIC record layer for encryption and sending. | |
126 | To send accumulated frames as packets to the QUIC Write Record Layer: | |
fabce809 P |
127 | |
128 | ```c | |
a73078b7 HL |
129 | int ossl_qtx_write_pkt(OSSL_QTX *qtx, const OSSL_QTX_PKT *pkt); |
130 | ``` | |
fabce809 | 131 | |
a73078b7 HL |
132 | The packetiser will attempt to maximise the number of bytes in a packet. |
133 | It will also attempt to create multiple packets to send simultaneously. | |
fabce809 | 134 | |
a73078b7 HL |
135 | The packetiser should also implement a wait time to allow more data to |
136 | accumulate before exhausting it's supply of data. The length of the wait | |
137 | will depend on how much data is queued already and how much space remains in | |
138 | the packet being filled. Once the wait is finished, the packets will be sent | |
139 | by calling: | |
fabce809 | 140 | |
a73078b7 HL |
141 | ```c |
142 | void ossl_qtx_flush_net(OSSL_QTX *qtx); | |
fabce809 P |
143 | ``` |
144 | ||
a73078b7 HL |
145 | The write record layer is responsible for coalescing multiple QUIC packets |
146 | into datagrams. | |
147 | ||
148 | ### TXPIM, FIFD, ACK Handling and Loss Detector | |
149 | ||
150 | ACK handling and loss detection is provided by the ACKM and FIFD. The FIFD uses | |
151 | the per-packet information recorded by the TXPIM to track which frames are | |
152 | contained within a packet which was lost or acknowledged, and generates | |
153 | callbacks to the TX packetiser, SSTREAM instances and CFQ to allow it to | |
154 | regenerate those frames as needed. | |
155 | ||
156 | 1. When a packet is sent, the packetiser informs the FIFD, which also informs | |
157 | the ACK Manager. | |
158 | 2. When a packet is ACKed, the FIFD notifies applicable SSTREAMs and the CFQ | |
159 | as appropriate. | |
160 | 3. When a packet is lost, the FIFD notifies the TX packetiser of any frames | |
161 | which were in the lost packet for which the Regenerate strategy is | |
162 | applicable. | |
163 | 4. Currently, no notifications to the TX packetiser are needed when packets | |
164 | are discarded (e.g. due to an EL being discarded). | |
165 | ||
166 | ### Flow Control | |
167 | ||
168 | The packetiser interacts with connection and stream-level TXFC and RXFC | |
169 | instances. It interacts with RXFC instances to know when to generate flow | |
170 | control frames, and with TXFC instances to know how much stream data it is | |
171 | allowed to send in a packet. | |
172 | ||
173 | ### Congestion Control | |
174 | ||
175 | The packetiser is likely to interact with the congestion controller in the | |
176 | future. Currently, congestion control is a no-op. | |
177 | ||
178 | Packets | |
179 | ------- | |
180 | ||
181 | Packet formats are defined in [RFC 9000 17.1 Packet Formats]. | |
182 | ||
183 | ### Packet types | |
fabce809 | 184 | |
a73078b7 HL |
185 | QUIC supports a number of different packets. The combination of packets of |
186 | different encryption levels as per [RFC 9000 12.2 Coalescing Packets], is done | |
187 | by the record layer. Non-encrypted packets are not handled by the TX Packetiser | |
188 | and callers may send them by direct calls to the record layer. | |
189 | ||
190 | #### Initial Packet | |
191 | ||
192 | Refer to [RFC 9000 17.2.2 Initial Packet]. | |
193 | ||
194 | #### Handshake Packet | |
195 | ||
196 | Refer to [RFC 9000 17.2.4 Handshake Packet]. | |
197 | ||
198 | #### App Data 0-RTT Packet | |
199 | ||
200 | Refer to [RFC 9000 17.2.3 0-RTT]. | |
201 | ||
202 | #### App Data 1-RTT Packet | |
203 | ||
204 | Refer to [RFC 9000 17.3.1 1-RTT]. | |
205 | ||
206 | Packetisation and Processing | |
207 | ---------------------------- | |
208 | ||
209 | ### Definitions | |
210 | ||
211 | - Maximum Datagram Payload Length (MDPL): The maximum number of UDP payload | |
212 | bytes we can put in a UDP packet. This is derived from the applicable PMTU. | |
213 | This is also the maximum size of a single QUIC packet if we place only one | |
214 | packet in a datagram. The MDPL may vary based on both local source IP and | |
215 | destination IP due to different path MTUs. | |
216 | ||
217 | - Maximum Packet Length (MPL): The maximum size of a fully encrypted | |
218 | and serialized QUIC packet in bytes in some given context. Typically | |
219 | equal to the MDPL and never greater than it. | |
220 | ||
221 | - Maximum Plaintext Payload Length (MPPL): The maximum number of plaintext | |
222 | bytes we can put in the payload of a QUIC packet. This is related to | |
223 | the MDPL by the size of the encoded header and the size of any AEAD | |
224 | authentication tag which will be attached to the ciphertext. | |
225 | ||
226 | - Coalescing MPL (CMPL): The maximum number of bytes left to serialize | |
227 | another QUIC packet into the same datagram as one or more previous | |
228 | packets. This is just the MDPL minus the total size of all previous | |
229 | packets already serialized into to the same datagram. | |
230 | ||
231 | - Coalescing MPPL (CMPPL): The maximum number of payload bytes we can put in | |
232 | the payload of another QUIC packet which is to be coalesced with one or | |
233 | more previous QUIC packets and placed into the same datagram. Essentially, | |
234 | this is the room we have left for another packet payload. | |
235 | ||
236 | - Remaining CMPPL (RCMPPL): The number of bytes left in a packet whose payload | |
237 | we are currently forming. This is the CMPPL minus any bytes we have already | |
238 | put into the payload. | |
239 | ||
240 | - Minimum Datagram Length (MinDPL): In some cases we must ensure a datagram | |
241 | has a minimum size of a certain number of bytes. This does not need to be | |
242 | accomplished with a single packet, but we may need to add PADDING frames | |
243 | to the final packet added to a datagram in this case. | |
244 | ||
245 | - Minimum Packet Length (MinPL): The minimum serialized packet length we | |
246 | are using while serializing a given packet. May often be 0. Used to meet | |
247 | MinDPL requirements, and thus equal to MinDPL minus the length of any packets | |
248 | we have already encoded into the datagram. | |
249 | ||
250 | - Minimum Plaintext Payload Length (MinPPL): The minimum number of bytes | |
251 | which must be placed into a packet payload in order to meet the MinPL | |
252 | minimum size when the packet is encoded. | |
253 | ||
254 | - Active Stream: A stream which has data or flow control frames ready for | |
255 | transmission. | |
256 | ||
257 | ### Frames | |
fabce809 P |
258 | |
259 | Frames are taken from [RFC 9000 12.4 Frames and Frame Types]. | |
260 | ||
261 | | Type | Name | I | H | 0 | 1 | N | C | P | F | | |
262 | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | |
263 | | 0x00 | padding | ✓ | ✓ | ✓ | ✓ | ✓ | | ✓ | |
264 | | 0x01 | ping | ✓ | ✓ | ✓ | ✓ | | | | | | |
265 | | 0x02 | ack 0x02 | ✓ | ✓ | | ✓ | ✓ | ✓ | | | | |
266 | | 0x03 | ack 0x03 | ✓ | ✓ | | ✓ | ✓ | ✓ | | | | |
267 | | 0x04 | reset_stream | | | ✓ | ✓ | | | | | | |
268 | | 0x05 | stop_sending | | | ✓ | ✓ | | | | | | |
269 | | 0x06 | crypto | ✓ | ✓ | | ✓ | | | | | | |
270 | | 0x07 | new_token | | | | ✓ | | | | | | |
271 | | 0x08 | stream 0x08 | | | ✓ | ✓ | | | | ✓ | | |
272 | | 0x09 | stream 0x09 | | | ✓ | ✓ | | | | ✓ | | |
273 | | 0x0A | stream 0x0A | | | ✓ | ✓ | | | | ✓ | | |
274 | | 0x0B | stream 0x0B | | | ✓ | ✓ | | | | ✓ | | |
275 | | 0x0C | stream 0x0C | | | ✓ | ✓ | | | | ✓ | | |
276 | | 0x0D | stream 0x0D | | | ✓ | ✓ | | | | ✓ | | |
277 | | 0x0E | stream 0x0E | | | ✓ | ✓ | | | | ✓ | | |
278 | | 0x0F | stream 0x0F | | | ✓ | ✓ | | | | ✓ | | |
279 | | 0x10 | max_data | | | ✓ | ✓ | | | | | | |
280 | | 0x11 | max_stream_data | | | ✓ | ✓ | | | | | | |
281 | | 0x12 | max_streams 0x12 | | | ✓ | ✓ | | | | | | |
282 | | 0x13 | max_streams 0x13 | | | ✓ | ✓ | | | | | | |
283 | | 0x14 | data_blocked | | | ✓ | ✓ | | | | | | |
284 | | 0x15 | stream_data_blocked | | | ✓ | ✓ | | | | | | |
285 | | 0x16 | streams_blocked 0x16 | | | ✓ | ✓ | | | | | | |
286 | | 0x17 | streams_blocked 0x17 | | | ✓ | ✓ | | | | | | |
287 | | 0x18 | new_connection_id | | | ✓ | ✓ | | | ✓ | | | |
288 | | 0x19 | retire_connection_id | | | ✓ | ✓ | | | | | | |
289 | | 0x1A | path_challenge | | | ✓ | ✓ | | | ✓ | | | |
290 | | 0x1B | path_response | | | | ✓ | | | ✓ | | | |
291 | | 0x1C | connection_close 0x1C | ✓ | ✓ | ✓ | ✓ | ✓ | |
292 | | 0x1D | connection_close 0x1D | | | ✓ | ✓ | ✓ | | | | | |
293 | | 0x1E | handshake_done | | | | ✓ | | | | | | |
294 | ||
295 | The various fields are as defined in RFC 9000. | |
296 | ||
a73078b7 | 297 | #### Pkts |
fabce809 P |
298 | |
299 | _Pkts_ are defined as: | |
300 | ||
301 | | Pkts | Description| | |
302 | | :---: | --- | | |
303 | | I | Valid in Initial packets| | |
304 | | H | Valid in Handshake packets| | |
305 | | 0 | Valid in 0-RTT packets| | |
306 | | 1 | Valid in 1-RTT packets| | |
307 | ||
a73078b7 | 308 | #### Spec |
fabce809 P |
309 | |
310 | _Spec_ is defined as: | |
311 | ||
312 | | Spec | Description | | |
313 | | :---: | --- | | |
314 | | N | Not ack-eliciting. | | |
315 | | C | does not count toward bytes in flight for congestion control purposes. | | |
316 | | P | Can be used to probe new network paths during connection migration. | | |
317 | | F | The contents of frames with this marking are flow controlled. | | |
318 | ||
319 | For `C`, `N` and `P`, the entire packet must consist of only frames with the | |
320 | marking for the packet to qualify for it. For example, a packet with an ACK | |
321 | frame and a _stream_ frame would qualify for neither the `C` or `N` markings. | |
322 | ||
fabce809 P |
323 | #### Notes |
324 | ||
325 | - Do we need the distinction between 0-rtt and 1-rtt when both are in | |
326 | the Application Data number space? | |
327 | - 0-RTT packets can morph into 1-RTT packets and this needs to be handled by | |
328 | the packetiser. | |
329 | ||
a73078b7 | 330 | ### Frame Type Prioritisation |
fabce809 | 331 | |
a73078b7 HL |
332 | The frame types listed above are reordered below in the order of priority with |
333 | which we want to serialize them. We discuss the motivations for this priority | |
334 | ordering below. Items without a line between them have the same priority. | |
fabce809 | 335 | |
a73078b7 HL |
336 | ```plain |
337 | HANDSHAKE_DONE GCR / REGEN | |
338 | ---------------------------- | |
339 | MAX_DATA REGEN | |
340 | DATA_BLOCKED REGEN | |
341 | MAX_STREAMS REGEN | |
342 | STREAMS_BLOCKED REGEN | |
343 | ---------------------------- | |
fabce809 | 344 | |
fabce809 | 345 | |
a73078b7 HL |
346 | NEW_CONNECTION_ID GCR |
347 | RETIRE_CONNECTION_ID GCR | |
348 | ---------------------------- | |
349 | PATH_CHALLENGE - | |
350 | PATH_RESPONSE - | |
351 | ---------------------------- | |
352 | ACK - (non-ACK-eliciting) | |
353 | ---------------------------- | |
354 | CONNECTION_CLOSE *** (non-ACK-eliciting) | |
355 | ---------------------------- | |
356 | NEW_TOKEN GCR | |
fabce809 | 357 | |
a73078b7 HL |
358 | ---------------------------- |
359 | CRYPTO GCR/*q | |
360 | ||
361 | ============================ ] priority group, repeats per stream | |
362 | RESET_STREAM GCR* ] | |
363 | STOP_SENDING GCR* ] | |
364 | ---------------------------- ] | |
365 | MAX_STREAM_DATA REGEN ] | |
366 | STREAM_DATA_BLOCKED REGEN ] | |
367 | ---------------------------- ] | |
368 | STREAM *q ] | |
369 | ============================ ] | |
fabce809 | 370 | |
a73078b7 HL |
371 | ---------------------------- |
372 | PING - | |
373 | ---------------------------- | |
374 | PADDING - (non-ACK-eliciting) | |
fabce809 P |
375 | ``` |
376 | ||
a73078b7 HL |
377 | (See [Frame in Flight Manager](quic-fifm.md) for information on the meaning of |
378 | the second column, which specifies the retransmission strategy for each frame | |
379 | type.) | |
380 | ||
381 | - `PADDING`: For obvious reasons, this frame type is the lowest priority. We only | |
382 | add `PADDING` frames at the very end after serializing all other frames if we | |
383 | have been asked to ensure a non-zero MinPL but have not yet met that minimum. | |
384 | ||
385 | - `PING`: The `PING` frame is encoded as a single byte. It is used to make a packet | |
386 | ACK-eliciting if it would not otherwise be ACK-eliciting. Therefore we only | |
387 | need to send it if | |
388 | ||
389 | a. we have been asked to ensure the packet is ACK-eliciting, and | |
390 | b. we do not have any other ACK-eliciting frames in the packet. | |
391 | ||
392 | Thus we wait until the end before adding the PING frame as we may end up | |
393 | adding other ACK-eliciting frames and not need to add it. There is never | |
394 | a need to add more than one PING frame. If we have been asked to ensure | |
395 | the packet is ACK-eliciting and we do not know for sure up front if we will | |
396 | add any other ACK-eliciting packet, we must reserve one byte of our CMPPL | |
397 | to ensure we have room for this. We can cancel this reservation if we | |
398 | add an ACK-eliciting frame earlier. For example: | |
399 | ||
400 | - We have been asked to ensure a packet is ACK-eliciting and the CMPPL is | |
401 | 1000 (we are coalescing with another packet). | |
402 | - We allocate 999 bytes for non-PING frames. | |
403 | - While adding non-PING frames, we add a STREAM frame, which is | |
404 | ACK-eliciting, therefore the PING frame reservation is cancelled | |
405 | and we increase our allocation for non-PING frames to 1000 bytes. | |
406 | ||
407 | - `HANDSHAKE_DONE`: This is a single byte frame with no data which is used to | |
408 | indicate handshake completion. It is only ever sent once. As such, it can be | |
409 | implemented as a single flag, and there is no risk of it outcompeting other | |
410 | frames. It is therefore trivially given the highest priority. | |
411 | ||
412 | - `MAX_DATA`, `DATA_BLOCKED`: These manage connection-level flow control. They | |
413 | consist of a single integer argument, and, as such, take up little space, but | |
414 | are also critical to ensuring the timely expansion of the connection-level | |
415 | flow control window. Thus there is a performance reason to include them in | |
416 | packets with high priority and due to their small size and the fact that there | |
417 | will only ever be at most one per packet, there is no risk of them | |
418 | outcompeting other frames. | |
419 | ||
420 | - `MAX_STREAMS`, `STREAMS_BLOCKED`: Similar to the frames above for | |
421 | connection-level flow control, but controls rate at which new streams are | |
422 | opened. The same arguments apply here, so they are prioritised equally. | |
423 | ||
424 | - `STREAM`: This is the bread and butter of a QUIC packet, and contains | |
425 | application-level stream data. As such these frames can usually be expected to | |
426 | consume most of our packet's payload budget. We must generally assume that | |
427 | ||
428 | - there are many streams, and | |
429 | - several of those streams have much more data waiting to be sent than | |
430 | can be sent in a single packet. | |
431 | ||
432 | Therefore we must ensure some level of balance between multiple competing | |
433 | streams. We refer to this as stream scheduling. There are many strategies that | |
434 | can be used for this, and in the future we might even support | |
435 | application-signalled prioritisation of specific streams. We discuss | |
436 | stream scheduling further below. | |
437 | ||
438 | Because these frames are expected to make up the bulk of most packets, we | |
439 | consider them low priority, higher only than `PING` and `PADDING` frames. | |
440 | Moreover, we give priority to control frames as unlike `STREAM` frames, they | |
441 | are vital to the maintenance of the health of the connection itself. Once we | |
442 | have serialized all other frame types, we can reserve the rest of the packet | |
443 | for any `STREAM` frames. Since all `STREAM` frames are ACK-eliciting, if we | |
444 | have any `STREAM` frame to send at all, it cancels any need for any `PING` | |
445 | frame, and may be able to partially or wholly obviate our need for any | |
446 | `PADDING` frames which we might otherwise have needed. Thus once we start | |
447 | serializing STREAM frames, we are limited only by the remaining CMPPL. | |
448 | ||
449 | - `MAX_STREAM_DATA`, `STREAM_DATA_BLOCKED`: Stream-level flow control. These | |
450 | contain only a stream ID and integer value used for flow control, so they are | |
451 | not large. Since they are critical to the management and health of a specific | |
452 | stream, and because they are small and have no risk of stealing too many bytes | |
453 | from the `STREAM` frames they follow, we always serialize these before any | |
454 | corresponding `STREAM` frames for a given stream ID. | |
455 | ||
456 | - `RESET_STREAM`, `STOP_SENDING`: These terminate a given stream ID and thus are | |
457 | also associated with a stream. They are also small. As such, we consider these | |
458 | higher priority than both `STREAM` frames and the stream-level flow control | |
459 | frames. | |
460 | ||
461 | - `NEW_CONNECTION_ID`, `RETIRE_CONNECTION_ID`: These are critical for connection | |
462 | management and are not particularly large, therefore they are given a high | |
463 | priority. | |
464 | ||
465 | - `PATH_CHALLENGE`, `PATH_RESPONSE`: Used during connection migration, these | |
466 | are small and are given a high priority. | |
467 | ||
468 | - `CRYPTO`: These frames generate the logical crypto stream, which is a logical | |
469 | bidirectional bytestream used to transport TLS records for connection | |
470 | handshake and management purposes. As such, the crypto stream is viewed as | |
471 | similar to application streams but of a higher priority. We are willing to let | |
472 | `CRYPTO` frames outcompete all application stream-related frames if need be, | |
473 | as `CRYPTO` frames are more important to the maintenance of the connection and | |
474 | the handshake layer should not generate an excessive amount of data. | |
475 | ||
476 | - `CONNECTION_CLOSE`, `NEW_TOKEN`: The `CONNECTION_CLOSE` frame can contain a | |
477 | user-specified reason string. The `NEW_TOKEN` frame contains an opaque token | |
478 | blob. Both can be arbitrarily large but for the fact that they must fit in a | |
479 | single packet and are thus ultimately limited by the MPPL. However, these | |
480 | frames are important to connection maintenance and thus are given a priority | |
481 | just above that of `CRYPTO` frames. The `CONNECTION_CLOSE` frame has higher | |
482 | priority than `NEW_TOKEN`. | |
483 | ||
484 | - `ACK`: `ACK` frames are critical to avoid needless retransmissions by our peer. | |
485 | They can also potentially become large if a large number of ACK ranges needs | |
486 | to be transmitted. Thus `ACK` frames are given a fairly high priority; | |
487 | specifically, their priority is higher than all frames which have the | |
488 | potential to be large but below all frames which contain only limited data, | |
489 | such as connection-level flow control. However, we reserve the right to adapt | |
490 | the size of the ACK frames we transmit by chopping off some of the PN ranges | |
491 | to limit the size of the ACK frame if its size would be otherwise excessive. | |
492 | This ensures that the high priority of the ACK frame does not starve the | |
493 | packet of room for stream data. | |
494 | ||
495 | ### Stream Scheduling | |
496 | ||
497 | **Stream budgeting.** When it is time to add STREAM frames to a packet under | |
498 | construction, we take our Remaining CMPPL and call this value the Streams | |
499 | Budget. There are many ways we could make use of this Streams Budget. | |
500 | ||
501 | For the purposes of stream budgeting, we consider all bytes of STREAM frames, | |
502 | stream-level flow control frames, RESET_STREAM and STOP_SENDING frames to | |
503 | “belong” to their respective streams, and the encoded sizes of these frames are | |
504 | accounted to those streams for budgeting purposes. If the total number of bytes | |
505 | of frames necessary to serialize all pending data from all active streams is | |
506 | less than our Streams Budget, there is no need for any prioritisation. | |
507 | Otherwise, there are a number of strategies we could employ. We can categorise | |
508 | the possible strategies into two groups to begin with: | |
509 | ||
510 | - **Intrapacket muxing (IRPM)**. When the data available to send across all | |
511 | streams exceeds the Streams Budget for the packet, allocate an equal | |
512 | portion of the packet to each stream. | |
513 | ||
514 | - **Interpacket muxing (IXPM).** When the data available to send across all | |
515 | streams exceeds the Streams Budget for the packet, try to fill the packet | |
516 | using as few streams as possible, and multiplex by using different | |
517 | streams in different packets. | |
518 | ||
519 | Though obvious, IRPM does not appear to be a widely used strategy [1] [2], | |
520 | probably due to a clear downside: if a packet is lost and it contains data for | |
521 | multiple streams, all of those streams will be held up. This undermines a key | |
522 | advantage of QUIC, namely the ability of streams to function independently of | |
523 | one another for the purposes of head-of-line blocking. By contrast, with IXPM, | |
524 | if a packet is lost, typically only a single stream is held up. | |
525 | ||
526 | Suppose we choose IXPM. We must now choose a strategy for deciding when to | |
527 | schedule streams on packets. [1] establishes that there are two basic | |
528 | strategies found in use: | |
529 | ||
530 | - A round robin (RR) strategy in which the frame scheduler switches to | |
531 | the next active stream every n packets (where n ≥ 1). | |
532 | ||
533 | - A sequential (SEQ) strategy in which a stream keeps being transmitted | |
534 | until it is no longer active. | |
535 | ||
536 | The SEQ strategy does not appear to be suitable for general-purpose | |
537 | applications as it presumably starves other streams of bandwidth. It appears | |
538 | that this strategy may be chosen in some implementations because it can offer | |
539 | greater efficiency with HTTP/3, where there are performance benefits to | |
540 | completing transmission of one stream before beginning the next. However, it | |
541 | does not seem like a suitable choice for an application-agnostic QUIC | |
542 | implementation. Thus the RR strategy is the better choice and the popular choice | |
543 | in a survey of implementations. | |
544 | ||
545 | The choice of `n` for the RR strategy is most trivially 1 but there are | |
546 | suggestions [1] that a higher value of `n` may lead to greater performance due | |
547 | to packet loss in typical networks occurring in small durations affecting small | |
548 | numbers of consecutive packets. Thus, if `n` is greater than 1, fewer streams | |
549 | will be affected by packet loss and held up on average. However, implementing | |
550 | different values of `n` poses no non-trivial implementation concerns, so it is | |
551 | not a major concern for discussion here. Such a parameter can easily be made | |
552 | configurable. | |
553 | ||
554 | Thus, we choose what active stream to select to fill in a packet on a | |
555 | revolving round robin basis, moving to the next stream in the round robin | |
556 | every `n` packets. If the available data in the active stream is not enough to | |
557 | fill a packet, we do also move to the next stream, so IRPM can still occur in | |
558 | this case. | |
559 | ||
560 | When we fill a packet with a stream, we start with any applicable `RESET_STREAM` | |
561 | or `STOP_SENDING` frames, followed by stream-level flow control frames if | |
562 | needed, followed by `STREAM` frames. | |
563 | ||
564 | (This means that `RESET_STREAM`, `STOP_SENDING`, `MAX_STREAM_DATA`, | |
565 | `STREAM_DATA_BLOCKED` and `STREAM` frames are interleaved rather than occurring | |
566 | in a fixed priority order; i.e., first there could be a `STOP_SENDING` frame | |
567 | for one stream, then a `STREAM` frame for another, then another `STOP_SENDING` | |
568 | frame for another stream, etc.) | |
569 | ||
570 | [1] [Same Standards; Different Decisions: A Study of QUIC and HTTP/3 | |
571 | Implementation Diversity (Marx et al. 2020)](https://qlog.edm.uhasselt.be/epiq/files/QUICImplementationDiversity_Marx_final_11jun2020.pdf) | |
572 | [2] [Resource Multiplexing and Prioritization in HTTP/2 over TCP versus HTTP/3 | |
573 | over QUIC (Marx et al. 2020)](https://h3.edm.uhasselt.be/files/ResourceMultiplexing_H2andH3_Marx2020.pdf) | |
574 | ||
575 | ### Packets with Special Requirements | |
576 | ||
577 | Some packets have special requirements which the TX packetiser must meet: | |
578 | ||
579 | - **Padded Initial Datagrams.** | |
580 | A datagram must always be padded to at least 1200 bytes if it contains an | |
581 | Initial packet. (If there are multiple packets in the datagram, the padding | |
582 | does not necessarily need to be part of the Initial packet itself.) This | |
583 | serves to confirm that the QUIC minimum MTU is met. | |
584 | ||
585 | - **Token in Initial Packets.** | |
586 | Initial packets may need to contain a token. If used, token is contained in | |
587 | all further Initial packets sent by the client, not just the first Initial | |
588 | packet. | |
589 | ||
590 | - **Anti-amplification Limit.** Sometimes a lower MDPL may be imposed due to | |
591 | anti-amplification limits. (Only a concern for servers, so not relevant to | |
592 | MVP.) | |
593 | ||
594 | Note: It has been observed that a lot of implementations are not fastidious | |
595 | about enforcing the amplification limit in terms of precise packet sizes. | |
596 | Rather, they just use it to determine if they can send another packet, but not | |
597 | to determine what size that packet must be. Implementations with 'precise' | |
598 | anti-amplification implementations appear to be rare. | |
599 | ||
600 | - **MTU Probes.** These packets have a precisely crafted size for the purposes | |
601 | of probing a path MTU. Unlike ordinary packets, they are routinely expected to | |
602 | be lost and this loss should not be taken as a signal for congestion control | |
603 | purposes. (Not relevant for MVP.) | |
604 | ||
605 | - **Path/Migration Probes.** These packets are sent to verify a new path | |
606 | for the purposes of connection migration. | |
607 | ||
608 | - **ACK Manager Probes.** Packets produced because the ACK manager has | |
609 | requested a probe be sent. These MUST be made ACK-eliciting (using a PING | |
610 | frame if necessary). However, these packets need not be reserved exclusively | |
611 | for ACK Manager purposes; they SHOULD contain new data if available, and MAY | |
612 | contain old data. | |
613 | ||
614 | We handle the need for different kinds of packet via a notion of “archetypes”. | |
615 | The TX packetiser is requested to generate a datagram via the following call: | |
fabce809 P |
616 | |
617 | ```c | |
a73078b7 HL |
618 | /* Generate normal packets containing most frame types. */ |
619 | #define TX_PACKETISER_ARCHETYPE_NORMAL 0 | |
620 | /* Generate ACKs only. */ | |
621 | #define TX_PACKETISER_ARCHETYPE_ACK_ONLY 1 | |
fabce809 | 622 | |
a73078b7 HL |
623 | int ossl_quic_tx_packetiser_generate(OSSL_QUIC_TX_PACKETISER *txp, |
624 | uint32_t archetype); | |
fabce809 P |
625 | ``` |
626 | ||
a73078b7 HL |
627 | More archetypes can be added in the future as required. The archetype limits |
628 | what frames can be placed into the packets of a datagram. | |
fabce809 | 629 | |
a73078b7 | 630 | ### Encryption Levels |
fabce809 | 631 | |
a73078b7 HL |
632 | A QUIC connection progresses through Initial, Handshake, 0-RTT and 1-RTT |
633 | encryption levels (ELs). The TX packetiser decides what EL to use to send a | |
634 | packet; or rather, it would be more accurate to say that the TX packetiser | |
635 | decides what ELs need a packet generating. Many resources are instantiated per | |
636 | EL, and can only be managed using a packet of that EL, therefore a datagram will | |
637 | frequently need to contain multiple packets to manage the resources of different | |
638 | ELs. We can thus view datagram construction as a process of determining if an EL | |
639 | needs to produce a packet for each EL, and concatenating the resulting packets. | |
fabce809 | 640 | |
a73078b7 | 641 | The following EL-specific resources exist: |
fabce809 | 642 | |
a73078b7 HL |
643 | - The crypto stream, a bidirectional byte stream abstraction provided |
644 | to the handshake layer. There is one crypto stream for each of the Initial, | |
645 | Handshake and 1-RTT ELs. (`CRYPTO` frames are prohibited in 0-RTT packets, | |
646 | which is to say the 0-RTT EL has no crypto stream of its own.) | |
fabce809 | 647 | |
a73078b7 HL |
648 | - Packet number spaces and acknowledgements. The 0-RTT and 1-RTT ELs |
649 | share a PN space, but Initial and Handshake ELs both have their own | |
650 | PN spaces. Thus, Initial packets can only be acknowledged using an `ACK` | |
651 | frame sent in an Initial packet, etc. | |
fabce809 | 652 | |
a73078b7 | 653 | Thus, a fully generalised datagram construction methodology looks like this: |
fabce809 | 654 | |
a73078b7 HL |
655 | - Let E be the set of ELs which are not discarded and for which `pending(el)` is |
656 | true, where `pending()` is a predicate function determining if the EL has data | |
657 | to send. | |
fabce809 | 658 | |
a73078b7 HL |
659 | - Determine if we are limited by anti-amplification restrictions. |
660 | (Not relevant for MVP since this is only needed on the server side.) | |
fabce809 | 661 | |
a73078b7 HL |
662 | - For each EL in E, construct a packet bearing in mind the Remaining CMPPL |
663 | and append it to the datagram. | |
fabce809 | 664 | |
a73078b7 | 665 | For the Initial EL, we attach a token if we have been given one. |
fabce809 | 666 | |
a73078b7 HL |
667 | If Initial is in E, the total length of the resulting datagram must be at |
668 | least 1200, but it is up to us to which packets of which ELs in E we add | |
669 | padding to. | |
fabce809 | 670 | |
a73078b7 | 671 | - Send the datagram. |
fabce809 | 672 | |
a73078b7 | 673 | ### TX Key Update |
fabce809 | 674 | |
a73078b7 HL |
675 | The TX packetiser decides when to tell the QRL to initiate a TX-side key update. |
676 | It decides this using information provided by the QRL. | |
fabce809 P |
677 | |
678 | ### Restricting packet sizes | |
679 | ||
a73078b7 | 680 | Two factors impact the size of packets that can be sent: |
fabce809 | 681 | |
a73078b7 | 682 | * The maximum datagram payload length (MDPL) |
fabce809 P |
683 | * Congestion control |
684 | ||
a73078b7 HL |
685 | The MDPL limits the size of an entire datagram, whereas congestion control |
686 | limits how much data can be in flight at any given time, which may cause a lower | |
687 | limit to be imposed on a given packet. | |
fabce809 P |
688 | |
689 | ### Stateless Reset | |
690 | ||
691 | Refer to [RFC 9000 10.3 Stateless Reset]. It's entirely reasonable for | |
692 | the state machine to send this directly and immediately if required. | |
693 | ||
694 | [RFC 9000 2.3 Stream Prioritization]: https://datatracker.ietf.org/doc/html/rfc9000#section-2.3 | |
695 | [RFC 9000 4.1 Data Flow Control]: https://datatracker.ietf.org/doc/html/rfc9000#section-4.1 | |
696 | [RFC 9000 10.3 Stateless Reset]: https://datatracker.ietf.org/doc/html/rfc9000#section-10.3 | |
697 | [RFC 9000 12.2 Coalescing Packets]: https://datatracker.ietf.org/doc/html/rfc9000#section-12.2 | |
698 | [RFC 9000 12.4 Frames and Frame Types]: https://datatracker.ietf.org/doc/html/rfc9000#section-12.4 | |
699 | [RFC 9000 13.3 Retransmission of Information]: https://datatracker.ietf.org/doc/html/rfc9000#section-13.3 | |
700 | [RFC 9000 17.1 Packet Formats]: https://datatracker.ietf.org/doc/html/rfc9000#section-17 | |
701 | [RFC 9000 17.2.1 Version Negotiation Packet]: https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.1 | |
702 | [RFC 9000 17.2.2 Initial Packet]: https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.2 | |
703 | [RFC 9000 17.2.3 0-RTT]: https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.3 | |
704 | [RFC 9000 17.2.4 Handshake Packet]: https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.4 | |
705 | [RFC 9000 17.2.5 Retry Packet]: https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.5 | |
706 | [RFC 9000 17.3.1 1-RTT]: https://datatracker.ietf.org/doc/html/rfc9000#section-17.3.1 | |
707 | [RFC 9002]: https://datatracker.ietf.org/doc/html/rfc9002 |