From: Alan T. DeKok Date: Tue, 5 Mar 2024 12:54:33 +0000 (-0500) Subject: documentation updates X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=839b0adbc3195316c97c1ca9112adf24f1494536;p=thirdparty%2Ffreeradius-server.git documentation updates --- diff --git a/src/lib/bio/README.md b/src/lib/bio/README.md index d5ef1e0d5d2..89708c4e822 100644 --- a/src/lib/bio/README.md +++ b/src/lib/bio/README.md @@ -25,61 +25,49 @@ handles all of these issues simultaneously. The issues addressed by bios include the following items: -* abstracting TCP versus UDP socket IO - -* allowing packet-based reads and writes, instead of byte-based - * i.e. so that application protocol state machines do not have to - * deal with partial packets. - -* Use protocol-agnostic memory buffers to track partial reads and - partial writes. - -* allowing "written" data to be cancelled or "unwritten". Packets - which have been written to the bio, but not yet to the network can - be cancelled at any time. The data then disappears from the bio, - and is never written to the network. - -* allowing chaining, so that an application can write RADIUS packets - to a bio, and then have those packets go through a TLS - transformation, and then out a TCP socket. - -* Chaining also allows applications to selectively add per-chain - functionality, without affecting the producer or consumer of data. - -* allowing unchaining, so that we can have a bio say "I'm done, and no - longer needed". This happens for example when we have a connection - from haproxy. The first ~128 bytes of a TCP connection are the - original src/dst ip/port. The data after that is just the TLS - transport. The haproxy layer needs to be able to intercept and read - that data, and then remove itself from the chain of bios. - -* abstraction, so that the application can be handed a bio, and use - it. The underlying bio might be UDP, TCP, TLS, etc. The - application does not know, and can behave identically for all - situations. There are some limitations, of course. Something has - to create the bios and their respective chains. But once a "RADIUS" - bio, has been created, the RADIUS application can read and write - packets to it without worrying about underlying issues of UDP vs - TCP, TLS vs clear-text, dedup, etc. - -* simplicity. Any transport-specific function knows only about that - transport, and it's own bio. It does not need to know about other - bios (unless it needs them, as with TLS -> TCP). The function does - not know about packets or protocols. We should be able to use the - same basic UDP/TCP network bios for most protocols. Or if we - cannot, the duplicated code should be trivial, and little more than - `read()` and some checks for error conditions (EOF, blocked, etc.) - -* If the caller needs to do something with a particular bio, that bio - will expose an API specific to that bio. There is no reason to copy - that status back up the bio chain. This also means that the caller - often needs to cache the multiple bios, which is fine. - -* asynchronous at its core. Anything can block at any time. There +* be based on _independent blocks_ (e.g. file IO, memory buffers, + etc). + +* be composable, so that blocks can be _chained_ or _unchained_ to + oqbtain complex functionality. + +* be _abstracted_ so that the application using the bio has little + need to understand the difference between the individual blocks + +* be _declarative_ where possible, so an application can declare a + data structure saying "this is the kind of bio I want", and the + underlying bio "alloc" or "create" API does the right thing. + +* be _callback_ oriented, so that the bio calls the application to do + application-specific things, and the bio handles the abstractions + +* be _state machine_ oriented, so that inside of the bio, the + functionality is broken into a series of small functions. If the + bio is in a different state, it changes its internal function + pointers to manage that state. This approach is better than large + functions with masses of if / then / else + +* be _exposed_ so that the individual blocks do not hide everything + from the application. Each block exports an info / state structure. + The application can example the internal state of the block. This + approach stops the M*N API explosion typically seen when every block + has to implement all of the get/set functionality of every other + block. + +* Be _modifiable_ so that blocks can be chained / unchained on the + fly. This capability allows applications to add / delete things + like dynamic filters (haproxy or dynamic client) on the fly. + +* Allow for a _separation_ of application issues (basic bio + read/write) from protocol state machine issues (packet retransmit, + etc.) The application largely just calls read / write to the bio, + any bio modifications are done by the protocol state machine. + +* be _asynchronous_ where possible. Anything can block at any time. There are callbacks if necessary. -* no run-time memory allocations for bio operations. Everything - operates on pre-allocated structures +* _avoid_ run-time memory allocations for bio operations. Everything + should operate on pre-allocated structures * O(1) operations where possible. @@ -87,9 +75,9 @@ The issues addressed by bios include the following items: it needs to do. It exposes APIs for the caller (who must know what it is). It has its own callbacks to modify its operation. -* not thread-safe. Use locks, people. +* the bios do _not_ need to be thread-safe. -There are explicit _non-goals_ for the bio API. These non-goals are +There are some explicit _non-goals_ for the bio API. These non-goals are issues which are outside of the scope of bios, such as: * As an outcome of simplicity, there are no bio-specific wrappers for @@ -107,7 +95,7 @@ issues which are outside of the scope of bios, such as: state of the bio. * eventing and timers. The bios can allow an underlying file - descriptor to be used, but the bio layer itself runs nothing more + descriptor to be used, but the bio layers usually run nothing more than state-specific callbacks, defined on a per-bio basis. * decoding / encoding packet contents. This is handled by dbuffs, @@ -115,3 +103,4 @@ issues which are outside of the scope of bios, such as: and enforce nested bounds on packets, nested attributes, etc. But dbuffs have no concept of multiple packets, deduplication, file descriptors, etc. + diff --git a/src/lib/bio/fd.md b/src/lib/bio/fd.md new file mode 100644 index 00000000000..56935e1d413 --- /dev/null +++ b/src/lib/bio/fd.md @@ -0,0 +1,49 @@ +# The FD bio + +The file descriptor bio abstracts reads / writes over file +descriptors. The goal is for applications to be able to get a file +descriptor bio, and then just call raw read / write routines, similar +to the Posix `read()` and `write()` functions. The difference is that +the bio routines _abstract_ all of the issues with file descriptors. + +For example, file descriptors can refer to files, sockets (stream or +datagram), IPv4, IPv6, Unix domain sockets, etc. Each of those file +descriptor types has a different set of requirements for +initialization, and even for reading and writing. + +The simplest and perhaps most frustrating difference between the types +of sockets is that when the Posix `read()` function returns `0`. That +value has _different meanings_ for stream and datagram sockets. For +stream sockets, it means "EOF", and the socket should be closed. For +datagram sockets, it means "read() returned no data". + +In our bio implementation, `read()` of `0` always means "no data". +Signally EOF is an error path, where the `read()` function returns an +EOF error. + +Similarly, initializing a socket requires a number of steps, which are +all different for IPv4, IPv6, and whether the socket was connected or +unconnected. Perhaps the underlying socket is a connected stream +socket, in which case IO is essentially just `read()` and `write()`. +Or maybe the socket is an unconnected datagram socket, in which case +IO has to use the `udpfromto` path to obtain the src/dst IP/port +information for each packet. + +All of these differences are abstracted away with the bio API. The +caller simply declares a `fr_bio_fd_config_t` data structure, fills it +in with the appropriate data, and calls `fr_bio_fd_alloc()`. The FD +bio code determines what kind of file descriptor to open, and then +initializes it. + +The caller gets returned a bio which can then be used for basic read / +write operations, independent of the underlying file descriptor type. + +There is no API to query the state of the FD bio. Instead, the caller +can get a copy of the internal `fr_bio_fd_info_t` data structure, +which contains all of the "raw" data needed by the application. The +caller can see whether or not the bio is at EOF, or if it is blocked +for read / write operations. + +The file descriptor bio does _not_ manage packets. If there is a +partial write, it returns a partial write. It is up to the +application (or another bio) to manage packet-oriented data. diff --git a/src/lib/bio/mem.md b/src/lib/bio/mem.md new file mode 100644 index 00000000000..31717f28844 --- /dev/null +++ b/src/lib/bio/mem.md @@ -0,0 +1,36 @@ +# The Memory Bio + +The memory bio does read / write buffering for a "next" bio. Where +the file desciptor bio does no buffering, the memory bio adds that +capability. + +In its simplest incarnation, the memory bio allows for controllable +buffering on read and write. + +The read functions try to fill the memory buffer on each read. The +application can then read smaller bits of data from the memory buffer, +which avoids extra system calls. + +The write functions try to bypass the memory buffer as much as +possible. If the memory buffers are empty, the `write()` call writes +directly to the next bio block, and avoids memory copies. The data is +cached only if the next block returns a partial write. In which case +the partial data is cached, and is written _before_ any data from +subsequent calls to `write()`. + +Data in the buffers can always be flushed via passing a `NULL` pointer to the +`write()` routine. + +## Packet-based reads + +The memory bio supports a function`fr_bio_mem_set_verify()`, which +sets a "verification" function. When the application calls `read()`, +the memory bio reads the data into an intermediate buffer, and then +calls the verify function. That function can return the size of the +packet to read, or other options like "discard data", or "want more +data", or "have full packet". That way the application only sees +whole packets. + +The application then calls the main bio `read()` routines, which +(eventually) reads raw data from somewhere. When that data is at +least a full packet, it is returned to the application.