The issues addressed by bios include the following items:
-* abstracting TCP versus UDP socket IO
-
-* allowing packet-based reads and writes, instead of byte-based
- * i.e. so that application protocol state machines do not have to
- * deal with partial packets.
-
-* Use protocol-agnostic memory buffers to track partial reads and
- partial writes.
-
-* allowing "written" data to be cancelled or "unwritten". Packets
- which have been written to the bio, but not yet to the network can
- be cancelled at any time. The data then disappears from the bio,
- and is never written to the network.
-
-* allowing chaining, so that an application can write RADIUS packets
- to a bio, and then have those packets go through a TLS
- transformation, and then out a TCP socket.
-
-* Chaining also allows applications to selectively add per-chain
- functionality, without affecting the producer or consumer of data.
-
-* allowing unchaining, so that we can have a bio say "I'm done, and no
- longer needed". This happens for example when we have a connection
- from haproxy. The first ~128 bytes of a TCP connection are the
- original src/dst ip/port. The data after that is just the TLS
- transport. The haproxy layer needs to be able to intercept and read
- that data, and then remove itself from the chain of bios.
-
-* abstraction, so that the application can be handed a bio, and use
- it. The underlying bio might be UDP, TCP, TLS, etc. The
- application does not know, and can behave identically for all
- situations. There are some limitations, of course. Something has
- to create the bios and their respective chains. But once a "RADIUS"
- bio, has been created, the RADIUS application can read and write
- packets to it without worrying about underlying issues of UDP vs
- TCP, TLS vs clear-text, dedup, etc.
-
-* simplicity. Any transport-specific function knows only about that
- transport, and it's own bio. It does not need to know about other
- bios (unless it needs them, as with TLS -> TCP). The function does
- not know about packets or protocols. We should be able to use the
- same basic UDP/TCP network bios for most protocols. Or if we
- cannot, the duplicated code should be trivial, and little more than
- `read()` and some checks for error conditions (EOF, blocked, etc.)
-
-* If the caller needs to do something with a particular bio, that bio
- will expose an API specific to that bio. There is no reason to copy
- that status back up the bio chain. This also means that the caller
- often needs to cache the multiple bios, which is fine.
-
-* asynchronous at its core. Anything can block at any time. There
+* be based on _independent blocks_ (e.g. file IO, memory buffers,
+ etc).
+
+* be composable, so that blocks can be _chained_ or _unchained_ to
+ oqbtain complex functionality.
+
+* be _abstracted_ so that the application using the bio has little
+ need to understand the difference between the individual blocks
+
+* be _declarative_ where possible, so an application can declare a
+ data structure saying "this is the kind of bio I want", and the
+ underlying bio "alloc" or "create" API does the right thing.
+
+* be _callback_ oriented, so that the bio calls the application to do
+ application-specific things, and the bio handles the abstractions
+
+* be _state machine_ oriented, so that inside of the bio, the
+ functionality is broken into a series of small functions. If the
+ bio is in a different state, it changes its internal function
+ pointers to manage that state. This approach is better than large
+ functions with masses of if / then / else
+
+* be _exposed_ so that the individual blocks do not hide everything
+ from the application. Each block exports an info / state structure.
+ The application can example the internal state of the block. This
+ approach stops the M*N API explosion typically seen when every block
+ has to implement all of the get/set functionality of every other
+ block.
+
+* Be _modifiable_ so that blocks can be chained / unchained on the
+ fly. This capability allows applications to add / delete things
+ like dynamic filters (haproxy or dynamic client) on the fly.
+
+* Allow for a _separation_ of application issues (basic bio
+ read/write) from protocol state machine issues (packet retransmit,
+ etc.) The application largely just calls read / write to the bio,
+ any bio modifications are done by the protocol state machine.
+
+* be _asynchronous_ where possible. Anything can block at any time. There
are callbacks if necessary.
-* no run-time memory allocations for bio operations. Everything
- operates on pre-allocated structures
+* _avoid_ run-time memory allocations for bio operations. Everything
+ should operate on pre-allocated structures
* O(1) operations where possible.
it needs to do. It exposes APIs for the caller (who must know what
it is). It has its own callbacks to modify its operation.
-* not thread-safe. Use locks, people.
+* the bios do _not_ need to be thread-safe.
-There are explicit _non-goals_ for the bio API. These non-goals are
+There are some explicit _non-goals_ for the bio API. These non-goals are
issues which are outside of the scope of bios, such as:
* As an outcome of simplicity, there are no bio-specific wrappers for
state of the bio.
* eventing and timers. The bios can allow an underlying file
- descriptor to be used, but the bio layer itself runs nothing more
+ descriptor to be used, but the bio layers usually run nothing more
than state-specific callbacks, defined on a per-bio basis.
* decoding / encoding packet contents. This is handled by dbuffs,
and enforce nested bounds on packets, nested attributes, etc. But
dbuffs have no concept of multiple packets, deduplication, file
descriptors, etc.
+
--- /dev/null
+# The FD bio
+
+The file descriptor bio abstracts reads / writes over file
+descriptors. The goal is for applications to be able to get a file
+descriptor bio, and then just call raw read / write routines, similar
+to the Posix `read()` and `write()` functions. The difference is that
+the bio routines _abstract_ all of the issues with file descriptors.
+
+For example, file descriptors can refer to files, sockets (stream or
+datagram), IPv4, IPv6, Unix domain sockets, etc. Each of those file
+descriptor types has a different set of requirements for
+initialization, and even for reading and writing.
+
+The simplest and perhaps most frustrating difference between the types
+of sockets is that when the Posix `read()` function returns `0`. That
+value has _different meanings_ for stream and datagram sockets. For
+stream sockets, it means "EOF", and the socket should be closed. For
+datagram sockets, it means "read() returned no data".
+
+In our bio implementation, `read()` of `0` always means "no data".
+Signally EOF is an error path, where the `read()` function returns an
+EOF error.
+
+Similarly, initializing a socket requires a number of steps, which are
+all different for IPv4, IPv6, and whether the socket was connected or
+unconnected. Perhaps the underlying socket is a connected stream
+socket, in which case IO is essentially just `read()` and `write()`.
+Or maybe the socket is an unconnected datagram socket, in which case
+IO has to use the `udpfromto` path to obtain the src/dst IP/port
+information for each packet.
+
+All of these differences are abstracted away with the bio API. The
+caller simply declares a `fr_bio_fd_config_t` data structure, fills it
+in with the appropriate data, and calls `fr_bio_fd_alloc()`. The FD
+bio code determines what kind of file descriptor to open, and then
+initializes it.
+
+The caller gets returned a bio which can then be used for basic read /
+write operations, independent of the underlying file descriptor type.
+
+There is no API to query the state of the FD bio. Instead, the caller
+can get a copy of the internal `fr_bio_fd_info_t` data structure,
+which contains all of the "raw" data needed by the application. The
+caller can see whether or not the bio is at EOF, or if it is blocked
+for read / write operations.
+
+The file descriptor bio does _not_ manage packets. If there is a
+partial write, it returns a partial write. It is up to the
+application (or another bio) to manage packet-oriented data.
--- /dev/null
+# The Memory Bio
+
+The memory bio does read / write buffering for a "next" bio. Where
+the file desciptor bio does no buffering, the memory bio adds that
+capability.
+
+In its simplest incarnation, the memory bio allows for controllable
+buffering on read and write.
+
+The read functions try to fill the memory buffer on each read. The
+application can then read smaller bits of data from the memory buffer,
+which avoids extra system calls.
+
+The write functions try to bypass the memory buffer as much as
+possible. If the memory buffers are empty, the `write()` call writes
+directly to the next bio block, and avoids memory copies. The data is
+cached only if the next block returns a partial write. In which case
+the partial data is cached, and is written _before_ any data from
+subsequent calls to `write()`.
+
+Data in the buffers can always be flushed via passing a `NULL` pointer to the
+`write()` routine.
+
+## Packet-based reads
+
+The memory bio supports a function`fr_bio_mem_set_verify()`, which
+sets a "verification" function. When the application calls `read()`,
+the memory bio reads the data into an intermediate buffer, and then
+calls the verify function. That function can return the size of the
+packet to read, or other options like "discard data", or "want more
+data", or "have full packet". That way the application only sees
+whole packets.
+
+The application then calls the main bio `read()` routines, which
+(eventually) reads raw data from somewhere. When that data is at
+least a full packet, it is returned to the application.