documentation updates

author Alan T. DeKok <aland@freeradius.org>

Tue, 5 Mar 2024 12:54:33 +0000 (07:54 -0500)

committer Alan T. DeKok <aland@freeradius.org>

Tue, 5 Mar 2024 12:54:58 +0000 (07:54 -0500)
author Alan T. DeKok <aland@freeradius.org>
Tue, 5 Mar 2024 12:54:33 +0000 (07:54 -0500)
committer Alan T. DeKok <aland@freeradius.org>
Tue, 5 Mar 2024 12:54:58 +0000 (07:54 -0500)
diff --git a/src/lib/bio/README.md b/src/lib/bio/README.md

index d5ef1e0d5d2695e2205cb6e0a802946829a79897..89708c4e8222656bf7cfa4113e5595adaa5f1a5e 100644 (file)
--- a/src/lib/bio/README.md
+++ b/src/lib/bio/README.md
@@ -25,61 +25,49 @@ handles all of these issues simultaneously.
  
  The issues addressed by bios include the following items:
  
-* abstracting TCP versus UDP socket IO
-
-* allowing packet-based reads and writes, instead of byte-based
-  * i.e. so that application protocol state machines do not have to
-  * deal with partial packets.
-
-* Use protocol-agnostic memory buffers to track partial reads and
-  partial writes.
-
-* allowing "written" data to be cancelled or "unwritten".  Packets
-  which have been written to the bio, but not yet to the network can
-  be cancelled at any time.  The data then disappears from the bio,
-  and is never written to the network.
-
-* allowing chaining, so that an application can write RADIUS packets
-  to a bio, and then have those packets go through a TLS
-  transformation, and then out a TCP socket.
-
-* Chaining also allows applications to selectively add per-chain
-  functionality, without affecting the producer or consumer of data.
-
-* allowing unchaining, so that we can have a bio say "I'm done, and no
-  longer needed".  This happens for example when we have a connection
-  from haproxy.  The first ~128 bytes of a TCP connection are the
-  original src/dst ip/port.  The data after that is just the TLS
-  transport.  The haproxy layer needs to be able to intercept and read
-  that data, and then remove itself from the chain of bios.
-
-* abstraction, so that the application can be handed a bio, and use
-  it.  The underlying bio might be UDP, TCP, TLS, etc.  The
-  application does not know, and can behave identically for all
-  situations.  There are some limitations, of course.  Something has
-  to create the bios and their respective chains.  But once a "RADIUS"
-  bio, has been created, the RADIUS application can read and write
-  packets to it without worrying about underlying issues of UDP vs
-  TCP, TLS vs clear-text, dedup, etc.
-
-* simplicity.  Any transport-specific function knows only about that
-  transport, and it's own bio.  It does not need to know about other
-  bios (unless it needs them, as with TLS -> TCP).  The function does
-  not know about packets or protocols.  We should be able to use the
-  same basic UDP/TCP network bios for most protocols.  Or if we
-  cannot, the duplicated code should be trivial, and little more than
-  `read()` and some checks for error conditions (EOF, blocked, etc.)
-
-* If the caller needs to do something with a particular bio, that bio
-  will expose an API specific to that bio.  There is no reason to copy
-  that status back up the bio chain.  This also means that the caller
-  often needs to cache the multiple bios, which is fine.
-
-* asynchronous at its core.  Anything can block at any time.  There
+* be based on _independent blocks_ (e.g. file IO, memory buffers,
+  etc).
+
+* be composable, so that blocks can be _chained_ or _unchained_ to
+  oqbtain complex functionality.
+
+* be _abstracted_ so that the application using the bio has little
+  need to understand the difference between the individual blocks
+
+* be _declarative_ where possible, so an application can declare a
+  data structure saying "this is the kind of bio I want", and the
+  underlying bio "alloc" or "create" API does the right thing.
+
+* be _callback_ oriented, so that the bio calls the application to do
+  application-specific things, and the bio handles the abstractions
+
+* be _state machine_ oriented, so that inside of the bio, the
+  functionality is broken into a series of small functions.  If the
+  bio is in a different state, it changes its internal function
+  pointers to manage that state.  This approach is better than large
+  functions with masses of if / then / else
+
+* be _exposed_ so that the individual blocks do not hide everything
+  from the application.  Each block exports an info / state structure.
+  The application can example the internal state of the block.  This
+  approach stops the M*N API explosion typically seen when every block
+  has to implement all of the get/set functionality of every other
+  block.
+
+* Be _modifiable_ so that blocks can be chained / unchained on the
+  fly.  This capability allows applications to add / delete things
+  like dynamic filters (haproxy or dynamic client) on the fly.
+
+* Allow for a _separation_ of application issues (basic bio
+  read/write) from protocol state machine issues (packet retransmit,
+  etc.)  The application largely just calls read / write to the bio,
+  any bio modifications are done by the protocol state machine.
+
+* be _asynchronous_ where possible.  Anything can block at any time.  There
    are callbacks if necessary.
  
-* no run-time memory allocations for bio operations.  Everything
-  operates on pre-allocated structures
+* _avoid_ run-time memory allocations for bio operations.  Everything
+  should operate on pre-allocated structures
  
  * O(1) operations where possible.
  
@@ -87,9 +75,9 @@ The issues addressed by bios include the following items:
    it needs to do.  It exposes APIs for the caller (who must know what
    it is).  It has its own callbacks to modify its operation.
  
-* not thread-safe.  Use locks, people.
+* the bios do _not_ need to be thread-safe.
  
-There are explicit _non-goals_ for the bio API.  These non-goals are
+There are some explicit _non-goals_ for the bio API.  These non-goals are
  issues which are outside of the scope of bios, such as:
  
  * As an outcome of simplicity, there are no bio-specific wrappers for
@@ -107,7 +95,7 @@ issues which are outside of the scope of bios, such as:
    state of the bio.
  
  * eventing and timers.  The bios can allow an underlying file
-  descriptor to be used, but the bio layer itself runs nothing more
+  descriptor to be used, but the bio layers usually run nothing more
    than state-specific callbacks, defined on a per-bio basis.
  
  * decoding / encoding packet contents.  This is handled by dbuffs,
@@ -115,3 +103,4 @@ issues which are outside of the scope of bios, such as:
    and enforce nested bounds on packets, nested attributes, etc.  But
    dbuffs have no concept of multiple packets, deduplication, file
    descriptors, etc.
+
diff --git a/src/lib/bio/fd.md b/src/lib/bio/fd.md

new file mode 100644 (file)

index 0000000..56935e1
--- /dev/null
+++ b/src/lib/bio/fd.md
@@ -0,0 +1,49 @@
+# The FD bio
+
+The file descriptor bio abstracts reads / writes over file
+descriptors.  The goal is for applications to be able to get a file
+descriptor bio, and then just call raw read / write routines, similar
+to the Posix `read()` and `write()` functions.  The difference is that
+the bio routines _abstract_ all of the issues with file descriptors.
+
+For example, file descriptors can refer to files, sockets (stream or
+datagram), IPv4, IPv6, Unix domain sockets, etc.  Each of those file
+descriptor types has a different set of requirements for
+initialization, and even for reading and writing.
+
+The simplest and perhaps most frustrating difference between the types
+of sockets is that when the Posix `read()` function returns `0`.  That
+value has _different meanings_ for stream and datagram sockets.  For
+stream sockets, it means "EOF", and the socket should be closed.  For
+datagram sockets, it means "read() returned no data".
+
+In our bio implementation, `read()` of `0` always means "no data".
+Signally EOF is an error path, where the `read()` function returns an
+EOF error.
+
+Similarly, initializing a socket requires a number of steps, which are
+all different for IPv4, IPv6, and whether the socket was connected or
+unconnected.  Perhaps the underlying socket is a connected stream
+socket, in which case IO is essentially just `read()` and `write()`.
+Or maybe the socket is an unconnected datagram socket, in which case
+IO has to use the `udpfromto` path to obtain the src/dst IP/port
+information for each packet.
+
+All of these differences are abstracted away with the bio API.  The
+caller simply declares a `fr_bio_fd_config_t` data structure, fills it
+in with the appropriate data, and calls `fr_bio_fd_alloc()`.  The FD
+bio code determines what kind of file descriptor to open, and then
+initializes it.
+
+The caller gets returned a bio which can then be used for basic read /
+write operations, independent of the underlying file descriptor type.
+
+There is no API to query the state of the FD bio.  Instead, the caller
+can get a copy of the internal `fr_bio_fd_info_t` data structure,
+which contains all of the "raw" data needed by the application.  The
+caller can see whether or not the bio is at EOF, or if it is blocked
+for read / write operations.
+
+The file descriptor bio does _not_ manage packets.  If there is a
+partial write, it returns a partial write.  It is up to the
+application (or another bio) to manage packet-oriented data.
diff --git a/src/lib/bio/mem.md b/src/lib/bio/mem.md

new file mode 100644 (file)

index 0000000..31717f2
--- /dev/null
+++ b/src/lib/bio/mem.md
@@ -0,0 +1,36 @@
+# The Memory Bio
+
+The memory bio does read / write buffering for a "next" bio.  Where
+the file desciptor bio does no buffering, the memory bio adds that
+capability.
+
+In its simplest incarnation, the memory bio allows for controllable
+buffering on read and write.
+
+The read functions try to fill the memory buffer on each read.  The
+application can then read smaller bits of data from the memory buffer,
+which avoids extra system calls.
+
+The write functions try to bypass the memory buffer as much as
+possible.  If the memory buffers are empty, the `write()` call writes
+directly to the next bio block, and avoids memory copies.  The data is
+cached only if the next block returns a partial write.  In which case
+the partial data is cached, and is written _before_ any data from
+subsequent calls to `write()`.
+
+Data in the buffers can always be flushed via passing a `NULL` pointer to the
+`write()` routine.
+
+## Packet-based reads
+
+The memory bio supports a function`fr_bio_mem_set_verify()`, which
+sets a "verification" function.  When the application calls `read()`,
+the memory bio reads the data into an intermediate buffer, and then
+calls the verify function.  That function can return the size of the
+packet to read, or other options like "discard data", or "want more
+data", or "have full packet".  That way the application only sees
+whole packets.
+
+The application then calls the main bio `read()` routines, which
+(eventually) reads raw data from somewhere.  When that data is at
+least a full packet, it is returned to the application.
author	Alan T. DeKok <aland@freeradius.org>
	Tue, 5 Mar 2024 12:54:33 +0000 (07:54 -0500)
committer	Alan T. DeKok <aland@freeradius.org>
	Tue, 5 Mar 2024 12:54:58 +0000 (07:54 -0500)
src/lib/bio/README.md		patch \| blob \| blame \| history
src/lib/bio/fd.md	[new file with mode: 0644]	patch \| blob
src/lib/bio/mem.md	[new file with mode: 0644]	patch \| blob