From: Alan T. DeKok Date: Mon, 22 Jan 2024 23:44:41 +0000 (-0500) Subject: First pass at bio handlers. X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=61008f95b71180b1aebb1b1c65eb5ceee4823087;p=thirdparty%2Ffreeradius-server.git First pass at bio handlers. The FD bio works. The others are "compile tested" --- diff --git a/src/lib/bio/README.md b/src/lib/bio/README.md new file mode 100644 index 00000000000..d5ef1e0d5d2 --- /dev/null +++ b/src/lib/bio/README.md @@ -0,0 +1,117 @@ +# Binary IO API + +The binary input / output (bio) API is intended to abstract a wide +range of issues related to network IO. Historically (v3) we just +"wrote the code until it worked", which meant that the same piece of +code handled network transport issues (e.g. TCP), protocol issues +(e.g. RADIUS), connection issues (up / down / reconnect), and eventing +issues (socket readable / blocked). + +This style of programming lead to complex interconnected state +machines which were difficult to write, to maintain, and to debug. + +v4 is better with many of these functions split out into separate +APIs, such as connections, trunking, etc. However, the input +listeners and output client modules (e.g. rlm_radius and radclient) +still have the transport and protocol states intermixed. This makes +the read / write routines complex, and difficult to extend. + +For these reasons and more, as of early 2024, v4 does not have input +TLS listeners, or output TCP or TLS for RADIUS proxying. We then have +a horrid mess dynamic clients, haproxy connections, network source IP +filtering, UDP vs TCP issues, and connected vs unconnected sockets, +and finally TLS. It is essentially impossible to write code which +handles all of these issues simultaneously. + +The issues addressed by bios include the following items: + +* abstracting TCP versus UDP socket IO + +* allowing packet-based reads and writes, instead of byte-based + * i.e. so that application protocol state machines do not have to + * deal with partial packets. + +* Use protocol-agnostic memory buffers to track partial reads and + partial writes. + +* allowing "written" data to be cancelled or "unwritten". Packets + which have been written to the bio, but not yet to the network can + be cancelled at any time. The data then disappears from the bio, + and is never written to the network. + +* allowing chaining, so that an application can write RADIUS packets + to a bio, and then have those packets go through a TLS + transformation, and then out a TCP socket. + +* Chaining also allows applications to selectively add per-chain + functionality, without affecting the producer or consumer of data. + +* allowing unchaining, so that we can have a bio say "I'm done, and no + longer needed". This happens for example when we have a connection + from haproxy. The first ~128 bytes of a TCP connection are the + original src/dst ip/port. The data after that is just the TLS + transport. The haproxy layer needs to be able to intercept and read + that data, and then remove itself from the chain of bios. + +* abstraction, so that the application can be handed a bio, and use + it. The underlying bio might be UDP, TCP, TLS, etc. The + application does not know, and can behave identically for all + situations. There are some limitations, of course. Something has + to create the bios and their respective chains. But once a "RADIUS" + bio, has been created, the RADIUS application can read and write + packets to it without worrying about underlying issues of UDP vs + TCP, TLS vs clear-text, dedup, etc. + +* simplicity. Any transport-specific function knows only about that + transport, and it's own bio. It does not need to know about other + bios (unless it needs them, as with TLS -> TCP). The function does + not know about packets or protocols. We should be able to use the + same basic UDP/TCP network bios for most protocols. Or if we + cannot, the duplicated code should be trivial, and little more than + `read()` and some checks for error conditions (EOF, blocked, etc.) + +* If the caller needs to do something with a particular bio, that bio + will expose an API specific to that bio. There is no reason to copy + that status back up the bio chain. This also means that the caller + often needs to cache the multiple bios, which is fine. + +* asynchronous at its core. Anything can block at any time. There + are callbacks if necessary. + +* no run-time memory allocations for bio operations. Everything + operates on pre-allocated structures + +* O(1) operations where possible. + +* each bio in large part runs as its own state machine. It does what + it needs to do. It exposes APIs for the caller (who must know what + it is). It has its own callbacks to modify its operation. + +* not thread-safe. Use locks, people. + +There are explicit _non-goals_ for the bio API. These non-goals are +issues which are outside of the scope of bios, such as: + +* As an outcome of simplicity, there are no bio-specific wrappers for + modifying file descriptors. An application is free to cache the FD, + associate it with the application layer, and call eventing functions + to get "readable" or "writable" callbacks. The application can also + get / set socket information manually, such as "get IP" or "bind to + particular port". + +* configuration. The bios expose configuration structures (static + input used to create a bio), and run-time informational structures + (dynamic information about the state of the bio). The API is small, + and all uses of get/set member functions should be avoided. We + presume that the caller is smart enough to not muck with the current + state of the bio. + +* eventing and timers. The bios can allow an underlying file + descriptor to be used, but the bio layer itself runs nothing more + than state-specific callbacks, defined on a per-bio basis. + +* decoding / encoding packet contents. This is handled by dbuffs, + which are bounds checkers around memory buffers. i.e. they check + and enforce nested bounds on packets, nested attributes, etc. But + dbuffs have no concept of multiple packets, deduplication, file + descriptors, etc. diff --git a/src/lib/bio/all.mk b/src/lib/bio/all.mk new file mode 100644 index 00000000000..187abf9ee35 --- /dev/null +++ b/src/lib/bio/all.mk @@ -0,0 +1,3 @@ +SUBMAKEFILES := libfreeradius-bio.mk + +# bio_tests.mk diff --git a/src/lib/bio/base.c b/src/lib/bio/base.c new file mode 100644 index 00000000000..0eb9b6200bd --- /dev/null +++ b/src/lib/bio/base.c @@ -0,0 +1,190 @@ +/* + * This program is is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/base.c + * @brief Binary IO abstractions. + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ + +#include +#include + +/** Free this bio. + * + * The bio can only be freed if it is not in any chain. + */ +int fr_bio_destructor(fr_bio_t *bio) +{ + fr_assert(!fr_bio_prev(bio)); + fr_assert(!fr_bio_next(bio)); + + /* + * It's safe to free this bio. + */ + return 0; +} + +/** Always returns EOF on fr_bio_read() + * + */ +ssize_t fr_bio_eof_read(UNUSED fr_bio_t *bio, UNUSED void *packet_ctx, UNUSED void *buffer, UNUSED size_t size) +{ + return fr_bio_error(EOF); +} + +/** Internal bio function which just reads from the "next" bio. + * + * It is mainly used when the current bio needs to modify the write + * path, but does not need to do anything on the read path. + */ +ssize_t fr_bio_next_read(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size) +{ + ssize_t rcode; + fr_bio_t *next; + + next = fr_bio_next(bio); + fr_assert(next != NULL); + + rcode = next->read(next, packet_ctx, buffer, size); + if (rcode >= 0) return rcode; + + if (rcode == fr_bio_error(IO_WOULD_BLOCK)) return rcode; + + bio->read = fr_bio_eof_read; + bio->write = fr_bio_null_write; + return rcode; +} + +/** Internal bio function which just writes to the "next" bio. + * + * It is mainly used when the current bio needs to modify the read + * path, but does not need to do anything on the write path. + */ +ssize_t fr_bio_next_write(fr_bio_t *bio, void *packet_ctx, void const *buffer, size_t size) +{ + ssize_t rcode; + fr_bio_t *next; + + next = fr_bio_next(bio); + fr_assert(next != NULL); + + rcode = next->write(next, packet_ctx, buffer, size); + if (rcode >= 0) return rcode; + + if (rcode == fr_bio_error(IO_WOULD_BLOCK)) return rcode; + + bio->read = fr_bio_eof_read; + bio->write = fr_bio_null_write; + return rcode; +} + +/** Free this bio, and everything it calls. + * + * We unlink the bio chain, and then free it individually. If there's an error, the bio chain is relinked. + * That way the error can be addressed (somehow) and this function can be called again. + * + * Note that we do not support talloc_free() for the bio chain. Each individual bio has to be unlinked from + * the chain before the destructor will allow it to be freed. This functionality is by design. + * + * We want to have an API where bios are created "bottom up", so that it is impossible for an application to + * create an incorrect chain. However, creating the chain bottom up means that the lower bios not parented + * from the higher bios, and therefore talloc_free() won't free them. As a result, we need an explicit + * bio_free() function. + */ +int fr_bio_free(fr_bio_t *bio) +{ + fr_bio_t *next = fr_bio_next(bio); + + /* + * We cannot free a bio in the middle of a chain. It has to be unlinked first. + */ + if (fr_bio_prev(bio)) return -1; + + /* + * Unlink our bio, and recurse to free the next one. If we can't free it, re-chain it, but reset + * the read/write functions to do nothing. + */ + if (next) { + fr_bio_unchain(bio); + if (fr_bio_free(next) < 0) { + fr_bio_chain(bio, next); + bio->read = fr_bio_eof_read; + bio->write = fr_bio_null_write; + return -1; + } + } + + /* + * It's now safe to free this bio. + */ + return talloc_free(bio); +} + +/** Shut down a set of BIOs + * + * Must be called from the top-most bio. + * + * Will shut down the bios from the bottom-up. + * + * The shutdown function MUST be callable multiple times without breaking. + */ +int fr_bio_shutdown(fr_bio_t *bio) +{ + fr_bio_t *last; + + fr_assert(!fr_bio_prev(bio)); + + /* + * Find the last bio in the chain. + */ + for (last = bio; fr_bio_next(last) != NULL; last = fr_bio_next(last)) { + /* nothing */ + } + + /* + * Walk back up the chain, calling the shutdown functions. + */ + do { + int rcode; + fr_bio_common_t *my = (fr_bio_common_t *) last; + + /* + * Call user shutdown before the bio shutdown. + */ + if (my->cb.shutdown && ((rcode = my->cb.shutdown(last)) < 0)) return rcode; + + last = fr_bio_prev(last); + } while (last); + + return 0; +} + +/** Like fr_bio_shutdown(), but can be called by anyone in the chain. + * + */ +int fr_bio_shutdown_intermediate(fr_bio_t *bio) +{ + fr_bio_common_t *prev = (fr_bio_common_t *) fr_bio_prev(bio); + + while ((prev = (fr_bio_common_t *) fr_bio_prev(bio)) != NULL) { + bio = (fr_bio_t *) prev; + } + + return fr_bio_shutdown(bio); +} diff --git a/src/lib/bio/base.h b/src/lib/bio/base.h new file mode 100644 index 00000000000..d4aa1367dbf --- /dev/null +++ b/src/lib/bio/base.h @@ -0,0 +1,172 @@ +#pragma once +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/base.h + * @brief Binary IO abstractions. + * + * Create abstract binary input / output buffers. + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ +RCSIDH(lib_bio_base_h, "$Id$") + +#include +#include + +#ifdef NDEBUG +#define XDEBUG(_x) +#else +#define XDEBUG(fmt, ...) fprintf(stderr, fmt, ## __VA_ARGS__) +#endif + +#ifdef _CONST +# error _CONST can only be defined in the local header +#endif +#ifndef _BIO_PRIVATE +# define _CONST const +#else +# define _CONST +#endif + +typedef enum { + FR_BIO_ERROR_NONE = 0, + FR_BIO_ERROR_IO_WOULD_BLOCK, //!< IO would block + + FR_BIO_ERROR_IO, //!< IO error - check errno + FR_BIO_ERROR_GENERIC, //!< generic "failed" error - check fr_strerror() + FR_BIO_ERROR_VERIFY, //!< some packet verification error + FR_BIO_ERROR_BUFFER_FULL, //!< the buffer is full + FR_BIO_ERROR_BUFFER_TOO_SMALL, //!< the output buffer is too small for the data + + FR_BIO_ERROR_EOF, //!< at EOF +} fr_bio_error_type_t; + +typedef struct fr_bio_s fr_bio_t; + +/** Do a raw read from a socket, or other data source + * + * These functions should be careful about packet_ctx. This handling depends on a number of factors. Note + * that the packet_ctx may be NULL! + * + * Stream sockets will generally ignore packet_ctx. + * + * Datagram sockets generally write src/dst IP/port to the packet context. This same packet_ctx is then + * passed to bio->write(), which can use it to send the data to the correct destination. + * + * @param bio the binary IO handler + * @param packet_ctx where the function can store per-packet information, such as src/dst IP/port for datagram sockets + * @param buffer where the function should store data it reads + * @param size the maximum amount of data to read. + * @return + * - <0 for error + * - 0 for "no data available". Note that this does NOT mean EOF! It could mean "we do not have a full packet" + * - >0 for amount of data which was read. + */ +typedef ssize_t (*fr_bio_read_t)(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size); +typedef ssize_t (*fr_bio_write_t)(fr_bio_t *bio, void *packet_ctx, const void *buffer, size_t size); + +typedef int (*fr_bio_callback_t)(fr_bio_t *bio); /* activate / shutdown callbacks */ + +typedef struct { + fr_bio_callback_t activate; + fr_bio_callback_t shutdown; +} fr_bio_cb_funcs_t; + +/** Accept a new connection on a bio + * + * @param bio the binary IO handler + * @param ctx the talloc ctx for the new bio. + * @param[out] accepted the accepted bio + * @return + * - <0 on error + * - 0 for "we did nothing, and there is no new bio available" + * - 1 for "the accepted bio is available" + */ +typedef int (*fr_bio_accept_t)(fr_bio_t *bio, TALLOC_CTX *ctx, fr_bio_t **accepted); + +struct fr_bio_s { + void *uctx; //!< user ctx, caller can manually set it. + + fr_bio_read_t _CONST read; //!< read from the underlying bio + fr_bio_write_t _CONST write; //!< write to the underlying bio + + fr_dlist_t _CONST entry; //!< in the linked list of multiple bios +}; + +static inline CC_HINT(nonnull) fr_bio_t *fr_bio_prev(fr_bio_t *bio) +{ + fr_dlist_t *prev = bio->entry.prev; + + if (!prev) return NULL; + + return fr_dlist_entry_to_item(offsetof(fr_bio_t, entry), prev); +} + +static inline CC_HINT(nonnull) fr_bio_t *fr_bio_next(fr_bio_t *bio) +{ + fr_dlist_t *next = bio->entry.next; + + if (!next) return NULL; + + return fr_dlist_entry_to_item(offsetof(fr_bio_t, entry), next); +} + +static inline ssize_t CC_HINT(nonnull(1,3)) fr_bio_read(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size) +{ + if (size == 0) return 0; + + /* + * We cannot read from the middle of a chain. + */ + fr_assert(!fr_bio_next(bio)); + + return bio->read(bio, packet_ctx, buffer, size); +} + +static inline ssize_t CC_HINT(nonnull(1)) fr_bio_write(fr_bio_t *bio, void *packet_ctx, void const *buffer, size_t size) +{ + if (size == 0) return 0; + + /* + * We cannot write to the middle of a chain. + */ + fr_assert(!fr_bio_prev(bio)); + + return bio->write(bio, packet_ctx, buffer, size); +} + +int fr_bio_shutdown_intermediate(fr_bio_t *bio) CC_HINT(nonnull); + +#ifndef NDEBUG +int fr_bio_destructor(fr_bio_t *bio) CC_HINT(nonnull); +#else +#define fr_bio_destructor (NULL) +#endif + +#define fr_bio_error(_x) (-(FR_BIO_ERROR_ ## _x)) + +ssize_t fr_bio_eof_read(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size); + +ssize_t fr_bio_next_read(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size); + +ssize_t fr_bio_next_write(fr_bio_t *bio, void *packet_ctx, void const *buffer, size_t size); + +int fr_bio_shutdown(fr_bio_t *bio) CC_HINT(nonnull); + +int fr_bio_free(fr_bio_t *bio) CC_HINT(nonnull); diff --git a/src/lib/bio/bio_priv.h b/src/lib/bio/bio_priv.h new file mode 100644 index 00000000000..42859e247c3 --- /dev/null +++ b/src/lib/bio/bio_priv.h @@ -0,0 +1,69 @@ +#pragma once +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/bio_priv.h + * @brief Binary IO private functions + * + * Create abstract binary input / output buffers. + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ +RCSIDH(lib_bio_bio_priv_h, "$Id$") + +#define _BIO_PRIVATE 1 +#include + +typedef int (*fr_bio_shutdown_t)(fr_bio_t *bio); + +typedef struct fr_bio_common_s fr_bio_common_t; + +/** Common elements at the start of each private #fr_bio_t + * + */ +#define FR_BIO_COMMON \ + fr_bio_t bio; \ + fr_bio_cb_funcs_t cb + +struct fr_bio_common_s { + FR_BIO_COMMON; +}; + +/** Chain one bio after another. + * + * @todo - this likely needs to be public + */ +static inline void CC_HINT(nonnull) fr_bio_chain(fr_bio_t *first, fr_bio_t *second) +{ + fr_dlist_entry_link_after(&first->entry, &second->entry); +} + +/** Remove a bio from a chain + * + * And reset prev/next ptrs to NULL. + * + * @todo - this likely needs to be public + */ +static inline void CC_HINT(nonnull) fr_bio_unchain(fr_bio_t *bio) +{ + fr_assert(fr_bio_prev(bio) != NULL); + fr_assert(fr_bio_next(bio) != NULL); + + fr_dlist_entry_unlink(&bio->entry); + bio->entry.prev = bio->entry.next = NULL; +} diff --git a/src/lib/bio/buf.c b/src/lib/bio/buf.c new file mode 100644 index 00000000000..0d113cd4d4c --- /dev/null +++ b/src/lib/bio/buf.c @@ -0,0 +1,112 @@ +/* + * This program is is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/buf.c + * @brief BIO abstractions for file descriptors + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ + +#include +#include + +size_t fr_bio_buf_make_room(fr_bio_buf_t *bio_buf) +{ + size_t used; + + if (bio_buf->read == bio_buf->start) return fr_bio_buf_write_room(bio_buf); + + used = bio_buf->write - bio_buf->read; + if (!used) return fr_bio_buf_write_room(bio_buf); + + memmove(bio_buf->start, bio_buf->read, used); + + bio_buf->read = bio_buf->start; + bio_buf->write = bio_buf->read + used; + + return fr_bio_buf_write_room(bio_buf); +} + +size_t fr_bio_buf_read(fr_bio_buf_t *bio_buf, void *buffer, size_t size) +{ + size_t used; + + fr_bio_buf_verify(bio_buf); + + used = bio_buf->write - bio_buf->read; + if (!used || !size) return 0; + + /* + * Clamp the data to read at how much data is in the buffer. + */ + if (size > used) size = used; + + if (buffer) memcpy(buffer, bio_buf->read, size); + + bio_buf->read += size; + if (bio_buf->read == bio_buf->write) { + fr_bio_buf_reset(bio_buf); + + } else if ((bio_buf->end - bio_buf->read) < (bio_buf->read - bio_buf->start)) { + /* + * The "read" pointer is closer to the end of the + * buffer than to the start. Shift the data + * around to give more room for reading. + * + * @todo - change the check instead to "(end - write) < min_room" + * + * @todo - what about pending packets which point to the buffer? + */ + fr_bio_buf_make_room(bio_buf); + } + + return size; +} + +ssize_t fr_bio_buf_write(fr_bio_buf_t *bio_buf, const void *buffer, size_t size) +{ + size_t room; + + fr_bio_buf_verify(bio_buf); + + room = fr_bio_buf_write_room(bio_buf); + + if (room < size) { + return -room; /* how much more room we would need */ + } + + /* + * The data might already be in the buffer, in which case we can skip the memcpy(). + * + * But the data MUST be at the current "write" position. i.e. we can't have overlapping / + * conflicting writes. + * + * @todo - if it's after the current write position, maybe still allow it? That's so + * fr_bio_mem_write() and friends can write partial packets into the buffer. Maybe add a + * fr_bio_buf_write_partial() API, which takes (packet, already_written, size), and then does the + * right thing. If the packet is not within the buffer, then it devolves to fr_bio_buf_write(), + * otherwise it moves the write ptr in the buffer to after the packet. + */ + if (buffer != bio_buf->write) { + fr_assert(!fr_bio_buf_contains(bio_buf, buffer)); + memcpy(bio_buf->write, buffer, size); + } + bio_buf->write += size; + + return size; +} diff --git a/src/lib/bio/buf.h b/src/lib/bio/buf.h new file mode 100644 index 00000000000..d8e237145b1 --- /dev/null +++ b/src/lib/bio/buf.h @@ -0,0 +1,147 @@ +#pragma once +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/buf.h + * @brief Binary IO abstractions for buffers + * + * The #fr_bio_buf_t allows readers and writers to use a shared buffer, without overflow. + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ +RCSIDH(lib_bio_buf_h, "$Id$") + +typedef struct { + uint8_t *start; //!< start of the buffer + uint8_t *end; //!< end of the buffer + + uint8_t *read; //!< where in the buffer reads are taken from + uint8_t *write; //!< where in the buffer writes are sent to +} fr_bio_buf_t; + +static inline void fr_bio_buf_init(fr_bio_buf_t *bio_buf, uint8_t *buffer, size_t size) +{ + bio_buf->start = bio_buf->read = bio_buf->write = buffer; + bio_buf->end = buffer + size; +} + +int fr_bio_buf_alloc(TALLOC_CTX *ctx, fr_bio_buf_t *bio_buf, size_t size) CC_HINT(nonnull); + +int fr_bio_buf_resize(fr_bio_buf_t *bio_buf, uint8_t *buffer, size_t size) CC_HINT(nonnull); + +size_t fr_bio_buf_make_room(fr_bio_buf_t *bio_buf); + +size_t fr_bio_buf_read(fr_bio_buf_t *bio_buf, void *buffer, size_t size) CC_HINT(nonnull(1)); +ssize_t fr_bio_buf_write(fr_bio_buf_t *bio_buf, const void *buffer, size_t size) CC_HINT(nonnull); + + +static inline void CC_HINT(nonnull) fr_bio_buf_verify(fr_bio_buf_t const *bio_buf) +{ + fr_assert(bio_buf->start != NULL); + fr_assert(bio_buf->start <= bio_buf->read); + fr_assert(bio_buf->read <= bio_buf->write); + fr_assert(bio_buf->write <= bio_buf->end); +} + +static inline void CC_HINT(nonnull) fr_bio_buf_reset(fr_bio_buf_t *bio_buf) +{ + fr_bio_buf_verify(bio_buf); + + bio_buf->read = bio_buf->write = bio_buf->start; +} + +static inline bool CC_HINT(nonnull) fr_bio_buf_initialized(fr_bio_buf_t const *bio_buf) +{ + return (bio_buf->start != NULL); +} + +static inline size_t CC_HINT(nonnull) fr_bio_buf_used(fr_bio_buf_t const *bio_buf) +{ + if (!fr_bio_buf_initialized(bio_buf)) return 0; + + fr_bio_buf_verify(bio_buf); + + return (bio_buf->write - bio_buf->read); +} + +static inline size_t CC_HINT(nonnull) fr_bio_buf_write_room(fr_bio_buf_t const *bio_buf) +{ + fr_bio_buf_verify(bio_buf); + + return bio_buf->end - bio_buf->write; +} + +static inline uint8_t *CC_HINT(nonnull) fr_bio_buf_write_reserve(fr_bio_buf_t *bio_buf, size_t size) +{ + fr_bio_buf_verify(bio_buf); + + if (fr_bio_buf_write_room(bio_buf) < size) return NULL; + + return bio_buf->write; +} + +static inline int CC_HINT(nonnull) fr_bio_buf_write_alloc(fr_bio_buf_t *bio_buf, size_t size) +{ + fr_bio_buf_verify(bio_buf); + + if (fr_bio_buf_write_room(bio_buf) < size) return -1; + + bio_buf->write += size; + + fr_bio_buf_verify(bio_buf); + + return 0; +} + +static inline void CC_HINT(nonnull) fr_bio_buf_write_undo(fr_bio_buf_t *bio_buf, size_t size) +{ + fr_bio_buf_verify(bio_buf); + + fr_assert(bio_buf->read + size <= bio_buf->write); + + bio_buf->write -= size; + fr_bio_buf_verify(bio_buf); + + if (bio_buf->read == bio_buf->write) { + fr_bio_buf_reset(bio_buf); + } +} + +static inline bool fr_bio_buf_contains(fr_bio_buf_t *bio_buf, void const *buffer) +{ + return ((uint8_t const *) buffer >= bio_buf->start) && ((uint8_t const *) buffer <= bio_buf->end); +} + +static inline void CC_HINT(nonnull) fr_bio_buf_write_update(fr_bio_buf_t *bio_buf, void const *buffer, size_t size, size_t written) +{ + if (!fr_bio_buf_initialized(bio_buf)) return; + + fr_bio_buf_verify(bio_buf); + + if (bio_buf->read == buffer) { + fr_assert(fr_bio_buf_used(bio_buf) >= size); + + (void) fr_bio_buf_read(bio_buf, NULL, written); + } else { + /* + * If we're not writing from the start of write_buffer, then the data to + * be written CANNOT appear anywhere in the buffer. + */ + fr_assert(!fr_bio_buf_contains(bio_buf, buffer)); + } +} diff --git a/src/lib/bio/fd.c b/src/lib/bio/fd.c new file mode 100644 index 00000000000..7f4702dfd0a --- /dev/null +++ b/src/lib/bio/fd.c @@ -0,0 +1,1169 @@ +/* + * This program is is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/fd.c + * @brief BIO abstractions for file descriptors + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ + +#ifdef __linux__ +/* + * for accept4() + */ +#define _GNU_SOURCE +#endif + +#include +#include + +/* + * More portability idiocy + * Mac OSX Lion doesn't define SOL_IP. But IPPROTO_IP works. + */ +#ifndef SOL_IP +# define SOL_IP IPPROTO_IP +#endif + +/* + * glibc 2.4 and uClibc 0.9.29 introduce IPV6_RECVPKTINFO etc. and + * change IPV6_PKTINFO This is only supported in Linux kernel >= + * 2.6.14 + * + * This is only an approximation because the kernel version that libc + * was compiled against could be older or newer than the one being + * run. But this should not be a problem -- we just keep using the + * old kernel interface. + */ +#ifdef __linux__ +# ifdef IPV6_RECVPKTINFO +# include +# if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,14) +# ifdef IPV6_2292PKTINFO +# undef IPV6_RECVPKTINFO +# undef IPV6_PKTINFO +# define IPV6_RECVPKTINFO IPV6_2292PKTINFO +# define IPV6_PKTINFO IPV6_2292PKTINFO +# endif +# endif +/* Fall back to the legacy socket option if IPV6_RECVPKTINFO isn't defined */ +# elif defined(IPV6_2292PKTINFO) +# define IPV6_RECVPKTINFO IPV6_2292PKTINFO +# endif +#else + +/* + * For everything that's not Linux we assume RFC 3542 compliance + * - setsockopt() takes IPV6_RECVPKTINFO + * - cmsg_type is IPV6_PKTINFO (in sendmsg, recvmsg) + * + * If we don't have IPV6_RECVPKTINFO defined but do have IPV6_PKTINFO + * defined, chances are the API is RFC2292 compliant and we need to use + * IPV6_PKTINFO for both. + */ +# if !defined(IPV6_RECVPKTINFO) && defined(IPV6_PKTINFO) +# define IPV6_RECVPKTINFO IPV6_PKTINFO + +/* + * Ensure IPV6_RECVPKTINFO is not defined somehow if we have we + * don't have IPV6_PKTINFO. + */ +# elif !defined(IPV6_PKTINFO) +# undef IPV6_RECVPKTINFO +# endif +#endif + +#define ADDR_INIT do { \ + addr->when = fr_time(); \ + addr->socket.type = my->info.socket.type; \ + addr->socket.fd = -1; \ + addr->socket.inet.ifindex = my->info.socket.inet.ifindex; \ + } while (0) + +/* + * Close the descriptor and free the bio. + */ +static int fr_bio_fd_destructor(fr_bio_fd_t *my) +{ + /* + * The upstream bio must have unlinked it from the chain before calling talloc_free() on this + * bio. + */ + fr_assert(!fr_bio_prev(&my->bio)); + fr_assert(!fr_bio_next(&my->bio)); + + return fr_bio_fd_close(&my->bio); +} + +/** Stream read. + * + */ +static ssize_t fr_bio_fd_read_stream(fr_bio_t *bio, UNUSED void *packet_ctx, void *buffer, size_t size) +{ + int tries = 0; + ssize_t rcode; + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + + my->info.read_blocked = false; + +retry: + rcode = read(my->info.socket.fd, buffer, size); + if (rcode > 0) return rcode; + + if (rcode == 0) { + /* + * Stream sockets return 0 at EOF. However, we want to distinguish that from the case of datagram + * sockets, which return 0 when there's no data. So we over-ride the 0 value here, and instead + * return an EOF error. + */ + bio->read = fr_bio_eof_read; + bio->write = fr_bio_null_write; + my->info.eof = true; + + return fr_bio_error(EOF); + } + +#undef flag_blocked +#define flag_blocked info.read_blocked +#include "fd_errno.h" + + return fr_bio_error(IO); +} + +/** Connected datagram read. + * + * The difference between this and stream protocols is that for datagrams. a read of zero means "no packets", + * where a read of zero on a steam socket means "EOF". + */ +static ssize_t fr_bio_fd_read_connected_datagram(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size) +{ + int tries = 0; + ssize_t rcode; + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + + my->info.read_blocked = false; + +retry: + rcode = read(my->info.socket.fd, buffer, size); + if (rcode > 0) { + fr_bio_fd_packet_ctx_t *addr = fr_bio_fd_packet_ctx(my, packet_ctx); + + ADDR_INIT; + + addr->socket.inet.dst_ipaddr = my->info.socket.inet.src_ipaddr; + addr->socket.inet.dst_port = my->info.socket.inet.src_port; + + addr->socket.inet.src_ipaddr = my->info.socket.inet.dst_ipaddr; + addr->socket.inet.src_port = my->info.socket.inet.dst_port; + return rcode; + } + + if (rcode == 0) return rcode; + +#undef flag_blocked +#define flag_blocked info.read_blocked +#include "fd_errno.h" + + return fr_bio_error(IO); +} + +/** Read from a UDP socket where we know our IP + */ +static ssize_t fr_bio_fd_recvfrom(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size) +{ + int tries = 0; + ssize_t rcode; + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + socklen_t salen; + struct sockaddr_storage sockaddr; + + my->info.read_blocked = false; + +retry: + salen = sizeof(sockaddr); + + rcode = recvfrom(my->info.socket.fd, buffer, size, 0, (struct sockaddr *) &sockaddr, &salen); + if (rcode > 0) { + fr_bio_fd_packet_ctx_t *addr = fr_bio_fd_packet_ctx(my, packet_ctx); + + ADDR_INIT; + + addr->socket.inet.dst_ipaddr = my->info.socket.inet.src_ipaddr; + addr->socket.inet.dst_port = my->info.socket.inet.src_port; + + (void) fr_ipaddr_from_sockaddr(&addr->socket.inet.src_ipaddr, addr->socket.inet.src_port, + &sockaddr, salen); + return rcode; + } + + if (rcode == 0 ) return rcode; + +#undef flag_blocked +#define flag_blocked info.read_blocked +#include "fd_errno.h" + + return fr_bio_error(IO); +} + + +/** Write to fd + * + */ +static ssize_t fr_bio_fd_write(fr_bio_t *bio, UNUSED void *packet_ctx, const void *buffer, size_t size) +{ + int tries = 0; + ssize_t rcode; + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + + /* + * FD bios do nothing on flush. + */ + if (!buffer) return 0; + + my->info.write_blocked = false; + +retry: + /* + * Note that we call send() and not write()! Posix says: + * + * "A write was attempted on a socket that is shut down for writing, or is no longer + * connected. In the latter case, if the socket is of type SOCK_STREAM, a SIGPIPE signal shall + * also be sent to the thread." + * + * We can override this behavior by calling send(), and passing the special flag which says + * "don't do that!". The system call will then return EPIPE, which indicates that the socket is + * no longer usavle. + */ + rcode = send(my->info.socket.fd, buffer, size, MSG_NOSIGNAL); + if (rcode >= 0) return rcode; + +#undef flag_blocked +#define flag_blocked info.write_blocked +#include "fd_errno.h" + + return fr_bio_error(IO); +} + +/** Write to a UDP socket where we know our IP + * + */ +static ssize_t fr_bio_fd_sendto(fr_bio_t *bio, UNUSED void *packet_ctx, const void *buffer, size_t size) +{ + int tries = 0; + ssize_t rcode; + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + socklen_t salen; + struct sockaddr_storage sockaddr; + + /* + * FD bios do nothing on flush. + */ + if (!buffer) return 0; + + my->info.write_blocked = false; + + // get destination IP + salen = sizeof(sockaddr); + +retry: + rcode = sendto(my->info.socket.fd, buffer, size, 0, (struct sockaddr *) &sockaddr, salen); + if (rcode >= 0) return rcode; + +#undef flag_blocked +#define flag_blocked info.write_blocked +#include "fd_errno.h" + + return fr_bio_error(IO); +} + + +#if defined(IP_PKTINFO) || defined(IP_RECVDSTADDR) || defined(IPV6_PKTINFO) +static ssize_t fd_fd_recvfromto_common(fr_bio_fd_t *my, void *packet_ctx, void *buffer, size_t size) +{ + int tries = 0; + ssize_t rcode; + struct sockaddr_storage from; + socklen_t from_len; + fr_bio_fd_packet_ctx_t *addr = fr_bio_fd_packet_ctx(my, packet_ctx); + + my->info.read_blocked = false; + + memset(&my->cbuf, 0, sizeof(my->cbuf)); + memset(&my->msgh, 0, sizeof(struct msghdr)); + + my->iov = (struct iovec) { + .iov_base = buffer, + .iov_len = size, + }; + + my->msgh = (struct msghdr) { + .msg_control = my->cbuf, + .msg_controllen = sizeof(my->cbuf), + .msg_name = &from, + .msg_namelen = &from_len, + .msg_iov = &my->iov, + .msg_iovlen = 1, + .msg_flags = 0, + }; + +retry: + rcode = recvmsg(my->info.socket.fd, &my->msgh, 0); + if (rcode > 0) { + ADDR_INIT; + + (void) fr_ipaddr_from_sockaddr(&addr->socket.inet.src_ipaddr, &addr->socket.inet.src_port, + &from, from_len); + + return rcode; + } + + if (rcode == 0) return rcode; + +#undef flag_blocked +#define flag_blocked info.read_blocked +#include "fd_errno.h" + + return fr_bio_error(IO); +} +#endif + +#if defined(IP_PKTINFO) || defined(IP_RECVDSTADDR) + +/** Read from a UDP socket where we can change our IP, IPv4 version. + */ +static ssize_t fr_bio_fd_recvfromto4(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size) +{ + ssize_t rcode; + struct cmsghdr *cmsg; + fr_time_t when = fr_time_wrap(0); + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + fr_bio_fd_packet_ctx_t *addr = fr_bio_fd_packet_ctx(my, packet_ctx); + + rcode = fd_fd_recvfromto_common(my, packet_ctx, buffer, size); + if (rcode <= 0) return rcode; + +DIAG_OFF(sign-compare) + /* Process auxiliary received data in msgh */ + for (cmsg = CMSG_FIRSTHDR(&my->msgh); + cmsg != NULL; + cmsg = CMSG_NXTHDR(&my->msgh, cmsg)) { +DIAG_ON(sign-compare) + +#ifdef IP_PKTINFO + if ((cmsg->cmsg_level == SOL_IP) && + (cmsg->cmsg_type == IP_PKTINFO)) { + struct in_pktinfo *i = (struct in_pktinfo *) CMSG_DATA(cmsg); + struct sockaddr_in to; + + to.sin_addr = i->ipi_addr; + + (void) fr_ipaddr_from_sockaddr(&addr->socket.inet.dst_ipaddr, &addr->socket.inet.dst_port, + (struct sockaddr_storage *) &to, sizeof(struct sockaddr_in)); + addr->socket.inet.ifindex = i->ipi_ifindex; + break; + } +#endif + +#ifdef IP_RECVDSTADDR + if ((cmsg->cmsg_level == IPPROTO_IP) && + (cmsg->cmsg_type == IP_RECVDSTADDR)) { + struct in_addr *i = (struct in_addr *) CMSG_DATA(cmsg); + struct sockaddr_in to; + + to.sin_addr = *i; + (void) fr_ipaddr_from_sockaddr(&addr->socket.inet.dst_ipaddr, &addr->socket.inet.dst_port, + (struct sockaddr_storage *) &to, sizeof(struct sockaddr_in)); + break; + } +#endif + +#ifdef SO_TIMESTAMPNS + if ((cmsg->cmsg_level == SOL_IP) && (cmsg->cmsg_type == SO_TIMESTAMPNS)) { + when = fr_time_from_timespec((struct timespec *)CMSG_DATA(cmsg)); + } + +#elif defined(SO_TIMESTAMP) + if ((cmsg->cmsg_level == SOL_IP) && (cmsg->cmsg_type == SO_TIMESTAMP)) { + when = fr_time_from_timeval((struct timeval *)CMSG_DATA(cmsg)); + } +#endif + } + + if fr_time_eq(when, fr_time_wrap(0)) when = fr_time(); + + addr->when = when; + + return rcode; +} + +/** Send to UDP socket where we can change our IP, IPv4 version. + */ +static ssize_t fr_bio_fd_sendfromto4(fr_bio_t *bio, void *packet_ctx, const void *buffer, size_t size) +{ + int tries = 0; + ssize_t rcode; + struct cmsghdr *cmsg; + struct sockaddr_storage to; + socklen_t to_len; + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + fr_bio_fd_packet_ctx_t *addr = fr_bio_fd_packet_ctx(my, packet_ctx); + + my->info.write_blocked = false; + + memset(&my->cbuf, 0, sizeof(my->cbuf)); + memset(&my->msgh, 0, sizeof(struct msghdr)); + + (void) fr_ipaddr_to_sockaddr(&to, &to_len, &addr->socket.inet.dst_ipaddr, addr->socket.inet.dst_port); + + my->iov = (struct iovec) { + .iov_base = UNCONST(void *, buffer), + .iov_len = size, + }; + + my->msgh = (struct msghdr) { + .msg_control = my->cbuf, + // controllen is set below + .msg_name = &to, + .msg_namelen = &to_len, + .msg_iov = &my->iov, + .msg_iovlen = 1, + .msg_flags = 0, + }; + + cmsg = CMSG_FIRSTHDR(&my->msgh); + + { +#ifdef IP_PKTINFO + struct in_pktinfo *pkt; + + my->msgh.msg_controllen = CMSG_SPACE(sizeof(*pkt)); + + cmsg->cmsg_level = SOL_IP; + cmsg->cmsg_type = IP_PKTINFO; + cmsg->cmsg_len = CMSG_LEN(sizeof(*pkt)); + + pkt = (struct in_pktinfo *) CMSG_DATA(cmsg); + memset(pkt, 0, sizeof(*pkt)); + pkt->ipi_spec_dst = addr->socket.inet.src_ipaddr.addr.v4; + pkt->ipi_ifindex = addr->socket.inet.ifindex; + +#elif defined(IP_SENDSRCADDR) + struct in_addr *in; + + my->msgh.msg_controllen = CMSG_SPACE(sizeof(*in)); + + cmsg->cmsg_level = IPPROTO_IP; + cmsg->cmsg_type = IP_SENDSRCADDR; + cmsg->cmsg_len = CMSG_LEN(sizeof(*in)); + + in = (struct in_addr *) CMSG_DATA(cmsg); + *in = addr->socket.inet.src_ipaddr.addr.v4; +#endif + } + +retry: + rcode = sendmsg(my->info.socket.fd, &my->msgh, 0); + if (rcode >= 0) return rcode; + +#undef flag_blocked +#define flag_blocked info.read_blocked +#include "fd_errno.h" + + return fr_bio_error(IO); +} + +static inline int fr_bio_fd_udpfromto_init4(int fd) +{ + int proto = 0, flag = 0, opt = 1; + +#ifdef HAVE_IP_PKTINFO + /* + * Linux + */ + proto = SOL_IP; + flag = IP_PKTINFO; + +#elif defined(IP_RECVDSTADDR) + /* + * Set the IP_RECVDSTADDR option (BSD). Note: + * IP_RECVDSTADDR == IP_SENDSRCADDR + */ + proto = IPPROTO_IP; + flag = IP_RECVDSTADDR; +#endif + + return setsockopt(fd, proto, flag, &opt, sizeof(opt)); +} +#endif + +#if defined(IPV6_PKTINFO) +/** Read from a UDP socket where we can change our IP, IPv4 version. + */ +static ssize_t fr_bio_fd_recvfromto6(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size) +{ + ssize_t rcode; + struct cmsghdr *cmsg; + fr_time_t when = fr_time_wrap(0); + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + fr_bio_fd_packet_ctx_t *addr = fr_bio_fd_packet_ctx(my, packet_ctx); + + rcode = fd_fd_recvfromto_common(my, packet_ctx, buffer, size); + if (rcode <= 0) return rcode; + +DIAG_OFF(sign-compare) + /* Process auxiliary received data in msgh */ + for (cmsg = CMSG_FIRSTHDR(&my->msgh); + cmsg != NULL; + cmsg = CMSG_NXTHDR(&my->msgh, cmsg)) { +DIAG_ON(sign-compare) + + if ((cmsg->cmsg_level == IPPROTO_IPV6) && + (cmsg->cmsg_type == IPV6_PKTINFO)) { + struct in6_pktinfo *i = (struct in6_pktinfo *) CMSG_DATA(cmsg); + struct sockaddr_in6 to; + + to.sin6_addr = i->ipi6_addr; + + (void) fr_ipaddr_from_sockaddr(&addr->socket.inet.dst_ipaddr, &addr->socket.inet.dst_port, + (struct sockaddr_storage *) &to, sizeof(struct sockaddr_in6)); + addr->socket.inet.ifindex = i->ipi6_ifindex; + break; + } + +#ifdef SO_TIMESTAMPNS + if ((cmsg->cmsg_level == SOL_IP) && (cmsg->cmsg_type == SO_TIMESTAMPNS)) { + when = fr_time_from_timespec((struct timespec *)CMSG_DATA(cmsg)); + } + +#elif defined(SO_TIMESTAMP) + if ((cmsg->cmsg_level == SOL_IP) && (cmsg->cmsg_type == SO_TIMESTAMP)) { + when = fr_time_from_timeval((struct timeval *)CMSG_DATA(cmsg)); + } +#endif + } + + if fr_time_eq(when, fr_time_wrap(0)) when = fr_time(); + + addr->when = when; + + return rcode; +} + +/** Send to UDP socket where we can change our IP, IPv4 version. + */ +static ssize_t fr_bio_fd_sendfromto6(fr_bio_t *bio, void *packet_ctx, const void *buffer, size_t size) +{ + int tries = 0; + ssize_t rcode; + struct cmsghdr *cmsg; + struct sockaddr_storage to; + socklen_t to_len; + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + fr_bio_fd_packet_ctx_t *addr = fr_bio_fd_packet_ctx(my, packet_ctx); + + my->info.write_blocked = false; + + memset(&my->cbuf, 0, sizeof(my->cbuf)); + memset(&my->msgh, 0, sizeof(struct msghdr)); + + (void) fr_ipaddr_to_sockaddr(&to, &to_len, &addr->socket.inet.dst_ipaddr, addr->socket.inet.dst_port); + + my->iov = (struct iovec) { + .iov_base = UNCONST(void *, buffer), + .iov_len = size, + }; + + my->msgh = (struct msghdr) { + .msg_control = my->cbuf, + // controllen is set below + .msg_name = &to, + .msg_namelen = &to_len, + .msg_iov = &my->iov, + .msg_iovlen = 1, + .msg_flags = 0, + }; + + cmsg = CMSG_FIRSTHDR(&my->msgh); + + { + struct in6_pktinfo *pkt; + + my->msgh.msg_controllen = CMSG_SPACE(sizeof(*pkt)); + + cmsg->cmsg_level = IPPROTO_IPV6; + cmsg->cmsg_type = IPV6_PKTINFO; + cmsg->cmsg_len = CMSG_LEN(sizeof(*pkt)); + + pkt = (struct in6_pktinfo *) CMSG_DATA(cmsg); + memset(pkt, 0, sizeof(*pkt)); + pkt->ipi6_addr = addr->socket.inet.src_ipaddr.addr.v6; + pkt->ipi6_ifindex = addr->socket.inet.ifindex; + } + +retry: + rcode = sendmsg(my->info.socket.fd, &my->msgh, 0); + if (rcode >= 0) return rcode; + +#undef flag_blocked +#define flag_blocked info.read_blocked +#include "fd_errno.h" + + return fr_bio_error(IO); +} + + +static inline int fr_bio_fd_udpfromto_init6(int fd) +{ + int opt = 1; + + return setsockopt(fd, IPPROTO_IPV6, IPV6_RECVPKTINFO, &opt, sizeof(opt)); +} +#endif + +int fr_filename_to_sockaddr(struct sockaddr_un *sun, socklen_t *sunlen, char const *filename) +{ + size_t len; + + len = strlen(filename); + if (len >= sizeof(sun->sun_path)) { + fr_strerror_const("Failed parsing unix domain socket filename: Name is too long"); + return -1; + } + + sun->sun_family = AF_UNIX; + memcpy(sun->sun_path, filename, len + 1); /* SUN_LEN will do strlen */ + + *sunlen = SUN_LEN(sun); + + return 0; +} + + +/** Try to connect(). + * + * If connect is blocking, we either succeed or error immediately. Otherwise, the caller has to select the + * socket for writeability, and then call fr_bio_fd_connect() as soon as the socket is writeable. + */ +static ssize_t fr_bio_fd_try_connect(fr_bio_fd_t *my) +{ + int tries = 0; + int rcode; + socklen_t salen; + struct sockaddr_storage sockaddr; + + if (my->info.socket.af != AF_UNIX) { + rcode = fr_ipaddr_to_sockaddr(&sockaddr, &salen, &my->info.socket.inet.dst_ipaddr, &my->info.socket.inet.dst_port); + } else { + rcode = fr_filename_to_sockaddr((struct sockaddr_un *) &sockaddr, &salen, my->info.socket.unix.path); + } + + if (rcode < 0) { + fr_bio_shutdown(&my->bio); + return fr_bio_error(GENERIC); + } + + my->info.state = FR_BIO_FD_STATE_CONNECTING; + +retry: + if (connect(my->info.socket.fd, (struct sockaddr *) &sockaddr, salen) == 0) { + my->info.state = FR_BIO_FD_STATE_OPEN; + + if (fr_bio_fd_init_common(my) < 0) goto fail; + + return 0; + } + + switch (errno) { + case EINTR: + tries++; + if (tries <= my->max_tries) goto retry; + FALL_THROUGH; + + /* + * This shouldn't happen, but we'll allow it + */ + case EALREADY: + FALL_THROUGH; + + /* + * Once the socket is writable, it will be active, or in an error state. The caller has + * to call fr_bio_fd_connect() before calling write() + */ + case EINPROGRESS: + my->info.write_blocked = true; + return fr_bio_error(IO_WOULD_BLOCK); + + default: + break; + } + +fail: + fr_bio_shutdown(&my->bio); + return fr_bio_error(IO); +} + +int fr_bio_fd_init_connected(fr_bio_fd_t *my) +{ + /* + * Connected datagrams must have real IPs + */ + if (fr_ipaddr_is_inaddr_any(&my->info.socket.inet.src_ipaddr)) return -1; + if (fr_ipaddr_is_inaddr_any(&my->info.socket.inet.dst_ipaddr)) return -1; + + /* + * Don't do any reads until we're connected. + */ + my->bio.read = fr_bio_null_read; + my->bio.write = fr_bio_null_write; + + my->info.eof = false; + + /* + * The socket shouldn't be selected for read. But it should be selected for write. + */ + my->info.read_blocked = false; + my->info.write_blocked = true; + +#ifdef SO_NOSIGPIPE + /* + * Although the server ignore SIGPIPE, some operating systems like BSD and OSX ignore the + * ignoring. + * + * Fortunately, those operating systems usually support SO_NOSIGPIPE. We set that to prevent + * them raising the signal in the first place. + */ + { + int on = 1; + + setsockopt(my->info.socket.fd, SOL_SOCKET, SO_NOSIGPIPE, &on, sizeof(on)); + } +#endif + + return fr_bio_fd_try_connect(my); +} + +int fr_bio_fd_init_common(fr_bio_fd_t *my) +{ + if (my->info.socket.type == SOCK_STREAM) { //!< stream socket + my->bio.read = fr_bio_fd_read_stream; + my->bio.write = fr_bio_fd_write; + + } else if (my->info.type == FR_BIO_FD_CONNECTED) { //!< connected datagram + my->bio.read = fr_bio_fd_read_connected_datagram; + my->bio.write = fr_bio_fd_write; + + } else if (!fr_ipaddr_is_inaddr_any(&my->info.socket.inet.src_ipaddr)) { //!< we know our IP address + my->bio.read = fr_bio_fd_recvfrom; + my->bio.write = fr_bio_fd_sendto; + +#if defined(IP_PKTINFO) || defined(IP_RECVDSTADDR) + } else if (my->info.socket.inet.src_ipaddr.af == AF_INET) { //!< we don't know our IPv4 + if (fr_bio_fd_udpfromto_init4(my->info.socket.fd) < 0) return -1; + + my->bio.read = fr_bio_fd_recvfromto4; + my->bio.write = fr_bio_fd_sendfromto4; +#endif + +#if defined(IPV6_PKTINFO) + } else if (my->info.socket.inet.src_ipaddr.af == AF_INET6) { //!< we don't know our IPv6 + + if (fr_bio_fd_udpfromto_init6(my->info.socket.fd) < 0) return -1; + + my->bio.read = fr_bio_fd_recvfromto6; + my->bio.write = fr_bio_fd_sendfromto6; +#endif + + } else { + fr_strerror_const("Failed initializing socket: cannot determine what to do"); + return -1; + } + + my->info.state = FR_BIO_FD_STATE_OPEN; + my->info.eof = false; + my->info.read_blocked = false; + my->info.write_blocked = false; + + return 0; +} + +/** Return an fd on read() + * + * With packet_ctx containing information about the socket. + */ +static ssize_t fr_bio_fd_read_accept(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size) +{ + int fd, tries = 0; + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + socklen_t salen; + struct sockaddr_storage sockaddr; + + if (size < sizeof(int)) return fr_bio_error(BUFFER_TOO_SMALL); + + salen = sizeof(sockaddr); + +retry: +#ifdef __linux__ + /* + * Set these flags immediately on the new socket. + */ + fd = accept4(my->info.socket.fd, (struct sockaddr *) &sockaddr, &salen, SOCK_NONBLOCK | SOCK_CLOEXEC); +#else + fd = accept(my->info.socket.fd, (struct sockaddr *) &sockaddr, &salen); +#endif + if (fd >= 0) { + fr_bio_fd_packet_ctx_t *addr = fr_bio_fd_packet_ctx(my, packet_ctx); + + ADDR_INIT; + + (void) fr_ipaddr_from_sockaddr(&addr->socket.inet.src_ipaddr, addr->socket.inet.src_port, + &sockaddr, salen); + + addr->socket.inet.dst_ipaddr = my->info.socket.inet.src_ipaddr; + addr->socket.inet.dst_port = my->info.socket.inet.src_port; + addr->socket.fd = fd; /* might as well! */ + + *(int *) buffer = fd; + return sizeof(int); + } + + switch (errno) { + case EINTR: + /* + * Try a few times before giving up. + */ + tries++; + if (tries <= my->max_tries) goto retry; + return 0; + + /* + * We can ignore these errors. + */ + case ECONNABORTED: +#if defined(EWOULDBLOCK) && (EWOULDBLOCK != EAGAIN) + case EWOULDBLOCK: +#endif + case EAGAIN: +#ifdef EPERM + case EPERM: +#endif +#ifdef ETIMEDOUT + case ETIMEDOUT: +#endif + return 0; + + default: + /* + * Some other error, it's fatal. + */ + fr_bio_shutdown(&my->bio); + break; + } + + return fr_bio_error(IO); +} + + +int fr_bio_fd_init_accept(fr_bio_fd_t *my) +{ + my->info.state = FR_BIO_FD_STATE_OPEN; + my->info.eof = false; + my->info.read_blocked = true; + my->info.write_blocked = false; /* don't select() for write */ + + my->bio.read = fr_bio_fd_read_accept; + my->bio.write = fr_bio_null_write; + + if (listen(my->info.socket.fd, 8) < 0) { + fr_strerror_printf("Failed opening setting FD_CLOEXE: %s", fr_syserror(errno)); + return -1; + } + + return 0; +} + + +/** Allocate a FD bio + * + * The caller is responsible for tracking the FD, and all associated management of it. The bio API is + * intended to be simple, and does not provide wrapper functions for various ioctls. The caller should + * instead do that work. + * + * Once the FD is give to the bio, its lifetime is "owned" by the bio. Calling talloc_free(bio) will close + * the FD. + * + * The caller can still manage the FD for being readable / writeable. However, the caller should not call + * this bio directly (unless it is the only one). Instead, the caller should read from / write to the + * previous bio which will then eventually call this one. + * + * Before updating any event handler readable / writeable callbacks, the caller should check + * fr_bio_fd_at_eof(). If true, then the handlers should not be inserted. The previous bios should still be + * called to process any pending data, until they return EOF. + * + * The main purpose of an FD bio is to wrap the FD in a bio container. That, and handling retries on read / + * write, along with returning EOF as an error instead of zero. + * + * Note that the read / write functions can return partial data. It is the callers responsibility to ensure + * that any writes continue from where they left off (otherwise dat awill be missing). And any partial reads + * should go to a memory bio. + * + * If a read returns EOF, then the FD remains open until talloc_free(bio) or fr_bio_fd_close() is called. + * + * @param ctx the talloc ctx + * @param cb callbacks + * @param sock structure holding socket information + * src_ip is always *our* IP. dst_ip is always *their* IP. + * @param type type of the bio + * @param offset for datagram sockets, where #fr_bio_fd_packet_ctx_t is stored + * @return + * - NULL on error, memory allocation failed + * - !NULL the bio + */ +fr_bio_t *fr_bio_fd_alloc(TALLOC_CTX *ctx, fr_bio_cb_funcs_t *cb, fr_socket_t const *sock, fr_bio_fd_type_t type, size_t offset) +{ + fr_bio_fd_t *my; + + my = talloc_zero(ctx, fr_bio_fd_t); + if (!my) return NULL; + + if (cb) my->cb = *cb; + my->max_tries = 4; + my->offset = offset; + + if (sock) { + my->info.type = type; + my->info.state = FR_BIO_FD_STATE_CLOSED; + + if ((my->info.socket.fd >= 0) && + (fr_bio_fd_init(&my->bio, sock) < 0)) { + talloc_free(my); + return -1; + } + } else { + /* + * We can allocate a "place-holder" FD bio, and then later fill it in with + * fr_bio_fd_init(). + * + * @todo - maybe just use fr_bio_fd_open() all of the time? + */ + my->info = (fr_bio_fd_info_t) { + .socket = { + .af = AF_UNSPEC, + }, + .type = type, + .read_blocked = true, + .write_blocked = true, + .eof = false, + .state = FR_BIO_FD_STATE_CLOSED, + }; + + my->bio.read = fr_bio_eof_read; + my->bio.write = fr_bio_null_write; + } + + talloc_set_destructor(my, fr_bio_fd_destructor); + return (fr_bio_t *) my; +} + +/** Close the FD, but leave the bio allocated and alive. + * + */ +int fr_bio_fd_close(fr_bio_t *bio) +{ + int rcode; + int tries = 0; + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + + if (my->info.state == FR_BIO_FD_STATE_CLOSED) return 0; + + /* + * Shut the bio down cleanly. + */ + rcode = fr_bio_shutdown(bio); + if (rcode < 0) return rcode; + + my->bio.read = fr_bio_eof_read; + my->bio.write = fr_bio_null_write; + + /* + * Shut down the connected socket. The only errors possible here are things we can't do anything + * about. + * + * shutdown() will close ALL versions of this file descriptor, even if it's (somehow) used in + * another process. shutdown() will also tell the kernel to gracefully close the connected + * socket, so that it can signal the other end, instead of having the connection disappear. + * + * This shouldn't strictly be necessary, as no other processes should be sharing this file + * descriptor. But it's the safe (and polite) thing to do. + */ + if (my->info.type == FR_BIO_FD_CONNECTED) { + (void) shutdown(my->info.socket.fd, SHUT_RDWR); + } + +retry: + rcode = close(my->info.socket.fd); + if (rcode < 0) { + switch (errno) { + case EINTR: + case EIO: + tries++; + if (tries < my->max_tries) goto retry; + return -1; + + default: + /* + * EBADF, or other unrecoverable error. We just call it closed, and continue. + */ + break; + } + } + + my->info.state = FR_BIO_FD_STATE_CLOSED; + my->info.read_blocked = true; + my->info.write_blocked = true; + my->info.eof = true; + + return 0; +} + +/** re-open the bio + */ +int fr_bio_fd_init(fr_bio_t *bio, fr_socket_t const *sock) +{ + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + + fr_assert(my->info.socket.inet.src_ipaddr.af == my->info.socket.inet.dst_ipaddr.af); + + /* + * The bio can't be open if we're re-initializing it. + */ + if (my->info.state == FR_BIO_FD_STATE_OPEN) return -1; + + my->info.socket = *sock; + + switch (my->info.type) { + case FR_BIO_FD_UNCONNECTED: + return fr_bio_fd_init_common(my); + + case FR_BIO_FD_CONNECTED: + return fr_bio_fd_init_connected(my); + + case FR_BIO_FD_ACCEPT: + return fr_bio_fd_init_accept(my); + } +} + +/** Finalize a connect() + * + * connect() said "come back when the socket is writeable". It's now writeable, so we check if there was a + * connection error. + */ +int fr_bio_fd_connect(fr_bio_t *bio) +{ + int error; + socklen_t socklen = sizeof(error); + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + + if (my->info.state == FR_BIO_FD_STATE_OPEN) return 0; + + if (my->info.state != FR_BIO_FD_STATE_CONNECTING) return fr_bio_error(GENERIC); + + /* + * The socket is writeable. Let's see if there's an error. + * + * Unix Network Programming says: + * + * ""If so_error is nonzero when the process calls write, -1 is returned with errno set to the + * value of SO_ERROR (p. 495 of TCPv2) and SO_ERROR is reset to 0. We have to check for the + * error, and if there's no error, set the state to "open". "" + * + * The same applies to connect(). If a non-blocking connect returns INPROGRESS, it may later + * become writable. It will be writable even if the connection fails. Rather than writing some + * random application data, we call SO_ERROR, and get the underlying error. + */ + if (getsockopt(my->info.socket.fd, SOL_SOCKET, SO_ERROR, (void *)&error, &socklen) < 0) { + fail: + fr_bio_shutdown(bio); + return fr_bio_error(IO); + } + + my->info.state = FR_BIO_FD_STATE_OPEN; + + /* + * The socket is connected, so initialize the normal IO handlers. + */ + if (fr_bio_fd_init_common(my) < 0) goto fail; + + return 0; +} + +/** Returns a pointer to the bio-specific information. + * + */ +fr_bio_fd_info_t const *fr_bio_fd_info(fr_bio_t *bio) +{ + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + + return &my->info; +} + + +/** Discard all reads from a UDP socket. + */ +static ssize_t fr_bio_fd_read_discard(fr_bio_t *bio, UNUSED void *packet_ctx, void *buffer, size_t size) +{ + int tries = 0; + ssize_t rcode; + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + + my->info.read_blocked = false; + +retry: + rcode = read(my->info.socket.fd, buffer, size); + if (rcode >= 0) return 0; + +#undef flag_blocked +#define flag_blocked info.read_blocked +#include "fd_errno.h" + + return fr_bio_error(IO); +} + +/** Mark up a bio as write-only + * + */ +int fr_bio_fd_write_only(fr_bio_t *bio) +{ + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + + switch (my->info.type) { + case FR_BIO_FD_UNCONNECTED: + if (my->info.socket.type != SOCK_DGRAM) { + fr_strerror_const("Only datagram sockets can be marked 'write-only'"); + return -1; + } + break; + + case FR_BIO_FD_CONNECTED: + case FR_BIO_FD_ACCEPT: + fr_strerror_const("Only unconnected sockets can be marked 'write-only'"); + return -1; + } + + my->bio.read = fr_bio_fd_read_discard; + return 0; +} diff --git a/src/lib/bio/fd.h b/src/lib/bio/fd.h new file mode 100644 index 00000000000..ac4de1ac292 --- /dev/null +++ b/src/lib/bio/fd.h @@ -0,0 +1,121 @@ +#pragma once +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/fd.h + * @brief Binary IO abstractions for file descriptors + * + * Allow reads and writes from file descriptors. + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ +RCSIDH(lib_bio_fd_h, "$Id$") + +#include +#include + +/** Per-packet context + * + * For reading packets src_ip is *their* IP, and dst_ip is *our* IP. + * + * For writing packets, src_ip is *our* IP, and dst_ip is *their* IP. + * + * This context is returned only for datagram sockets. For stream sockets (TCP and Unix domain), it + * isn't used. The caller can look at the socket information to determine src/dst ip/port. + */ +typedef struct { + fr_time_t when; //!< when the packet was received + fr_socket_t socket; //!< socket information, including FD. +} fr_bio_fd_packet_ctx_t; + +typedef enum { + FR_BIO_FD_STATE_INVALID = 0, + FR_BIO_FD_STATE_CLOSED, + FR_BIO_FD_STATE_OPEN, //!< error states must be before this + FR_BIO_FD_STATE_CONNECTING, +} fr_bio_fd_state_t; + +typedef enum { + FR_BIO_FD_UNCONNECTED, //!< unconnected UDP / datagram only + // updates #fr_bio_fd_packet_ctx_t for reads, + // uses #fr_bio_fd_packet_ctx_t for writes + FR_BIO_FD_CONNECTED, //!< connected client sockets (UDP or TCP) + FR_BIO_FD_ACCEPT, //!< returns new fd in buffer on fr_bio_read() + // updates #fr_bio_fd_packet_ctx_t on successful FD read. +} fr_bio_fd_type_t; + +/** Run-time status of the socket. + * + */ +typedef struct { + fr_socket_t socket; //!< as connected socket + + fr_bio_fd_type_t type; //!< type of the socket + + fr_bio_fd_state_t state; //!< connecting, open, closed, etc. + + bool read_blocked; //!< did we block on read? + bool write_blocked; //!< did we block on write? + bool eof; //!< are we at EOF? + +} fr_bio_fd_info_t; + +/** Configuration for sockets + * + * Each piece of information is broken out into a separate field, so that the configuration file parser can + * parse each field independently. + * + * We also include more information here than we need in an #fr_socket_t. + */ +typedef struct { + fr_bio_fd_type_t type; //!< accept, connected, unconnected, etc. + + int socket_type; //!< SOCK_STREAM or SOCK_DGRAM + + fr_ipaddr_t src_ipaddr; //!< our IP address + fr_ipaddr_t dst_ipaddr; //!< their IP address + + uint16_t src_port; //!< our port + uint16_t dst_port; //!< their port + + char const *interface; //!< for binding to an interface + + uint32_t recv_buff; //!< How big the kernel's receive buffer should be. + uint32_t send_buff; //!< How big the kernel's send buffer should be. + + char const *path; //!< for Unix domain sockets + mode_t perm; //!< permissions for domain sockets + uid_t uid; //!< who owns the socket + gid_t gid; //!< who owns the socket + + bool async; //!< is it async +} fr_bio_fd_config_t; + +fr_bio_t *fr_bio_fd_alloc(TALLOC_CTX *ctx, fr_bio_cb_funcs_t *cb, fr_socket_t const *sock, fr_bio_fd_type_t type, size_t offset) CC_HINT(nonnull(1)); + +int fr_bio_fd_close(fr_bio_t *bio) CC_HINT(nonnull); + +int fr_bio_fd_init(fr_bio_t *bio, fr_socket_t const *sock) CC_HINT(nonnull); + +int fr_bio_fd_connect(fr_bio_t *bio) CC_HINT(nonnull); + +fr_bio_fd_info_t const *fr_bio_fd_info(fr_bio_t *bio) CC_HINT(nonnull); + +int fr_bio_fd_socket_open(fr_bio_t *bio, fr_bio_fd_config_t const *cfg) CC_HINT(nonnull); + +int fr_bio_fd_write_only(fr_bio_t *bio); diff --git a/src/lib/bio/fd_errno.h b/src/lib/bio/fd_errno.h new file mode 100644 index 00000000000..7f2d9e0bdae --- /dev/null +++ b/src/lib/bio/fd_errno.h @@ -0,0 +1,29 @@ +/* + * Code snippet to avoid duplication. + */ +switch (errno) { +case EINTR: + /* + * Try a few times before giving up. + */ + tries++; + if (tries <= my->max_tries) goto retry; + return 0; + +#if defined(EWOULDBLOCK) && (EWOULDBLOCK != EAGAIN) +case EWOULDBLOCK: +#endif +case EAGAIN: + /* + * The operation would block, return that. + */ + my->flag_blocked = true; + return fr_bio_error(IO_WOULD_BLOCK); + +default: + /* + * Some other error, it's fatal. + */ + fr_bio_shutdown(&my->bio); + break; +} diff --git a/src/lib/bio/fd_open.c b/src/lib/bio/fd_open.c new file mode 100644 index 00000000000..e719c37dbed --- /dev/null +++ b/src/lib/bio/fd_open.c @@ -0,0 +1,883 @@ +/* + * This program is is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/fd_open.c + * @brief BIO abstractions for opening file descriptors + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ + +#include +#include + +#include +#include +#include +#include + +/** Initialize common datagram information + * + */ +static int fr_bio_fd_common_tcp(int fd, UNUSED fr_socket_t const *sock, UNUSED fr_bio_fd_config_t const *cfg) +{ + int on = 1; + +#ifdef SO_KEEPALIVE + if (setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &on, sizeof(on)) < 0) { + fr_strerror_printf("Failed setting SO_KEEPALIVE: %s", fr_syserror(errno)); + return -1; + } +#endif + + return 0; +} + + +/** Initialize common datagram information + * + */ +static int fr_bio_fd_common_datagram(int fd, UNUSED fr_socket_t const *sock, fr_bio_fd_config_t const *cfg) +{ + int on = 1; + +#ifdef SO_TIMESTAMPNS + /* + * Enable receive timestamps, these should reflect + * when the packet was received, not when it was read + * from the socket. + */ + if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPNS, &on, sizeof(int)) < 0) { + fr_strerror_printf("Failed setting SO_TIMESTAMPNS: %s", fr_syserror(errno)); + return -1; + } + +#elif defined(SO_TIMESTAMP) + /* + * Enable receive timestamps, these should reflect + * when the packet was received, not when it was read + * from the socket. + */ + if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMP, &on, sizeof(int)) < 0) { + fr_strerror_printf("Failed setting SO_TIMESTAMP: %s", fr_syserror(errno)); + return -1; + } +#endif + +#ifdef SO_RCVBUF + if (cfg->recv_buff) { + int opt = cfg->recv_buff; + + if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &opt, sizeof(opt)) < 0) { + fr_strerror_printf("Failed setting SO_RCVBUF: %s", fr_syserror(errno)); + return -1; + } + } +#endif + +#ifdef SO_SNDBUF + if (cfg->send_buff) { + int opt = cfg->send_buff; + + if (setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &opt, sizeof(opt)) < 0) { + fr_strerror_printf("Failed setting SO_SNDBUF: %s", fr_syserror(errno)); + return -1; + } + } +#endif + + return 0; +} + +/** Initialize a UDP server socket. + * + */ +static int fr_bio_fd_server_udp(int fd, fr_socket_t const *sock, fr_bio_fd_config_t const *cfg) +{ +#ifdef SO_REUSEPORT + int on = 1; + + /* + * Set SO_REUSEPORT before bind, so that all sockets can + * listen on the same destination IP address. + */ + if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &on, sizeof(on)) < 0) { + fr_strerror_printf("Failed setting SO_REUSEPORT: %s", fr_syserror(errno)); + return -1; + } +#endif + + return fr_bio_fd_common_datagram(fd, sock, cfg); +} + +/** Initialize a TCP server socket. + * + */ +static int fr_bio_fd_server_tcp(int fd, UNUSED fr_socket_t const *sock) +{ + int on = 1; + + if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)) < 0) { + fr_strerror_printf("Failed setting SO_REUSEADDR: %s", fr_syserror(errno)); + return -1; + } + + return 0; +} + +/** Initialize an IPv4 server socket. + * + */ +static int fr_bio_fd_server_ipv4(int fd, fr_socket_t const *sock, fr_bio_fd_config_t const *cfg) +{ + int flag; + +#if defined(IP_MTU_DISCOVER) && defined(IP_PMTUDISC_DONT) + /* + * Disable PMTU discovery. On Linux, this also makes sure that the "don't + * fragment" flag is zero. + */ + flag = IP_PMTUDISC_DONT; + + if (setsockopt(fd, IPPROTO_IP, IP_MTU_DISCOVER, &flag, sizeof(flag)) < 0) { + fr_strerror_printf("Failed setting IP_MTU_DISCOVER: %s", fr_syserror(errno)); + return -1; + } +#endif + +#if defined(IP_DONTFRAG) + /* + * Ensure that the "don't fragment" flag is zero. + */ + flag = 0; + + if (setsockopt(fd, IPPROTO_IP, IP_DONTFRAG, &flag, sizeof(flag)) < 0) { + fr_strerror_printf("Failed setting IP_DONTFRAG: %s", fr_syserror(errno)); + return -1; + } +#endif + + /* + * And set up any UDP / TCP specific information. + */ + if (sock->type == SOCK_DGRAM) return fr_bio_fd_server_udp(fd, sock, cfg); + + return fr_bio_fd_server_tcp(fd, sock); +} + +/** Initialize an IPv6 server socket. + * + */ +static int fr_bio_fd_server_ipv6(int fd, fr_socket_t const *sock, fr_bio_fd_config_t const *cfg) +{ +#ifdef IPV6_V6ONLY + /* + * Don't allow v4 packets on v6 connections. + */ + if (IN6_IS_ADDR_UNSPECIFIED(UNCONST(struct in6_addr *, &sock->inet.src_ipaddr.addr.v6))) { + int on = 1; + + if (setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, (char *)&on, sizeof(on)) < 0) { + fr_strerror_printf("Failed setting IPV6_ONLY: %s", fr_syserror(errno)); + return -1; + } + } +#endif /* IPV6_V6ONLY */ + + /* + * And set up any UDP / TCP specific information. + */ + if (sock->type == SOCK_DGRAM) return fr_bio_fd_server_udp(fd, sock, cfg); + + return fr_bio_fd_server_tcp(fd, sock); +} + +/** Verify or clean up a pre-existing domain socket. + * + */ +static int fr_bio_fd_socket_unix_verify(int dirfd, char const *filename, fr_bio_fd_config_t const *cfg) +{ + int fd; + struct stat buf; + + /* + * See if the socket exits. If there's an error opening it, that's an issue. + * + * If it doesn't exist, that's fine. + */ + if (fstatat(dirfd, filename, &buf, AT_SYMLINK_NOFOLLOW) < 0) { + if (errno != ENOENT) { + fr_strerror_printf("Failed opening domain socket %s: %s", cfg->path, fr_syserror(errno)); + return -1; + } + + return 0; + } + + /* + * If it exists, it must be a socket. + */ + if (!S_ISSOCK(buf.st_mode)) { + fr_strerror_printf("Failed open domain socket %s: it is not a socket", filename); + return -1; + } + + /* + * Refuse to open sockets not owned by us. This prevents configurations from stomping on each + * other. + */ + if (buf.st_uid != cfg->uid) { + fr_strerror_printf("Failed opening domain socket %s: incorrect UID", cfg->path); + return -1; + } + + /* + * The file exists,and someone is listening. We can't claim it for ourselves. + * + * Note that this function calls connect(), but connect() always returns immediately for domain + * sockets. + * + * @todo - redo that function here, with separate checks for permission errors vs anything else. + */ + fd = fr_socket_client_unix(cfg->path, false); + if (fd >= 0) { + close(fd); + fr_strerror_printf("Failed creating domain socket %s: It is currently active", cfg->path); + return -1; + } + + /* + * It exists, but no one is listening. Delete it so that we can re-bind to it. + */ + if (unlinkat(dirfd, filename, 0) < 0) { + fr_strerror_printf("Failed removing pre-existing domain socket %s: %s", + cfg->path, fr_syserror(errno)); + return -1; + } + + return 0; +} + +/* + * We normally can't call fchmod() or fchown() on sockets, as they don't really exist in the file system. + * Instead, we enforce those permissions on the parent directory of the socket. + */ +static int fr_bio_fd_socket_unix_mkdir(int *dirfd, char const **filename, fr_bio_fd_config_t const *cfg) +{ + mode_t perm; + int parent_fd, fd; + char const *path = cfg->path; + char *dir, *p; + char *slashes[2]; + + perm = S_IREAD | S_IWRITE | S_IEXEC; + perm |= S_IRGRP | S_IWGRP | S_IXGRP; + + /* + * The parent directory exists. Ensure that it has the correct ownership and permissions. + * + * If the parent directory exists, then it enforces access, and we can create the domain socket + * within it. + */ + if (fr_dirfd(dirfd, filename, path) == 0) { + struct stat buf; + + if (fstat(*dirfd, &buf) < 0) { + fr_strerror_printf("Failed reading parent directory for file %s: %s", path, fr_syserror(errno)); + close(*dirfd); + return -1; + } + + if (buf.st_uid != cfg->uid) { + fr_strerror_printf("Failed reading parent directory for file %s: Incorrect UID", path); + return -1; + } + + if (buf.st_gid != cfg->gid) { + fr_strerror_printf("Failed reading parent directory for file %s: Incorrect GID", path); + return -1; + } + + /* + * We don't have the correct permissions on the directory, so we fix them. + * + * @todo - allow for "other" to read/write if we do authentication on the socket? + */ + if (fchmod(*dirfd, perm) < 0) { + fr_strerror_printf("Failed setting parent directory permissions for file %s: %s", path, fr_syserror(errno)); + close(*dirfd); + return -1; + } + + return 0; + } + + dir = talloc_strdup(NULL, path); + if (!dir) return -1; + + /* + * Find the last two directory separators. + */ + slashes[0] = slashes[1] = NULL; + for (p = dir; *p != '\0'; p++) { + if (*p == '/') { + slashes[0] = slashes[1]; + slashes[1] = p; + } + } + + /* + * There's only one / in the path, we can't do anything. + * + * Opening 'foo/bar.sock' might be useful, but isn't normally a good idea. + */ + if (!slashes[0]) { + fr_strerror_printf("Failed parsing filename %s: it is not absolute", path); + fail: + talloc_free(dir); + return -1; + } + + /* + * Ensure that the grandparent directory exists. + * + * /var/run/radiusd/foo.sock + * + * slashes[0] points to the slash after 'run'. + * + * slashes[1] points to the slash after 'radiusd', which doesn't exist. + */ + slashes[0] = '\0'; + + /* + * If the grandparent doesn't exist, then we don't create it. + * + * These checks minimize the possibility that a misconfiguration by user "radiusd" can cause a + * suid-root binary top create a directory in the wrong place. These checks are only necessary + * if the unix domain socket is opened as root. + */ + parent_fd = open(dir, O_DIRECTORY | O_NOFOLLOW); + if (parent_fd < 0) { + fr_strerror_printf("Failed opening directory %s: %s", dir, fr_syserror(errno)); + goto fail; + } + + /* + * Create the parent directory. + */ + slashes[0] = '/'; + slashes[1] = '\0'; + if (mkdirat(parent_fd, dir, 0700) < 0) { + fr_strerror_printf("Failed creating directory %s: %s", dir, fr_syserror(errno)); + close_parent: + close(parent_fd); + goto fail; + } + + fd = openat(parent_fd, dir, O_DIRECTORY); + if (fd < 0) { + fr_strerror_printf("Failed opening directory %s: %s", dir, fr_syserror(errno)); + goto close_parent; + } + + if (fchmod(fd, perm) < 0) { + fr_strerror_printf("Failed changing permission for directory %s: %s", dir, fr_syserror(errno)); + close_fd: + close(fd); + goto close_parent; + } + + /* + * This is a NOOP if we're chowning a file owned by ourselves to our own UID / GID. + * + * Otherwise if we're running as root, it will set ownership to the correct user. + */ + if (fchown(fd, cfg->uid, cfg->gid) < 0) { + fr_strerror_printf("Failed changing ownershipt for directory %s: %s", dir, fr_syserror(errno)); + goto close_fd; + } + + talloc_free(dir); + close(fd); + close(parent_fd); + + return 0; +} + +static int fr_bio_fd_unix_shutdown(fr_bio_t *bio) +{ + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + + /* + * The bio must be open in order to shut it down. + * + * Unix domain sockets are deleted when the bio is closed. + * + * Unix domain sockets are never in the "connecting" state, because connect() always returns + * immediately. + */ + fr_assert(my->info.state == FR_BIO_FD_STATE_OPEN); + + /* + * Run the user shutdown before we run ours. + */ + if (my->user_shutdown) { + if (my->user_shutdown(bio) < 0) return -1; + } + + return unlink(my->info.socket.unix.path); +} + +/** Bind to a Unix domain socket. + * + * @todo - this function only does a tiny bit of what fr_server_domain_socket_peercred() and + * fr_server_domain_socket_perm() do. Those functions do a lot more sanity checks. + * + * The main question is whether or not those checks are useful. In many cases, fchmod() and fchown() are not + * possible on Unix sockets, so we shouldn't bother doing them, + * + * Note that the listeners generally call these functions with wrappers of fr_suid_up() and fr_suid_down(). + * So these functions are running as "root", and will create files owned as "root". + */ +static int fr_bio_fd_socket_bind_unix(fr_bio_fd_t *my, fr_bio_fd_config_t const *cfg) +{ + int dirfd, rcode; + char const *filename, *p; + socklen_t sunlen; + struct sockaddr_un sun; + + p = strrchr(my->info.socket.unix.path, '/'); + + /* + * The UID and GID should be taken automatically from the "user" and "group" settings in + * mainconfig. There is no reason to set them to anything else. + */ + if (cfg->uid == (uid_t) -1) { + fr_strerror_printf("Failed opening domain socket %s: no UID specified", my->info.socket.unix.path); + return -1; + } + + if (cfg->gid == (gid_t) -1) { + fr_strerror_printf("Failed opening domain socket %s: no GID specified", my->info.socket.unix.path); + return -1; + } + + if (cfg->uid == 0) { + fr_strerror_printf("Failed opening domain socket %s: refusing to open as UID 0", my->info.socket.unix.path); + return -1; + } + + if (cfg->gid == 0) { + fr_strerror_printf("Failed opening domain socket %s: refusing to open as GID 0", my->info.socket.unix.path); + return -1; + } + + /* + * Opening 'foo.sock' is OK. + */ + if (!p) { + dirfd = AT_FDCWD; + filename = my->info.socket.unix.path; + + } else if (p == my->info.socket.unix.path) { + /* + * Opening '/foo.sock' is dumb. + */ + fr_strerror_printf("Failed opening domain socket %s: cannot exist at file system root", p); + return -1; + + } else if (fr_bio_fd_socket_unix_mkdir(&dirfd, &filename, cfg) < 0) { + return -1; + } + + /* + * Verify and/or clean up the domain socket. + */ + if (fr_bio_fd_socket_unix_verify(dirfd, filename, cfg) < 0) { + fail: + if (dirfd != AT_FDCWD) close(dirfd); + return -1; + } + +#ifdef HAVE_BINDAT + /* + * The best function to use here is bindat(), but only quite recent versions of FreeBSD actually + * have it, and it's definitely not POSIX. + * + * If we use bindat(), we pass a relative pathname. + */ + if (fr_filename_to_sockaddr(&sun, &sunlen, filename) < 0) goto fail; + + rcode = bindat(dirfd, my->info.socket.fd, (struct sockaddr *) &sun, sunlen); +#else + /* + * For bind(), we pass the full path. + */ + if (fr_filename_to_sockaddr(&sun, &sunlen, my->info.socket.unix.path) < 0) goto fail; + + rcode = bind(my->info.socket.fd, (struct sockaddr *) &sun, sunlen); +#endif + if (rcode < 0) { + /* + * @todo - if EADDRINUSE, then the socket exists. Try connect(), and if that fails, + * delete the socket and try again. This may be simpler than the checks above. + */ + fr_strerror_printf("Failed binding to domain socket %s: %s", my->info.socket.unix.path, fr_syserror(errno)); + goto fail; + } + +#ifdef __linux__ + /* + * Linux supports chown && chmod for sockets. + */ + if (fchmod(my->info.socket.fd, S_IREAD | S_IWRITE | S_IEXEC | S_IRGRP | S_IWGRP | S_IXGRP) < 0) { + fr_strerror_printf("Failed changing permission for domain socket %s: %s", my->info.socket.unix.path, fr_syserror(errno)); + goto fail; + } + + /* + * This is a NOOP if we're chowning a file owned by ourselves to our own UID / GID. + * + * Otherwise if we're running as root, it will set ownership to the correct user. + */ + if (fchown(my->info.socket.fd, cfg->uid, cfg->gid) < 0) { + fr_strerror_printf("Failed changing ownershipt for domain directory %s: %s", my->info.socket.unix.path, fr_syserror(errno)); + goto fail; + } + +#endif + + /* + * Socket is open. We need to clean it up on shutdown. + */ + if (my->cb.shutdown) my->user_shutdown = my->cb.shutdown; + my->cb.shutdown = fr_bio_fd_unix_shutdown; + + return 0; +} + +#ifdef SO_BINDTODEVICE +/** Linux bind to device by name. + * + */ +static int fr_bio_fd_socket_bind_to_device(fr_bio_fd_t *my, fr_bio_fd_config_t const *cfg) +{ + char *ifname; + char buffer[IFNAMSIZ]; + + /* + * ifindex isn't set, do nothing. + */ + if (!my->info.socket.inet.ifindex) return 0; + + /* + * The internet hints that CAP_NET_RAW is required to use SO_BINDTODEVICE. + * + * This function also sets fr_strerror() on failure, which will be seen if the bind fails. If + * the bind succeeds, then we don't really care that the capability change has failed. We must + * already have that capability. + */ +#ifdef HAVE_CAPABILITY_H + (void)fr_cap_enable(CAP_NET_RAW, CAP_EFFECTIVE); +#endif + + if (setsockopt(my->info.socket.fd, SOL_SOCKET, SO_BINDTODEVICE, cfg->interface, strlen(cfg->interface)) < 0) { + fr_strerror_printf("Failed setting SO_BINDTODEVICE for %s: %s", cfg->interface, fr_syserror(errno)); + return -1; + } + + return 0; +} + +#elif defined(IP_BOUND_IF) || defined(IPV6_BOUND_IF) +/** *BSD bind to interface by index. + * + */ +static int fr_bio_fd_socket_bind_to_device(fr_bio_fd_t *my, UNUSED fr_bio_fd_config_t const *cfg) +{ + int opt, rcode; + + if (!my->info.socket.inet.ifindex) return 0; + + opt = my->info.socket.inet.ifindex; + + switch (my->info.socket.af) { + case AF_UNIX: + rcode = setsockopt(my->info.socket.fd, IPPROTO_IP, IP_BOUND_IF, &opt, sizeof(opt)); + break; + + case AF_INET6: + rcode = setsockopt(my->info.socket.fd, IPPROTO_IPV6, IPV6_BOUND_IF, &opt, sizeof(opt)); + break; + + default: + rcode = -1; + errno = EAFNOSUPPORT; + break; + } + + fr_strerror_printf("Failed setting IP_BOUND_IF: %s", fr_syserror(errno)); + return rcode; +} +#else + +#error This system is missing SO_BINDTODEVICE, IP_BOUND_IF, IPV6_BOUND_IF + +/** ??? Who knows? + * + */ +static int fr_bio_fd_socket_bind_to_device(fr_bio_fd_t *my, fr_bio_fd_config_t const *cfg) +{ + /* + * @todo - see fr_socket_bind(). Troll through the interfaces to see which interface has a name + * which matches the named interface. If so, copy over it's IP to our src_ip, so long as src_ip + * is INADDR_ANY. + */ + + return -1; +} + +/* bind to device */ +#endif + +static int fr_bio_fd_socket_bind(fr_bio_fd_t *my, fr_bio_fd_config_t const *cfg) +{ + socklen_t salen; + struct sockaddr_storage salocal; + + if (my->info.socket.af == AF_UNIX) { + return fr_bio_fd_socket_bind_unix(my, cfg); + } + +#ifdef HAVE_CAPABILITY_H + /* + * If we're binding to a special port as non-root, then + * check capabilities. If we're root, we already have + * equivalent capabilities so we don't need to check. + */ + if ((my->info.socket.inet.src_port < 1024) && (geteuid() != 0)) { + (void)fr_cap_enable(CAP_NET_BIND_SERVICE, CAP_EFFECTIVE); + } +#endif + + if (fr_bio_fd_socket_bind_to_device(my, cfg) < 0) return -1; + + /* + * Bind to the IP + interface. + */ + if (fr_ipaddr_to_sockaddr(&salocal, &salen, &my->info.socket.inet.src_ipaddr, my->info.socket.inet.src_port) < 0) return -1; + + if (bind(my->info.socket.fd, (struct sockaddr *) &salocal, salen) < 0) { + fr_strerror_printf("Failed binding to socket: %s", fr_syserror(errno)); + return -1; + } + + /* + * FreeBSD jail issues. We bind to 0.0.0.0, but the + * kernel instead binds us to a 1.2.3.4. So once the + * socket is bound, ask it what it's IP address is. + */ + salen = sizeof(salocal); + memset(&salocal, 0, salen); + if (getsockname(my->info.socket.fd, (struct sockaddr *) &salocal, &salen) < 0) { + fr_strerror_printf("Failed getting socket name: %s", fr_syserror(errno)); + return -1; + } + + if (fr_ipaddr_from_sockaddr(&my->info.socket.inet.src_ipaddr, &my->info.socket.inet.src_port, &salocal, salen) < 0) return -1; + + return 0; +} + +/** Opens a socket and updates sock->fd + * + * Note that it does not call connect()! + */ +int fr_bio_fd_socket_open(fr_bio_t *bio, fr_bio_fd_config_t const *cfg) +{ + int fd, protocol; + int rcode; + fr_bio_fd_t *my = talloc_get_type_abort(bio, fr_bio_fd_t); + + fr_strerror_clear(); + + my->info.socket = (fr_socket_t) {}; + + if (cfg->path) { + my->info.socket.af = AF_UNIX; + } else { + my->info.socket.af = cfg->src_ipaddr.af; + } + my->info.socket.type = cfg->socket_type; + + switch (my->info.socket.af) { + case AF_INET: + case AF_INET6: + my->info.socket.inet.src_ipaddr = cfg->src_ipaddr; + my->info.socket.inet.dst_ipaddr = cfg->dst_ipaddr; + my->info.socket.inet.src_port = cfg->src_port; + my->info.socket.inet.dst_port = cfg->dst_port; + + if (cfg->socket_type == SOCK_STREAM) { + protocol = IPPROTO_TCP; + } else { + protocol = IPPROTO_UDP; + } + + if (cfg->interface) { + my->info.socket.inet.ifindex = if_nametoindex(cfg->interface); + + if (!my->info.socket.inet.ifindex) { + fr_strerror_printf_push("Failed finding interface %s: %s", cfg->interface, fr_syserror(errno)); + return -1; + } + } + break; + + case AF_UNIX: + my->info.socket.unix.path = cfg->path; + my->info.socket.type = SOCK_STREAM; + protocol = 0; + break; + + default: + fr_strerror_const("Failed opening socket: unsupported address family"); + return -1; + } + + /* + * Open the socket. + */ + fd = socket(my->info.socket.af, my->info.socket.type, protocol); + if (fd < 0) { + fr_strerror_printf("Failed opening socket: %s", fr_syserror(errno)); + return -1; + } + + /* + * Set it to be non-blocking if required. + */ + if (cfg->async && (fr_nonblock(fd) < 0)) { + fr_strerror_printf("Failed opening setting O_NONBLOCK: %s", fr_syserror(errno)); + + fail: + my->info.socket.fd = -1; + my->info.state = FR_BIO_FD_STATE_CLOSED; + close(fd); + return -1; + } + +#ifdef FD_CLOEXEC + /* + * We don't want child processes inheriting these file descriptors. + */ + rcode = fcntl(fd, F_GETFD); + if (rcode >= 0) { + if (fcntl(fd, F_SETFD, rcode | FD_CLOEXEC) < 0) { + fr_strerror_printf("Failed opening setting FD_CLOEXE: %s", fr_syserror(errno)); + goto fail; + } + } +#endif + + /* + * Initialize the bio information before calling the various setup functions. + */ + my->info.state = (cfg->type == FR_BIO_FD_CONNECTED) ? FR_BIO_FD_STATE_CONNECTING : FR_BIO_FD_STATE_OPEN; + + /* + * Set the FD so that the subsequent calls can use it. + */ + my->info.socket.fd = fd; + + /* + * Do sanity checks, bootstrap common socket options, bind to the socket, and initialize the read + * / write functions. + */ + switch (cfg->type) { + /* + * Unconnected UDP or datagram AF_UNUX server sockets. + */ + case FR_BIO_FD_UNCONNECTED: + if (my->info.socket.type != SOCK_DGRAM) { + fr_strerror_const("Failed configuring socket: unconnected sockets must be UDP"); + return -1; + } + + if (my->info.socket.af == AF_UNIX) { + rcode = fr_bio_fd_common_datagram(fd, &my->info.socket, cfg); + } else { + rcode = fr_bio_fd_server_udp(fd, &my->info.socket, cfg); /* sets SO_REUSEPORT, too */ + } + if (rcode < 0) goto fail; + + if (fr_bio_fd_socket_bind(my, cfg) < 0) goto fail; + + if (fr_bio_fd_init_common(my) < 0) goto fail; + break; + + /* + * A connected client: UDP, TCP, or AF_UNIX. + */ + case FR_BIO_FD_CONNECTED: + if (my->info.socket.type == SOCK_DGRAM) { + rcode = fr_bio_fd_common_datagram(fd, &my->info.socket, cfg); /* we don't use SO_REUSEPORT for clients */ + if (rcode < 0) goto fail; + + } else if (my->info.socket.af != AF_UNIX) { + rcode = fr_bio_fd_common_tcp(fd, &my->info.socket, cfg); + if (rcode < 0) goto fail; + } + + if (fr_bio_fd_socket_bind(my, cfg) < 0) goto fail; + + if (fr_bio_fd_init_connected(my) < 0) goto fail; + break; + + /* + * Server socket which listens for new stream connections + */ + case FR_BIO_FD_ACCEPT: + fr_assert(my->info.socket.type == SOCK_STREAM); + + switch (my->info.socket.af) { + case AF_INET: + rcode = fr_bio_fd_server_ipv4(fd, &my->info.socket, cfg); + break; + + case AF_INET6: + rcode = fr_bio_fd_server_ipv6(fd, &my->info.socket, cfg); + break; + + case AF_UNIX: + rcode = 0; + break; + + default: + rcode = -1; + errno = EAFNOSUPPORT; + break; + } + if (rcode < 0) goto fail; + + if (fr_bio_fd_socket_bind(my, cfg) < 0) goto fail; + + if (fr_bio_fd_init_accept(my) < 0) goto fail; + break; + } + return 0; +} diff --git a/src/lib/bio/fd_priv.h b/src/lib/bio/fd_priv.h new file mode 100644 index 00000000000..1097d8d1a18 --- /dev/null +++ b/src/lib/bio/fd_priv.h @@ -0,0 +1,59 @@ +#pragma once +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/fd_priv.h + * @brief Private binary IO abstractions for file descriptors + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ +RCSIDH(lib_bio_fd_privh, "$Id$") + +#include + +#include +#include + +/** Our FD bio structure. + * + */ +typedef struct fr_bio_fd_s { + FR_BIO_COMMON; + fr_bio_callback_t user_shutdown; //!< user shutdown + + fr_bio_fd_info_t info; + + int max_tries; //!< how many times we retry on EINTR + size_t offset; //!< where #fr_bio_fd_packet_ctx_t is stored + +#if defined(IP_PKTINFO) || defined(IP_RECVDSTADDR) || defined(IPV6_PKTINFO) + struct iovec iov; //!< for recvfromto + struct msghdr msgh; //!< for recvfromto + uint8_t cbuf[256]; //!< for recvfromto +#endif +} fr_bio_fd_t; + +#define fr_bio_fd_packet_ctx(_my, _packet_ctx) ((fr_bio_fd_packet_ctx_t *) (((uint8_t *) _packet_ctx) + _my->offset)) + +int fr_filename_to_sockaddr(struct sockaddr_un *sun, socklen_t *sunlen, char const *filename) CC_HINT(nonnull); + +int fr_bio_fd_init_common(fr_bio_fd_t *my); + +int fr_bio_fd_init_connected(fr_bio_fd_t *my); + +int fr_bio_fd_init_accept(fr_bio_fd_t *my); diff --git a/src/lib/bio/haproxy.c b/src/lib/bio/haproxy.c new file mode 100644 index 00000000000..d5583fb4e39 --- /dev/null +++ b/src/lib/bio/haproxy.c @@ -0,0 +1,253 @@ +/* + * This program is is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/haproxy.c + * @brief BIO abstractions for HA proxy protocol interceptors + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ + +#include +#include +#include + +#include + +#define HAPROXY_HEADER_V1_SIZE (108) + +/** The haproxy bio + * + */ +typedef struct { + FR_BIO_COMMON; + + fr_bio_haproxy_info_t info; //!< Information about the "real" client which has connected. + // @todo - for v2 of the haproxy protocol, add TLS parameters! + + fr_bio_buf_t buffer; //!< intermediate buffer to read the haproxy header + + bool available; //!< is the haxproxy header available and done +} fr_bio_haproxy_t; + +/** Parse the haproxy header, version 1. + * + */ +static ssize_t fr_bio_haproxy_v1(fr_bio_haproxy_t *my) +{ + int af, argc, port; + ssize_t rcode; + uint8_t *p, *end; + char *eos, *argv[5]; + + p = my->buffer.read; + end = my->buffer.write; + + /* + * We only support v1, and only TCP. + */ + if (memcmp(my->buffer.read, "PROXY TCP", 9) != 0) { + fail: + fr_bio_shutdown(&my->bio); + return fr_bio_error(VERIFY); + } + p += 9; + + if (*p == '4') { + af = AF_INET; + + } else if (*p == '6') { + af = AF_INET6; + + } else { + goto fail; + } + p++; + + if (*(p++) != ' ') goto fail; + + argc = 0; + rcode = -1; + while (p < end) { + if (*p > ' ') { + if (argc > 4) goto fail; + + argv[argc++] = (char *) p; + + while ((*p > ' ') && (p < end)) p++; + continue; + } + + if (*p < ' ') { + if ((end - p) < 3) goto fail; + + if (memcmp(p, "\r\n", 3) != 0) goto fail; + + *p = '\0'; + end = p + 3; + rcode = 0; + break; + } + + if (*p != ' ') goto fail; + + *(p++) = '\0'; + } + + /* + * Didn't end with CRLF and zero. + */ + if (rcode < 0) goto fail; + + if (fr_inet_pton(&my->info.socket.inet.src_ipaddr, argv[0], -1, af, false, false) < 0) goto fail; + if (fr_inet_pton(&my->info.socket.inet.dst_ipaddr, argv[1], -1, af, false, false) < 0) goto fail; + + port = strtoul(argv[2], &eos, 10); + if (port > 65535) goto fail; + if (*eos) goto fail; + my->info.socket.inet.src_port = port; + + port = strtoul(argv[3], &eos, 10); + if (port > 65535) goto fail; + if (*eos) goto fail; + my->info.socket.inet.dst_port = port; + + /* + * Return how many bytes we read. The remainder are for the application. + */ + return (end - my->buffer.read); +} + +/** Satisfy reads from the "next" bio + * + * The caveat is that there may be data left in our buffer which is needed for the application. We can't + * unchain ourselves until we've returned that data to the application, and emptied our buffer. + */ +static ssize_t fr_bio_haproxy_read_next(fr_bio_t *bio, UNUSED void *packet_ctx, void *buffer, size_t size) +{ + ssize_t rcode; + size_t used; + fr_bio_haproxy_t *my = talloc_get_type_abort(bio, fr_bio_haproxy_t); + + my->available = true; + + used = fr_bio_buf_used(&my->buffer); + + /* + * Somehow (magically) we can satisy the read from our buffer. Do so. Note that we do NOT run + * the activation callback, as there is still data in our buffer + */ + if (size < used) { + (void) fr_bio_buf_read(&my->buffer, buffer, size); + return size; + } + + /* + * We are asked to empty the buffer. Copy the data to the caller. + */ + (void) fr_bio_buf_read(&my->buffer, buffer, used); + + /* + * Call the users activation function, which might remove us from the proxy chain. + */ + if (my->cb.activate) { + rcode = my->cb.activate(bio); + if (rcode < 0) return rcode; + } + + return used; +} + +/** Read from the next bio, and determine if we have an haproxy header. + * + */ +static ssize_t fr_bio_haproxy_read(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size) +{ + ssize_t rcode; + fr_bio_haproxy_t *my = talloc_get_type_abort(bio, fr_bio_haproxy_t); + fr_bio_t *next; + + next = fr_bio_next(&my->bio); + fr_assert(next != NULL); + + fr_assert(fr_bio_buf_write_room(&my->buffer) > 0); + + rcode = next->read(next, NULL, my->buffer.read, fr_bio_buf_write_room(&my->buffer)); + if (rcode <= 0) return rcode; + + /* + * Not enough room for a full v1 header, tell the caller + * that no data was read. The caller should call us + * again when the underlying FD is readable. + */ + if (fr_bio_buf_used(&my->buffer) < 16) return 0; + + /* + * Process haproxy protocol v1 header. + */ + rcode = fr_bio_haproxy_v1(my); + if (rcode <= 0) return rcode; + + /* + * We've read a number of bytes from our buffer. The remaining ones are for the application. + */ + (void) fr_bio_buf_read(&my->buffer, NULL, rcode); + my->bio.read = fr_bio_haproxy_read_next; + + return fr_bio_haproxy_read_next(bio, packet_ctx, buffer, size); +} + +/** Allocate an haproxy bio. + * + */ +fr_bio_t *fr_bio_haproxy_alloc(TALLOC_CTX *ctx, fr_bio_cb_funcs_t *cb, fr_bio_t *next) +{ + fr_bio_haproxy_t *my; + uint8_t *data; + + my = talloc_zero(ctx, fr_bio_haproxy_t); + if (!my) return NULL; + + data = talloc_array(my, uint8_t, HAPROXY_HEADER_V1_SIZE); + if (!data) { + talloc_free(my); + return NULL; + } + + fr_bio_buf_init(&my->buffer, data, HAPROXY_HEADER_V1_SIZE); + + my->bio.read = fr_bio_haproxy_read; + my->bio.write = fr_bio_null_write; /* can't write to this bio */ + my->cb = *cb; + + fr_bio_chain(&my->bio, next); + + talloc_set_destructor((fr_bio_t *) my, fr_bio_destructor); + return (fr_bio_t *) my; +} + +/** Get client information from the haproxy bio. + * + */ +fr_bio_haproxy_info_t const *fr_bio_haproxy_info(fr_bio_t *bio) +{ + fr_bio_haproxy_t *my = talloc_get_type_abort(bio, fr_bio_haproxy_t); + + if (!my->available) return NULL; + + return &my->info; +} diff --git a/src/lib/bio/haproxy.h b/src/lib/bio/haproxy.h new file mode 100644 index 00000000000..aab42738db7 --- /dev/null +++ b/src/lib/bio/haproxy.h @@ -0,0 +1,47 @@ +#pragma once +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/haproxy.h + * @brief Binary IO abstractions for HA proxy protocol interceptors + * + * The haproxy bio should be inserted before an FD bio. The caller + * can then read from it until the "activation" function is called. + * The activate callback should unchain the haproxy bio, and add the + * real top-level bio. Or, just use the FD bio as-is. + * + * This process means that the caller should manually cache pointers + * to the individual bios, so that they can be tracked and queried as + * necessary. + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ +RCSIDH(lib_bio_fd_h, "$Id$") + +#include + +/** Data structure which describes the "real" client connection. + * + */ +typedef struct { + fr_socket_t socket; +} fr_bio_haproxy_info_t; + +fr_bio_t *fr_bio_haproxy_alloc(TALLOC_CTX *ctx, fr_bio_cb_funcs_t *cb, fr_bio_t *next) CC_HINT(nonnull); + +fr_bio_haproxy_info_t const *fr_bio_haproxy_info(fr_bio_t *bio) CC_HINT(nonnull); diff --git a/src/lib/bio/libfreeradius-bio.mk b/src/lib/bio/libfreeradius-bio.mk new file mode 100644 index 00000000000..7e4473abef4 --- /dev/null +++ b/src/lib/bio/libfreeradius-bio.mk @@ -0,0 +1,15 @@ +TARGET := libfreeradius-bio$(L) + +SOURCES := \ + base.c \ + buf.c \ + fd.c \ + fd_open.c \ + haproxy.c \ + mem.c \ + network.c \ + null.c \ + packet.c \ + pipe.c + +TGT_PREREQS := libfreeradius-util$(L) diff --git a/src/lib/bio/mem.c b/src/lib/bio/mem.c new file mode 100644 index 00000000000..e30b5f9bca0 --- /dev/null +++ b/src/lib/bio/mem.c @@ -0,0 +1,727 @@ +/* + * This program is is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/mem.c + * @brief BIO abstractions for memory buffers + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ + +#include +#include +#include + +#include + +/** The memory buffer bio + * + * It is used to buffer reads / writes to a streaming socket. + */ +typedef struct fr_bio_mem_s { + FR_BIO_COMMON; + + fr_bio_verify_t verify; //!< verify data to see if we have a packet. + + fr_bio_buf_t read_buffer; //!< buffering for reads + fr_bio_buf_t write_buffer; //!< buffering for writes +} fr_bio_mem_t; + +static ssize_t fr_bio_mem_write_buffer(fr_bio_t *bio, void *packet_ctx, void const *buffer, size_t size); + +static int fr_bio_mem_verify_packet(fr_bio_t *bio, void *packet_ctx, size_t *size) CC_HINT(nonnull(1,3)); + +/** At EOF, read data from the buffer until it is empty. + * + * When "next" bio returns EOF, there may still be pending data in the memory buffer. Return that until it's + * empty, and then EOF from then on. + */ +static ssize_t fr_bio_mem_read_eof(fr_bio_t *bio, UNUSED void *packet_ctx, void *buffer, size_t size) +{ + fr_bio_mem_t *my = talloc_get_type_abort(bio, fr_bio_mem_t); + + /* + * No more data: return EOF from now on. + */ + if (fr_bio_buf_used(&my->read_buffer) == 0) { + my->bio.read = fr_bio_eof_read; + return fr_bio_error(EOF); + } + + /* + * Return whatever data we have available. One the buffer is empty, the next read will get EOF. + */ + return fr_bio_buf_read(&my->read_buffer, buffer, size); +} + +/** Read from a memory BIO + * + * This bio reads as much data as possible into the memory buffer. On the theory that a few memcpy() or + * memmove() calls are much cheaper than a system call. + * + * If the read buffer has enough data to satisfy the read, then it is returned. + * + * Otherwise the next bio is called to re-fill the buffer. The next read call will try to get as much data + * as possible into the buffer, even if that results in reading more than "size" bytes. + * + * Once the next read has been done, then the data from the buffer is returned, even if it is less than + * "size". + */ +static ssize_t fr_bio_mem_read(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size) +{ + ssize_t rcode; + size_t used, room; + uint8_t *p; + fr_bio_mem_t *my = talloc_get_type_abort(bio, fr_bio_mem_t); + fr_bio_t *next; + + /* + * We can satisfy the read from the memory buffer: do so. + */ + used = fr_bio_buf_used(&my->read_buffer); + if (size <= used) { + return fr_bio_buf_read(&my->read_buffer, buffer, size); + } + + /* + * There must be a next bio. + */ + next = fr_bio_next(&my->bio); + fr_assert(next != NULL); + + /* + * If there's no room to store more data in the buffer. Just return whatever data we have in the + * buffer. + */ + room = fr_bio_buf_write_room(&my->read_buffer); + if (!room) return fr_bio_buf_read(&my->read_buffer, buffer, size); + + /* + * We try to fill the buffer as much as possible from the network, even if that means reading + * more than "size" amount of data. + */ + p = fr_bio_buf_write_reserve(&my->read_buffer, room); + fr_assert(p != NULL); /* otherwise room would be zero */ + + rcode = next->read(next, packet_ctx, p, room); + + /* + * Ensure that whatever data we have read is marked as "used" in the buffer, and then return + * whatever data is available back to the caller. + */ + if (rcode >= 0) { + if (rcode > 0) (void) fr_bio_buf_write_alloc(&my->read_buffer, (size_t) rcode); + + return fr_bio_buf_read(&my->read_buffer, buffer, size); + } + + /* + * The next bio returned an error. Whatever it is, it's fatal. We can read from the memory + * buffer until it's empty, but we can no longer write to the memory buffer. Any data written to + * the buffer is lost. + */ + bio->read = fr_bio_mem_read_eof; + bio->write = fr_bio_null_write; + return rcode; +} + +/** Return data only if we have a complete packet. + * + */ +static ssize_t fr_bio_mem_read_packet(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size) +{ + ssize_t rcode; + size_t used, room, want; + uint8_t *p; + fr_bio_mem_t *my = talloc_get_type_abort(bio, fr_bio_mem_t); + fr_bio_t *next; + + /* + * We may be able to satisfy the read from the memory buffer. + */ + used = fr_bio_buf_used(&my->read_buffer); + if (used) { + /* + * See if there are valid packets in the buffer. + */ + rcode = fr_bio_mem_verify_packet(bio, packet_ctx, &want); + if (rcode < 0) { + rcode = fr_bio_error(VERIFY); + goto fail; + } + + /* + * There's at least one valid packet, return it. + */ + if (rcode == 1) { + /* + * This isn't a fatal error. The caller should check how much room is needed by calling + * fr_bio_mem_verify_packet(), and retry. + * + * But in general, the caller should make sure that the output buffer has enough + * room for at least one packet. The verify() function should also ensure that + * the packet is no larger than our application maximum, even if the protocol + * allows for it to be larger. + */ + if (want > size) return fr_bio_error(BUFFER_TOO_SMALL); + + return fr_bio_buf_read(&my->read_buffer, buffer, want); + } + + /* + * Else we need to read more data to have a complete packet. + */ + } + + /* + * There must be a next bio. + */ + next = fr_bio_next(&my->bio); + fr_assert(next != NULL); + + /* + * If there's no room to store more data in the buffer, try to make some room. + */ + room = fr_bio_buf_write_room(&my->read_buffer); + if (!room) { + room = fr_bio_buf_make_room(&my->read_buffer); + + /* + * We've tried to make room and failed. Which means that the buffer is full, AND there + * still isn't a compelte packet in the buffer. This is therefore a fatal error. The + * application has not supplied us with enough read_buffer space to store a complete + * packet. + */ + if (!room) { + rcode = fr_bio_error(BUFFER_FULL); + goto fail; + } + } + + /* + * We try to fill the buffer as much as possible from the network. The theory is that a few + * extra memcpy() or memmove()s are cheaper than a system call for reading each packet. + */ + p = fr_bio_buf_write_reserve(&my->read_buffer, room); + fr_assert(p != NULL); /* otherwise room would be zero */ + + rcode = next->read(next, packet_ctx, p, room); + + /* + * The next bio returned some data. See if it's a valid packet. + */ + if (rcode > 0) { + (void) fr_bio_buf_write_alloc(&my->read_buffer, (size_t) rcode); + + want = fr_bio_buf_used(&my->read_buffer); + if (size <= want) want = size; + + /* + * See if there are valid packets in the buffer. + */ + rcode = fr_bio_mem_verify_packet(bio, packet_ctx, &want); + if (rcode < 0) { + rcode = fr_bio_error(VERIFY); + goto fail; + } + + /* + * There's at least one valid packet, return it. + */ + if (rcode == 1) return fr_bio_buf_read(&my->read_buffer, buffer, want); + + /* + * No valid packets. The next call to read will call verify again, which will return a + * partial packet. And then it will try to fill the buffer from the next bio. + */ + return 0; + } + + /* + * No data was read from the next bio, we still don't have a packet. Return nothing. + */ + if (rcode == 0) return 0; + + /* + * The next bio returned an error. Whatever it is, it's fatal. We can read from the memory + * buffer until it's empty, but we can no longer write to the memory buffer. Any data written to + * the buffer is lost. + */ +fail: + bio->read = fr_bio_mem_read_eof; + bio->write = fr_bio_null_write; + return rcode; +} + +/** Pass writes to the next BIO + * + * For speed, we try to bypass the memory buffer and write directly to the next bio. However, if the next + * bio returns EWOULDBLOCK, we write the data to the memory buffer, even if it is partial data. + */ +static ssize_t fr_bio_mem_write_next(fr_bio_t *bio, void *packet_ctx, void const *buffer, size_t size) +{ + ssize_t rcode; + size_t room, leftover; + fr_bio_mem_t *my = talloc_get_type_abort(bio, fr_bio_mem_t); + fr_bio_t *next; + + /* + * We can't call the next bio if there's still cached data to flush. + * + * There must be a next bio. + */ + fr_assert(fr_bio_buf_used(&my->write_buffer) == 0); + + next = fr_bio_next(&my->bio); + fr_assert(next != NULL); + + /* + * The next bio may write all of the data. If so, we return that, + */ + rcode = next->write(next, packet_ctx, buffer, size); + if ((size_t) rcode == size) return rcode; + + /* + * The next bio returned an error. Anything other than WOULD BLOCK is fatal. We can read from + * the memory buffer until it's empty, but we can no longer write to the memory buffer. + */ + if ((rcode < 0) && (rcode != fr_bio_error(IO_WOULD_BLOCK))) { + bio->read = fr_bio_mem_read_eof; + bio->write = fr_bio_null_write; + return rcode; + } + + /* + * We were flushing the buffer, return however much data we managed to write. + * + * Note that flushes can never block. + */ + if (!buffer) { + fr_assert(rcode != fr_bio_error(IO_WOULD_BLOCK)); + return rcode; + } + + /* + * We had WOULD BLOCK, or wrote partial bytes. Save the data to the memory buffer, and ensure + * that future writes are ordered. i.e. they write to the memory buffer before writing to the + * next bio. + */ + bio->write = fr_bio_mem_write_buffer; + + /* + * Clamp the write to however much data is available in the buffer. + */ + leftover = size - rcode; + room = fr_bio_buf_write_room(&my->write_buffer); + + /* + * If we have "used == 0" above, then we must also have "room > 0". + */ + fr_assert(room > 0); + + if (room < leftover) leftover = room; + + /* + * Since we've clamped the write, this call can never fail. + */ + (void) fr_bio_buf_write(&my->write_buffer, ((uint8_t const *) buffer) + rcode, leftover); + + /* + * Some of the data base been written to the next bio, and some to our cache. The caller has to + * ensure that the first subsequent write will send over the rest of the data. + */ + return rcode + leftover; +} + +/** Flush the memory buffer. + * + */ +static ssize_t fr_bio_mem_write_flush(fr_bio_mem_t *my, size_t size) +{ + int rcode; + size_t used; + fr_bio_t *next; + + /* + * Nothing to flush, don't do any writes. + * + * Instead, set the write function to write next, where data will be sent directly to the next + * bio, and will bypass the write buffer. + */ + used = fr_bio_buf_used(&my->write_buffer); + if (!used) { + my->bio.write = fr_bio_mem_write_next; + return 0; + } + + next = fr_bio_next(&my->bio); + fr_assert(next != NULL); + + /* + * Clamp the amount of data written. If the caller wants to write everything, it should + * pass SIZE_MAX. + */ + if (used < size) used = size; + + /* + * Flush the buffer to the next bio in line. That function will write as much data as possible, + * but may return a partial write. + */ + rcode = next->write(next, NULL, my->write_buffer.write, used); + + /* + * The next bio returned an error. Anything other than WOULD BLOCK is fatal. We can read from + * the memory buffer until it's empty, but we can no longer write to the memory buffer. + */ + if ((rcode < 0) && (rcode != fr_bio_error(IO_WOULD_BLOCK))) { + my->bio.read = fr_bio_mem_read_eof; + my->bio.write = fr_bio_null_write; + return rcode; + } + + /* + * We didn't write anything, return that. + */ + if ((rcode == 0) || (rcode == fr_bio_error(IO_WOULD_BLOCK))) return rcode; + + /* + * Tell the buffer that we've read a certain amount of data from it. + */ + (void) fr_bio_buf_read(&my->write_buffer, NULL, (size_t) rcode); + + /* + * We haven't emptied the buffer, return the partial write. + */ + if ((size_t) rcode < used) return rcode; + + /* + * We've flushed all of the buffer. Revert back to "pass through" writing. + */ + fr_assert(fr_bio_buf_used(&my->write_buffer) == 0); + my->bio.write = fr_bio_mem_write_next; + return rcode; +} + +/** Write to the memory buffer. + * + * The special buffer pointer of NULL means flush(). On flush, we call next->read(), and if that succeeds, + * go back to "pass through" mode for the buffers. + */ +static ssize_t fr_bio_mem_write_buffer(fr_bio_t *bio, UNUSED void *packet_ctx, void const *buffer, size_t size) +{ + size_t room; + fr_bio_mem_t *my = talloc_get_type_abort(bio, fr_bio_mem_t); + + /* + * Flush the output buffer. + */ + if (unlikely(!buffer)) return fr_bio_mem_write_flush(my, size); + + /* + * Clamp the write to however much data is available in the buffer. + */ + room = fr_bio_buf_write_room(&my->write_buffer); + + /* + * The buffer is full. We're now blocked. + */ + if (!room) return fr_bio_error(IO_WOULD_BLOCK); + + if (room < size) size = room; + + /* + * As we have clamped the write, we know that this call must succeed. + */ + return fr_bio_buf_write(&my->write_buffer, buffer, size); +} + +/** Peek at the data in the read buffer + * + * Peeking at the data allows us to avoid many memory copies. + */ +uint8_t const *fr_bio_mem_read_peek(fr_bio_t *bio, size_t *size) +{ + size_t used; + fr_bio_mem_t *my = talloc_get_type_abort(bio, fr_bio_mem_t); + + used = fr_bio_buf_used(&my->read_buffer); + + if (!used) return NULL; + + *size = used; + return my->read_buffer.read; +} + +/** Discard data from the read buffer. + * + * Discarding allows the caller to silently omit packets, so that + * they are not passed up to previous bios. + */ +void fr_bio_mem_read_discard(fr_bio_t *bio, size_t size) +{ + fr_bio_mem_t *my = talloc_get_type_abort(bio, fr_bio_mem_t); + + (void) fr_bio_buf_read(&my->read_buffer, NULL, size); +} + +/** Verify that a packet is OK. + * + * @todo - have this as a parameter to the read routines, so that they only return complete packets? + * + * @param bio the #fr_bio_mem_t + * @param packet_ctx the packet ctx + * @param[out] size how big the verified packet is + * @return + * - <0 on error, the caller should close the bio. + * - 0 for "we have a partial packet", the size to read is in *size + * - 1 for "we have at least one good packet", the size of it is in *size + */ +static int fr_bio_mem_verify_packet(fr_bio_t *bio, void *packet_ctx, size_t *size) +{ + uint8_t *packet, *end; + fr_bio_mem_t *my = talloc_get_type_abort(bio, fr_bio_mem_t); + + packet = my->read_buffer.read; + end = my->read_buffer.write; + + while (packet < end) { + size_t want; +#ifndef NDEBUG + size_t used; + + used = end - packet; +#endif + + want = end - packet; + + switch (my->verify((fr_bio_t *) my, packet_ctx, packet, &want)) { + /* + * The data in the buffer is exactly a packet. Return that. + * + * @todo - if there are multiple packets, return the total size of packets? + */ + case FR_BIO_VERIFY_OK: + fr_assert(want <= used); + *size = want; + return 1; + + /* + * The packet needs more data. Return how much data we need for one packet. + */ + case FR_BIO_VERIFY_WANT_MORE: + fr_assert(want > used); + *size = want; + return 0; + + case FR_BIO_VERIFY_DISCARD: + /* + * We don't call fr_bio_buf_read(), because that will move the memory around, and + * we want to avoid that if at all possible. + */ + fr_assert(want <= used); + fr_assert(packet == my->read_buffer.read); + my->read_buffer.read += want; + continue; + + /* + * Some kind of fatal validation error. + */ + case FR_BIO_VERIFY_ERROR_CLOSE: + break; + } + } + + return -1; +} + +/** Allocate a memory buffer bio for either reading or writing. + */ +static bool fr_bio_mem_buf_alloc(fr_bio_mem_t *my, fr_bio_buf_t *buf, size_t size) +{ + uint8_t *data; + + if (size < 1024) size = 1024; + if (size > (1 << 20)) size = 1 << 20; + + data = talloc_array(my, uint8_t, size); + if (!data) { + talloc_free(my); + return false; + } + + fr_bio_buf_init(buf, data, size); + return true; +} + +/** Allocate a memory buffer bio + * + * The "read buffer" will cache reads from the next bio in the chain. If the next bio returns more data than + * the caller asked for, the extra data is cached in the read buffer. + * + * The "write buffer" will buffer writes to the next bio in the chain. If the caller writes more data than + * the next bio can process, the extra data is cached in the write buffer. + * + * When the bio is closed (or freed) any pending data in the buffers is lost. The same happens if the next + * bio returns a fatal error. + * + * At some point during a read, the next bio may return EOF. When that happens, the caller should not rely + * on the next FD being readable or writable. Instead, it should keep reading from the memory bio until it + * returns EOF. See fr_bio_fd_eof() for details. + * + * @param ctx the talloc ctx + * @param read_size size of the read buffer. Must be 1024..1^20 + * @param write_size size of the write buffer. Must be 1024..1^20 + * @param next the next bio which will perform the underlying reads and writes. + * - NULL on error, memory allocation failed + * - !NULL the bio + */ +fr_bio_t *fr_bio_mem_alloc(TALLOC_CTX *ctx, size_t read_size, size_t write_size, fr_bio_t *next) +{ + fr_bio_mem_t *my; + + my = talloc_zero(ctx, fr_bio_mem_t); + if (!my) return NULL; + + /* + * The caller has to state that the API is caching data both ways. + */ + if (!read_size || !write_size) return NULL; + + if (!fr_bio_mem_buf_alloc(my, &my->read_buffer, read_size)) return NULL; + if (!fr_bio_mem_buf_alloc(my, &my->write_buffer, write_size)) return NULL; + + my->bio.read = fr_bio_mem_read; + my->bio.write = fr_bio_mem_write_next; + + fr_bio_chain(&my->bio, next); + + talloc_set_destructor((fr_bio_t *) my, fr_bio_destructor); + return (fr_bio_t *) my; +} + +/** Only return verified packets. + * + * Like fr_bio_mem_alloc(), but only returns packets. + * + * Writes pass straight through to the next bio. + */ +fr_bio_t *fr_bio_mem_packet_alloc(TALLOC_CTX *ctx, size_t read_size, fr_bio_t *next, + fr_bio_verify_t verify, void *uctx) +{ + fr_bio_mem_t *my; + + my = (fr_bio_mem_t *) fr_bio_mem_sink_alloc(ctx, read_size); + if (!my) return NULL; + + my->verify = verify; + my->bio.read = fr_bio_mem_read_packet; + my->bio.write = fr_bio_next_write; + + fr_bio_chain(&my->bio, next); + + return (fr_bio_t *) my; +} + +/** Allocate a memory buffer which sources data from the callers application into the bio system. + * + * The caller writes data to the buffer, but never reads from it. This bio will call the "next" bio to sink + * the data. + */ +fr_bio_t *fr_bio_mem_source_alloc(TALLOC_CTX *ctx, size_t write_size, fr_bio_t *next) +{ + fr_bio_mem_t *my; + + my = talloc_zero(ctx, fr_bio_mem_t); + if (!my) return NULL; + + /* + * The caller has to state that the API is caching data. + */ + if (!write_size) return NULL; + + if (!fr_bio_mem_buf_alloc(my, &my->write_buffer, write_size)) return NULL; + + my->bio.read = fr_bio_null_read; /* reading FROM this bio is not possible */ + my->bio.write = fr_bio_mem_write_next; + + fr_bio_chain(&my->bio, next); + + talloc_set_destructor((fr_bio_t *) my, fr_bio_destructor); + return (fr_bio_t *) my; +} + +/** Read from a buffer which a previous bio has filled. + * + * This function is called by the application which wants to read from a sink. + */ +static ssize_t fr_bio_mem_read_buffer(fr_bio_t *bio, UNUSED void *packet_ctx, void *buffer, size_t size) +{ + fr_bio_mem_t *my = talloc_get_type_abort(bio, fr_bio_mem_t); + + return fr_bio_buf_read(&my->read_buffer, buffer, size); +} + +/** Write to the read buffer. + * + * This function is called by an upstream function which writes into our local buffer. + */ +static ssize_t fr_bio_mem_write_read_buffer(fr_bio_t *bio, UNUSED void *packet_ctx, void const *buffer, size_t size) +{ + size_t room; + fr_bio_mem_t *my = talloc_get_type_abort(bio, fr_bio_mem_t); + + /* + * Clamp the write to however much data is available in the buffer. + */ + room = fr_bio_buf_write_room(&my->read_buffer); + + /* + * The buffer is full. We're now blocked. + */ + if (!room) return fr_bio_error(IO_WOULD_BLOCK); + + if (room < size) size = room; + + /* + * As we have clamped the write, we know that this call must succeed. + */ + return fr_bio_buf_write(&my->read_buffer, buffer, size); +} + +/** Allocate a memory buffer which sinks data from a bio system into the callers application. + * + * The caller reads data from this bio, but never writes to it. Upstream bios will source the data. + */ +fr_bio_t *fr_bio_mem_sink_alloc(TALLOC_CTX *ctx, size_t read_size) +{ + fr_bio_mem_t *my; + + my = talloc_zero(ctx, fr_bio_mem_t); + if (!my) return NULL; + + /* + * The caller has to state that the API is caching data. + */ + if (!read_size) return NULL; + + if (!fr_bio_mem_buf_alloc(my, &my->read_buffer, read_size)) return NULL; + my->bio.read = fr_bio_mem_read_buffer; + my->bio.write = fr_bio_mem_write_read_buffer; /* the upstream will write to our read buffer */ + + talloc_set_destructor((fr_bio_t *) my, fr_bio_destructor); + return (fr_bio_t *) my; +} diff --git a/src/lib/bio/mem.h b/src/lib/bio/mem.h new file mode 100644 index 00000000000..338d9c03474 --- /dev/null +++ b/src/lib/bio/mem.h @@ -0,0 +1,63 @@ +#pragma once +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/mem.h + * @brief Binary IO abstractions for memory buffers + * + * Allow reads and writes from memory buffers + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ +RCSIDH(lib_bio_mem_h, "$Id$") + +/** Status returned by the verification callback. + * + */ +typedef enum { + FR_BIO_VERIFY_OK = 0, //!< packet is OK + FR_BIO_VERIFY_DISCARD, //!< the packet should be discarded + FR_BIO_VERIFY_WANT_MORE, //!< not enough data for one packet + FR_BIO_VERIFY_ERROR_CLOSE, //!< fatal error, the bio should be closed. +} fr_bio_verify_action_t; + +/** Verifies the packet + * + * If the packet is a dup, then this function can return DISCARD, or + * update the packet_ctx to say "dup", and then return OK. + * + * @param bio the bio to read + * @param packet_ctx as passed in to fr_bio_read() + * @param buffer pointer to the raw data + * @param[in,out] size in: size of data in the buffer. out: size of the packet to return, or data to discard. + * @return action to take + */ +typedef fr_bio_verify_action_t (*fr_bio_verify_t)(fr_bio_t *bio, void *packet_ctx, const void *buffer, size_t *size); + +fr_bio_t *fr_bio_mem_alloc(TALLOC_CTX *ctx, size_t read_size, size_t write_size, fr_bio_t *next) CC_HINT(nonnull); + +fr_bio_t *fr_bio_mem_packet_alloc(TALLOC_CTX *ctx, size_t read_size, fr_bio_t *next, + fr_bio_verify_t verify, void *uctx) CC_HINT(nonnull(1,3,4)); + +fr_bio_t *fr_bio_mem_source_alloc(TALLOC_CTX *ctx, size_t buffer_size, fr_bio_t *next) CC_HINT(nonnull); + +fr_bio_t *fr_bio_mem_sink_alloc(TALLOC_CTX *ctx, size_t buffer_size) CC_HINT(nonnull); + +uint8_t const *fr_bio_mem_read_peek(fr_bio_t *bio, size_t *size) CC_HINT(nonnull); + +void fr_bio_mem_read_discard(fr_bio_t *bio, size_t size) CC_HINT(nonnull); diff --git a/src/lib/bio/network.c b/src/lib/bio/network.c new file mode 100644 index 00000000000..faa6df28ea2 --- /dev/null +++ b/src/lib/bio/network.c @@ -0,0 +1,281 @@ +/* + * This program is is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/network.c + * @brief BIO patricia trie filtering handlers + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ + +#include +#include + +#include +#include + +#include + +/** The network filtering bio + */ +typedef struct { + FR_BIO_COMMON; + + fr_bio_read_t discard; //!< callback to run when discarding a packet due to filtering + + size_t offset; //!< where #fr_bio_fd_packet_ctx_t is stored + + fr_trie_t const *trie; //!< patricia trie for filtering +} fr_bio_network_t; + +/** Read a UDP packet, and only return packets from allowed sources. + * + */ +static ssize_t fr_bio_network_read(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size) +{ + ssize_t rcode; + bool *value; + fr_bio_network_t *my = talloc_get_type_abort(bio, fr_bio_network_t); + fr_bio_fd_packet_ctx_t *addr; + fr_bio_t *next; + + next = fr_bio_next(&my->bio); + fr_assert(next != NULL); + + rcode = next->read(next, packet_ctx, buffer, size); + if (rcode <= 0) return rcode; + + if (!packet_ctx) return rcode; + + addr = fr_bio_fd_packet_ctx(my, packet_ctx); + + /* + * Look up this particular source. If it's not found, then we suppress this packet. + */ + value = fr_trie_lookup_by_key(my->trie, + &addr->socket.inet.src_ipaddr.addr, addr->socket.inet.src_ipaddr.prefix); + if (value != FR_BIO_NETWORK_ALLOW) { + if (my->discard) return my->discard(bio, packet_ctx, buffer, rcode); + return 0; + } + + return rcode; +} + + +/** Allocate a bio for filtering IP addresses + * + * This is used for unconnected UDP bios, where we filter packets based on source IP address. + * + * It is also used for accept bios, where we filter new connections based on source IP address. The caller + * should chain this bio to the next FD bio, and then fr_bio_read() from the top-level bio. The result will + * be filtered or "clean" FDs. + * + * A patricia trie (but not the bio) could also be used in an haproxy "activate" callback, where the callback + * gets the haproxy socket info, and then checks if the source is allowed. However, that patricia trie is a + * property of the main "accept" bio, and should be managed by the activate() callback for the haproxy bio. + */ +fr_bio_t *fr_bio_network_alloc(TALLOC_CTX *ctx, fr_ipaddr_t const *allow, fr_ipaddr_t const *deny, + fr_bio_read_t discard, fr_bio_t *next) +{ + fr_bio_network_t *my; + fr_bio_t *fd; + fr_bio_fd_info_t const *info; + + /* + * We are only useable for FD bios. We need to get "offset" into the packet_ctx, and we don't + * want to have an API which allows for two different "offset" values to be passed to two + * different bios. + */ + fd = NULL; + + /* + * @todo - add an internal "type" to the bio? + */ + while (next && (strcmp(talloc_get_name(next), "fr_bio_fd_t") != 0)) { + next = fr_bio_next(next); + } + + if (!fd) return -1; + + info = fr_bio_fd_info(fd); + fr_assert(info != NULL); + + /* + * We can only filter connections for IP address families. + * + * Unix domain sockets have to use a different method for filtering input connections. + */ + if (!((info->socket.af == AF_INET) || (info->socket.af == AF_INET6))) return -1; + + /* + * We can only be used for accept() sockets, or unconnected UDP sockets. + */ + switch (info->type) { + case FR_BIO_FD_UNCONNECTED: + break; + + case FR_BIO_FD_CONNECTED: + return -1; + + case FR_BIO_FD_ACCEPT: + break; + } + + my = talloc_zero(ctx, fr_bio_network_t); + if (!my) return NULL; + + my->offset = ((fr_bio_fd_t *) fd)->offset; + my->discard = discard; + + my->bio.write = fr_bio_next_write; + my->bio.read = fr_bio_network_read; + + my->trie = fr_bio_network_trie_alloc(my, info->socket.af, allow, deny); + if (!my->trie) { + talloc_free(my); + return NULL; + } + + fr_bio_chain(&my->bio, next); + + return (fr_bio_t *) my; +} + +/** Create a patricia trie for doing network filtering. + * + */ +fr_trie_t *fr_bio_network_trie_alloc(TALLOC_CTX *ctx, int af, fr_ipaddr_t const *allow, fr_ipaddr_t const *deny) +{ + size_t i, num; + fr_trie_t *trie; + + trie = fr_trie_alloc(ctx, NULL, NULL); + if (!trie) return NULL; + + num = talloc_array_length(allow); + fr_assert(num > 0); + + for (i = 0; i < num; i++) { + bool *value; + + /* + * Can't add v4 networks to a v6 socket, or vice versa. + */ + if (allow[i].af != af) { + fr_strerror_printf("Address family in entry %zd - 'allow = %pV' " + "does not match 'ipaddr'", i + 1, fr_box_ipaddr(allow[i])); + fail: + talloc_free(trie); + return NULL; + } + + /* + * Duplicates are bad. + */ + value = fr_trie_match_by_key(trie, &allow[i].addr, allow[i].prefix); + if (value) { + fr_strerror_printf("Cannot add duplicate entry 'allow = %pV'", + fr_box_ipaddr(allow[i])); + goto fail; + } + +#if 0 + /* + * Look for overlapping entries. i.e. the networks MUST be disjoint. + * + * Note that this catches 192.168.1/24 followed by 192.168/16, but NOT the other way + * around. The best fix is likely to add a flag to fr_trie_alloc() saying "we can only + * have terminal fr_trie_user_t nodes" + */ + value = fr_trie_lookup_by_key(trie, &allow[i].addr, allow[i].prefix); + if (network && (network->prefix <= allow[i].prefix)) { + fr_strerror_printf("Cannot add overlapping entry 'allow = %pV'", fr_box_ipaddr(allow[i])); + fr_strerror_const("Entry is completely enclosed inside of a previously defined network."); + goto fail; + } +#endif + + /* + * Insert the network into the trie. Lookups will return a bool ptr of allow / deny. + */ + if (fr_trie_insert_by_key(trie, &allow[i].addr, allow[i].prefix, FR_BIO_NETWORK_ALLOW) < 0) { + fr_strerror_printf("Failed adding 'allow = %pV' to filtering rules", fr_box_ipaddr(allow[i])); + return NULL; + } + } + + /* + * And now check denied networks. + */ + num = talloc_array_length(deny); + if (!num) return trie; + + /* + * Since the default is to deny, you can only add a "deny" inside of a previous "allow". + */ + for (i = 0; i < num; i++) { + bool *value; + + /* + * Can't add v4 networks to a v6 socket, or vice versa. + */ + if (deny[i].af != af) { + fr_strerror_printf("Address family in entry %zd - 'deny = %pV' " + "does not match 'ipaddr'", i + 1, fr_box_ipaddr(deny[i])); + goto fail; + } + + /* + * Exact duplicates are forbidden. + */ + value = fr_trie_match_by_key(trie, &deny[i].addr, deny[i].prefix); + if (value) { + fr_strerror_printf("Cannot add duplicate entry 'deny = %pV'", fr_box_ipaddr(deny[i])); + goto fail; + } + + /* + * A "deny" can only be within a previous "allow". + */ + value = fr_trie_lookup_by_key(trie, &deny[i].addr, deny[i].prefix); + if (!value) { + fr_strerror_printf("The network in entry %zd - 'deny = %pV' is not " + "contained within a previous 'allow'", i + 1, fr_box_ipaddr(deny[i])); + goto fail; + } + + /* + * A "deny" cannot be within a previous "deny". + */ + if (value == FR_BIO_NETWORK_DENY) { + fr_strerror_printf("The network in entry %zd - 'deny = %pV' is overlaps " + "with another 'deny' rule", i + 1, fr_box_ipaddr(deny[i])); + goto fail; + } + + /* + * Insert the rule into the trie. + */ + if (fr_trie_insert_by_key(trie, &deny[i].addr, deny[i].prefix, FR_BIO_NETWORK_DENY) < 0) { + fr_strerror_printf("Failed adding 'deny = %pV' to filtering rules", fr_box_ipaddr(deny[i])); + return NULL; + } + } + + return trie; +} diff --git a/src/lib/bio/network.h b/src/lib/bio/network.h new file mode 100644 index 00000000000..0ae084b3f0e --- /dev/null +++ b/src/lib/bio/network.h @@ -0,0 +1,44 @@ +#pragma once +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/network.h + * @brief BIO patricia trie filtering handlers + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ +RCSIDH(lib_bio_network_h, "$Id$") + +#include + +fr_bio_t *fr_bio_network_alloc(TALLOC_CTX *ctx, fr_ipaddr_t const *allow, fr_ipaddr_t const *deny, + fr_bio_read_t discard, fr_bio_t *next) CC_HINT(nonnull(1,3,5)); + +fr_trie_t *fr_bio_network_trie_alloc(TALLOC_CTX *ctx, int af, fr_ipaddr_t const *allow, fr_ipaddr_t const *deny); + +/* + * IP address lookups return one of these two magic pointers. + * + * NULL means "nothing matches", which should also be interpreted as "deny". + * + * The difference between "NULL" and "deny" is that NULL is an IP address which was never inserted into + * the trie. Whereas "deny" menas that there is a parent "allow" range, and we are carving out a "deny" + * in the middle of that range. + */ +#define FR_BIO_NETWORK_ALLOW ((void *) (-1)) +#define FR_BIO_NETWORK_DENY ((void *) (-2)) diff --git a/src/lib/bio/null.c b/src/lib/bio/null.c new file mode 100644 index 00000000000..4f34b7ef8b1 --- /dev/null +++ b/src/lib/bio/null.c @@ -0,0 +1,42 @@ +/* + * This program is is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/null.c + * @brief BIO NULL handlers + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ + +#include +#include + +/** Always return 0 on read. + * + */ +ssize_t fr_bio_null_read(UNUSED fr_bio_t *bio, UNUSED void *packet_ctx, UNUSED void *buffer, UNUSED size_t size) +{ + return 0; +} + +/** Always return 0 on write. + * + */ +ssize_t fr_bio_null_write(UNUSED fr_bio_t *bio, UNUSED void *packet_ctx, UNUSED void const *buffer, UNUSED size_t size) +{ + return 0; +} diff --git a/src/lib/bio/null.h b/src/lib/bio/null.h new file mode 100644 index 00000000000..1a0e72a1017 --- /dev/null +++ b/src/lib/bio/null.h @@ -0,0 +1,28 @@ +#pragma once +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/null.h + * @brief BIO null handlers. + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ +RCSIDH(lib_bio_null_h, "$Id$") + +ssize_t fr_bio_null_read(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size); +ssize_t fr_bio_null_write(fr_bio_t *bio, void *packet_ctx, void const *buffer, size_t size); diff --git a/src/lib/bio/packet.c b/src/lib/bio/packet.c new file mode 100644 index 00000000000..11d00b2015b --- /dev/null +++ b/src/lib/bio/packet.c @@ -0,0 +1,517 @@ +/* + * This program is is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/fd.c + * @brief Binary IO abstractions for packets in buffers + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ + +#include +#include +#include +#include + +typedef struct fr_bio_packet_entry_s fr_bio_packet_entry_t; +typedef struct fr_bio_packet_list_s fr_bio_packet_list_t; +typedef struct fr_bio_packet_s fr_bio_packet_t; + +/* + * Define type-safe wrappers for head and entry definitions. + */ +FR_DLIST_TYPES(fr_bio_packet_list) + +/* + * For delayed writes. + * + * @todo - we can remove the "cancelled" field by setting packet_ctx == my? + */ +struct fr_bio_packet_entry_s { + void *packet_ctx; + void const *buffer; + size_t size; + size_t already_written; + bool cancelled; + + fr_bio_packet_t *my; + + FR_DLIST_ENTRY(fr_bio_packet_list) entry; //!< List entry. +}; + +struct fr_bio_packet_list_s { + FR_DLIST_HEAD(fr_bio_packet_list) saved; + FR_DLIST_HEAD(fr_bio_packet_list) free; +}; + +FR_DLIST_FUNCS(fr_bio_packet_list, fr_bio_packet_entry_t, entry) + + +typedef struct fr_bio_packet_s { + FR_BIO_COMMON; + + size_t max_saved; + + fr_bio_packet_saved_t saved; + fr_bio_packet_callback_t sent; + fr_bio_packet_callback_t cancel; + + FR_DLIST_HEAD(fr_bio_packet_list) pending; + FR_DLIST_HEAD(fr_bio_packet_list) free; + + fr_bio_packet_entry_t array[]; +} fr_bio_packet_t; + +static ssize_t fr_bio_packet_write_buffer(fr_bio_t *bio, void *packet_ctx, void const *buffer, size_t size); + +/** Forcibly cancel all outstanding packets. + * + * Even partially written ones. This function is called from + * shutdown(), when the destructor is called, or on fatal read / write + * errors. + */ +static void fr_bio_packet_list_cancel(fr_bio_packet_t *my) +{ + fr_bio_packet_entry_t *item; + + if (!my->cancel) return; + + if (fr_bio_packet_list_num_elements(&my->pending) == 0) return; + + /* + * Cancel any remaining saved items. + */ + while ((item = fr_bio_packet_list_pop_head(&my->pending)) != NULL) { + my->cancel(&my->bio, item->packet_ctx, item->buffer, item->size); + item->cancelled = true; + fr_bio_packet_list_insert_head(&my->free, item); + } +} + +static int fr_bio_packet_destructor(fr_bio_packet_t *my) +{ + fr_assert(my->cancel); /* otherwise it would be fr_bio_destructor */ + + my->bio.write = fr_bio_null_write; + fr_bio_packet_list_cancel(my); + + return fr_bio_destructor(&my->bio); +} + +/** Push a packet onto a list. + * + */ +static ssize_t fr_bio_packet_list_push(fr_bio_packet_t *my, void *packet_ctx, const void *buffer, size_t size, size_t already_written) +{ + fr_bio_packet_entry_t *item; + + item = fr_bio_packet_list_pop_head(&my->free); + if (!item) return fr_bio_error(IO_WOULD_BLOCK); + + /* + * If we're the first entry in the saved list, we can have a partially written packet. + * + * Otherwise, we're a subsequent entry, and we cannot have any data which is partially written. + */ + fr_assert((fr_bio_packet_list_num_elements(&my->pending) == 0) || + (already_written == 0)); + + item->packet_ctx = packet_ctx; + item->buffer = buffer; + item->size = size; + item->already_written = already_written; + item->cancelled = false; + + fr_bio_packet_list_insert_tail(&my->pending, item); + + if (my->saved) my->saved(&my->bio, packet_ctx, buffer, size, item); + + return size; +} + +/** Write one packet to the next bio. + * + * If it blocks, save the packet and return OK to the caller. + */ +static ssize_t fr_bio_packet_write_next(fr_bio_t *bio, void *packet_ctx, void const *buffer, size_t size) +{ + ssize_t rcode; + fr_bio_packet_t *my = talloc_get_type_abort(bio, fr_bio_packet_t); + fr_bio_t *next; + + /* + * We can't call the next bio if there's still cached data to flush. + */ + fr_assert(fr_bio_packet_list_num_elements(&my->pending) == 0); + + next = fr_bio_next(&my->bio); + fr_assert(next != NULL); + + /* + * Write the data out. If we write all of it, we're done. + */ + rcode = next->write(next, packet_ctx, buffer, size); + if ((size_t) rcode == size) return rcode; + + if (rcode < 0) { + /* + * A non-blocking error: return it back up the chain. + */ + if (rcode != fr_bio_error(IO_WOULD_BLOCK)) return rcode; + + /* + * All other errors are fatal. + */ + my->bio.read = fr_bio_eof_read; + my->bio.write = fr_bio_null_write; + + fr_bio_packet_list_cancel(my); + return rcode; + } + + /* + * We were flushing the next buffer, return any data which was written. + */ + if (!buffer) return rcode; + + /* + * The next bio wrote a partial packet. Save the entire packet, and swap the write function to + * save all future packets in the saved list. + */ + bio->write = fr_bio_packet_write_buffer; + + fr_assert(fr_bio_packet_list_num_elements(&my->free) > 0); + + /* + * This can only error out if the free list has no more entries. + */ + return fr_bio_packet_list_push(my, packet_ctx, buffer, size, (size_t) rcode); +} + +/** Flush the packet list. + * + */ +static ssize_t fr_bio_packet_write_flush(fr_bio_packet_t *my, size_t size) +{ + size_t written; + fr_bio_t *next; + + if (fr_bio_packet_list_num_elements(&my->pending) == 0) { + my->bio.write = fr_bio_packet_write_next; + return 0; + } + + next = fr_bio_next(&my->bio); + fr_assert(next != NULL); + + /* + * Loop over the saved packets, flushing them to the next bio. + */ + written = 0; + while (written < size) { + ssize_t rcode; + fr_bio_packet_entry_t *item; + + /* + * No more saved packets to write: stop. + */ + item = fr_bio_packet_list_head(&my->pending); + if (!item) break; + + /* + * A cancelled item must be partially written. A cancelled item which has zero bytes + * written should not be in the saved list. + */ + fr_assert(!item->cancelled || (item->already_written > 0)); + + /* + * Push out however much data we can to the next bio. + */ + rcode = next->write(next, item->packet_ctx, ((uint8_t const *) item->buffer) + item->already_written, item->size - item->already_written); + if (rcode == 0) break; + + if (rcode < 0) { + if (rcode == fr_bio_error(IO_WOULD_BLOCK)) break; + + return rcode; + } + + /* + * Update the written count. + */ + written += rcode; + item->already_written += rcode; + + if (item->already_written < item->size) break; + + /* + * We don't run "sent" callbacks for cancelled items. + */ + if (item->cancelled) { + if (my->cancel) my->cancel(&my->bio, item->packet_ctx, item->buffer, item->size); + } else { + if (my->sent) my->sent(&my->bio, item->packet_ctx, item->buffer, item->size); + } + + (void) fr_bio_packet_list_pop_head(&my->pending); +#ifndef NDEBUG + item->buffer = NULL; + item->packet_ctx = NULL; + item->size = 0; + item->already_written = 0; +#endif + item->cancelled = true; + + fr_bio_packet_list_insert_head(&my->free, item); + } + + /* + * If we've written all of the saved packets, go back to writing to the "next" bio. + */ + if (fr_bio_packet_list_head(&my->pending)) my->bio.write = fr_bio_packet_write_next; + + return written; +} + +/** Write to the packet list buffer. + * + * The special buffer pointer of NULL means flush(). On flush, we call next->read(), and if that succeeds, + * go back to "pass through" mode for the buffers. + */ +static ssize_t fr_bio_packet_write_buffer(fr_bio_t *bio, void *packet_ctx, void const *buffer, size_t size) +{ + fr_bio_packet_t *my = talloc_get_type_abort(bio, fr_bio_packet_t); + + if (!buffer) return fr_bio_packet_write_flush(my, size); + + /* + * This can only error out if the free list has no more entries. + */ + return fr_bio_packet_list_push(my, packet_ctx, buffer, size, 0); +} + +/** Read one packet from next bio. + * + * This function does NOT respect packet boundaries. The caller should use other APIs to determine how big + * the "next" packet is. + * + * The caller may buffer the output data itself, or it may use other APIs to do checking. + * + * The main + */ +static ssize_t fr_bio_packet_read(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size) +{ + int rcode; + fr_bio_packet_t *my = talloc_get_type_abort(bio, fr_bio_packet_t); + fr_bio_t *next; + + next = fr_bio_next(&my->bio); + fr_assert(next != NULL); + + rcode = next->read(next, packet_ctx, buffer, size); + if (rcode >= 0) return rcode; + + /* + * We didn't read anything, return that. + */ + if (rcode == fr_bio_error(IO_WOULD_BLOCK)) return rcode; + + /* + * Error reading, which means that we can't write to it, either. We don't care if the error is + * EOF or anything else. We just cancel the outstanding packets, and shut ourselves down. + */ + my->bio.read = fr_bio_eof_read; + my->bio.write = fr_bio_null_write; + + fr_bio_packet_list_cancel(my); + return rcode; +} + +/** Shutdown + * + * Cancel / close has to be called before re-init. + */ +static int fr_bio_packet_shutdown(fr_bio_t *bio) +{ + fr_bio_packet_t *my = talloc_get_type_abort(bio, fr_bio_packet_t); + + fr_bio_packet_list_cancel(my); + + my->bio.read = fr_bio_packet_read; + my->bio.write = fr_bio_packet_write_next; + + return 0; +} + + +/** Allocate a packet-based bio. + * + * This bio assumes that each call to fr_bio_write() is for one packet, and only one packet. If the next bio + * returns a partial write, or WOULD BLOCK, then information about the packet is cached. Subsequent writes + * will write the partial data first, and then continue with subsequent writes. + * + * The caller is responsible for not freeing the packet ctx or the packet buffer until either the write has + * been performed, or the write has been cancelled. + * + * The read() API makes no provisions for reading complete packets. It simply returns whatever the next bio + * allows. If instead there is a need to read only complete packets, then the next bio should be + * fr_bio_mem_packet_alloc(). + * + * The read() API may return 0. There may have been data read from an underlying FD, but that data did not + * make it through the filters of the "next" bios. e.g. Any underlying FD should be put into a "wait for + * readable" state. + * + * The write() API will return a full write, even if the next layer is blocked. Any underlying FD + * should be put into a "wait for writeable" state. The packet which was supposed to be written has been + * cached, and cannot be cancelled as it is partially written. The caller should likely start using another + * bio for writes. If the caller continues to use the bio, then any subsequent writes will *always* cache + * the packets. @todo - we need to mark up the bio as "blocked", and then have a write_blocked() API? ugh. + * or just add `bool blocked` and `bool eof` to both read/write bios + * + * Once the underlying FD has become writeable, the caller should call fr_bio_write(bio, NULL, NULL, SIZE_MAX); + * That will cause the pending packets to be flushed. + * + * The write() API may return that it's written a full packet, in which case it's either completely written to + * the next bio, or to the pending queue. + * + * The read / write APIs can return WOULD_BLOCK, in which case nothing was done. Any underlying FD should be + * put into a "wait for writeable" state. Other errors from bios "further down" the chain can also be + * returned. + * + * @param ctx the talloc ctx + * @param max_saved Maximum number of packets to cache. Must be 1..1^17 + * @param saved callback to run when a packet is saved in the pending queue + * @param sent callback to run when a packet is sent. + * @param cancel callback to run when a packet is cancelled. + * @param next the next bio which will perform the underlying reads and writes. + * - NULL on error, memory allocation failed + * - !NULL the bio + */ +fr_bio_t *fr_bio_packet_alloc(TALLOC_CTX *ctx, size_t max_saved, + fr_bio_packet_saved_t saved, + fr_bio_packet_callback_t sent, + fr_bio_packet_callback_t cancel, + fr_bio_t *next) +{ + size_t i; + fr_bio_packet_t *my; + + if (!max_saved) max_saved = 1; + if (max_saved > (1 << 17)) max_saved = 1 << 17; + + my = (fr_bio_packet_t *) talloc_zero_array(ctx, uint8_t, sizeof(fr_bio_packet_t) + + sizeof(fr_bio_packet_entry_t) * max_saved); + if (!my) return NULL; + + talloc_set_type(my, fr_bio_packet_t); + + my->max_saved = max_saved; + + fr_bio_packet_list_init(&my->pending); + fr_bio_packet_list_init(&my->free); + + my->saved = saved; + my->sent = sent; + my->cancel = cancel; + + for (i = 0; i < max_saved; i++) { + my->array[i].my = my; + my->array[i].cancelled = true; + fr_bio_packet_list_insert_tail(&my->free, &my->array[i]); + } + + my->bio.read = fr_bio_packet_read; + my->bio.write = fr_bio_packet_write_next; + my->cb.shutdown = fr_bio_packet_shutdown; + + fr_bio_chain(&my->bio, next); + + if (my->cancel) { + talloc_set_destructor(my, fr_bio_packet_destructor); + } else { + talloc_set_destructor((fr_bio_t *) my, fr_bio_destructor); + } + + return (fr_bio_t *) my; +} + +/** Cancel the write for a packet. + * + * Cancel one a saved packets, and call the cancel() routine if it exists. + * + * There is no way to cancel all packets. The caller must find the lowest bio in the chain, and shutdown it. + * e.g. by closing the socket via fr_bio_fd_close(). That function will take care of walking back up the + * chain, and shutdownting each bio. + * + * @param bio the #fr_bio_packet_t + * @param ctx The context returned from #fr_bio_packet_saved_t + * @return + * - <0 no such packet was found in the list of saved packets, OR the packet cannot be cancelled. + * - 0 the packet was cancelled. + */ +int fr_bio_packet_cancel(fr_bio_t *bio, void *ctx) +{ + fr_bio_packet_t *my = talloc_get_type_abort(bio, fr_bio_packet_t); + fr_bio_packet_entry_t *item = ctx; + + if (!(item >= &my->array[0]) && (item < &my->array[my->max_saved])) { + return -1; + } + + /* + * Already cancelled, that's a NOOP. + */ + if (item->cancelled) return 0; + + /* + * If the item has been partially written, AND we have a working write function, see if we can + * cancel it. + */ + if (item->already_written && (my->bio.write != fr_bio_null_write)) { + ssize_t rcode; + fr_bio_t *next; + + next = fr_bio_next(bio); + fr_assert(next != NULL); + + /* + * If the write fails or returns nothing, the item can't be cancelled. + */ + rcode = next->write(next, item->packet_ctx, ((uint8_t const *) item->buffer) + item->already_written, item->size - item->already_written); + if (rcode <= 0) return -1; + + /* + * If we haven't written the full item, it can't be cancelled. + */ + item->already_written += rcode; + if (item->already_written < item->size) return -1; + + /* + * Else the item has been fully written, it can be safely cancelled. + */ + } + + + /* + * Remove it from the saved list, and run the cancellation callback. + */ + (void) fr_bio_packet_list_remove(&my->pending, item); + fr_bio_packet_list_insert_head(&my->free, item); + + if (my->cancel) my->cancel(bio, item->packet_ctx, item->buffer, item->size); + return 0; +} diff --git a/src/lib/bio/packet.h b/src/lib/bio/packet.h new file mode 100644 index 00000000000..c14172a9e3e --- /dev/null +++ b/src/lib/bio/packet.h @@ -0,0 +1,42 @@ +#pragma once +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/packet.h + * @brief Binary IO abstractions for packets in buffers + * + * Write packets of data to bios. If a packet is partially + * read/written, it is cached for later processing. + * + * @todo - Not quite done yet. It still needs to be integrated into the bio framework, + * and be managed through a bio of its own. + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ +RCSIDH(lib_bio_packet_h, "$Id$") + +typedef void (*fr_bio_packet_callback_t)(fr_bio_t *bio, void *packet_ctx, const void *buffer, size_t size); +typedef void (*fr_bio_packet_saved_t)(fr_bio_t *bio, void *packet_ctx, const void *buffer, size_t size, void *ctx); + +fr_bio_t *fr_bio_packet_alloc(TALLOC_CTX *ctx, size_t max_saved, + fr_bio_packet_saved_t saved, + fr_bio_packet_callback_t sent, + fr_bio_packet_callback_t cancel, + fr_bio_t *next) CC_HINT(nonnull(1,6)); + +int fr_bio_packet_cancel(fr_bio_t *bio, void *ctx) CC_HINT(nonnull); diff --git a/src/lib/bio/pipe.c b/src/lib/bio/pipe.c new file mode 100644 index 00000000000..296cbcd9f11 --- /dev/null +++ b/src/lib/bio/pipe.c @@ -0,0 +1,189 @@ +/* + * This program is is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or (at + * your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/pipe.c + * @brief BIO abstractions for in-memory pipes + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ +#include +#include + +#include + +#include + +/** The pipe bio + * + */ +typedef struct { + FR_BIO_COMMON; + + fr_bio_t *next; + + bool eof; //!< are we at EOF? + + fr_bio_pipe_cb_funcs_t signal; //!< inform us that the pipe is readable + + pthread_mutex_t mutex; +} fr_bio_pipe_t; + + +static int fr_bio_pipe_destructor(fr_bio_pipe_t *my) +{ + pthread_mutex_destroy(&my->mutex); + + return 0; +} + +/** Read from the pipe. + * + * Once EOF is set, any pending data is read, and then EOF is returned. + */ +static ssize_t fr_bio_pipe_read(fr_bio_t *bio, void *packet_ctx, void *buffer, size_t size) +{ + ssize_t rcode; + fr_bio_pipe_t *my = talloc_get_type_abort(bio, fr_bio_pipe_t); + + fr_assert(my->next != NULL); + + pthread_mutex_lock(&my->mutex); + rcode = my->next->read(my->next, packet_ctx, buffer, size); + if ((rcode == 0) && my->eof) { + rcode = fr_bio_error(EOF); + + } else if (rcode > 0) { + /* + * There is room to write more data. + * + * @todo - only signal when we transition from BLOCKED to unblocked. + */ + my->signal.writeable(&my->bio); + } + pthread_mutex_unlock(&my->mutex); + + return rcode; +} + + +/** Write to the pipe. + * + * Once EOF is set, no further writes are possible. + */ +static ssize_t fr_bio_pipe_write(fr_bio_t *bio, void *packet_ctx, void const *buffer, size_t size) +{ + ssize_t rcode; + fr_bio_pipe_t *my = talloc_get_type_abort(bio, fr_bio_pipe_t); + + fr_assert(my->next != NULL); + + pthread_mutex_lock(&my->mutex); + if (!my->eof) { + rcode = my->next->write(my->next, packet_ctx, buffer, size); + + /* + * There is more data to read. + * + * @todo - only signal when we transition from no data to data. + */ + if (rcode > 0) { + my->signal.readable(&my->bio); + } + + } else { + rcode = fr_bio_error(EOF); + } + pthread_mutex_unlock(&my->mutex); + + return rcode; +} + +/** Shutdown callback. + * + */ +static int fr_bio_pipe_shutdown(fr_bio_t *bio) +{ + ssize_t rcode; + fr_bio_pipe_t *my = talloc_get_type_abort(bio, fr_bio_pipe_t); + + fr_assert(my->next != NULL); + + pthread_mutex_lock(&my->mutex); + rcode = fr_bio_shutdown(my->next); + pthread_mutex_unlock(&my->mutex); + + return rcode; +} + +/** Allocate a thread-safe pipe which can be used for both reads and writes. + * + * Due to talloc issues with multiple threads, if the caller wants a bi-directional pipe, this function will + * need to be called twice. That way a free in each context won't result in a race condition on two mutex + * locks. + * + * For now, iqt's too difficult to emulate the pipe[2] behavior, where two identical "connected" things are + * returned, and either can be used for reading or for writing. + * + * i.e. a pipe is really a mutex-protected memory buffer. One side should call write (and never read). The + * other side should call read (and never write). + * + * The pipe should be freed only after both ends have set EOF. + */ +fr_bio_t *fr_bio_pipe_alloc(TALLOC_CTX *ctx, fr_bio_pipe_cb_funcs_t *cb, size_t buffer_size) +{ + fr_bio_pipe_t *my; + + if (!cb->readable || !cb->writeable) return -1; + + if (buffer_size < 1024) buffer_size = 1024; + if (buffer_size > (1 << 20)) buffer_size = (1 << 20); + + my = talloc_zero(ctx, fr_bio_pipe_t); + if (!my) return NULL; + + my->next = fr_bio_mem_sink_alloc(my, buffer_size); + if (!my->next) { + talloc_free(my); + return NULL; + } + + my->signal = *cb; + + pthread_mutex_init(&my->mutex, NULL); + + my->bio.read = fr_bio_pipe_read; + my->bio.write = fr_bio_pipe_write; + my->cb.shutdown = fr_bio_pipe_shutdown; + + talloc_set_destructor(my, fr_bio_pipe_destructor); + return (fr_bio_t *) my; +} + +/** Set EOF. + * + * Either side can set EOF, in which case pending reads are still processed. Writes return EOF immediately. + * Readers return pending data, and then EOF. + */ +void fr_bio_pipe_set_eof(fr_bio_t *bio) +{ + fr_bio_pipe_t *my = talloc_get_type_abort(bio, fr_bio_pipe_t); + + pthread_mutex_lock(&my->mutex); + my->eof = true; + pthread_mutex_unlock(&my->mutex); +} diff --git a/src/lib/bio/pipe.h b/src/lib/bio/pipe.h new file mode 100644 index 00000000000..7c3d9cb0b83 --- /dev/null +++ b/src/lib/bio/pipe.h @@ -0,0 +1,34 @@ +#pragma once +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA + */ + +/** + * $Id$ + * @file lib/bio/pipe.h + * @brief BIO pipe handlers. + * + * @copyright 2024 Network RADIUS SAS (legal@networkradius.com) + */ +RCSIDH(lib_bio_pipe_h, "$Id$") + +typedef struct { + fr_bio_callback_t readable; + fr_bio_callback_t writeable; +} fr_bio_pipe_cb_funcs_t; + +fr_bio_t *fr_bio_pipe_alloc(TALLOC_CTX *ctx, fr_bio_pipe_cb_funcs_t *cb, size_t buffer_size) CC_HINT(nonnull); + +void fr_bio_pipe_set_eof(fr_bio_t *bio);