From: Michael Tremer Date: Sun, 26 Oct 2025 17:00:43 +0000 (+0000) Subject: DESIGN: Add some thoughts on the design of this project X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=eb37782b6c03520f74e1b1cd8a56d8b2083b62a3;p=oddments%2Fcollecty.git DESIGN: Add some thoughts on the design of this project Signed-off-by: Michael Tremer --- diff --git a/DESIGN.md b/DESIGN.md new file mode 100644 index 0000000..49a0d19 --- /dev/null +++ b/DESIGN.md @@ -0,0 +1,107 @@ +# IPFire Telemetry Design Ideas + +## Motivation + +This project has been created to collect metrics, stats and analytics from Linux-based +systems. It is supposed to be light-weight and suiting modern needs. + +In comparison to existing solutions like collectd, this project lifts the data collection +into todays time by creating a modern, extensible architecture and tightly couples +graph creation with a D-Bus interface. That way, it becomes a one-stop shop for not only +data collection, but also presentation of the collected data. + +## Design Goals + +This project has been written in C on top of libsystemd and comes with a number of bundled +sources which will collect said system metrics. Those are usually based on other libraries +that implement fetching the data as it is out of scope of this project. Should certain +non-essential libraries not be available, the project will compile itself without support +for those sources that require said library. + +## High-Level Architecture + +The core of this project is telemetryd, it does: + + * Collect all the metrics + * Provides the D-Bus interface in order to: + * Generate graphs + +A command line tool called telemetry-graph is implementing the client-side of the +D-Bus interface to quickly generate graphs with a single command. However, if suitable, +the D-Bus interface of the daemon can be accessed independently. + +Internally, the daemon is working as a single-threaded event loop which the source and +other objects register to. In regular intervals, the event loop calls a heartbeat function +of the sources to collect another metric. It will also flush any collected data to disk. + +Due to the single-threaded design, any sources may not block the event loop at any time. +The execution time of the heartbeat function is being tracked and if it exceeds a certain +threshold, it will be de-prioritised so that other sources have the chance to collect +their metrics in time. See for more details below. + +Collected metrics won't be written to the RRD files immediately in order to save on I/O. +Updating an RRD can be quite write-intensive and since we are looking to detailed data +from a lot of sources, we don't want to decrease the life of any flash storage too much. +Since the daemon is also generating the graphs, any collected, but not yet written data +will be flushed to the RRD files whenever needed to generate a specific graph. + +## Zero-Configuration + +In contrast to existing solutions, the daemon does not have a configuration file. The only +configuration possible is through the command line interface which allows to enable a +debugging mode. + +All sources will automatically discover what metrics to collect and might potentially +disable themselves if they are not needed on a host. + +Historically, IPFire used to have a script that generated a configuration file for +collectd, but instead of maintaining the separate script, we might as well conduct +the detection inside the daemon. + +## Data Model + +Each source holds the structure for the RRD files it is generating. It may also have a +number of functions that get called to setup the source to collect data, or to trigger +the source to collect data. Those functions are: + + * init - To set up any data structures + * free - To cleanup any resources that have been allocated by init + * heartbeat - A function called at regular intervals which triggers data collection + +Once some data has been collected, it will have to be submitted using the +td_source_submit* class of functions. + +Instead of storing each individual metric in a single RRD file, a source writes all +collected data into the same RRD file. That should help us to reduce I/O and also +keep the files easier to manage as there are fewer of them. + +In case the structure of the RRD file changes, the daemon automatically migrates +to the new schema whenever a RRD file is being updated. + +## Extensibility + +Sources and graphs have their own files with their implementations. One source collects +one specific thing and a graph renders exactly one presentation of the data that has +been collected by one or multiple sources. + +However, it is very easy to add more sources which will all run independently from each +other. That way, we will keep this project a living thing and adapt to any new developments, +features and so on. + +We are grateful for any contribution. + +## Shell Command Execution + +Some processes offer sockets or some other way to communicate with a master process. +Fetching metrics over these is a great way to access detailed information. However, it is +not always feasible to implement any custom protocols to communicate over such sockets. +Instead, we aim to use any companion utilities that are being shipped with those daemons +to access these metrics. That allows us to keep this project as lean as possible and +we will have to worry less about keeping up with any changes. + +Since calling a subprocess takes a lot of time and would block the event loop, a special +command tool has been built into the daemon which is able to asynchronously call any +commands in the background, ingest any data that is being printed to standard output +and then call a parsing function to process the data. This way, we will be able to +call many commands simultaneously without interfering with each other and we will +completely eliminate any danger of blocking the event loop. diff --git a/Makefile.am b/Makefile.am index 5830365..bf16b36 100644 --- a/Makefile.am +++ b/Makefile.am @@ -269,6 +269,11 @@ CLEANFILES += \ # ------------------------------------------------------------------------------ +dist_doc_DATA = \ + DESIGN.md + +# ------------------------------------------------------------------------------ + .PHONY: man man: $(MANPAGES) $(MANPAGES_HTML)