From 1c4933d8807ff61ccd310b66bf305fe3c449ce28 Mon Sep 17 00:00:00 2001 From: =?utf8?q?Petr=20=C5=A0pa=C4=8Dek?= Date: Thu, 2 Jan 2020 20:40:48 +0100 Subject: [PATCH] doc: restructure docs Content from the proposed Operator's guide was split into sections and main chapters were restructured. New structure follows "purposes": - quick start - installation, startup - configuration - operation - performance, monitoring, etc. - documentatino for developers --- doc/config.rst | 92 ++++++++++++++++++++++++++ doc/index.rst | 16 ++--- doc/monitoring.rst | 15 +++++ doc/operators-guide.rst | 143 ---------------------------------------- doc/performance.rst | 77 ++++++++++++++++++++++ doc/startguide.rst | 112 ++----------------------------- 6 files changed, 197 insertions(+), 258 deletions(-) create mode 100644 doc/config.rst create mode 100644 doc/monitoring.rst delete mode 100644 doc/operators-guide.rst create mode 100644 doc/performance.rst diff --git a/doc/config.rst b/doc/config.rst new file mode 100644 index 000000000..b1c5e7a4a --- /dev/null +++ b/doc/config.rst @@ -0,0 +1,92 @@ +Multiple instances +================== + +Knot Resolver can utilize multiple CPUs running in multiple independent instances (processes), where each process utilizes at most single CPU core on your machine. If your machine handles a lot of DNS traffic run multiple instances. + +All instances typically share the same configuration and cache, and incomming queries are automatically distributed by operating system among all instances. + +Advantage of using multiple instances is that a problem in a single instance will not affect others, so a single instance crash will not bring whole DNS resolver service down. + +.. tip:: For maximum performance, there should be as many kresd processes as + there are available CPU threads. + +To run multiple instances, use a different identifier after `@` sign for each instance, for +example: + +.. code-block:: bash + + $ systemctl start kresd@1.service + $ systemctl start kresd@2.service + $ systemctl start kresd@3.service + $ systemctl start kresd@4.service + +With the use of brace expansion in BASH the equivalent command looks like this: + +.. code-block:: bash + + $ systemctl start kresd@{1..4}.service + +For more details see ``kresd.systemd(7)``. + + +Zero-downtime restarts +---------------------- +When using `multiple instances`_, it is also possible to do a rolling +restart with zero downtime of the service. This can be done by restarting +only a subset of the processes at a time. + +On a system with 4 instances we can restart them one-by-one: + +.. code-block:: bash + + $ systemctl restart kresd@1.service + $ systemctl restart kresd@2.service + $ systemctl restart kresd@3.service + $ systemctl restart kresd@4.service + +At any given time only a single instance is stopped and restarted so remaining three instances continue to service clients. + + +.. _instance-specific-configuration: + +Instance-specific configuration +------------------------------- + +Instances can use arbitraty identifiers for the instances, for example we can name instances like `dns1`, `tls` and so on. + +.. code-block:: bash + + $ systemctl start kresd@dns1 + $ systemctl start kresd@dns2 + $ systemctl start kresd@tls + $ systemctl start kresd@doh + +The instance name is subsequently exposed to kresd via the environment variable +``SYSTEMD_INSTANCE``. This can be used to tell the instances apart, e.g. when +using the :ref:`mod-nsid` module with per-instance configuration: + +.. code-block:: lua + + local systemd_instance = os.getenv("SYSTEMD_INSTANCE") + + modules.load('nsid') + nsid.name(systemd_instance) + +More arcane set-ups are also possible. The following example isolates the +individual services for classic DNS, DoT and DoH from each other. + +.. code-block:: lua + + local systemd_instance = os.getenv("SYSTEMD_INSTANCE") + + if string.match(systemd_instance, '^dns') then + net.listen('127.0.0.1', 53, { kind = 'dns' }) + elseif string.match(systemd_instance, '^tls') then + net.listen('127.0.0.1', 853, { kind = 'tls' }) + elseif string.match(systemd_instance, '^doh') then + net.listen('127.0.0.1', 443, { kind = 'doh' }) + else + panic("Use kresd@dns*, kresd@tls* or kresd@doh* instance names") + end + + diff --git a/doc/index.rst b/doc/index.rst index 428272119..b82ae0a33 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -14,26 +14,26 @@ and it provides a state-machine like API for extensions. startguide .. toctree:: - :caption: Users + :caption: Configuration :name: users :maxdepth: 2 + config daemon modules + +.. _operation: + .. toctree:: + :caption: Operation :maxdepth: 1 + performance + monitoring upgrading NEWS -.. toctree:: - :caption: Experts - :name: experts - :maxdepth: 2 - - operators-guide - .. toctree:: :caption: Developers :name: developers diff --git a/doc/monitoring.rst b/doc/monitoring.rst new file mode 100644 index 000000000..e6964e60e --- /dev/null +++ b/doc/monitoring.rst @@ -0,0 +1,15 @@ +********** +Monitoring +********** + +Statistics for monitoring purposes are available in :ref:`mod-stats` module. If you want to export these statistics to a central system like Graphite, Metronome, InfluxDB or any other compatible storage see :ref:`mod-graphite`. Statistics can also be made available over HTTP protocol in Prometheus format, see module providing :ref:`mod-http`, Prometheus is supported by ``webmgmt`` endpoint. + +.. note:: + + Please remember that each Knot Resolver instance keeps its own statistics, and instances can be started and stopped dynamically. This might affect your data postprocessing procedures. + +More extensive logging can be enabled using :ref:`mod-bogus_log` module. + +Resolver watchdog is tool to detect and recover from potential bugs that cause the resolver to stop responding properly to queries. See :ref:`mod-watchdog` for more information about this functionality. + +If none of these options fits your deployment or if you have special needs you can configure your own checks and exports using :ref:`async-events`. diff --git a/doc/operators-guide.rst b/doc/operators-guide.rst deleted file mode 100644 index 232971c75..000000000 --- a/doc/operators-guide.rst +++ /dev/null @@ -1,143 +0,0 @@ -.. _operator-guide: - -**************** -Operator's Guide -**************** - -The out-of-the box configuration of the upstream Knot Resolver packages is -intended for personal or small-scale use. Any deployment with traffic over 100 -queries per second will likely benefit from the recommendations in this guide. - -Examples in this guide assume systemd is used as a supervisor. However, the -same logic applies for other supervisors (e.g. supervisord_) or when running -kresd without any supervisor. - - -Multiple instances -================== - -The resolver can run in multiple independent processes. All of them can share -the same socket (utilizing ``SO_REUSEPORT``) and cache. - -.. tip:: For maximum performance, there should be as many kresd processes as - there are available CPU threads. - -To run multiple daemons, use a different identifier for each instance, for -example: - -.. code-block:: bash - - $ systemctl start kresd@1.service - $ systemctl start kresd@2.service - $ systemctl start kresd@3.service - $ systemctl start kresd@4.service - -With the use of brace expansion in bash, the equivalent command looks like: - -.. code-block:: bash - - $ systemctl start kresd@{1..4}.service - -For more details, see ``kresd.systemd(7)``. - -.. note:: When using multiple process, it is also possible to do a rolling - restart with zero downtime of the service. This can be done by restarting - only a subset of the processes at a time. - - -.. _instance-specific-configuration: - -Instance-specific configuration -------------------------------- - -It is possible to use arbitraty identifiers for the instances. - -.. code-block:: bash - - $ systemctl start kresd@dns1 - $ systemctl start kresd@dns2 - $ systemctl start kresd@tls - $ systemctl start kresd@doh - -The instance name is subsequently exposed to kresd via the environment variable -``SYSTEMD_INSTANCE``. This can be used to tell the instances apart, e.g. when -using the :ref:`mod-nsid` module. - -.. code-block:: lua - - local systemd_instance = os.getenv("SYSTEMD_INSTANCE") - - modules.load('nsid') - nsid.name(systemd_instance) - -More arcane set-ups are also possible. The following example isolates the -individual services for classic DNS, DoT and DoH from each other. - -.. code-block:: lua - - local systemd_instance = os.getenv("SYSTEMD_INSTANCE") - - if string.match(systemd_instance, '^dns') then - net.listen('127.0.0.1', 53, { kind = 'dns' }) - elseif string.match(systemd_instance, '^tls') then - net.listen('127.0.0.1', 853, { kind = 'tls' }) - elseif string.match(systemd_instance, '^doh') then - net.listen('127.0.0.1', 443, { kind = 'doh' }) - else - panic("Use kresd@dns*, kresd@tls* or kresd@doh* instance names") - end - - -Cache -===== - -Cache size ----------- - -Increasing the cache size is suitable for larger deployments. Values of 1 GB or -larger should be considered. - -.. code-block:: lua - - cache.size = 1 * GB - -Ensure sufficient space is available on the filesystem, otherwise a runtime -error of ``SIGBUS`` will be raised when Knot Resolver tries to allocate more -space. - -Cache in tmpfs --------------- - -.. tip:: Using tmpfs for cache improves performance and reduces disk I/O. - -Mounting the cache directory as tmpfs is recommended for larger deployments. -Make sure to use appropriate ``size=`` option and don't forget to adjust the -size in the config file as well. - -.. code-block:: - - # /etc/fstab - tmpfs /var/cache/knot-resolver tmpfs rw,size=1G,uid=knot-resolver,gid=knot-resolver,nosuid,nodev,noexec,mode=0700 0 0 - -.. note:: While it is technically possible to move the cache to an existing - tmpfs filesystem, it is *not* recommended. The path to cache is specified in - multiple systemd units. Also, a shared tmpfs space could be used up by other - applications, leading to ``SIGBUS`` errors during runtime. - - -Watchdog -======== - -.. tip:: Configuring a query watchdog may improve the resilience of the - service. - -While Knot Resolver is automatically restarted by systemd if it crashes, -activating a query watchdog is useful for potential edge-case bugs that might -cause the resolver to stop properly responding to queries while it is still -running. - -See :ref:`mod-watchdog` for more information about this functionality and its -configuration. - - -.. _`supervisord`: http://supervisord.org/ diff --git a/doc/performance.rst b/doc/performance.rst new file mode 100644 index 000000000..29fed5bd7 --- /dev/null +++ b/doc/performance.rst @@ -0,0 +1,77 @@ +.. _performance: + +Performance tunning +=================== + +.. _cache_sizing: + +Cache sizing +------------ + +For personal use-cases and small deployments cache size around 100 MB is more than enough. + +For large deployments we recommend to run Knot Resolver on a dedicated machine, and to allocate 90% of machine's free memory for resolver's cache. + +For example, imagine you have a machine with 16 GB of memory. +After machine restart you use command ``free -m`` to determine amount of free memory (without swap): + +.. code-block:: bash + + $ free -m + total used free + Mem: 15907 979 14928 + +Now you can configure cache size to be 90% of the free memory 14 928 MB, i.e. 13 453 MB: + +.. code-block:: lua + + -- 90 % of free memory after machine restart + cache.size = 13453 * MB + +.. _cache_persistence: + +Cache persistence +----------------- +.. tip:: Using tmpfs for cache improves performance and reduces disk I/O. + +By default the cache is saved on a persistent storage device +so the content of the cache is persisted during system reboot. +This usually leads to smaller latency after restart etc., +however in certain situations a non-persistent cache storage might be preferred, e.g.: + + - Resolver handles high volume of queries and I/O performance to disk is too low. + - Threat model includes attacker getting access to disk content in power-off state. + - Disk has limited number of writes (e.g. flash memory in routers). + +If non-persistent cache is desired configure cache directory to be on +tmpfs_ filesystem, a temporary in-memory file storage. +The cache content will be saved in memory, and thus have faster access +and will be lost on power-off or reboot. + + +.. note:: In most of the Unix-like systems ``/tmp`` and ``/var/run`` are commonly mounted to tmpfs. + While it is technically possible to move the cache to an existing + tmpfs filesystem, it is *not recommended*: The path to cache is specified in + multiple systemd units, and a shared tmpfs space could be used up by other + applications, leading to ``SIGBUS`` errors during runtime. + +Mounting the cache directory as tmpfs_ is recommended apparoach. +Make sure to use appropriate ``size=`` option and don't forget to adjust the +size in the config file as well. + +.. code-block:: + + # /etc/fstab + tmpfs /var/cache/knot-resolver tmpfs rw,size=2G,uid=knot-resolver,gid=knot-resolver,nosuid,nodev,noexec,mode=0700 0 0 + +.. code-block:: lua + + # /etc/knot-resolver/config + cache.size = 2 * GB + +.. _tmpfs: https://en.wikipedia.org/wiki/Tmpfs + + +Utilizing all CPUs +------------------ + diff --git a/doc/startguide.rst b/doc/startguide.rst index 5da19315b..27d7e6182 100644 --- a/doc/startguide.rst +++ b/doc/startguide.rst @@ -68,13 +68,6 @@ Add the `OBS `_ package repository Startup ******* -Knot Resolver can run in multiple independent instances (processes), where each `single instance`_ of Knot Resolver will utilize at most single CPU core on your machine. If your machine handles a lot of DNS traffic, run `multiple instances`_. - -Advantage of using multiple instances is that a problem in a single instance will not affect others, so a single instance crash will not bring whole DNS resolver service down. - -Single instance -=============== - The simplest way to run single instance of Knot Resolver is to use provided Knot Resolver's Systemd integration: @@ -89,30 +82,8 @@ See logs and status of running instance with ``systemctl status kresd@1.service` ``kresd@*.service`` is not enabled by default, thus Knot Resolver won't start automatically after reboot. To start and enable service in one command use ``systemctl enable --now kresd@1.service`` - -Multiple instances -================== - -Knot Resolver can run in multiple independent processes, all sharing the same configuration and cache. Incomming queries will be distributed among all instances automatically. - -To use up all resources, for instance of 4 CPUs system, the best way is to run four instances at a time. - -.. code-block:: bash - - $ sudo systemctl start kresd@1.service - $ sudo systemctl start kresd@2.service - $ sudo systemctl start kresd@3.service - $ sudo systemctl start kresd@4.service - -or simpler way - -.. code-block:: bash - - $ sudo systemctl start kresd@{1..4}.service - - -Testing -======= +First DNS query +=============== After installation and first startup, Knot Resolver's default configuration accepts queries on loopback interface. This allows you to test that the installation and service startup were successful before continuing with configuration. For instance, you can use DNS lookup utility ``kdig`` to send DNS queries. The ``kdig`` command is provided by following packages: @@ -187,61 +158,6 @@ The following configuration instructs Knot Resolver to receive standard unencryp Knot Resolver could answer from different IP addresses if the network address ranges overlap, and clients would refuse such a response. -Cache configuration -=================== - -Sizing -^^^^^^ - -For personal use-cases and small deployments cache size around 100 MB is more than enough. - -For large deployments we recommend to run Knot Resolver on a dedicated machine, and to allocate 90% of machine's free memory for resolver's cache. - -For example, imagine you have a machine with 16 GB of memory. -After machine restart you use command ``free -m`` to determine amount of free memory (without swap): - -.. code-block:: bash - - $ free -m - total used free - Mem: 15907 979 14928 - -Now you can configure cache size to be 90% of the free memory 14 928 MB, i.e. 13 453 MB: - -.. code-block:: lua - - -- 90 % of free memory after machine restart - cache.size = 13453 * MB - -.. _quick-cache_persistence: - -Cache persistence -^^^^^^^^^^^^^^^^^ -By default the cache is saved on a persistent storage device -so the content of the cache is persisted during system reboot. -This usually leads to smaller latency after restart etc., -however in certain situations a non-persistent cache storage might be preferred, e.g.: - - - Resolver handles high volume of queries and I/O performance to disk is too low. - - Threat model includes attacker getting access to disk content in power-off state. - - Disk has limited number of writes (e.g. flash memory in routers). - -If non-persistent cache is desired configure cache directory to be on -tmpfs_ filesystem, a temporary in-memory file storage. -The cache content will be saved in memory, and thus have faster access -and will be lost on power-off or reboot. - -In most of the Unix-like systems ``/tmp`` and ``/var/run`` are commonly mounted to *tmpfs*. -This allows us to move cache e.g. to directory ``/tmp/knot-resolver``: - -.. code-block:: lua - - -- do not forget the lmdb:// prefix before absolute path - cache.storage = 'lmdb:///tmp/knot-resolver' - -If the temporary directory doesn't exist it will be created automatically with access only -for ``knot-resolver`` user and group. - Scenario: Internal Resolver =========================== @@ -310,10 +226,6 @@ signed by a trusted CA. Once the certificate was obtained a path to certificate net.tls("/etc/knot-resolver/server-cert.pem", "/etc/knot-resolver/server-key.pem") -Performance tunning -^^^^^^^^^^^^^^^^^^^ -For very high-volume traffic do not forget to run `multiple instances`_ and consider using :ref:`non-persistent cache storage `. - Mandatory domain blocking ^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -389,27 +301,13 @@ Knot Resolver's cache contains data clients queried for. If you are concerned about attackers who are able to get access to your computer system in power-off state and your storage device is not secured by encryption you can move the cache to tmpfs_. -See previous chapter :ref:`quick-cache_persistence`. +See chapter :ref:`cache_persistence`. ********** -Monitoring +Next steps ********** - -Statistics for monitoring purposes are available in :ref:`mod-stats` module. If you want to export these statistics to a central system like Graphite, Metronome, InfluxDB or any other compatible storage see :ref:`mod-graphite`. Statistics can also be made available over HTTP protocol in Prometheus format, see module providing :ref:`mod-http`, Prometheus is supported by ``webmgmt`` endpoint. - -More extensive logging can be enabled using :ref:`mod-bogus_log` module. - -If none of these options fits your deployment or if you have special needs you can configure your own checks and exports using :ref:`async-events`. - -.. note:: - - Please remember that each Knot Resolver instance keeps its own statistics, and instances can be started and stopped dynamically. This might affect your data postprocessing procedures. - -********* -Upgrading -********* -Before upgrade please see :ref:`upgrading` guide for each respective version. +Congratulations! Your resolver is now up and running and ready for queries. For serious deployments do not forget to read :ref:`operation` chapter. -- 2.47.3