--- /dev/null
+.. SPDX-License-Identifier: GPL-3.0-or-later
+
+.. _config-logging-bogus:
+
+DNSSEC validation failure logging
+=================================
+
+This logs a message for each DNSSEC validation failure (on ``notice`` logging level).
+It is meant to provide hint to operators which queries should be
+investigated using diagnostic tools like DNSViz_.
+
+Add following line to your configuration file to enable it:
+
+.. code-block:: yaml
+
+ logging:
+ dnssec-bogus: true
+
+Example of error message logged:
+
+.. code-block:: none
+
+ [dnssec] validation failure: dnssec-failed.org. DNSKEY
+
+.. _DNSViz: http://dnsviz.net/
+
+.. List of most frequent queries which fail as DNSSEC bogus can be obtained at run-time:
+
+.. .. code-block:: lua
+
+.. > bogus_log.frequent()
+.. {
+.. {
+.. ['count'] = 1,
+.. ['name'] = 'dnssec-failed.org.',
+.. ['type'] = 'DNSKEY',
+.. },
+.. {
+.. ['count'] = 13,
+.. ['name'] = 'rhybar.cz.',
+.. ['type'] = 'DNSKEY',
+.. },
+.. }
+
+.. Please note that in future this might be replaced
+.. with some other way to log this information.
--- /dev/null
+.. SPDX-License-Identifier: GPL-3.0-or-later
+
+Debugging options
+=================
+
+In case the resolver crashes, it is often helpful to collect a coredump from
+the crashed process. Configuring the system to collect coredump from crashed
+process is out of the scope of this documentation, but some tips can be found
+`here <https://lists.nic.cz/hyperkitty/list/knot-resolver-users@lists.nic.cz/message/GUHW4JSDXZ6SZUAYYQ3U2WWOZEIVVF2S/>`_.
+
+Kresd uses its own mechanism for assertions. They are checks that should always
+pass and indicate some weird or unexpected state if they don't. In such cases,
+they show up in the log as errors. By default, the process recovers from those
+states if possible, but the behaviour can be changed with the following options
+to aid further debugging.
+
+.. option:: logging/debugging:
+
+ .. option:: assertion-abort: true|false
+
+ :default: false
+
+ Allow the process to be aborted in case it encounters a failed assertion.
+ (Some critical conditions always lead to abortion, regardless of settings.)
+
+ .. option:: assertion-fork: <time ms|s|m|h|d>
+
+ :default: 5m
+
+ If a process should be aborted, it can be done in two ways. When this is
+ set to nonzero (default), a child is forked and aborted to obtain a coredump,
+ while the parent process recovers and keeps running. This can be useful to
+ debug a rare issue that occurs in production, since it doesn't affect the
+ main process.
+
+ As the dumping can be costly, the value is a lower bound on delay between
+ consecutive coredumps of each process. It is randomized by +-25% each time.
+
+.. code-block:: yaml
+
+ logging:
+ debugging:
+ assertion-abort: true
+ assertion-fork: 10m
--- /dev/null
+.. SPDX-License-Identifier: GPL-3.0-or-later
+
+.. _config-logging-dnstap:
+
+Dnstap (traffic collection)
+===========================
+
+The ``dnstap`` supports logging DNS requests and responses to a unix
+socket in `dnstap format <https://dnstap.info>`_ using fstrm framing library.
+This logging is useful if you need effectively log all DNS traffic.
+
+The unix socket and the socket reader must be present before starting resolver instances.
+Also it needs appropriate filesystem permissions;
+the typical user and group for the resolver are called ``knot-resolver``.
+
+Tunables:
+
+* ``unix-socket``: the unix socket file where dnstap messages will be sent
+* ``log-queries``: if ``true`` queries from downstream in wire format will be logged
+* ``log-responses``: if ``true`` responses to downstream in wire format will be logged
+
+.. Very non-standard and it seems unlikely that others want to collect the RTT.
+.. * ``log-tcp-rtt``: if ``true`` and on Linux,
+ add "extra" field with "rtt=12345\n",
+ signifying kernel's current estimate of RTT micro-seconds for the non-UDP connection
+ (alongside every arrived DNS message).
+
+.. code-block:: yaml
+
+ logging:
+ dnstap:
+ unix-socket: /tmp/dnstap.sock
+ # by default log is enabled for all
+ log-queries: true
+ log-responses: true
********************************
To read service logs use commands usual for your distribution.
-E.g. on distributions using systemd-journald use command ``journalctl -u kresd@* -f``.
+E.g. on distributions using systemd-journald use command ``journalctl -eu knot-resolver``.
-Knot Resolver supports 6 logging levels - ``crit``, ``err``, ``warning``,
-``notice``, ``info``, ``debug``. All levels with the same meaning as is defined
-in ``syslog.h``. It is possible change logging level using
-:func:`log_level` function.
+.. option:: logging:
-.. code-block:: lua
+ .. option:: level: crit|err|warning|notice|info|debug
- log_level('debug') -- too verbose for normal usage
+ :default: notice
-Logging level ``notice`` is set after start by default,
-so logs from Knot Resolver should contain only couple lines a day.
-For debugging purposes it is possible to use the very verbose ``debug`` level,
-but that is generally not usable unless restricted in some way (see below).
+ Logging level ``notice`` is set after start by default,
+ so logs from Knot Resolver should contain only couple lines a day.
+ For debugging purposes it is possible to use the very verbose ``debug`` level,
+ but that is generally not usable unless restricted in some way (see below).
-In addition to levels, logging is also divided into the
-:ref:`groups <config_log_groups>`. All groups
-are logged by default, but you can enable ``debug`` level for selected groups using
-:func:`log_groups` function. Other groups are logged to the log level
-set by :func:`log_level`.
+ Toggle between ``debug`` and ``notice`` log level. Use only for debugging purposes.
+ On busy systems verbose logging can produce several MB of logs per
+ second and will slow down operation.
-It is also possible to enable ``debug`` logging level for particular requests,
-with :ref:`policies <mod-policy-logging>` or as :ref:`an HTTP service <mod-http-trace>`.
+ In addition to levels, logging is also divided into the groups.
-Less verbose logging for DNSSEC validation errors can be enabled by using :ref:`mod-bogus_log` module.
+ .. option:: groups: <list of logging groups>
-.. py:function:: log_level([level])
+ Use to turn-on ``debug`` logging for the selected :ref:`groups <config_log_groups>` regardless of the global log level.
+ Other groups are logged to the log based on the initial level.
- :param: string ``'crit'``, ``'err'``, ``'warning'``, ``'notice'``,
- ``'info'`` or ``'debug'``
- :return: string Current logging level.
+ .. code-block:: yaml
- Pass a string to set the global logging level.
+ logging:
+ level: notice # other groups are logged based on this level
+ groups: [manager, cache] # enable debug logging level for manager and cache group
- .. py:function:: verbose([true | false])
+ .. It is also possible to enable ``debug`` logging level for particular requests,
+ .. with :ref:`policies <mod-policy-logging>` or as :ref:`an HTTP service <mod-http-trace>`.
- .. deprecated:: 5.4.0
- Use :func:`log_level` instead.
+ Less verbose logging for DNSSEC validation errors can be enabled by using :ref:`config-logging-bogus` module.
- :param: ``true`` enable ``debug`` level, ``false`` switch to default level (``notice``).
- :return: boolean ``true`` when ``debug`` level is enabled.
+ .. option:: target: syslog|stderr|stdout
- Toggle between ``debug`` and ``notice`` log level. Use only for debugging purposes.
- On busy systems verbose logging can produce several MB of logs per
- second and will slow down operation.
+ Knot Resolver logs to standard error stream by default, but typical systemd units change that to ``'syslog'``.
+ That setting logs directly through systemd's facilities (if available) to preserve more meta-data.
+ Do not edit if you do not know what you are doing.
-.. py:function:: log_target(target)
-
- :param: string ``'syslog'``, ``'stderr'``, ``'stdout'``
- :return: string Current logging target.
-
- Knot Resolver logs to standard error stream by default,
- but typical systemd units change that to ``'syslog'``.
- That setting logs directly through systemd's facilities
- (if available) to preserve more meta-data.
-
-.. py:function:: log_groups([table])
-
- :param: table of string(s) representing :ref:`log groups <config_log_groups>`
- :return: table of string with currently set log groups
-
- Use to turn-on debug logging for the selected groups regardless of the global
- log level. Calling with no argument lists the currently active log groups. To
- remove all log groups, call the function with an empty table.
-
- .. code-block:: lua
-
- log_groups({'io', 'tls'} -- turn on debug logging for io and tls groups
- log_groups() -- list active log groups
- log_groups({}) -- remove all log groups
-
-Various statistics for monitoring purposes are available in :ref:`mod-stats` module, including export to central systems like Graphite, Metronome, InfluxDB, or Prometheus format.
-
-Resolver :ref:`mod-watchdog` is tool to detect and recover from potential bugs that cause the resolver to stop responding properly to queries.
+Various statistics for monitoring purposes are available in :ref:`config-monitoring-stats`, including export to central systems like Graphite, Metronome, InfluxDB, or Prometheus format.
Additional monitoring and debugging methods are described below. If none of these options fits your deployment or if you have special needs you can configure your own checks and exports using :ref:`async-events`.
.. toctree::
:maxdepth: 1
- modules-bogus_log
- modules-stats
- daemon-bindings-worker
- modules-nsid
- modules-http-trace
- modules-watchdog
- modules-dnstap
- modules-ta_sentinel
- modules-ta_signal_query
- modules-detect_time_skew
- modules-detect_time_jump
- config-debugging
- config-logging-header
+ config-logging-bogus
+ config-monitoring-stats
+ config-nsid
+ config-logging-dnstap
+ config-ta-sentinel
+ config-ta-signal-query
+ config-time-skew-detection
+ config-time-jump-detection
+ config-logging-debugging
--- /dev/null
+.. SPDX-License-Identifier: GPL-3.0-or-later
+
+.. _config-monitoring-stats:
+
+Statistics collector
+====================
+
+Module ``stats`` gathers various counters from the query resolution
+and server internals, and offers them as a key-value storage.
+These metrics can be either exported to :ref:`mod-graphite`,
+exposed as :ref:`mod-http-prometheus`, or processed using user-provided script
+as described in chapter :ref:`async-events`.
+
+.. note:: Please remember that each Knot Resolver instance keeps its own
+ statistics, and instances can be started and stopped dynamically. This might
+ affect your data postprocessing procedures if you are using
+ :ref:`systemd-multiple-instances`.
+
+.. _mod-stats-list:
+
+Built-in statistics
+-------------------
+
+Built-in counters keep track of number of queries and answers matching specific criteria.
+
++-----------------------------------------------------------------+
+| **Global request counters** |
++------------------+----------------------------------------------+
+| request.total | total number of DNS requests |
+| | (including internal client requests) |
++------------------+----------------------------------------------+
+| request.internal | internal requests generated by Knot Resolver |
+| | (e.g. DNSSEC trust anchor updates) |
++------------------+----------------------------------------------+
+| request.udp | external requests received over plain UDP |
+| | (:rfc:`1035`) |
++------------------+----------------------------------------------+
+| request.tcp | external requests received over plain TCP |
+| | (:rfc:`1035`) |
++------------------+----------------------------------------------+
+| request.dot | external requests received over |
+| | DNS-over-TLS (:rfc:`7858`) |
++------------------+----------------------------------------------+
+| request.doh | external requests received over |
+| | DNS-over-HTTP (:rfc:`8484`) |
++------------------+----------------------------------------------+
+| request.xdp | external requests received over plain UDP |
+| | via an AF_XDP socket |
++------------------+----------------------------------------------+
+
++----------------------------------------------------+
+| **Global answer counters** |
++-----------------+----------------------------------+
+| answer.total | total number of answered queries |
++-----------------+----------------------------------+
+| answer.cached | queries answered from cache |
++-----------------+----------------------------------+
+
++-----------------+----------------------------------+
+| **Answers categorized by RCODE** |
++-----------------+----------------------------------+
+| answer.noerror | NOERROR answers |
++-----------------+----------------------------------+
+| answer.nodata | NOERROR, but empty answers |
++-----------------+----------------------------------+
+| answer.nxdomain | NXDOMAIN answers |
++-----------------+----------------------------------+
+| answer.servfail | SERVFAIL answers |
++-----------------+----------------------------------+
+
++-----------------+----------------------------------+
+| **Answer latency** |
++-----------------+----------------------------------+
+| answer.1ms | completed in 1ms |
++-----------------+----------------------------------+
+| answer.10ms | completed in 10ms |
++-----------------+----------------------------------+
+| answer.50ms | completed in 50ms |
++-----------------+----------------------------------+
+| answer.100ms | completed in 100ms |
++-----------------+----------------------------------+
+| answer.250ms | completed in 250ms |
++-----------------+----------------------------------+
+| answer.500ms | completed in 500ms |
++-----------------+----------------------------------+
+| answer.1000ms | completed in 1000ms |
++-----------------+----------------------------------+
+| answer.1500ms | completed in 1500ms |
++-----------------+----------------------------------+
+| answer.slow | completed in more than 1500ms |
++-----------------+----------------------------------+
+| answer.sum_ms | sum of all latencies in ms |
++-----------------+----------------------------------+
+
++-----------------+----------------------------------+
+| **Answer flags** |
++-----------------+----------------------------------+
+| answer.aa | authoritative answer |
++-----------------+----------------------------------+
+| answer.tc | truncated answer |
++-----------------+----------------------------------+
+| answer.ra | recursion available |
++-----------------+----------------------------------+
+| answer.rd | recursion desired (in answer!) |
++-----------------+----------------------------------+
+| answer.ad | authentic data (DNSSEC) |
++-----------------+----------------------------------+
+| answer.cd | checking disabled (DNSSEC) |
++-----------------+----------------------------------+
+| answer.do | DNSSEC answer OK |
++-----------------+----------------------------------+
+| answer.edns0 | EDNS0 present |
++-----------------+----------------------------------+
+
++-----------------+----------------------------------+
+| **Query flags** |
++-----------------+----------------------------------+
+| query.edns | queries with EDNS present |
++-----------------+----------------------------------+
+| query.dnssec | queries with DNSSEC DO=1 |
++-----------------+----------------------------------+
+
+Example:
+
+.. code-block:: none
+
+ modules.load('stats')
+
+ -- Enumerate metrics
+ > stats.list()
+ [answer.cached] => 486178
+ [iterator.tcp] => 490
+ [answer.noerror] => 507367
+ [answer.total] => 618631
+ [iterator.udp] => 102408
+ [query.concurrent] => 149
+
+ -- Query metrics by prefix
+ > stats.list('iter')
+ [iterator.udp] => 105104
+ [iterator.tcp] => 490
+
+ -- Fetch most common queries
+ > stats.frequent()
+ [1] => {
+ [type] => 2
+ [count] => 4
+ [name] => cz.
+ }
+
+ -- Fetch most common queries (sorted by frequency)
+ > table.sort(stats.frequent(), function (a, b) return a.count > b.count end)
+
+ -- Show recently contacted authoritative servers
+ > stats.upstreams()
+ [2a01:618:404::1] => {
+ [1] => 26 -- RTT
+ }
+ [128.241.220.33] => {
+ [1] => 31 - RTT
+ }
+
+ -- Set custom metrics from modules
+ > stats['filter.match'] = 5
+ > stats['filter.match']
+ 5
+
+Module reference
+----------------
+
+.. function:: stats.get(key)
+
+ :param string key: i.e. ``"answer.total"``
+ :return: ``number``
+
+Return nominal value of given metric.
+
+.. function:: stats.set('key val')
+
+Set nominal value of given metric.
+
+Example:
+
+.. code-block:: lua
+
+ stats.set('answer.total 5')
+ -- or syntactic sugar
+ stats['answer.total'] = 5
+
+
+.. function:: stats.list([prefix])
+
+ :param string prefix: optional metric prefix, i.e. ``"answer"`` shows only metrics beginning with "answer"
+
+Outputs collected metrics as a JSON dictionary.
+
+.. function:: stats.upstreams()
+
+Outputs a list of recent upstreams and their RTT. It is sorted by time and stored in a ring buffer of
+a fixed size. This means it's not aggregated and readable by multiple consumers, but also that
+you may lose entries if you don't read quickly enough. The default ring size is 512 entries, and may be overridden on compile time by ``-DUPSTREAMS_COUNT=X``.
+
+.. function:: stats.frequent()
+
+Outputs list of most frequent iterative queries as a JSON array. The queries are sampled probabilistically,
+and include subrequests. The list maximum size is 5000 entries, make diffs if you want to track it over time.
+
+.. function:: stats.clear_frequent()
+
+Clear the list of most frequent iterative queries.
+
+.. include:: ../modules/graphite/README.rst
+.. include:: ../modules/http/prometheus.rst
--- /dev/null
+.. SPDX-License-Identifier: GPL-3.0-or-later
+
+.. _config-nsid:
+
+Name Server Identifier (NSID)
+=============================
+
+Knot Resolver provides server-side support for :rfc:`5001`
+which allows DNS clients to request resolver to send back its NSID
+along with the reply to a DNS request.
+This is useful for debugging larger resolver farms
+(e.g. when using :ref:`systemd-multiple-instances`, anycast or load balancers).
+
+NSID value can be configured in the resolver's configuration file:
+
+.. code-block:: yaml
+
+ nsid: kres1
+
+.. note::
+
+ When running with multiple workers, each worker adds its own identifier to the end of the NSID.
--- /dev/null
+.. SPDX-License-Identifier: GPL-3.0-or-later
+
+.. _config-ta_sentinel:
+
+Sentinel for Detecting Trusted Root Keys
+========================================
+
+Root Key Trust Anchor Sentinel for DNSSEC according to standard :rfc:`8509`.
+
+This feature allows users of DNSSEC validating resolver to detect which root keys
+are configured in resolver's chain of trust. The data from such
+signaling are necessary to monitor the progress of the DNSSEC root key rollover
+and to detect potential breakage before it affect users. One example of research enabled by this module `is available here <https://www.potaroo.net/ispcol/2018-11/kskpm.html>`_.
+
+The sentinel is enabled by default and we urge users not to disable it unless absolutely necessary.
+
+.. code-block:: yaml
+
+ dnssec:
+ trust-anchor-sentinel: false
--- /dev/null
+.. SPDX-License-Identifier: GPL-3.0-or-later
+
+.. _config-ta-signal-query:
+
+Signaling Trust Anchor Knowledge in DNSSEC
+==========================================
+
+Signaling Trust Anchor Knowledge in DNSSEC Using Key Tag Query,
+implemented according to :rfc:`8145#section-5`.
+
+This feature allows validating resolvers to signal to authoritative servers
+which keys are referenced in their chain of trust. The data from such
+signaling allow zone administrators to monitor the progress of rollovers
+in a DNSSEC-signed zone.
+
+This mechanism serve to measure the acceptance and use of new DNSSEC
+trust anchors and key signing keys (KSKs). This signaling data can be
+used by zone administrators as a gauge to measure the successful deployment
+of new keys. This is of particular interest for the DNS root zone in the event
+of key and/or algorithm rollovers that rely on :rfc:`5011` to automatically
+update a validating DNS resolver’s trust anchor.
+
+.. attention::
+ Experience from root zone KSK rollover in 2018 shows that this mechanism
+ by itself is not sufficient to reliably measure acceptance of the new key.
+ Nevertheless, some DNS researchers found it is useful in combination
+ with other data so we left it enabled for now. This default might change
+ once more information is available.
+
+This module is enabled by default. You may disable it in the configuration file.
+
+.. code-block:: yaml
+
+ dnssec:
+ trust-anchor-signal-query: false
--- /dev/null
+.. SPDX-License-Identifier: GPL-3.0-or-later
+
+.. _config-time-jump-detection:
+
+Detect discontinuous jumps in the system time
+=============================================
+
+Detect discontinuous jumps in the system time when resolver
+is running. It clears cache when a significant backward time jumps occurs.
+
+Time jumps are usually created by NTP time change or by admin intervention.
+These change can affect cache records as they store timestamp and TTL in real
+time.
+
+If you want to preserve cache during time travel you should disable it:
+
+.. code-block:: yaml
+
+ options:
+ time-jump-detection: false
+
+Due to the way monotonic system time works on typical systems,
+suspend-resume cycles will be perceived as forward time jumps,
+but this direction of shift does not have the risk of using records
+beyond their intended TTL, so forward jumps do not cause erasing the cache.
+
--- /dev/null
+.. SPDX-License-Identifier: GPL-3.0-or-later
+
+.. _config-time-skew-detection:
+
+System time skew detector
+=========================
+
+This module compares local system time with inception and expiration time
+bounds in DNSSEC signatures for ``. NS`` records. If the local system time is
+outside of these bounds, it is likely a misconfiguration which will cause
+all DNSSEC validation (and resolution) to fail.
+
+In case of mismatch, a warning message will be logged to help with
+further diagnostics.
+
+.. warning::
+
+ Information printed by this module can be forged by a network attacker!
+ System administrator MUST verify values printed by this module and
+ fix local system time using a trusted source.
+
+This module is useful for debugging purposes. It runs only once during resolver
+start does not anything after that. It is enabled by default.
+You may disable in configuration file.
+
+.. code-block:: yaml
+
+ dnssec:
+ time-skew-detection: false
config-overview
usecase-network-interfaces
config-policy-new
+ config-logging-monitoring
config-dnssec
config-lua
:name: developers-chapter
:maxdepth: 2
+ manager-dev
architecture
build
lib