]> git.ipfire.org Git - thirdparty/pdns.git/blob - pdns/recursordist/docs/metrics.rst
This provides CPU usage statistics per thread (worker & distributor).
[thirdparty/pdns.git] / pdns / recursordist / docs / metrics.rst
1 Metrics and Statistics
2 ======================
3
4 The PowerDNS Recursor collects many statistics about itself.
5
6 Regular Statistics Log
7 ----------------------
8 Every half hour or so (configurable with :ref:`setting-statistics-interval`, the recursor outputs a line with statistics.
9 To force the output of statistics, send the process a SIGUSR1. A line of statistics looks like this::
10
11 Feb 10 14:16:03 stats: 125784 questions, 13971 cache entries, 309 negative entries, 84% cache hits, outpacket/query ratio 37%, 12% throttled
12
13 This means that there are 13791 different names cached, which each may have multiple records attached to them.
14 There are 309 items in the negative cache, items of which it is known that don't exist and won't do so for the near future.
15 84% of incoming questions could be answered without any additional queries going out to the net.
16
17 The outpacket/query ratio means that on average, 0.37 packets were needed to answer a question.
18 Initially this ratio may be well over 100% as additional queries may be needed to actually recurse the DNS and figure out the addresses of nameservers.
19
20 Finally, 12% of queries were not performed because identical queries had gone out previously and failed, saving load on servers worldwide.
21
22 .. _metricscarbon:
23
24 Sending metrics to Graphite/Metronome over Carbon
25 -------------------------------------------------
26 For carbon/graphite/metronome, we use the following namespace.
27 Everything starts with 'pdns.', which is then followed by the local hostname.
28 Thirdly, we add 'recursor' to signify the daemon generating the metrics.
29 This is then rounded off with the actual name of the metric. As an example: 'pdns.ns1.recursor.questions'.
30
31 Care has been taken to make the sending of statistics as unobtrusive as possible, the daemons will not be hindered by an unreachable carbon server, timeouts or connection refused situations.
32
33 To benefit from our carbon/graphite support, either install Graphite, or use our own lightweight statistics daemon, Metronome, currently available on `GitHub <https://github.com/ahupowerdns/metronome/>`_.
34
35 To enable sending metrics, set :ref:`setting-carbon-server`, possibly :ref:`setting-carbon-interval` and possibly :ref:`setting-carbon-ourname` in the configuration.
36
37 .. warning::
38
39 If your hostname includes dots, they will be replaced by underscores so as not to confuse the namespace.
40
41 If you include dots in :ref:`setting-carbon-ourname`, they will **not** be replaced by underscores.
42 As PowerDNS assumes you know what you are doing if you override your hostname.
43
44 Sending metrics over SNMP
45 -------------------------
46 .. versionadded:: 4.1.0
47
48 The recursor can export statistics over SNMP and send traps from :doc:`Lua <lua-scripting/index>`, provided support is compiled into the Recursor and :ref:`setting-snmp-agent` set.
49
50 MIB
51 ^^^
52
53 .. literalinclude:: ../RECURSOR-MIB.txt
54
55 Getting Metrics from the Recursor
56 ---------------------------------
57
58 Should Carbon not be the preferred way of receiving metric, several other techniques can be employed to retrieve metrics.
59
60 Using the Webserver
61 ^^^^^^^^^^^^^^^^^^^
62 The :doc:`API <http-api/index>` exposes a statistics endpoint at :http:get:`/api/v1/servers/:server_id/statistics`.
63 This endpoint exports all statistics in a single JSON document.
64
65 Using ``rec_control``
66 ^^^^^^^^^^^^^^^^^^^^^
67 Metrics can also be gathered on the system itself by invoking :doc:`rec_control <manpages/rec_control.1>`::
68
69 rec_control get-all
70
71 Single statistics can also be retrieved with the ``get`` command, e.g.::
72
73 rec_control get all-outqueries
74
75 External programs can use this technique to scrape metrics.
76
77 .. _metricnames:
78
79 Gathered Information
80 --------------------
81
82 These statistics are gathered.
83
84 It should be noted that answers0-1 + answers1-10 + answers10-100 + answers100-1000 + answers-slow + packetcache-hits + over-capacity-drops + policy-drops = questions.
85
86 Also note that unauthorized-tcp and unauthorized-udp packets do not end up in the 'questions' count.
87
88 all-outqueries
89 ^^^^^^^^^^^^^^
90 counts the number of outgoing UDP queries since starting
91
92 answers-slow
93 ^^^^^^^^^^^^
94 counts the number of queries answered after 1 second
95
96 answers0-1
97 ^^^^^^^^^^
98 counts the number of queries answered within 1 millisecond
99
100 answers1-10
101 ^^^^^^^^^^^
102 counts the number of queries answered within 10 milliseconds
103
104 answers10-100
105 ^^^^^^^^^^^^^
106 counts the number of queries answered within 100 milliseconds
107
108 answers100-1000
109 ^^^^^^^^^^^^^^^
110 counts the number of queries answered within 1 second
111
112 auth4-answers-slow
113 ^^^^^^^^^^^^^^^^^^
114 counts the number of queries answered by auth4s after 1 second (4.0)
115
116 auth4-answers0-1
117 ^^^^^^^^^^^^^^^^
118 counts the number of queries answered by auth4s within 1 millisecond (4.0)
119
120 auth4-answers1-10
121 ^^^^^^^^^^^^^^^^^
122 counts the number of queries answered by auth4s within 10 milliseconds (4.0)
123
124 auth4-answers10-100
125 ^^^^^^^^^^^^^^^^^^^
126 counts the number of queries answered by auth4s within 100 milliseconds (4.0)
127
128 auth4-answers100-1000
129 ^^^^^^^^^^^^^^^^^^^^^
130 counts the number of queries answered by auth4s within 1 second (4.0)
131
132 auth6-answers-slow
133 ^^^^^^^^^^^^^^^^^^
134 counts the number of queries answered by auth6s after 1 second (4.0)
135
136 auth6-answers0-1
137 ^^^^^^^^^^^^^^^^
138 counts the number of queries answered by auth6s within 1 millisecond (4.0)
139
140 auth6-answers1-10
141 ^^^^^^^^^^^^^^^^^
142 counts the number of queries answered by auth6s within 10 milliseconds (4.0)
143
144 auth6-answers10-100
145 ^^^^^^^^^^^^^^^^^^^
146 counts the number of queries answered by auth6s within 100 milliseconds (4.0)
147
148 auth6-answers100-1000
149 ^^^^^^^^^^^^^^^^^^^^^
150 counts the number of queries answered by auth6s within 1 second (4.0)
151
152 auth-zone-queries
153 ^^^^^^^^^^^^^^^^^
154 counts the number of queries to locally hosted authoritative zones (:ref:`setting-auth-zones`) since starting
155
156 cache-bytes
157 ^^^^^^^^^^^
158 size of the cache in bytes
159
160 cache-entries
161 ^^^^^^^^^^^^^
162 shows the number of entries in the cache
163
164 cache-hits
165 ^^^^^^^^^^
166 counts the number of cache hits since starting, this does **not** include hits that got answered from the packet-cache
167
168 cache-misses
169 ^^^^^^^^^^^^
170 counts the number of cache misses since starting
171
172 case-mismatches
173 ^^^^^^^^^^^^^^^
174 counts the number of mismatches in character case since starting
175
176 chain-resends
177 ^^^^^^^^^^^^^
178 number of queries chained to existing outstanding query
179
180 client-parse-errors
181 ^^^^^^^^^^^^^^^^^^^
182 counts number of client packets that could not be parsed
183
184 concurrent-queries
185 ^^^^^^^^^^^^^^^^^^
186 shows the number of MThreads currently running
187
188 cpu-msec-thread-n
189 ^^^^^^^^^^^^^^^^^
190 shows the number of milliseconds spent in thread n. Available since 4.1.12.
191
192 dlg-only-drops
193 ^^^^^^^^^^^^^^
194 number of records dropped because of :ref:`setting-delegation-only` setting
195
196 dnssec-authentic-data-queries
197 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
198 .. versionadded:: 4.2
199
200 number of queries received with the AD bit set
201
202 dnssec-check-disabled-queries
203 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
204 .. versionadded:: 4.2
205
206 number of queries received with the CD bit set
207
208 dnssec-queries
209 ^^^^^^^^^^^^^^
210 number of queries received with the DO bit set
211
212 dnssec-result-bogus
213 ^^^^^^^^^^^^^^^^^^^
214 number of DNSSEC validations that had the Bogus state
215
216 dnssec-result-indeterminate
217 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
218 number of DNSSEC validations that had the Indeterminate state
219
220 dnssec-result-insecure
221 ^^^^^^^^^^^^^^^^^^^^^^
222 number of DNSSEC validations that had the Insecure state
223
224 dnssec-result-nta
225 ^^^^^^^^^^^^^^^^^
226 number of DNSSEC validations that had the NTA (negative trust anchor) state
227
228 dnssec-result-secure
229 ^^^^^^^^^^^^^^^^^^^^
230 number of DNSSEC validations that had the Secure state
231
232 dnssec-validations
233 ^^^^^^^^^^^^^^^^^^
234 number of DNSSEC validations performed
235
236 dont-outqueries
237 ^^^^^^^^^^^^^^^
238 number of outgoing queries dropped because of :ref:`setting-dont-query` setting (since 3.3)
239
240 ecs-queries
241 ^^^^^^^^^^^
242 number of outgoing queries adorned with an EDNS Client Subnet option (since 4.1)
243
244 ecs-responses
245 ^^^^^^^^^^^^^
246 number of responses received from authoritative servers with an EDNS Client Subnet option we used (since 4.1)
247
248 ecs-v4-response-bits-*
249 ^^^^^^^^^^^^^^^^^^^^^^
250 .. versionadded:: 4.2.0
251
252 number of responses received from authoritative servers with an IPv4 EDNS Client Subnet option we used, of this subnet size (1 to 32).
253
254 ecs-v6-response-bits-*
255 ^^^^^^^^^^^^^^^^^^^^^^
256 .. versionadded:: 4.2.0
257
258 number of responses received from authoritative servers with an IPv6 EDNS Client Subnet option we used, of this subnet size (1 to 128).
259
260 edns-ping-matches
261 ^^^^^^^^^^^^^^^^^
262 number of servers that sent a valid EDNS PING response
263
264 edns-ping-mismatches
265 ^^^^^^^^^^^^^^^^^^^^
266 number of servers that sent an invalid EDNS PING response
267
268 failed-host-entries
269 ^^^^^^^^^^^^^^^^^^^
270 number of servers that failed to resolve
271
272 ignored-packets
273 ^^^^^^^^^^^^^^^
274 counts the number of non-query packets received on server sockets that should only get query packets
275
276 ipv6-outqueries
277 ^^^^^^^^^^^^^^^
278 number of outgoing queries over IPv6
279
280 ipv6-questions
281 ^^^^^^^^^^^^^^
282 counts all end-user initiated queries with the RD bit set, received over IPv6 UDP
283
284 malloc-bytes
285 ^^^^^^^^^^^^
286 returns the number of bytes allocated by the process (broken, always returns 0)
287
288 max-cache-entries
289 ^^^^^^^^^^^^^^^^^
290 currently configured maximum number of cache entries
291
292 max-packetcache-entries
293 ^^^^^^^^^^^^^^^^^^^^^^^
294 currently configured maximum number of packet cache entries
295
296 max-mthread-stack
297 ^^^^^^^^^^^^^^^^^
298 maximum amount of thread stack ever used
299
300 negcache-entries
301 ^^^^^^^^^^^^^^^^
302 shows the number of entries in the negative answer cache
303
304 no-packet-error
305 ^^^^^^^^^^^^^^^
306 number of erroneous received packets
307
308 noedns-outqueries
309 ^^^^^^^^^^^^^^^^^
310 number of queries sent out without EDNS
311
312 noerror-answers
313 ^^^^^^^^^^^^^^^
314 counts the number of times it answered NOERROR since starting
315
316 noping-outqueries
317 ^^^^^^^^^^^^^^^^^
318 number of queries sent out without ENDS PING
319
320 nsset-invalidations
321 ^^^^^^^^^^^^^^^^^^^
322 number of times an nsset was dropped because it no longer worked
323
324 nsspeeds-entries
325 ^^^^^^^^^^^^^^^^
326 shows the number of entries in the NS speeds map
327
328 nxdomain-answers
329 ^^^^^^^^^^^^^^^^
330 counts the number of times it answered NXDOMAIN since starting
331
332 outgoing-timeouts
333 ^^^^^^^^^^^^^^^^^
334 counts the number of timeouts on outgoing UDP queries since starting
335
336 outgoing4-timeouts
337 ^^^^^^^^^^^^^^^^^^
338 counts the number of timeouts on outgoing UDP IPv4 queries since starting (since 4.0)
339
340 outgoing6-timeouts
341 ^^^^^^^^^^^^^^^^^^
342 counts the number of timeouts on outgoing UDP IPv6 queries since starting (since 4.0)
343
344 over-capacity-drops
345 ^^^^^^^^^^^^^^^^^^^
346 questions dropped because over maximum concurrent query limit (since 3.2)
347
348 packetcache-bytes
349 ^^^^^^^^^^^^^^^^^
350 size of the packet cache in bytes (since 3.3.1)
351
352 packetcache-entries
353 ^^^^^^^^^^^^^^^^^^^
354 size of packet cache (since 3.2)
355
356 packetcache-hits
357 ^^^^^^^^^^^^^^^^
358 packet cache hits (since 3.2)
359
360 packetcache-misses
361 ^^^^^^^^^^^^^^^^^^
362 packet cache misses (since 3.2)
363
364 policy-drops
365 ^^^^^^^^^^^^
366 packets dropped because of (Lua) policy decision
367
368 policy-result-noaction
369 ^^^^^^^^^^^^^^^^^^^^^^
370 packets that were not actioned upon by the RPZ/filter engine
371
372 policy-result-drop
373 ^^^^^^^^^^^^^^^^^^
374 packets that were dropped by the RPZ/filter engine
375
376 policy-result-nxdomain
377 ^^^^^^^^^^^^^^^^^^^^^^
378 packets that were replied to with NXDOMAIN by the RPZ/filter engine
379
380 policy-result-nodata
381 ^^^^^^^^^^^^^^^^^^^^
382 packets that were replied to with no data by the RPZ/filter engine
383
384 policy-result-truncate
385 ^^^^^^^^^^^^^^^^^^^^^^
386 packets that were forced to TCP by the RPZ/filter engine
387
388 policy-result-custom
389 ^^^^^^^^^^^^^^^^^^^^
390 packets that were sent a custom answer by the RPZ/filter engine
391
392 qa-latency
393 ^^^^^^^^^^
394 shows the current latency average, in microseconds, exponentially weighted over past 'latency-statistic-size' packets
395
396 query-pipe-full-drops
397 ^^^^^^^^^^^^^^^^^^^^^
398 .. versionadded:: 4.2
399
400 questions dropped because the query distribution pipe was full
401
402 questions
403 ^^^^^^^^^
404 counts all end-user initiated queries with the RD bit set
405
406 rebalanced-queries
407 ^^^^^^^^^^^^^^^^^^
408 .. versionadded:: 4.1.12
409
410 number of queries balanced to a different worker thread because the first selected one was above the target load configured with 'distribution-load-factor'
411
412 resource-limits
413 ^^^^^^^^^^^^^^^
414 counts number of queries that could not be performed because of resource limits
415
416 security-status
417 ^^^^^^^^^^^^^^^
418 security status based on :ref:`securitypolling`
419
420 server-parse-errors
421 ^^^^^^^^^^^^^^^^^^^
422 counts number of server replied packets that could not be parsed
423
424 servfail-answers
425 ^^^^^^^^^^^^^^^^
426 counts the number of times it answered SERVFAIL since starting
427
428 spoof-prevents
429 ^^^^^^^^^^^^^^
430 number of times PowerDNS considered itself spoofed, and dropped the data
431
432 sys-msec
433 ^^^^^^^^
434 number of CPU milliseconds spent in 'system' mode
435
436 tcp-client-overflow
437 ^^^^^^^^^^^^^^^^^^^
438 number of times an IP address was denied TCP access because it already had too many connections
439
440 tcp-clients
441 ^^^^^^^^^^^
442 counts the number of currently active TCP/IP clients
443
444 tcp-outqueries
445 ^^^^^^^^^^^^^^
446 counts the number of outgoing TCP queries since starting
447
448 tcp-questions
449 ^^^^^^^^^^^^^
450 counts all incoming TCP queries (since starting)
451
452 throttle-entries
453 ^^^^^^^^^^^^^^^^
454 shows the number of entries in the throttle map
455
456 throttled-out
457 ^^^^^^^^^^^^^
458 counts the number of throttled outgoing UDP queries since starting
459
460 throttled-outqueries
461 ^^^^^^^^^^^^^^^^^^^^
462 idem to throttled-out
463
464 too-old-drops
465 ^^^^^^^^^^^^^
466 questions dropped that were too old
467
468 truncated-drops
469 ^^^^^^^^^^^^^^^
470 .. versionadded:: 4.2
471
472 questions dropped because they were larger than 512 bytes
473
474 empty-queries
475 ^^^^^^^^^^^^^
476 .. versionadded:: 4.2
477
478 questions dropped because they had a QD count of 0
479
480 unauthorized-tcp
481 ^^^^^^^^^^^^^^^^
482 number of TCP questions denied because of allow-from restrictions
483
484 unauthorized-udp
485 ^^^^^^^^^^^^^^^^
486 number of UDP questions denied because of allow-from restrictions
487
488 unexpected-packets
489 ^^^^^^^^^^^^^^^^^^
490 number of answers from remote servers that were unexpected (might point to spoofing)
491
492 unreachables
493 ^^^^^^^^^^^^
494 number of times nameservers were unreachable since starting
495
496 uptime
497 ^^^^^^
498 number of seconds process has been running (since 3.1.5)
499
500 user-msec
501 ^^^^^^^^^
502 number of CPU milliseconds spent in 'user' mode
503
504 .. _stat-x-our-latency:
505
506 variable-responses
507 ^^^^^^^^^^^^^^^^^^
508 .. versionadded:: 4.2
509
510 Responses that were marked as 'variable'. This could be because of EDNS
511 Client Subnet or Lua rules that indicate this variable status (dependent on
512 time or who is asking, for example).
513
514 x-our-latency
515 ^^^^^^^^^^^^^
516 .. versionadded:: 4.1
517 Not yet proven to be reliable
518
519 PowerDNS measures per query how much time has been spent waiting on authoritative servers.
520 In addition, the Recursor measures the total amount of time needed to answer a question.
521 The difference between these two durations is a measure of how much time was spent within PowerDNS.
522 This metric is the average of that difference, in microseconds.
523
524 x-ourtime0-1
525 ^^^^^^^^^^^^
526 .. versionadded:: 4.1
527 Not yet proven to be reliable
528
529 Counts responses where between 0 and 1 milliseconds was spent within the Recursor.
530 See :ref:`stat-x-our-latency` for further details.
531
532 x-ourtime1-2
533 ^^^^^^^^^^^^
534 .. versionadded:: 4.1
535 Not yet proven to be reliable
536
537 Counts responses where between 1 and 2 milliseconds was spent within the Recursor.
538 See :ref:`stat-x-our-latency` for further details.
539
540 x-ourtime2-4
541 ^^^^^^^^^^^^
542 .. versionadded:: 4.1
543 Not yet proven to be reliable
544
545 Counts responses where between 2 and 4 milliseconds was spent within the Recursor. Since 4.1.
546 See :ref:`stat-x-our-latency` for further details.
547
548 x-ourtime4-8
549 ^^^^^^^^^^^^
550 .. versionadded:: 4.1
551 Not yet proven to be reliable
552
553 Counts responses where between 4 and 8 milliseconds was spent within the Recursor.
554 See :ref:`stat-x-our-latency` for further details.
555
556 x-ourtime8-16
557 ^^^^^^^^^^^^^
558 .. versionadded:: 4.1
559 Not yet proven to be reliable
560
561 Counts responses where between 8 and 16 milliseconds was spent within the Recursor.
562 See :ref:`stat-x-our-latency` for further details.
563
564 x-ourtime16-32
565 ^^^^^^^^^^^^^^
566 .. versionadded:: 4.1
567 Not yet proven to be reliable
568
569 Counts responses where between 16 and 32 milliseconds was spent within the Recursor.
570 See :ref:`stat-x-our-latency` for further details.
571
572 x-ourtime-slow
573 ^^^^^^^^^^^^^^
574 .. versionadded:: 4.1
575 Not yet proven to be reliable
576
577 Counts responses where more than 32 milliseconds was spent within the Recursor.
578 See :ref:`stat-x-our-latency` for further details.