]> git.ipfire.org Git - thirdparty/pdns.git/blame - CODING_GUIDELINES.md
Coding Guidelines: Properly capitalize section names
[thirdparty/pdns.git] / CODING_GUIDELINES.md
CommitLineData
1caeb33c 1Coding Guidelines for Contributing to PowerDNS
8f7dc5e9
RG
2----------------------------------------------
3
b638b4d2 4Thank you for you interest in contributing to the PowerDNS project.
0044dbdb
PD
5This document describes the general coding guidelines to keep in mind when contributing code to our code base.
6It does assume that you have already read the contributing document at [CONTRIBUTING.md](https://github.com/PowerDNS/pdns/blob/master/CONTRIBUTING.md).
8f7dc5e9 7
1caeb33c 8# High-level Guidelines
8f7dc5e9
RG
9
10* Although the codebase does not consistently have them, [docblock](https://www.doxygen.nl/manual/docblocks.html)s on functions and classes are appreciated.
11* Never hesitate to write comments on anything that might not be immediately clear just from reading the code.
0044dbdb
PD
12* When adding whole new things, consider putting them in a `pdns::X` namespace.
13 Look for `namespace pdns` in the codebase for examples.
8f7dc5e9 14
1caeb33c 15# Memory Handling
8f7dc5e9 16
0044dbdb 17The memory model in C++, inherited from the C era, is very powerful but also very error-prone.
b638b4d2 18Several features are available in modern (C++11 and up) C++ to make it possible to avoid most of the pitfalls, while conserving the same level of performance.
8f7dc5e9 19
b638b4d2 20Most of the issues related to memory allocation (memory leaks, use-after-free) can be solved by using standard containers, or taking advantage of RAII and smart pointers, which take care of destructing the object when it is not used anymore.
8f7dc5e9 21
1caeb33c 22## Stack-based Memory Allocation
8f7dc5e9 23
b638b4d2 24Default allocations, when declaring a variable local to a function for example, are done on the stack instead of doing a dynamic allocation on the heap.
0044dbdb 25Allocating objects on the stack is faster, especially in threaded programs, and provides the benefit that objects are automatically destroyed when the function is exited.
8f7dc5e9 26
b638b4d2
PD
27One caveat is that the programmer needs to be wary of is the size of the object in order not to exceed the space available on the stack, which would corrupt other objects in memory and could lead to a crash, or even execution of arbitrary code.
28This is especially true in the Recursor which uses a custom stack-switching in user-space mechanism and thus has a reduced stack size.
8f7dc5e9 29
1caeb33c 30### Variable-Length Arrays (VLAs)
8f7dc5e9 31
b638b4d2 32In order to avoid smashing the stack, special care should be taken to limit the depth of function calls that, for example, can grow quickly with recursion.
0044dbdb 33A second common source of smash stacking is the use of Variable-Length Arrays, whose size is determined at runtime and is therefore very hard to predict.
b638b4d2
PD
34The C++ language does not support VLAs but a lot of compilers inherit such a support from C99, so it is possible to use them by accident.
35PowerDNS strictly forbids the use of VLAs, as does the Linux kernel, and enforces that with the `-Werror=vla` compiler flag.
8f7dc5e9 36
1caeb33c 37### C-style Arrays
8f7dc5e9
RG
38
39While you might still find some in the existing code base, we are actively trying to get rid of C-style arrays like this one:
40
41```C++
42somestruct buffer[12];
43auto bufferSize = sizeof(buffer) / sizeof(*buffer);
44auto& firstElement = buffer[0];
45```
46
0044dbdb 47It is immediately obvious that computing the actual number of elements is error-prone, as `sizeof()` does not return the number of elements but the total memory space used by the array.
b638b4d2 48Another obvious issue is that accesses to the array are not bound-checked.
0044dbdb 49These are not the only drawbacks of these arrays, but are bad enough already to justify getting rid of them.
8f7dc5e9
RG
50
51The modern C++ way is to use `std::array`s:
52
53```C++
54std::array<somestruct, 12> buffer;
55auto bufferSize = buffer.size();
56auto& firstElement = buffer.at(0);
57```
58
1caeb33c 59### `alloca`
8f7dc5e9
RG
60
61The use of `alloca()` is forbidden in the code base as it is much too easy to smash the stack.
62
63## RAII
64
b638b4d2 65Resource acquisition is initialization ([RAII](https://en.cppreference.com/w/cpp/language/raii)) is one of the fundamental concepts in C++.
0044dbdb 66Resources are allocated during the construction of an object and destroyed when the object is itself destructed.
b638b4d2 67It means that if an object is correctly designed, the resources associated with it can not survive its lifetime.
0044dbdb 68Since objects that are allocated on the stack (local variables in a function, for example) are automatically destroyed when a function is exited, be it by reaching the last line, calling return or throwing an exception, it makes it possible to ensure that resources are always properly destroyed by wrapping them into an object.
8f7dc5e9 69
b638b4d2 70We describe the use of smart pointers, containers and other wrappers for that purpose below, but first a few words of caution.
0044dbdb
PD
71Resources stored in a object are only tied to this object if the constructor finished properly.
72If an exception is raised in the constructor body, the object is not created and therefore the destructor will not get called.
73This means that if the object has non-object members holding resources, like naked file descriptors or naked pointers, they need to be explicitly released before raising the exception, otherwise they are lost.
8f7dc5e9
RG
74
75```C++
b638b4d2 76class BadFileDescriptorWrapper
8f7dc5e9 77{
b638b4d2 78 BadFileDescriptorWrapper()
8f7dc5e9
RG
79 {
80 d_fd = open(...);
81 if (something) {
82 throw std::runtime_error(...);
83 }
84 ...
85 }
b638b4d2 86 ~BadFileDescriptorWrapper()
8f7dc5e9
RG
87 {
88 if (d_fd > 0) {
89 close(d_fd);
90 d_fd = -1;
91 }
92 }
93 int getHandle() const
94 {
95 return d_fd;
96 }
97private:
98 int d_fd{-1};
99};
100```
101
102The use of smart pointers can be a solution to most resources leakage, but otherwise the only way is to be careful about exceptions in constructors:
103
104```C++
b638b4d2 105BadFileDescriptorWrapper()
8f7dc5e9
RG
106{
107 d_fd = open(...);
108 if (something) {
109 close(d_fd);
110 throw std::runtime_error(...);
111 }
112 ...
113}
114```
115
1caeb33c 116## Smart Pointers
8f7dc5e9 117
0044dbdb 118There is almost no good reason not to use a smart pointer when doing dynamic memory allocation.
b638b4d2 119Smart pointers will keep track of whether the dynamically allocated object is still used, and destroy it when the last user goes away.
8f7dc5e9 120
b638b4d2 121Using naked pointers quickly results in security issues, ranging from memory leaks to arbitrary code execution.
0044dbdb 122Examples of such issues can be found in the following PowerDNS security advisories:
8f7dc5e9 123
b638b4d2
PD
124* [2017-07: Memory leak in DNSSEC parsing](https://docs.powerdns.com/recursor/security-advisories/powerdns-advisory-2017-07.html)
125* [2018-04: Crafted answer can cause a denial of service](https://docs.powerdns.com/recursor/security-advisories/powerdns-advisory-2018-04.html)
8f7dc5e9 126
b638b4d2 127Most allocations should be wrapped in a `std::unique_ptr`, using `make_unique`.
0044dbdb 128There can only be one owner at a given time, as opposed to shared pointers, but the ownership can be passed along using `std::move()` if needed.
8f7dc5e9 129
b638b4d2 130If the dynamically allocated object needs to be referenced in several places, the use of a `std::shared_ptr` is advised instead, via `std::make_shared`.
8f7dc5e9 131
b638b4d2 132The use of the `make_*` methods has three advantages:
8f7dc5e9 133
b638b4d2 134* they result in a single allocation for `shared_ptr`s, instead of two otherwise ;
8f7dc5e9
RG
135* they avoid duplicating the type name twice ;
136* they prevent a possible issue if an exception is raised with temporaries.
137
138They also make is easier to spot naked pointers by looking for "new" and "delete" throughout the code :)
139
0044dbdb 140Please note however that while unique pointers are as cheap as naked pointers, shared pointers are much more expensive.
b638b4d2 141That is because they need to use atomic operations to update their internal counters, so making a copy of a shared pointer is expensive.
0044dbdb 142Passing one by reference is cheap, however.
8f7dc5e9 143
1caeb33c 144### Shared Pointers
8f7dc5e9 145
0044dbdb
PD
146An important thing to be aware of with shared pointers is that taking a new copy of a shared pointer or releasing, thus updating its internal reference counter, is atomic and therefore thread-safe.
147Altering the content of the object pointed to is not, though, and is subject to the usual locking methods.
148The often misunderstood part is that updating the target of the shared pointer is not thread-safe.
149Basically, you can copy the shared pointer from multiple threads at once, and then each thread can assign a new target to its own copy safely, like that:
8f7dc5e9
RG
150
151```C++
152auto ptr = std::make_shared<int>(4);
153for (auto idx = 0; idx < 10 ; idx++){
154 std::thread([ptr]{ auto copy = ptr; }).detach(); //ok, only mutates the control block
155}
156```
157
b638b4d2 158But there is a race if one thread updates the exact same smart pointer that another thread is trying to read:
8f7dc5e9
RG
159
160```c++
161auto ptr = std::make_shared<int>(4);
162
163std::thread threadA([&ptr]{
164 ptr = std::make_shared<int>(10);
165});
166std::thread threadB([&ptr]{
167 ptr = std::make_shared<int>(20);
168});
169```
170
0044dbdb
PD
171That unfortunately means that we still need some locking with shared pointers.
172C++11 defines atomic compare/exchange operations for `std::shared_ptr`, but they are implemented in `libstdc++` by global mutexes and are therefore not lock-free.
8f7dc5e9 173
1caeb33c 174### Wrapping C Pointers
8f7dc5e9
RG
175
176Smart pointers can also be used to wrap C-pointers, such as `FILE*` pointers:
177
178```c++
179auto fp = std::unique_ptr<FILE, int(*)(FILE*)>(fopen(certificateFile.c_str(), "r"), fclose);
180```
181
182It also works with types from external C libraries, like OpenSSL:
183
184```c++
185auto cert = std::unique_ptr<X509, decltype(&X509_free)>(PEM_read_X509_AUX(fp.get(), nullptr, nullptr, nullptr), X509_free);
186```
187
0044dbdb 188Unfortunately there are a few cases where smart pointers cannot be used.
b638b4d2 189In the PowerDNS products, these cases have been mostly reduced to a few select classes, like the `pdns::channel` ones, that are used to pass pointers to a different thread by writing them to a pipe, as is done for example by the query distributors of the auth and the rec.
8f7dc5e9 190
b638b4d2 191When smart pointers cannot be used, special care should be taken to:
8f7dc5e9
RG
192
193* make sure that every exit point frees the allocated memory (early return, goto, exceptions..) ;
194* set the pointer to `nullptr` right after the deallocation, so we can't use it again (use-after-free) ;
b638b4d2 195* do not mix `malloc` with `delete`, or `new` with `free` (destructors are not run, at the very least) ;
8f7dc5e9
RG
196* do not mix array allocations (`new[]`) with a non-array `delete` (vs `delete[]`).
197
1caeb33c 198## Pointer Arithmetic
8f7dc5e9 199
0044dbdb
PD
200It is very common to use pointer arithmetic to calculate a position in a buffer, or to test whether a given offset is outside of a given buffer.
201Unfortunately it is quite easy to trigger undefined behaviour when doing so, as the C++ standard does not allow pointer arithmetic pointing inside an object, except for arrays where it is also permitted to point one element past the end.
202Still that undefined behaviour is mostly harmless, but it might lead to real issue on some platforms.
8f7dc5e9 203
b638b4d2 204One such example occurred in dnsdist: [2017-01: Crafted backend responses can cause a denial of service](https://dnsdist.org/security-advisories/powerdns-advisory-for-dnsdist-2017-01.html)
8f7dc5e9 205
b638b4d2 206In that case, a pointer was set to the start of a buffer plus a given length, to see whether the result would go past another pointer that was set to the end of the buffer.
0044dbdb
PD
207Unfortunately, if the start of the buffer is at a very high virtual address, the result of the addition might overflow and wrap around, causing the check to become true and leading to either a crash or the reading of unrelated memory.
208While very unlikely on a 64 bits platform, it could happen on 16 or 32 bits platform.
8f7dc5e9 209
b638b4d2 210This kind of issue is best avoided by the use of containers to avoid the need for pointer arithmetic, or by being very careful to only add checked offsets to a pointer.
8f7dc5e9
RG
211
212### Containers
213
214The use of containers like `vector`, `map` or `set` has several advantages in term of security:
215
216* memory allocations are handled by the container itself ;
217* it prevents a disconnect between the actual size and the variable tracking that size ;
218* it provides safe (and fast) operations like comparisons, iterators, etc..
219
b638b4d2 220One issue that could have been prevented by the use of a container can be found in the following advisory: [2018-09: Crafted query can cause a denial of service](https://docs.powerdns.com/recursor/security-advisories/powerdns-advisory-2018-09.html)
8f7dc5e9 221
0044dbdb 222The use of a container and its corresponding `at()` operator would have prevented an out-of-bounds read since calling `at()` on an invalid offset results in an exception being raised.
b638b4d2 223The cost of using `at()` is negligible for most use cases, and can be avoided by using the `[]` operator in the rare case when the cost cannot be afforded.
0044dbdb 224Note that several Linux distributions now build with `-Wp,-D_GLIBCXX_ASSERTIONS` enabled by default, which turns on cheap range checks for C++ arrays, vectors, and strings anyway.
8f7dc5e9 225
0044dbdb
PD
226Regarding performance, it is advised to `reserve()` the needed size in advance when a rough estimate is known to avoid reallocations and copies.
227Resizing in advance is not advised, though, as it makes it harder to know what is exactly in the container in case of early returns or exceptions.
8f7dc5e9 228
b638b4d2 229In C++11, move operators make it possible to cheaply get the contents of a container into a different variable if needed.
8f7dc5e9 230
0044dbdb
PD
231The need to pass a subset of a container without copying it often leads to passing a pointer to an array of chars along with a size.
232Introduced in C++14 but already available in PowerDNS via boost (see views.hh), views provides a nice way to borrow the content of a container to pass it to a function, without any copy or dynamic memory allocation.
8f7dc5e9
RG
233
234The basic `string_view` class provides that feature for a container of chars, but the same feature exists for other types, like `uint8_t`.
235
1caeb33c 236# Threads and Concurrency
8f7dc5e9 237
0044dbdb 238All of our products use threading to be able to take advantage of the increasing number of cores on modern CPUs.
b638b4d2 239This inevitably leads to the question of how to synchronise data accesses between threads.
0044dbdb
PD
240Most objects, like containers, cannot be accessed from more than one thread at once.
241Even `const` methods on containers might not be thread-safe.
242For example getting the `size()` of a container might not be thread-safe if a different thread might be writing to the container.
243Some functions might also not be thread-safe, for example if they have a static non-const variable.
8f7dc5e9 244
0044dbdb 245We currently use three solutions, depending on the use-case.
b638b4d2 246The first one is used when we only need to share some kind of counter or gauge, and involves the use of `std::atomic` which allows atomic operations to be performed from different threads without locking.
0044dbdb 247More on that later.
b638b4d2
PD
248The second one is the "share nothing" approach, where each thread has its own data (using `thread_local`, for example), avoiding the need for data synchronization.
249When a thread needs to communicate with another one, it might use a `pdns::channel` to pass a pointer to that second thread.
0044dbdb 250That works quite well but sometimes sharing data is much more efficient than the alternative.
8f7dc5e9
RG
251
252For these cases, we use the classic locking approach, using either a simple mutex or read-write lock, depending on the use case.
253
b638b4d2 254## Locks
8f7dc5e9 255
b638b4d2 256Locks allow a thread of execution to ensure that no other thread will try to access the code path or data they protect at the same time.
8f7dc5e9 257
b638b4d2 258There are a few pitfalls to avoid when using locks:
8f7dc5e9 259
b638b4d2 260* failing to release the lock, which can be avoided by wrappers like `std::lock_guard`, `std::unique_lock` and our own wrappers: look for `LockGuarded`, `SharedLockGuarded` in lock.hh ;
0044dbdb 261* high contention, where threads are blocked for a long time while waiting to acquire a lock.
b638b4d2 262 This can be solved by carefully examining the portion of code that really needs to hold the lock, making the critical path faster, or by using sharding which basically divides the data protected by the lock into several blocks, each one of them protected by its own lock ;
0044dbdb
PD
263* starvation, which occurs for example when thread 1 acquires lock 1 and wants to acquire lock 2, which is already owned by thread 2, itself currently waiting to acquire lock 1.
264 This can be avoided by a better design of the locking mechanism, and assuring that locks are always acquired in the same order if more than one lock is needed.
8f7dc5e9 265
b638b4d2 266There are several types of locks:
8f7dc5e9 267
b638b4d2 268* spinlocks are very fast but are busy-waiting, meaning that they do not pause, but instead repetitively try to get hold of the lock, using 100% of one core doing so unless preempted by the OS.
0044dbdb
PD
269 So they are only suited for locks that are almost never contented ;
270* a mutex is a very simple lock.
b638b4d2 271 In most implementations it is a very fast lock, implemented in user-space on recent Linux kernels and glibc ;
0044dbdb 272* a read-write lock allows several threads to acquire it in read mode, but only one thread can acquire it in write mode.
b638b4d2 273 This is suited when most accesses are read-only and writes are rare and do not take too long.
0044dbdb 274 Otherwise a mutex might actually be faster ;
8f7dc5e9 275
b638b4d2 276One quick word about condition variables, that allow a thread to notify one or more threads waiting for a condition to happen.
0044dbdb
PD
277A thread should acquire a mutex using a `std::unique_lock` and call the `wait()` method of the condition variable.
278This is a very useful mechanism but one must be careful about two things:
8f7dc5e9 279
0044dbdb
PD
280* the producer thread can either wake only one thread or all threads waiting on the condition variable.
281 Waking up several threads if only one has something to do (known as a "thundering herd") is bad practice, but there are some cases where it makes sense ;
8f7dc5e9
RG
282* a consumer might be waken up spuriously, which can be avoided by passing a predicate (which can be as simple as a small lambda function) to `wait()`.
283
b638b4d2 284Our wrappers, `LockGuarded`, `SharedLockGuarded` in `lock.hh`, should always be preferred over other solutions.
0044dbdb 285They provide a way to wrap any data structure as protected by a lock (mutex or shared mutex), while making it immediately clear which data is protected by that lock, and preventing any access to the data without holding the lock.
8f7dc5e9
RG
286
287For example, to protect a set of integers with a simple mutex:
288
289```c++
290LockGuarded<std::set<int>> d_data;
291```
292
293or with a shared mutex instead:
294
295```c+++
296SharedLockGuarded<std::set<int>> d_data;
297```
298
299Then the only way to access the data is to call the `lock()`, `read_only_lock()` or `try_lock()` methods for the simple case, or the `read_lock()`, `write_lock()`, `try_read_lock()` or `try_write_lock()` for the shared one.
0044dbdb
PD
300Doing so will return a "holder" object, which provides access to the protected data, checking that the lock has really been acquired if needed (`try_` cases).
301The data might be read-only if `read_lock()`, `try_read_lock()` or `read_only_lock()` was called.
302Access is provided by dereferencing the holder object via `*` or `->`, allowing a quick-access syntax:
8f7dc5e9
RG
303
304```c+++
305return d_data.lock()->size();
306```
307
308Or when the lock needs to be kept for a bit longer:
309
310```c++
311{
312 auto data = d_data.lock();
313 data->clear();
314 data->insert(42);
315}
316```
317
1caeb33c 318## Atomics
8f7dc5e9 319
0044dbdb 320`std::atomic` provides a nice way to share a counter or gauge between threads without the need for locking.
b638b4d2 321This is done by implementing operations like reading, increasing, decreasing or writing a value in an atomic way, using memory barriers, making sure that the value cannot be updated from a different core during the operation.
0044dbdb
PD
322The default mode uses a sequentially consistent ordering memory model, which is quite expensive since it requires a full memory fence on all multi-core systems.
323A relaxed model can be used for certain very specific operations, but the default model has the advantage of being safe in all situations.
8f7dc5e9 324
1caeb33c 325## Per-Thread Counters
8f7dc5e9 326
b638b4d2 327For generic per-thread counters, we have a class in `tcounters.hh` that should provide better performances by allowing each thread to independently update its own counter, the costly operation only happening when the counter needs to be read by one thread gathering metrics from all threads.
8f7dc5e9 328
1caeb33c 329# Dealing with Untrusted Data
8f7dc5e9 330
0044dbdb 331As a rule of thumb, any data received from outside the process should be considered as untrusted.
b638b4d2 332That means data received on a socket, loaded from a file, retrieved from a database, etc.
0044dbdb 333Data received from an internal pipe might be excluded from that rule.
8f7dc5e9 334
0044dbdb
PD
335Untrusted data should never be trusted to adhere to the expected format or specifications, and a strict checking of boundaries should be performed.
336It means for example that, after reading the length for a field inside the data, whether that length does not exceed the total length of the data should be checked.
337In the same way, if we expect a numerical type we should check whether it matches one we expect and understand.
8f7dc5e9 338
0044dbdb
PD
339Anything unexpected should stop the processing and lead to the discarding of the complete data set.
340If a smaller data set can be safely discarded, and it is more important to load an incomplete set than to assure the integrity of the complete data set, only the faulty data set can be discarded instead.
8f7dc5e9 341
1caeb33c 342## Alignment Issues
0044dbdb
PD
343
344When structured, binary data is received from the network or read from a file, it might be tempting to map it to an existing structure directly to make the parsing easier.
345But one must be careful about alignment issues on some architectures:
8f7dc5e9
RG
346
347```c++
348struct my_struct {
349 uint32_t foo;
350 uint32_t bar;
351};
352```
353
354It might be tempting to directly cast the received data:
355
356```c++
357void func(char* data, size_t offset, size_t length) {
358 // bounds check left out!
359 const struct my_struct* tmp = reinterpret_cast<const struct my_struct*>(data + offset);
360 ...
361}
362```
363
0044dbdb
PD
364Unfortunately this leads to undefined behaviour because the offset might not be aligned with the alignment requirement of the struct.
365One solution is to do a copy:
8f7dc5e9
RG
366
367```c++
368void func(char* data, size_t offset, size_t length) {
369 // bounds check left out!
370 struct my_struct tmp;
371 memcpy(&tmp, data + offset, sizeof(tmp));
372 /* ... */
373}
374```
375
1caeb33c 376## Signed vs. Unsigned
8f7dc5e9 377
b638b4d2
PD
378Signed integers might overflow, and the resulting value is unpredictable, as this overflow is undefined behaviour.
379That means that this code results in an unpredictable value:
8f7dc5e9
RG
380
381```c++
382int8_t a = std::numeric_limits<int8_t>::max();
383a++;
384```
385
b638b4d2 386One such example led to [2006-01: Malformed TCP queries can lead to a buffer overflow which might be exploitable](https://docs.powerdns.com/recursor/security-advisories/powerdns-advisory-2006-01.html).
8f7dc5e9 387
b638b4d2 388It would be necessary to check that the value cannot overflow first.
0044dbdb
PD
389Another possibility would be to instruct the compiler to treat signed overflow as it does for unsigned values, by wrapping.
390This can be done with `-fwrapv` with g++.
8f7dc5e9 391
0044dbdb
PD
392An operation on an unsigned integer will never result in an overflow, because the value will simply wrap around.
393This might still result in an unexpected value, possibly bypassing a critical check:
8f7dc5e9
RG
394
395```c++
396void parse_untrusted_data(uint8_t* data, uint16_t length)
397{
398 /* parse a record, first two bytes are the size of the record data, second two bytes are the type of the record */
399 if (length < 4) {
400 return;
401 }
402 /* read the first two bytes which hold the length of the next record */
403 uint16_t recordLen = data[0] * 256 + data[1];
404 /* let's assume that recordLen is equal to 65535 */
405 uint16_t totalRecordLen = /* size of the type */ sizeof(uint16_t) + recordLen; // <-- this results in a wrapped value of 65535 + 2 = 65537 = 1
406 if (totalRecordLen > length) {
407 return;
408 }
409 /* ... */
410}
411```
412
413A valid version to prevent the overflow:
414
415```c++
416void parse_untrusted_data(uint8_t* data, uint16_t length)
417{
418 /* parse a record, first two bytes are the size of the record data, second two bytes are the type of the record */
419 if (length < 4) {
420 return;
421 }
422 /* read the first two bytes which hold the length of the next record */
423 uint16_t recordLen = data[0] * 256 + data[1];
424 if (recordLen > length || (length - recordLen) < sizeof(uint16_t)) {
425 return;
426 }
427 /* ... */
428}
429```
430
431Converting from unsigned to signed will lose the high order bytes, and should be avoided, or the value should be checked before-hand:
432
433```c++
434uint64_t u = std::numeric_limits<uint64_t>::max();
435int64_t s = static_cast<int64_t>(u); /* Wrong, and the cast eliminates any warning */
436if (u <= std::numeric_limit<int64_t>::max()) {
437 int64_t s = static_cast<int64_t>(u); /* OK */
438}
439```
440
441The `pdns::checked_conv()` function can be used, ensuring that the conversion can safely be done and raising an exception otherwise.
442
443`-Wsign-conversion` can be used to warn about dangerous conversions (disabled by default in g++, and note that a cast disables the warning).
444
1caeb33c 445## Fuzzing
8f7dc5e9 446
0044dbdb 447Fuzzing is a very useful way to test a piece of code that parses untrusted data.
b638b4d2 448Efficient fuzzers are often doing coverage-based fuzzing, where the code that they test has been compiled in a special way to allow the fuzzer to detect which branches are executed and which are not, so that the fuzzer can see the effect of mutating specific bytes of the input on the code path.
8f7dc5e9 449
b638b4d2 450PowerDNS has a few fuzzing targets that can be used with libFuzzer or AFL in the `pdns/` directory, and are built when `--enable-fuzzing-target` is passed to `configure`.
0044dbdb 451More information can be found in the [fuzzing/README.md](https://github.com/PowerDNS/pdns/blob/master/fuzzing/README.md) file.
8f7dc5e9
RG
452The existing fuzzing targets are run on the OSS-Fuzz infrastructure for a short time every time a pull request is opened, and for a longer time on the HEAD of the repository.
453
1caeb33c 454# Other Potential Issues
8f7dc5e9
RG
455
456## TOCTOU
457
b638b4d2 458The time-of-check to time-of-use vulnerability is a very easy mistake to make when dealing with files or directories.
0044dbdb
PD
459The gist of it is that there is a small race condition between the time where a program might check the ownership, permissions or even existence of a file and the time it will actually do something with it.
460This time might be enough to allow an attacker to create a symbolic link to a critical file at the place of that exact file, for example.
461Since the program has enough rights to edit this file, this might allow an attacker to trick the program into writing into a completely different file.
8f7dc5e9
RG
462
463This is hard to avoid in all cases, but some mitigations do help:
464
465* opening a file first (handling errors if that fails) then getting the needed metadata via the file descriptor instead of the path (`fstat`, `fchmod`, `fchown`) ;
466* opening with the `O_NOFOLLOW` flag set, so that the operation will fail if the target exists and is a symbolic link ;
467* always creating temporary files via the `mkstemp()` function, which guarantees that the file did not exist before and has been created with the right permissions ;
468* using operations that are guaranteed to be atomic, like renaming a file on the same filesystem (for example in the same directory).
469
470## Secrets
471
b638b4d2
PD
472Try very hard not to load sensitive information into memory.
473And of course do not write to disk!
8f7dc5e9
RG
474
475If you have to:
476
b638b4d2
PD
477* use an object that can't be copied, by deleting the copy constructors and assignments operators,
478* try to lock the memory so it cannot be swapped out to disk, or included in a core dump, via `sodium_malloc()` or `sodium_mlock()`, for example ;
479* wipe the content before releasing the memory, so it will not linger around.
480 Do note that `memset()` is very often optimized out by the compiler, so function like `sodium_munlock()`, `explicit_bzero()` or `explicit_memset()` should be used instead.
8f7dc5e9 481
1caeb33c 482### Constant-Time Comparison
8f7dc5e9 483
b638b4d2
PD
484Don't compare secret against data using a naive string comparison, as the timing of the operation will leak information about the content of the secret.
485Ideally, a constant-time comparison should be used instead (see `constantTimeStringEquals()` in the PowerDNS code base) but it is not always easy to achieve.
0044dbdb 486One option might be to compute a HMAC of the secret using a key randomly generated at startup, and compare it against a HMAC of the supplied data computed with the same key.
8f7dc5e9 487
1caeb33c 488## Virtual Destructors
8f7dc5e9 489
0044dbdb
PD
490Any class that is expected to be sub-classed should provide a virtual destructor.
491Not doing so will prevent the destructor of a derived class from being called if the object is held as the base type:
8f7dc5e9
RG
492
493```c++
494class Parent
495{
496 virtual void doVirtualCall();
497};
498
499class Child: public Parent
500{
501 Child()
502 {
503 d_fd = fopen(..);
504 }
505 ~Child()
506 {
507 if (d_fd) {
508 fclose(d_fd);
509 f_fd = nullptr;
510 }
511 }
512 void doVirtualCall() override;
513};
514
515std::vector<Parent> myObjects;
516myObjects.push_back(Child());
517```
518
b638b4d2 519Note that defining a destructor will prevent the automatic creation of move operators for that class, since they are generated only if these conditions are met:
8f7dc5e9 520
b638b4d2
PD
521* no copy operators are declared ;
522* no move operators are declared ;
8f7dc5e9
RG
523* no destructor is declared.
524
b638b4d2 525If the Parent class holds data that is costly to copy, it might make sense to declare the move operators explicitly:
8f7dc5e9
RG
526
527```c++
528class Parent
529{
530 Parent(Parent&&) = default;
531 Parent& operator=(Parent&&) = default;
532
533 virtual ~Parent()
534 {
535 }
536
537 virtual void doVirtualCall();
538private:
539 FILE* d_fd{nullptr};
540};
541```
542
b638b4d2 543Note that declaring the move operators disables the copy operators, so if they are still needed:
8f7dc5e9
RG
544
545```c++
546class Parent
547{
548 Parent(Parent&&) = default;
549 Parent& operator=(Parent&&) = default;
550
551 Parent(const Parent&) = default;
552 Parent& operator=(const Parent&) = default;
553
554 virtual ~Parent()
555 {
556 }
557
558 virtual void doVirtualCall();
559};
560```
561
0044dbdb 562On a related topic, virtual methods should not be called from constructors or destructors.
b638b4d2 563While this is allowed under certain restrictions, it is very hard to know exactly which method (base or derived) will be called, and whether all sub-objects contained in the class would have been correctly constructed at that point.
8f7dc5e9 564
1caeb33c 565## Hash Collisions
8f7dc5e9 566
0044dbdb
PD
567Hashes are a very useful tool, used in `unordered_map` and `unordered_set` among others.
568They are also used in our caches.
569An important caveat that developers need to be aware about regarding hashes are that the probability of a collision is often a lot higher than expected.
b638b4d2 570This is well-known as the birthday paradox, the fact that the probability of having two entries colliding is a lot higher than the probability of finding a collision for a specific entry.
0044dbdb 571This means that it is important to verify that the entries are actually identical, and just not that they hash to the same value.
8f7dc5e9
RG
572
573This is especially important when hashing attacker-controlled values, as they can be specially crafted to trigger collisions to cause:
574
b638b4d2
PD
575* cache pollution (see [2018-06: Packet cache pollution via crafted query](https://docs.powerdns.com/recursor/security-advisories/powerdns-advisory-2018-06.html)) ;
576* denial of service via hash table flooding (in a map, all entries that hash to the same value are often placed into a linked-list, making it possible to cause a linear scan of entries by making all of them hash to that same value).
8f7dc5e9 577
0044dbdb
PD
578The first issue can be prevented by comparing the entries and not just the value they hash to.
579The second one can be used by using some sort of secret when computing the hash so that the result cannot be guessed by the attacker.
580That can be achieved by using an unpredictable seed for certain hash algorithms, or a secret for some other like `SipHash`.
8f7dc5e9 581
1caeb33c 582# Readability Tips
8f7dc5e9
RG
583
584Some of these tips are actually enforced by `clang-tidy` nowadays, but it is still useful to keep them in mind.
585
1caeb33c 586## `auto`
8f7dc5e9 587
b638b4d2 588C++11 introduced automatic type deduction, using the `auto` keyword.
0044dbdb
PD
589In addition to saving the typing of a few more letters, using automatic type deduction prevents nasty surprises if the variable is initialized from another one, or from a function, and the other type is changed to a different one.
590The code might still compile while now involving a copy or worse.
591
1caeb33c 592## Boolean Expressions
8f7dc5e9 593
1caeb33c 594## Explicit Comparisons
8f7dc5e9 595
b638b4d2 596* compare numerical values with `== 0` or `!= 0` explicitly ;
8f7dc5e9
RG
597* compare to `false` explicitly, which is easier to read ;
598* compare to `nullptr` for the same reason.
599
600## Initialization
601
602Use braced initialization for members as often as possible:
603
604* it does forbid narrowing conversions
b638b4d2 605* and avoids C++'s "[most vexing parse](https://en.wikipedia.org/wiki/Most_vexing_parse)" which is to declare a function instead of calling the default constructor:
8f7dc5e9
RG
606
607```c++
608Object a(); // declares a function named a that returns an object
609```
610
1caeb33c 611## `nullptr`
8f7dc5e9 612
0044dbdb 613When representing a pointer, using `nullptr` makes it immediately obvious that we are dealing with a pointer, as opposed to the use of `0`.
b638b4d2 614It also cannot be silently taken as an integer, which can happens with `0` but also with `NULL`.
8f7dc5e9 615
1caeb33c 616## `const`-ness
8f7dc5e9 617
0044dbdb 618* Mark parameters and variables that should not be modified as `const`.
b638b4d2 619 This is especially important for references and pointers that comes from outside the function, but it also makes sense to do it for local variables or parameters passed by value because it might help detect a logic error later.
8f7dc5e9
RG
620* Mark const methods as such (and make them thread-safe)
621* Prefer using `at()` on containers so that no insertion can take place by mistake, and to get bounds checking.
622
1caeb33c 623## `static`
8f7dc5e9
RG
624
625Functions that are only used inside a single file should be marked as `static`, so that:
626
627* the compiler knows that these functions will not be called from a different compilation unit and thus that no symbol needs to be generated, making it more likely for the function to be inlined ;
628* the reader knows that this function is only used there and can be altered without causing an issue somewhere else.
629
630For the same reason, global variables that are only accessed from a single file should be marked static as well.
631
632## Variables
633
0044dbdb
PD
634Try to declare variables in the innermost scope possible and avoid uninitialized variables as much as possible.
635Declare and initialize them when the values needed to initialize them are available.
8f7dc5e9
RG
636
637## Exceptions
638
b638b4d2 639Should be reserved for unexpected events (corrupted data, timeouts, ...) and should not be triggered in normal processing.
8f7dc5e9 640
b638b4d2 641Do not be afraid of using them, though, as the cost of an exception that is not thrown is usually very small, thanks to the zero-cost exception model.
0044dbdb 642It might still force the compiler to refrain from some optimizations, so it might make sense to avoid them in some very performance-sensitive, narrow code paths.
8f7dc5e9 643
1caeb33c 644### Custom Exceptions
8f7dc5e9
RG
645
646Exceptions defined by the standards should be used whenever possible, as they already cover a lot of use cases.
647
0044dbdb
PD
648If custom exceptions are necessary, to be able to catch them explicitly, they should still derive from `std::exception`, directly or indirectly, so that they can still be caught in a more generic way to prevent the program from terminating.
649For example, the main connection handling function of a server can catch `std::exception` and just terminate the current connection if an uncaught exception bubbles up.
8f7dc5e9 650
1caeb33c 651### Catching Exceptions
8f7dc5e9
RG
652
653Catching exceptions should always be done by const reference:
654
655```c+++
656try {
657}
658catch (const std::exception& e) {
659 std::cerr << e.what() <<endl;
660}
661```
662
663Not using a reference would result in the exception object being sliced, meaning that a custom exception derived from `std::exception` would not see its overriding `what()` method called but the one from the base class instead.
664
665## Casts
666
0044dbdb
PD
667C-style casts should be avoided, as the compiler does almost no check on the validity of the operation.
668They are also very hard to spot in a code.
669C++-style casts can easily be spotted in a code, which makes it easy to review them.
670
671* `const_cast` can be used to remove the const qualifier on a variable.
672 It's usually a bad sign, but sometimes it is needed to call a function that will not modify the variable but lacks the const qualifier, for example.
673* `dynamic_cast` can be used to cast a pointer to a derived class or to a base class, while checking that the operation is valid.
b638b4d2
PD
674 If the casted object is not valid for the intended type, a nullptr value will be returned (or a bad_cast exception for references) so the result of the operation should be checked!
675 Note that the RTTI check needed to verify that the casted object is valid has a non-negligible CPU cost.
0044dbdb
PD
676 Not checking the return value might lead to remote denial of service by nullptr dereference, as happened with the issue described in this advisory: https://docs.powerdns.com/recursor/security-advisories/powerdns-advisory-2017-08.html
677* `static_cast` can perform downcast in place of `dynamic_cast`, with none of the cost associated to the check, but can only be done if the cast is known to be valid.
678 It can also do implicit conversion between types (from `ssize_t` to `size_t`, AFTER checking that the value is greater or equal to zero).
679* `reinterpret_cast` is quite dangerous, since it can be used to turn a type into a different one.
b638b4d2 680 It cannot be be used to remove a const qualifier.
0044dbdb 681 When used to reinterpret the content of a buffer it can quickly lead to alignment issues, as described in the [alignment issues] section.
8f7dc5e9 682
1caeb33c 683## `errno`
8f7dc5e9 684
0044dbdb
PD
685`errno` is only guaranteed to be set on failing system calls and not set on succeeding system calls.
686A library call may clobber `errno`, even when it succeeds.
b638b4d2 687Safe practice is:
8f7dc5e9
RG
688
689* Only look at `errno` on failing systems calls or when a library function is documented to set `errno`.
b638b4d2 690* Immediately save the value of `errno` in a local variable after a system call for later decision making.