All our MSVS Project files share the same intermediate directory. We
know that this doesn't cause any problems, so we can just disable the
detection in the project files.
Example of the warning:
warning MSB8028: The intermediate directory (.\Release\) contains files shared from another project (dnssectool.vcxproj). This can lead to incorrect clean and rebuild behavior.
Due to a way the stdatomic.h shim is implemented on Windows, the MSVC
always things that the outside type is the largest - atomic_(u)int_fast64_t.
This can lead to false positives as this one:
lib\dns\adb.c(3678): warning C4477: 'fprintf' : format string '%u' requires an argument of type 'unsigned int', but variadic argument 2 has type 'unsigned __int64'
We workaround the issue by loading the value in a scoped local variable
with correct type first.
MSVC documentation states: "This warning can be caused when a pointer to
a const or volatile item is assigned to a pointer not declared as
pointing to const or volatile."
Unfortunately, this happens when we dynamically allocate and deallocate
block of atomic variables using isc_mem_get and isc_mem_put.
Couple of examples:
lib\isc\hp.c(134): warning C4090: 'function': different 'volatile' qualifiers [C:\builds\isc-projects\bind9\lib\isc\win32\libisc.vcxproj]
lib\isc\hp.c(144): warning C4090: 'function': different 'volatile' qualifiers [C:\builds\isc-projects\bind9\lib\isc\win32\libisc.vcxproj]
lib\isc\stats.c(55): warning C4090: 'function': different 'volatile' qualifiers [C:\builds\isc-projects\bind9\lib\isc\win32\libisc.vcxproj]
lib\isc\stats.c(87): warning C4090: 'function': different 'volatile' qualifiers [C:\builds\isc-projects\bind9\lib\isc\win32\libisc.vcxproj]
The InterlockedOr8() and InterlockedAnd8() first argument was cast
to (atomic_int_fast8_t) instead of (atomic_int_fast8_t *), this was
reported by MSVC as:
warning C4024: '_InterlockedOr8': different types for formal and actual parameter 1
warning C4024: '_InterlockedAnd8': different types for formal and actual parameter 1
Set WarningLevel to Level1 for Release, treat warnings as errors
Our vcxproj files set the WarningLevel to Level3, which is too verbose
for a code that needs to be portable. That basically leads to ignoring
all the errors that MSVC produces. This commits downgrades the
WarningLevel to Level1 and enables treating warnings as errors for
Release builds. For the Debug builds the WarningLevel got upgraded to
Level4, and treating warnings as errors is explicitly disabled.
We should eventually make the code clean of all MSVC warnings, but it's
a long way to go for Level4, so it's more reasonable to start at Level1.
For reference[1], these are the warning levels as described by MSVC
documentation:
* /W0 suppresses all warnings. It's equivalent to /w.
* /W1 displays level 1 (severe) warnings. /W1 is the default setting
in the command-line compiler.
* /W2 displays level 1 and level 2 (significant) warnings.
* /W3 displays level 1, level 2, and level 3 (production quality)
warnings. /W3 is the default setting in the IDE.
* /W4 displays level 1, level 2, and level 3 warnings, and all level 4
(informational) warnings that aren't off by default. We recommend
that you use this option to provide lint-like warnings. For a new
project, it may be best to use /W4 in all compilations. This option
helps ensure the fewest possible hard-to-find code defects.
* /Wall displays all warnings displayed by /W4 and all other warnings
that /W4 doesn't include — for example, warnings that are off by
default.
* /WX treats all compiler warnings as errors. For a new project, it
may be best to use /WX in all compilations; resolving all warnings
ensures the fewest possible hard-to-find code defects.
Michał Kępień [Wed, 15 Apr 2020 09:38:40 +0000 (11:38 +0200)]
Fix "srcid" on Windows
Windows BIND releases produced by GitLab CI are built from Git
repositories, not from release tarballs, which means the "srcid" file is
not present in the top source directory when MSBuild is invoked. This
causes the Git commit hash for such builds to be set to "unset_id".
Enable win32utils/Configure to try determining the commit hash for a
build by invoking Git on the build host if the "srcid" file is not
present (which is what its Unix counterpart does).
Ondřej Surý [Mon, 30 Mar 2020 12:18:01 +0000 (14:18 +0200)]
Add pylint and flake8 tests to GitLab CI
Our python code didn't adhere to any coding standard. In this commit, we add
flame8 (https://pypi.org/project/flake8/), and pylint (https://www.pylint.org/).
There's couple of exceptions:
- ans.py scripts are not checked, nor fixed as part of this MR
- pylint's missing-*-docstring and duplicate-code checks have
been disabled via .pylintrc
Michał Kępień [Wed, 8 Apr 2020 12:27:33 +0000 (14:27 +0200)]
Work around an MSVC bug
The assembly code generated by MSVC for at least some signed comparisons
involving atomic variables incorrectly uses unsigned conditional jumps
instead of signed ones. In particular, the checks in isc_log_wouldlog()
are affected in a way which breaks logging on Windows and thus also all
system tests involving a named instance. Work around the issue by
assigning the values returned by atomic_load_acquire() calls in
isc_log_wouldlog() to local variables before performing comparisons.
Diego Fronza [Thu, 13 Feb 2020 23:35:25 +0000 (20:35 -0300)]
Add test for the proposed fix
This test asserts that option "deny-answer-aliases" works correctly
when forwarding requests.
As a matter of example, the behavior expected for a forwarder BIND
instance, having an option such as deny-answer-aliases { "domain"; }
is that when forwarding a request for *.anything-but-domain, it is
expected that it will return SERVFAIL if any answer received has a CNAME
for "*.domain".
Diego Fronza [Thu, 13 Feb 2020 23:17:13 +0000 (20:17 -0300)]
Fixed rebinding protection bug when using forwarder setups
BIND wasn't honoring option "deny-answer-aliases" when configured to
forward queries.
Before the fix it was possible for nameservers listed in "forwarders"
option to return CNAME answers pointing to unrelated domains of the
original query, which could be used as a vector for rebinding attacks.
The fix ensures that BIND apply filters even if configured as a forwarder
instance.
Increate the DNSKEY TTL of the migrate.kasp zone for the following
reason: The key states are initialized depending on the timing
metadata. If a key is present long enough in the zone it will be
initialized to OMNIPRESENT. Long enough here is the time when it
was published (when the setup script was run) plus DNSKEY TTL.
Otherwise it is set to RUMOURED, or to HIDDEN if no timing metadata
is set or the time is still in the future.
Since the TTL is "only" 5 minutes, the DNSKEY state may be
initialized to OMNIPRESENT if the test is slow, but we expect it
to be in RUMOURED state. If we increase the TTL to a couple of
hours it is very unlikely that it will be initialized to something
else than RUMOURED.
This fixes another intermittent failure in the kasp system test.
It does not happen often, except for in the Windows platform tests
where it takes a long time to run the tests.
In the "kasp" system test, there is an "rndc reconfig" call which
triggers a new rekey event. check_next_key_event() verifies the time
remaining from the moment "rndc reconfig" is called until the next key
event. However, the next key event time is calculated from the key
times provided during key creation (i.e. during test setup). Given
this, if "rndc reconfig" is called a significant amount of time after
the test is started, some check_next_key_event() checks will fail.
Fix by calculating the time passed since the start of the test and
when 'rndc reconfig' happens. Substract this time from the
calculated next key event.
This only needs to be done after an "rndc reconfig" on zones where
the keymgr needs to wait for a period of time (for example for keys
to become OMNIPRESENT, or HIDDEN). This is on step 2 and step 5 of
the algorithm rollover. In step 2 there is a waiting period before
the DNSKEY is OMNIPRESENT, in step 5 there is a waiting period
before the DNSKEY is HIDDEN.
In step 1 new keys are created, in step 3 and 4 key states just
entered OMNIPRESENT, and in step 6 we no longer care because the
key lifetime is unlimited and we default to checking once per hour.
Regardless of our indifference about the next key event after step 6,
change some of the key timings in the setup script to better
reflect reality: DNSKEY is in HIDDEN after step 5, DS times have
changed when the new DS became active.
Ondřej Surý [Thu, 26 Mar 2020 09:57:38 +0000 (10:57 +0100)]
Fix the statistic counter underflow in ns_client_t
In case of normal fetch, the .recursionquota is attached and
ns_statscounter_recursclients is incremented when the fetch is created. Then
the .recursionquota is detached and the counter decremented in the
fetch_callback().
In case of prefetch or rpzfetch, the quota is attached, but the counter is not
incremented. When we reach the soft-quota, the function returns early but don't
detach from the quota, and it gets destroyed during the ns_client_endrequest(),
so no memory was leaked.
But because the ns_statscounter_recursclients is only incremented during the
normal fetch the counter would be incorrectly decremented on two occassions:
1) When we reached the softquota, because the quota was not properly detached
2) When the prefetch or rpzfetch was cancelled mid-flight and the callback
function was never called.
Rather than group key ids together, group key id with its
corresponding counters. This should make growing / shrinking easier
than having keyids then counters.
Matthijs Mekking [Thu, 26 Mar 2020 15:13:55 +0000 (16:13 +0100)]
Add test for many keys
Add a statschannel test case for DNSSEC sign metrics that has more
keys than there are allocated stats counters for. This will produce
gibberish, but at least it should not crash.
Matthijs Mekking [Thu, 26 Mar 2020 15:02:36 +0000 (16:02 +0100)]
Redesign dnssec sign statistics
The first attempt to add DNSSEC sign statistics was naive: for each
zone we allocated 64K counters, twice. In reality each zone has at
most four keys, so the new approach only has room for four keys per
zone. If after a rollover more keys have signed the zone, existing
keys are rotated out.
The DNSSEC sign statistics has three counters per key, so twelve
counters per zone. First counter is actually a key id, so it is
clear what key contributed to the metrics. The second counter
tracks the number of generated signatures, and the third tracks
how many of those are refreshes.
This means that in the zone structure we no longer need two separate
references to DNSSEC sign metrics: both the resign and refresh stats
are kept in a single dns_stats structure.
Incrementing dnssecsignstats:
Whenever a dnssecsignstat is incremented, we look up the key id
to see if we already are counting metrics for this key. If so,
we update the corresponding operation counter (resign or
refresh).
If the key is new, store the value in a new counter and increment
corresponding counter.
If all slots are full, we rotate the keys and overwrite the last
slot with the new key.
Dumping dnssecsignstats:
Dumping dnssecsignstats is no longer a simple wrapper around
isc_stats_dump, but uses the same principle. The difference is that
rather than dumping the index (key tag) and counter, we have to look
up the corresponding counter.
Add a test to ensure migration from 'auto-dnssec maintain;' to
dnssec-policy works even if the algorithm is changed. The existing
keys should not be removed immediately, but their goal should be
changed to become hidden, and the new keys with the different
algorithm should be introduced immediately.
If we initialize goals on all keys, superfluous keys that match
the policy all desire to be active. For example, there are six
keys available for a policy that needs just two, we only want to
set the goal state to OMNIPRESENT on two keys, not six.
Migrating from 'auto-dnssec maintain;' to dnssec-policy did not
work properly, mainly because the legacy keys were initialized
badly. Earlier commit deals with migration where existing keys
match the policy. This commit deals with migration where existing
keys do not match the policy. In that case, named must not
immediately delete the existing keys, but gracefully roll to the
dnssec-policy.
However, named did remove the existing keys immediately. This is
because the legacy key states were initialized badly. Because
those keys had their states initialized to HIDDEN or RUMOURED, the
keymgr decides that they can be removed (because only when the key
has its states in OMNIPRESENT it can be used safely).
The original thought to initialize key states to HIDDEN (and
RUMOURED to deal with existing keys) was to ensure that those keys
will go through the required propagation time before the keymgr
decides they can be used safely. However, those keys are already
in the zone for a long time and making the key states represent
otherwise is dangerous: keys may be pulled out of the zone while
in fact they are required to establish the chain of trust.
Fix initializing key states for existing keys by looking more closely
at the time metadata. Add TTL and propagation delays to the time
metadata and see if the DNSSEC records have been propagated.
Initialize the state to OMNIPRESENT if so, otherwise initialize to
RUMOURED. If the time metadata is in the future, or does not exist,
keep initializing the state to HIDDEN.
The added test makes sure that new keys matching the policy are
introduced, but existing keys are kept in the zone until the new
keys have been propagated.
A few kasp system test tweaks to improve test failure debugging and
deal with tests related to migration to dnssec-policy.
1. When clearing a key, set lifetime to "none". If "none", skip
expect no lifetime set in the state file. Legacy keys that
are migrated but don't match the dnssec-policy will not have a
lifetime.
2. The kasp system test prints which key id and file it is checking.
Log explicitly if we are checking the id or a file.
3. Add quotes around "ID" when setting the key id, for consistency.
4. Fix a typo (non -> none).
5. Print which key ids are found, this way it is easier to see what
KEY[1-4] failed to match one of the key files.
Matthijs Mekking [Fri, 27 Mar 2020 09:28:22 +0000 (10:28 +0100)]
Fix and test migration to dnssec-policy
Migrating from 'auto-dnssec maintain;' to dnssec-policy did not
work properly, mainly because the legacy keys were initialized
badly. Several adjustments in the keymgr are required to get it right:
- Set published time on keys when we calculate prepublication time.
This is not strictly necessary, but it is weird to have an active
key without the published time set.
- Initalize key states also before matching keys. Determine the
target state by looking at existing time metadata: If the time
data is set and is in the past, it is a hint that the key and
its corresponding records have been published in the zone already,
and the state is initialized to RUMOURED. Otherwise, initialize it
as HIDDEN. This fixes migration to dnssec-policy from existing
keys.
- Initialize key goal on keys that match key policy to OMNIPRESENT.
These may be existing legacy keys that are being migrated.
- A key that has its goal to OMNIPRESENT *or* an active key can
match a kasp key. The code was changed with CHANGE 5354 that
was a bugfix to prevent creating new KSK keys for zones in the
initial stage of signing. However, this caused problems for
restarts when rollovers are in progress, because an outroducing
key can still be an active key.
The test for this introduces a new KEY property 'legacy'. This is
used to skip tests related to .state files.
The rwlock introduced to protect the .logconfig member of isc_log_t
structure caused a significant performance drop because of the rwlock
contention. It was also found, that the debug_level member of said
structure was not protected from concurrent read/writes.
The .dynamic and .highest_level members of isc_logconfig_t structure
were actually just cached values pulled from the assigned channels.
We introduced an even higher cache level for .dynamic and .highest_level
members directly into the isc_log_t structure, so we don't have to
access the .logconfig member in the isc_log_wouldlog() function.
Evan Hunt [Tue, 31 Mar 2020 22:04:20 +0000 (15:04 -0700)]
incrementally clean up old RPZ records during updates
After an RPZ zone is updated via zone transfer, the RPZ summary
database is updated, inserting the newly added names in the policy
zone and deleting the newly removed ones. The first part of this
was quantized so it would not run too long and starve other tasks
during large updates, but the second part was not quantized, so
that an update in which a large number of records were deleted
could cause named to become briefly unresponsive.
Witold Kręcicki [Thu, 26 Mar 2020 13:25:06 +0000 (14:25 +0100)]
Deactivate the handle before sending the async close callback.
We could have a race between handle closing and processing async
callback. Deactivate the handle before issuing the callback - we
have the socket referenced anyway so it's not a problem.
Witold Kręcicki [Tue, 24 Mar 2020 10:42:16 +0000 (11:42 +0100)]
Add a quota attach function with a callback, some code cleanups.
We introduce a isc_quota_attach_cb function - if ISC_R_QUOTA is returned
at the time the function is called, then a callback will be called when
there's quota available (with quota already attached). The callbacks are
organized as a LIFO queue in the quota structure.
It's needed for TCP client quota - with old networking code we had one
single place where tcp clients quota was processed so we could resume
accepting when the we had spare slots, but it's gone with netmgr - now
we need to notify the listener/accepter that there's quota available so
that it can resume accepting.
Remove unused isc_quota_force() function.
The isc_quote_reserve and isc_quota_release were used only internally
from the quota.c and the tests. We should not expose API we are not
using.
Mark Andrews [Mon, 16 Mar 2020 05:44:51 +0000 (16:44 +1100)]
Typedef my_bool if missing.
ORACLE MySQL 8.0 has dropped the my_bool type, so we need to reinstate
it back when compiling with that version or higher. MariaDB is still
keeping the my_bool type. The numbering between the two (MariaDB 5.x
jumped to MariaDB 10.x) doesn't make the life of the developer easy.
Michał Kępień [Thu, 26 Mar 2020 10:03:52 +0000 (11:03 +0100)]
Include compiler name in all build/test job names
Most build/test job names already contain a "clang", "gcc", or "msvc"
prefix which indicates the compiler used for a given job. Apply that
naming convention to all build/test job names.
Michał Kępień [Thu, 26 Mar 2020 10:03:52 +0000 (11:03 +0100)]
Refactor TSAN unit test job definitions
Multiple YAML keys have identical values for both TSAN unit test job
definitions. Extract these common keys to a YAML anchor and use it in
TSAN unit test job definitions to reduce code duplication.
Michał Kępień [Thu, 26 Mar 2020 10:03:52 +0000 (11:03 +0100)]
Run "kyua report-html" for TSAN unit test jobs
Definitions of jobs running unit tests under TSAN contain an
"after_script" YAML key. Since the "unit_test_job" anchor is included
in those job definitions before "after_script" is defined, the
job-specific value of that key overrides the one defined in the included
anchor. This prevents "kyua report-html" from being run for TSAN unit
test jobs. Moving the invocation of "kyua report-html" to the "script"
key in the "unit_test_job" anchor is not acceptable as it would cause
the exit code of that command to determine the result of all unit test
jobs and we need that to be the exit code of "make unit". Instead, add
"kyua report-html" invocations to the "after_script" key of TSAN unit
test job definitions to address the problem without affecting other job
definitions.
Michał Kępień [Thu, 26 Mar 2020 10:03:52 +0000 (11:03 +0100)]
Refactor TSAN system test job definitions
Multiple YAML keys have identical values for both TSAN system test job
definitions. Extract these common keys to a YAML anchor and use it in
TSAN system test job definitions to reduce code duplication.
Michał Kępień [Thu, 26 Mar 2020 10:03:52 +0000 (11:03 +0100)]
Drop "before_script" key from TSAN job definitions
Both "system_test_job" and "unit_test_job" YAML anchors contain a
"before_script" key. TSAN job definitions first specify their own value
of the "before_script" key and then include the aforementioned YAML
anchors, which results in the value of the "before_script" key being
overridden with the value specified by the included anchor. Given this,
remove "before_script" definitions specific to TSAN jobs as they serve
no practical purpose.
Michał Kępień [Thu, 26 Mar 2020 10:03:52 +0000 (11:03 +0100)]
Define TSAN options in a global variable
All assignments for the TSAN_OPTIONS variable are identical across the
entire .gitlab-ci.yml file. Define a global TSAN_OPTIONS_COMMON
variable and use it in job definitions to reduce code duplication.
Ondřej Surý [Tue, 24 Mar 2020 08:43:45 +0000 (09:43 +0100)]
Adjust the GitLab CI jobs to match the new images
The custom builds (oot, asan, tsan) were mostly built using Debian sid
amd64 image. The problem was that this image broke too easily, because
it's Debian "unstable" after all.
This commit introduces "base_image" that should be most stable with
extra bits on top (clang, coccinelle, cppcheck, ...). Currently, that
would be Debian buster amd64.
Other changes introduced by this commit:
* Change the default clang version to 10
* Run both ASAN and TSAN with both gcc and clang compilers
* Remove Clang Debian stretch i386 job