Fix bug: error'd repositories weren't logged if a child repo was synced
+If a parent repository wasn't successfully synced (eg. LACNIC) but a child repository was synced (eg. Brazil), the errors related to the parent repository weren't logged to the operation log.
+Fix this by poping the working repository from the TA, since this was causing the error. All the repositories were erroneously related, so on success of any of them, the error logs were discarded.
+Two additional updates are done: don't rsync when forcing the download of an URI whose ancestor had a previous error, and remove line breaks from stale repositories summary.
+Rename some local variables to aid dev reading.
pcarana [Fri, 26 Jun 2020 23:09:32 +0000 (18:09 -0500)]
Fix bug: didn't searched local files when an RRDP URI failed previously
+Whenever an RRDP repository can't be fetched, an attempt to work with local files must be done. If RSYNC was disabled and there was an error fetching the RRDP repository, the next time that repository was found on a certificate, it was being rejected; the right thing to do, is to consider such scenario and keep working locally.
pcarana [Fri, 26 Jun 2020 19:38:23 +0000 (14:38 -0500)]
Avoid additional operations after calling fork()
+Based on https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html, the function called by the child process now avoids malloc's and only redirects its output to the corresponding pipes before doing the rsync execution with execvp().
+This fixes #35. Something at the musl implementation (very specific for docker+alpine) hangs the child process right after its creation, the parent process waits for the child to end but it never does, so the container runs for ever and never ends a validation cycle.
+Also, flush stderr/stdout before fork() to avoid a possible (in docker+alpine, almost sure) deadlock between parent process and its forked child.
pcarana [Fri, 19 Jun 2020 22:30:43 +0000 (17:30 -0500)]
Fix several bugs related to sync errors, update some log messages
+Fix bug: an endless loop when a requested URI error was removed.
+Fix bug: some error'd URIs could be logged despite that their repository data was successfully fetched with another access method.
+Fig bug: if a TAL has more than one URI and there was an error fetching an URI, the following URIs in the list weren't considered to get the TA certificate.
+Only 'stderr' rsync output will be sent to operation log considering '--stale-repository-period', the 'stdout' rsync output will be sent to validation log at level info.
+Messages of rsync/rrdp retries are 'upgraded' from level info to warn (all on validation logs).
+Add a warning message (validation log) whenever local data is going to be utilized due to previous errors fetching repositories or TA certificates.
+Log all communication errors if 'log.level=debug'.
pcarana [Sat, 13 Jun 2020 00:34:57 +0000 (19:34 -0500)]
Replace args '*log.prefix' for '*log.tag', add help message.
+Do the replacement at code, docs and unit tests.
+Add a help message that's printed whenever there's an error at the configuration arguments.
+Fix a broken unit test.
+Fix the description of 'validation-log.tag'.
+Fix some errors at configuration examples ('examples/config.json') and at the web docs ('usage.html#--configuration-file').
pcarana [Thu, 4 Jun 2020 23:14:03 +0000 (18:14 -0500)]
Fix bug when applying SLURM, and configure the log level on empty dirs
+The bug was when a SLURM was successfully loaded, instead of stopping the interfal flow on success (a 'return' was needed) it continued to the error flow. This lead to worst errors later, such as segfault when a valid slurm was applied.
+Log error whenever the TALs configured directory is empty, log warning if the SLURM directory is empty.
pcarana [Wed, 13 May 2020 23:43:43 +0000 (18:43 -0500)]
Allow to work with cache on requests errors, common func to get date
+ Create common function to get the current date and time.
+ Identify request errors, specifically after trying to fetch data via http/rsync without success. This helps to identify if a whole repository can't be downloaded after a considerable time (it can be configured).
+ Allow to work with local files even when there was a download error.
+ Add 'stale-repository-period' argument to set the time period that must lapse to warn about stale repositories (this will be logged to the operation log).
- Code wasn't validating null result on strdup
- If a validation thread was interrupted,
`perform_standalone_validation()` was reading an uninitalized
exit status.
More or less as a side effect, I also merged the structures
`pthread_param` and `thread`, because their usage was similar
and shared ~50% of their members.
`do_file_validation()` is no longer responsible for freeing its
generic argument.
pcarana [Wed, 25 Mar 2020 00:39:01 +0000 (18:39 -0600)]
Add new incidences regarding manifest validation.
-Related to #28.
-'incid-file-at_mft-not-found': when a file listed in a manifest isn't found at the manifest publication point.
-'incid-file-at-mft-hash-not-match': the file hash doesn't match the hash listed at the manifest.
-Both incidences will be an error by default.
pcarana [Thu, 19 Mar 2020 22:55:20 +0000 (16:55 -0600)]
Update SLURM loading logic (use a cache to load new data).
+Stop searching for duplicate elements in the same file or in distinct files, also stop searching for covered prefixes at the same file; those checks don't exist at the RFC and they had a huge processing cost.
+Implement a SLURM cache when a new file is loaded, this way is easier to check RFC 8416 section 4.2 rule.
+Remove the whole context properties that were utilized to know on which file the loader was working.
pcarana [Fri, 13 Mar 2020 17:47:37 +0000 (11:47 -0600)]
Check for time condition met/unmet due to old libcurl impl
The 'problem' was found at CentOS 7, the libcurl implementation makes the 'If-Modified-Since' check at the client side. So, if the server responds with an HTTP OK (200) code but the dates don't match, the response content is ignored.
What's the problem? For us (HTTP client) the response looks ok and we take the download as correct, but the downloaded file doesn't have content, so when its read bad things happen (actually the error is logged and the fallback is to mark such repository as invalid and try the download from another repo, if such repo is available).
pcarana [Fri, 13 Mar 2020 16:36:04 +0000 (10:36 -0600)]
Stop holding the write lock when the SLURM is loaded
There's no need to hold the lock, the SLURM loading action doesn't modify the current DB state; it's altering the new DB state, which will be utilized later to replace the current DB state (and that's where the lock is needed).
+Add missing dependency at some distros (libcurl) and update 'libxml' package (the '*-devel' package is required).
+Add '--timeout' parameter to rsync command, the default value is the same as 'http.idle-timeout' (15 secs). Update docs where this value is referred.
+Verify that the manifest file exists locally after downloading a repo, if the file doesn't exist, then the repo is discarded and (possibly) another repository will be utilized (eg. rrdp or rsync repo).
pcarana [Sat, 1 Feb 2020 00:06:00 +0000 (18:06 -0600)]
Set 'root-except-ta' as default, fix bugs deleting RRDP repo files.
+Remove unnecessary functions at 'visited_uris.h', rename function that deletes local files.
+Refactor the way the old repository files related to an RRDP URI are deleted, instead of deleting the 'best guess' of the root dir, delete each root dir of the mft uris stored at visited uris struct. The daemon will do its best effort to remove the files.
+Update year 2019 refs by 2020.
+Use 'root-except-ta' rsync strategy as default (and update docs as well), to prevent rsyncs to overwrite repositories fetched via RRDP.
+Remove 'create_snapshot' logic from 'rrdp_parser', wasn't of too much help since the 'If-Modified-Since' impl already avoids to load unnecessary data.
+Remove local repository files related to an RRDP URI only on session ID updates; also, reset RSYNC visited URIs of a TAL if an RRDP repository sync fails, this helps to refresh the repo via rsync (if rsync is the secondary option to fetch it).
+Fix 'tal_test.c' error comparing loaded URIs.
pcarana [Wed, 29 Jan 2020 20:34:43 +0000 (14:34 -0600)]
Fix bugs (base64 sanitize function, TAL URIs validations) and memleak.
+The base64 sanitize function was setting a nul char at a wrong location.
+Validate the TAL URIs syntax when they are loaded, not until they are utilized.
+Possible memleak at 'x509stack_push' when the function error'd.
pcarana [Tue, 28 Jan 2020 19:03:04 +0000 (13:03 -0600)]
Fix bug: wasn't requesting RRDP repos after the initial repo download.
-The way to 'decide' if an RRDP repository should be requested was made using the visited uris data, well, this was wrong; update the logic to consider if the RRDP repository was already utilized during the current cycle (use the RRDP repositories request status).
pcarana [Wed, 22 Jan 2020 23:40:56 +0000 (17:40 -0600)]
Fix bugs (RRDP processing, unclosed file), add new docs for routers.
+Bug at RRDP processing: base64 content with middle spaces (or line breaks) wasn't decoded.
+Bug at file/dir validation during config: file wasn't closed on success.
+Add 'Routers' section at docs to indicate the basics on communication between routers and validator.
pcarana [Fri, 17 Jan 2020 23:40:18 +0000 (17:40 -0600)]
Avoid HTTP requests on previously error'd URIs.
+Create type to set RRDP URIs request status (error, unvisited, visited). The status is set accordingly to the result of the last request and processing of the RRDP Update Notification URIs; during RRDP loading, the status is validated to either do a request (URIs hasn't been visited) or skip it (was previously visited or had an error). If a request had an error, then continue the access methods flow considering priorities.
+Update RET_NOT_FOUND_URI macro, it always returned the same error code.
+Remove downloaded files via HTTP (and its local directory structure) whenever there's an error during the download process.
pcarana [Thu, 16 Jan 2020 00:59:20 +0000 (18:59 -0600)]
Load previous valid SLURM on any error, validate tal/slurm conf args.
+Previous valid SLURM was applied only if a newer SLURM had syntax errors; this has changed, now it's applied on any error.
+Log error when the version isn't set at SLURM file.
+Validate configured location (can be a file or directory) of 'tal' and 'slurm' args when the application starts.
pcarana [Wed, 15 Jan 2020 21:56:04 +0000 (15:56 -0600)]
Fix bugs at snapshot processing, and uint args parsing.
+The errors raised during snapshot files processing were ignored. Despite the affected files were deleted, the validation flow kept going, thus presenting an incorrect behavior.
+Unsigned integer arguments were treating an empty string as '0'.
pcarana [Tue, 14 Jan 2020 23:42:31 +0000 (17:42 -0600)]
Add 'rsync.retry.*' and 'rrdp.retry.*' conf args.
+The new arguments are 'rsync.retry.count', 'rsync.retry.interval', 'rrdp.retry.count', and 'rrdp.retry.interval'. Utilized whenever there's an rsync or rrdp sync error, the validator will retry at most '*.retry.count' times, waiting '*.retry.interval' between each retry.
+Ensure that HTTP files download returns a negative error in case of error.
+Wrap files download function at rrdp_parser.
pcarana [Tue, 14 Jan 2020 21:17:03 +0000 (15:17 -0600)]
Use SO_REUSEADDR at server socket, log rsync execution output.
+SO_REUSEADDR sockopt allows to reuse server address and port at once when the service has been stoped (or killed).
+Fix bug: the output of rsync execution (either error or verbose) wasn't being logged when 'log.output' was syslog. The stderr of rsync fork is sent to 'pr_err' function, and stdout is sent to 'pr_info' function.
pcarana [Mon, 13 Jan 2020 19:58:47 +0000 (13:58 -0600)]
Add extra rsync and rrdp configurations (enabled and priority).
+The new configuration properties are: 'rsync.enabled', 'rsync.priority', 'rsync.strategy', 'rrdp.enabled', 'rrdp.priority', and 'work-offline'.
+'sync-strategy' will be deprecated but still it can be set. Whenever is set, its value will be set to 'rsync.priority'.
+Fix possible bug at 'visited_uris', a nul char was being set at a wrong location.
+Consider configured priorities and enabled flags whenever an access method is utilized while processing certificates.
+Boolean configuration parameters value can now be set also at command line, using the syntax '--key=value'.
+Remove 'http.disabled' and 'rrdp-disabled', they aren't needed anymore.
pcarana [Tue, 7 Jan 2020 18:05:52 +0000 (12:05 -0600)]
Fix SLURM bugs and unitiliazed var warning.
+Initialize serial var when logging validation run information.
+Use a write lock when removing non-visited tals RRDP info.
+There was a segfault on two scenarios:
- When run as server and using a slurm file, during the second run, the validator couldn't access RRDP data from the previous run. Fix: the RRDP TAL DB must be static (lives at the parent stack).
- When SLURM was discarded due to a bad file content (eg. empty file, or malformed JSON) and during the next run the file content was valid again, the previous SLURM pointer was freed but didn't pointed at NULL (and this was expected). Fix: point at NULL when the whole SLURM is discarded.
pcarana [Wed, 18 Dec 2019 19:03:58 +0000 (13:03 -0600)]
Fix segfaults at visited_uris.c, add minor updates and RRDP debug logs.
+The segfaults where due to a bad initialization of visited URI elements and reference count.
+Add some debug logs when processing RRDP files.
+Replace 'Valid ROAs' label with 'Valid Prefixes' when there are updates at VRPS DB (and update docs where this label is referenced).
pcarana [Tue, 17 Dec 2019 23:52:11 +0000 (17:52 -0600)]
Add args to disable rrdp/http, update docs and setup script.
+The new arguments are 'rrdp-disabled' and 'http.disabled', both are treated as flags.
+Update docs to include: new arguments, rrdp support, new 'libxml2' dependency.
+Update configuration file example to include new arguments.
+Fix bug at arguments whose value is expected to be a path, this '--tal=' was treated as valid when it isn't, so validate that no empty paths are received.
+Update unit tests impersonator with new args.
+Updates at setup script:
- Fix bug: paths that included a space in between, weren't correctly utilized.
- Use wget always.
- Ignore case when accepting ARIN's RPA.
pcarana [Mon, 16 Dec 2019 20:49:32 +0000 (14:49 -0600)]
Delete temporary XML files and unused TALs RRDP data.
+Remove XML files and its directory structure once they have been utilized.
+Mark as visited TAL information related to RRDP once per cycle; if no RRDP information was visited (means, the TAL wasn't validated during the cycle) the data is forgotten, including its local data.
+Move the directory tree removal functions to 'common.h', so that it can be called from multiple parts.
+Remove created file in case of error during an HTTP download.
pcarana [Fri, 13 Dec 2019 17:50:46 +0000 (11:50 -0600)]
Refactor RRDP URIs storage, implement session ID update.
+Delete dir daemon: detach thread, renames the directory that's going to be deleted.
+Update logic (structs and relations) to remember RRDP URIs: each TAL thread will hold its own RRDP URIs, and each URI (update notification URI) will have its own visited uris struct; the main thread holds each TALs information, so that it can be accesed during every validation run. This way we know who owns what, and in case of a session ID update it's easier to remove the whole file system directory tree related to an RRDP URI.
+Rename 'visited_uris' of rsync to 'rsync_visited_uris', in validation state struct.
+Assure that update notification files are requested only once per cycle (in case they're found as the prefered access method).
+Implement session ID update, remove all files related to the previous session ID.
pcarana [Wed, 11 Dec 2019 00:30:53 +0000 (18:30 -0600)]
Remember which manifests where fetched using RRDP, remove rrdp_handler.
+Remember all manifests URIs that were processed from a snapshot or delta file, this will aid to avoid unnecessary rsync's on child CAs.
+Create 'visited_uris' struct and methods to remember URIs from RRDP snapshot/delta file(s). This should be updated to use another struct more efficient than an SLIST.
+Remove 'rrdp_handler' and do its calls directly where needed.
+Add warning message whenever an access method fails and the secondary access method is utilized.
+Assure that RRDP Update Notification URIs are visited only once per validation run.
+In case there's a manifest error, don't retry the repository download if the accessMethod to get the manifest was RRDP.
pcarana [Tue, 10 Dec 2019 21:03:31 +0000 (15:03 -0600)]
Parse XML docs using a reader, don't load the whole DOM at memory.
+Use 'libxml/xmlreader.h' functions to validate and parse XML documents, this decreases the use of memory that was being allocated using other functions.
+Update the logic at 'rrdp_parser.c' to parse a document element by element, using an 'xmlTextReader'.
+Update unit test to use the XML text reader.
pcarana [Thu, 5 Dec 2019 23:57:19 +0000 (17:57 -0600)]
Validate list of deltas at update notification file.
+Assure that the list of deltas is ordered to facilitate the validation of contiguous serials, and the processing of only the required deltas (only if there's a delta update).
+Change enum 'rrdp_uri_cmp_result' to a type 'rrdp_uri_cmp_result_t'.
+Process the snapshot if there's an error processing deltas.
+Make 'delta_head' attributes public, global and doc data init methods are now void.
+Remove 'SLIST' usage at 'deltas_head' struct, use instead an array list implementation, ready to store a defined amount of elements.