Frantisek Sumsal [Mon, 25 Dec 2023 10:28:26 +0000 (11:28 +0100)]
coccinelle: rework how we run the Coccinelle transformations
Turns out that the original way we did things was quite broken, as it
skipped a _lot_ of code. This was because we just threw everything into
one pile and tried to spatch it, but this made Coccinelle sad, like when
man page examples redefined some of our macros, causing typedef
conflicts.
For example, with a minimal reproducer that defines a cleanup macro in
two source files, Coccinelle has no issues when spatch-ing each one
separately:
$ spatch --verbose-parsing --sp-file zz-drop-braces.cocci main.c
init_defs_builtins: /usr/lib64/coccinelle/standard.h
HANDLING: main.c
SPECIAL NAMES: adding _cleanup_ as a attribute with arguments
SPECIAL NAMES: adding _cleanup_free_ as a attribute
$ spatch --verbose-parsing --sp-file zz-drop-braces.cocci
logcontrol-example.c
init_defs_builtins: /usr/lib64/coccinelle/standard.h
HANDLING: logcontrol-example.c
SPECIAL NAMES: adding _cleanup_ as a attribute with arguments
But when you try to spatch both of them at once, Coccinelle starts
complaining and skipping the "bad" code:
$ spatch --verbose-parsing --sp-file zz-drop-braces.cocci main.c logcontrol-example.c
init_defs_builtins: /usr/lib64/coccinelle/standard.h
HANDLING: main.c logcontrol-example.c
SPECIAL NAMES: adding _cleanup_ as a attribute with arguments
SPECIAL NAMES: adding _cleanup_free_ as a attribute
remapping: _cleanup_ to an ident in macro name
ERROR-RECOV: found sync end of #define, line 44
parsing pass2: try again
ERROR-RECOV: found sync end of #define, line 44
parse error
= File "logcontrol-example.c", line 44, column 21, charpos = 1719
around = '__attribute__',
whole content = #define _cleanup_(f) __attribute__((cleanup(f)))
badcount: 2
bad: #include <systemd/sd-journal.h>
bad:
BAD:!!!!! #define _cleanup_(f) __attribute__((cleanup(f)))
This was, unfortunately, hidden as it is visible only with
--verbose-parsing (or --parse-error-msg).
Another issue was how we handled includes. The original way of throwing
them into the pile of source files doesn't really work, leading up to
similar issues as above. The better way is to let Coccinelle properly
resolve all includes by telling it where to find our own include files
(basically the same thing we already do during compilation).
After fixing all this, Coccinelle now has a chance to process much more
of our code (there are still some issues in more complex macros, but
that requires further investigation). However, there's a huge downside
from all of this - doing a _proper_ code analysis is surprisingly time
and resource heavy; meaning that processing just one Coccinelle rule now
takes 15 - 30 minutes.
To make this slightly less painful, Coccinelle supports caching the
generated ASTs, which actually helps a lot - it gets the runtime of one
rule from 15 - 30 minutes down to ~1 minute. It, of course, has its own
downside - the cache is _really_ big (ATTOW the cache takes ~15 GiB).
However, even with the aggressive AST caching you're still looking at
~1 hour for one full Coccinelle run, which is a bit annoying, but I
guess that's the price of doing things _properly_ (but I'll definitely
look into ways of further optimizing this).
Frantisek Sumsal [Sun, 24 Dec 2023 13:49:23 +0000 (14:49 +0100)]
busctl: avoid asserting on NULL message
Avoid passing a NULL message to sd_bus_message_is_signal(), to not trip
over an assertion:
[ 132.869436] H testsuite-82.sh[614]: + systemctl --no-block --check-inhibitors=yes soft-reboot
[ 132.967386] H systemd[1]: Created slice system-systemd\x2dcoredump.slice.
[ 133.018292] H systemd[1]: Starting inhibit.service...
[ 133.122610] H systemd[1]: Started systemd-coredump@0-665-0.service.
[ 133.163643] H systemd[1]: Started inhibit.service.
[ 133.206836] H testsuite-82.sh[614]: + exec sleep infinity
[ 133.236762] H systemd-logind[611]: The system will reboot now!
[ 135.891607] H systemd-coredump[667]: [🡕] Process 663 (busctl) of user 0 dumped core.
Frantisek Sumsal [Sun, 24 Dec 2023 11:53:53 +0000 (12:53 +0100)]
test: flush the socket once the triggered unit exits
Since the triggered unit intentionally fails without consuming any data
from the socket, we'd try to trigger it again and again, and we might
try to check the unit state in one of the "in-between" states, failing
the test:
[ 165.271698] H testsuite-07.sh[1032]: + systemctl start badbin_assert.socket
[ 165.977637] H testsuite-07.sh[1032]: + socat - ABSTRACT-CONNECT:badbin_assert.socket
[ 165.983787] H systemd[1]: Cannot find unit for notify message of PID 1039, ignoring.
[ 166.817187] H testsuite-07.sh[1032]: + timeout 10 sh -c 'while systemctl is-active badbin_assert.service; do sleep .5; done'
[ 167.049218] H testsuite-07.sh[1065]: active
[ 167.146854] H systemd[1]: Listening on badbin_assert.socket.
[ 167.163473] H systemd[1]: badbin_assert.socket: Incoming traffic
[ 167.542626] H systemd[1]: Cannot find unit for notify message of PID 1065, ignoring.
[ 167.543437] H (badbin)[1062]: badbin_assert.service: Failed to execute /tmp/badbin: Exec format error
[ 167.548346] H systemd[1]: badbin_assert.service: Main process exited, code=exited, status=203/EXEC
[ 167.549482] H systemd[1]: badbin_assert.service: Failed with result 'exit-code'.
[ 167.561537] H systemd[1]: badbin_assert.socket: Incoming traffic
[ 167.933390] H systemd[1]: Started badbin_assert.service.
[ 167.950489] H (badbin)[1070]: badbin_assert.service: Failed to execute /tmp/badbin: Exec format error
[ 167.956318] H systemd[1]: badbin_assert.service: Main process exited, code=exited, status=203/EXEC
[ 167.957173] H systemd[1]: badbin_assert.service: Failed with result 'exit-code'.
[ 167.974609] H systemd[1]: badbin_assert.socket: Incoming traffic
[ 168.042838] H testsuite-07.sh[1072]: failed
[ 168.094431] H testsuite-07.sh[1075]: ++ systemctl show -P ExecMainStatus badbin_assert.service
[ 168.704022] H systemd[1]: Started badbin_assert.service.
[ 168.778680] H (badbin)[1074]: badbin_assert.service: Failed to execute /tmp/badbin: Exec format error
[ 168.826881] H systemd[1]: badbin_assert.service: Main process exited, code=exited, status=203/EXEC
[ 168.833825] H systemd[1]: badbin_assert.service: Failed with result 'exit-code'.
[ 168.923931] H testsuite-07.sh[1032]: + [[ 0 == 203 ]]
[ 168.951492] H systemd[1]: Cannot find unit for notify message of PID 1075, ignoring.
[ 168.999862] H testsuite-07.sh[615]: + echo 'Subtest /usr/lib/systemd/tests/testdata/units/testsuite-07.issue-30412.sh failed'
[ 168.999862] H testsuite-07.sh[615]: Subtest /usr/lib/systemd/tests/testdata/units/testsuite-07.issue-30412.sh failed
Luca Boccassi [Sat, 23 Dec 2023 08:56:31 +0000 (09:56 +0100)]
meson: check for pefile dependency before enabling ukify
ukify (and all the tests, including the autogenerated check-version-ukify)
does not work unless pefile is available, so track it as a dependency
in meson to avoid unit test failures later
Yu Watanabe [Fri, 17 Nov 2023 19:18:44 +0000 (04:18 +0900)]
log: make assert_return() critical when -Dmode=developer
Triggering assert_return() should be a bug in general, and we should
really fix that. But, previously, it is hard to notice such bug, as
it was not critical.
This is for making CI or our testing environment fail if we unexpectedly
trigger assert_return(). So, hopefully we can easily find such bugs.
Yu Watanabe [Sat, 23 Dec 2023 16:49:57 +0000 (01:49 +0900)]
test: make assert_return() critical by default on fuzzer and unit tests
Several test cases intentionally trigger assert_return(). So, to avoid
the entire test fails, this introduces several macros that tentatively
make assert_return() not critical.
Frantisek Sumsal [Sat, 23 Dec 2023 14:35:26 +0000 (15:35 +0100)]
test: redirect stdout/stderr of TEST-04-JOURNAL to console as well
This effectively reverts fa6f37c043 just for TEST-04, as we nuke the
journal repeatedly in this test which makes it particularly hard to
debug. Let's hope the issue behind fa6f37c043 won't bite us back in this
case.
Frantisek Sumsal [Sat, 23 Dec 2023 12:33:11 +0000 (13:33 +0100)]
test: make sure the dummy CA certificate is marked as such
With OpenSSL 3.2.0+ this is necessary, otherwise the verification
of such CA certificate fails badly:
$ openssl s_client -CAfile /run/systemd/remote-pki/ca.crt -connect localhost:19532
...
Connecting to ::1
CONNECTED(00000003)
Can't use SSL_get_servername
depth=1 C=CZ, L=Brno, O=Foo, OU=Bar, CN=Test CA
verify error:num=79:invalid CA certificate
verify return:1
depth=1 C=CZ, L=Brno, O=Foo, OU=Bar, CN=Test CA
verify error:num=26:unsuitable certificate purpose
verify return:1
...
---
SSL handshake has read 1566 bytes and written 409 bytes
Verification error: unsuitable certificate purpose
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
This TLS version forbids renegotiation.
Compression: NONE
Expansion: NONE
No ALPN negotiated
Early data was not sent
Verify return code: 26 (unsuitable certificate purpose)
$ openssl version
OpenSSL 3.0.9 30 May 2023 (Library: OpenSSL 3.0.9 30 May 2023)
$ openssl x509 -in cert.pem -text -noout | grep Issuer
Issuer: C = XX, L = Default City, O = Default Company Ltd
Making test-ukify unhappy:
> assert 'Issuer: CN = SecureBoot signing key on host' in out
E AssertionError: assert 'Issuer: CN = SecureBoot signing key on host' in '<...snip...>Issuer: CN=SecureBoot signing key on host archlinux2\n...'
Luca Boccassi [Fri, 22 Dec 2023 20:58:04 +0000 (21:58 +0100)]
man: conditionalize sd-pcrlock and sd-measure on the same variable as their binaries
The binaries are built and installed if HAVE_TPM2 is set, and ignore ENABLE_BOOTLOADER,
so do the same for the manpages.
For the sd-pcrlock case this also installs the manpage aliases for the units, which
are not installed with -Dbootloader=disabled, but there's no way to conditionalize
the aliases, so on balance it's better to have too much documentation rather than
too little.
Yu Watanabe [Wed, 22 Nov 2023 03:57:45 +0000 (12:57 +0900)]
sd-device: modernize device_update_db() and friends
- introduce device_should_have_db(),
- split out device_get_db_path(),
- update log messages, especially clarify which stage is failed,
- use _cleanup_(unlink_and_freep) attribute,
- clear existing database file also when failed to create database directory
and when failed to create temporary file.
Let's get networkd onto Varlink. This only adds the most basic of
operations.
I'd love to see networkd do Varlink for all its basic operations so that
networkctl can use that, and work correctly before D-Bus is up. Right
now, many of networkctls calls simply don't work before D-Bus, and I'd
like to see that improved.
service: don't try to determine selinux label for socket activation if RootImage= is used
We cannot determine the SELinux label ahead of time if RootImage= is
used, since we'd have to mount the image then, hence don't, and handle
this cleanly, and gracefully.
While we are at it, stop "reaching over" so much from the socket code to
the service code, and instead provide function that most of the hard
work in service.c that socket.c just calls.
While we are at it, add debug logging and stuff.
I noticed the issue when also noticing #30560, but that one is harder to
fix, hence I avoided it for now.