This introduces a has_data boolean field in struct unit_files which can
be used to detect the end of the array.
Use a _cleanup_ for struct unit_files in acquire_time_data and its
callers. Code for acquire_time_data is also simplified by replacing
goto's with straight returns.
Tested: By running the commands below, also checking them under valgrind.
- build/systemd-analyze blame
- build/systemd-analyze critical-chain
- build/systemd-analyze plot
scsi_id: use _cleanup_free_ on buffer allocated by get_file_options
This simplifies the code a bit and hopefully fixes Coverity finding
CID 1382966. There was not actually a resource leak here (Coverity
seemed to be confused by thinking log_oom() could actually return 0),
but the fix doesn't hurt and should make this code more resilient to
future refactorings.
Tested: builds fine, manually called scsi_id, seems to work ok.
lldp: check that lldp neighbor raw data size is in expected range
This fixes an insecure use of tainted data as argument to functions that
allocate memory and read from files, which could be tricked into getting
networkctl to allocate a large amount of memory and fill it with file
data.
This was uncovered by Coverity. Fixes CID 1393254.
udev-builtin-usb_id: Check full range of size returned by read()
This shouldn't be necessary, since read() should never return a size
larger than the size of the buffer passed in, but Coverity doesn't seem
to understand that.
We could possibly fix this with a model file for Coverity, but given
changing the code is not that much of a biggie, let's just do that
instead.
Fixes CID 996458: Overflowed or truncated value (or a value computed
from an overflowed or truncated value) `pos` used as array index.
Tested: `ninja -C build/ test`, builds without warnings, test cases pass.
CODING_STYLE: allow c99-style mixed code and declarations
We already allowed variables to be declared in the middle of a function
(whenever a new scope was opened), so this isn't such a big change. Sometimes
we would open a scope just to work around this prohibition.
But sometimes the code can be much clearer if the variable is declared
somewhere in the middle of a scope, in particular if the declaration is
combined with initialization or acquisition of some resources. So let's allow
this, but keep things in the old style, unless there's a good reason to move
the variable declaration to a different place.
core: enumerate perpetual units in a separate per-unit-type method
Previously the enumerate() callback defined for each unit type would do
two things:
1. It would create perpetual units (i.e. -.slice, system.slice, -.mount and
init.scope)
2. It would enumerate units from /proc/self/mountinfo, /proc/swaps and
the udev database
With this change these two parts are split into two seperate methods:
enumerate() now only does #2, while enumerate_perpetual() is responsible
for #1. Why make this change? Well, perpetual units should have a
slightly different effect that those found through enumeration: as
perpetual units should be up unconditionally, perpetually and thus never
change state, they should also not pull in deps by their state changing,
not even when the state is first set to active. Thus, their state is
generally initialized through the per-device coldplug() method in
similar fashion to the deserialized state from a previous run would be
put into place. OTOH units found through regular enumeration should
result in state changes (and thus pull in deps due to state changes),
hence their state should be put in effect in the catchup() method
instead. Hence, given this difference, let's also separate the
functions, so that the rule is:
1. What is created in enumerate_perpetual() should be started in
coldplug()
2. What is created in enumerate() should be started in catchup().
This makes sure that any device changes that might have happened while
we were restarting/reloading will be noticed properly. For that we'll
now properly serialize/deserialize both the device unit state and the
device "found" flags, and restore these initially in the "coldplug"
phase of the manager deserialization. While enumerating the udev devices
during startup we'll put together a new "found" flags mask, which we'll
the switch to in the "catchup" phase of the manager deserialization,
which follows the "coldplug" phase.
Note that during the "coldplug" phase no unit state change events are
generated, which is different for the "catchall" phase which will do
that. Thus we correctly make sure that the deserialized state won't pull
in new deps, but any device's change while we were reloading would.
This is very similar to the existing unit method coldplug() but is
called a bit later. The idea is that that coldplug() restores the unit
state from before any prior reload/restart, i.e. puts the deserialized
state in effect. The catchup() call is then called a bit later, to
catch up with the system state for which we missed notifications while
we were reloading. This is only really useful for mount, swap and device
mount points were we should be careful to generate all missing unit
state change events (i.e. call unit_notify() appropriately) for
everything that happened while we were reloading.
let's drop the "now" argument, it's exactly what MANAGER_IS_RUNNING()
returns, hence let's use that instead to simplify things.
Moreover, let's change the add/found argument pair to become found/mask,
which allows us to change multiple flags at the same time into opposing
directions, which will be useful later on.
Also, let's change the return type to void. It's a notifier call where
callers will ignore the return value anyway as it is nothing actionable.
We do this checks as protection against bind mount cycles on the same
file system. However, the check wasn't really effective for that, as
it would only detect cycles A → B → A this way. By using
fs_is_mount_point() we'll also detect cycles A → A.
Also, while we are at it, make these file system boundary checks
optional. This is not used anywhere, but might be eventually...
Most importantly though add a longer blurb explanation the why.
copy: only check for traversing mount points on directories
This fixes the copy routines on overlay filesystem, which typically
returns the underlying st_dev for files, symlinks, etc.
The value of st_dev is guaranteed to be the same for directories, so
checking it on directories only fixes this code on overlay filesystem
and still keeps it from traversing mount points (which was the original
intent.)
There's a small side effect here, by which regular (non-directory) files
with bind mounts will be copied by the new logic (while they were
skipped by the previous logic.)
Tested: ./build/test-copy with an overlay on /tmp.
This adds a function sd_bus_slot_set_destroy_callback() to set a function
which can free userdata or perform other cleanups.
sd_bus_slot_get_destory_callback() queries the callback, and is included
for completeness.
Without something like this, for floating asynchronous callbacks, which might
be called or not, depending on the sequence of events, it's hard to perform
resource cleanup. The alternative would be to always perform the cleanup from
the caller too, but that requires more coordination and keeping of some shared
state. It's nicer to keep the cleanup contained between the callback and the
function that requests the callback.
test-bus-util: add a simple test for bus_request_name_async_may_reload_dbus()
This shows a minor memleak:
==1883== 24 bytes in 1 blocks are definitely lost in loss record 1 of 1
==1883== at 0x4C2DBAB: malloc (vg_replace_malloc.c:299)
==1883== by 0x4E9D385: malloc_multiply (alloc-util.h:69)
==1883== by 0x4EA2959: bus_request_name_async_may_reload_dbus (bus-util.c:1841)
==1883== by ...
The exchange of messages is truncated at two different points: once right
after the first callback is requested, and the second time after the full
sequence has run (usually resulting in an error because of policy).
man: xinclude the generic text to talk about libsystemd pkgconfig
The only difference is that functions are not individually listed by name,
but that seems completely pointless, since all functions that are documented
are always exported, so the generic text tells the user all she or he needs
to know.
resolved: reformat message about a revoked trust anchor
LOG_MESSAGE is just a wrapper, but it keeps the arguments indented together
with the format string, so put the argument inside of the macro invocation.
(No functional change.)
Also use lowercase for "trust anchor" — it should either be all capitaled or not
at all, and it's not a proper name, so let's make it all lowercase.
Also, add a newline, to make the string more readable. "%s" can expand to
something that is quite long.
test: disable QEMU based testing for TEST-16-EXTEND-TIMEOUT
The test is heavily dependent on timeouts, and if we are run in
potentially very slow QEMU instances there's a good chance we'll miss
some which we normally wouldn't miss. Hence, let's test this one in
nspawn only. Given that the test is purely in service management it
shouldn't matter whether it runs in nspawn or qemu, hence keep running
it in nspawn, but don't bother with qemu.
Similar, do this for TEST-03-JOBS, too, which operates with relatively
short sleep times internally.
core: move destruction of old time event sources to manager_setup_time_change()
It's a bit prettier that day as the function won't silently overwrite
any possibly pre-initialized field, and destroy it right before we
allocate a new event source.
sd-event: add test for the new sd_event_add_inotify() API
This tests a couple of corner cases of the sd-event API including
changing priorities of existing event sources, as well as overflow
conditions of the inotify queue.
sd-event: add new API for subscribing to inotify events
This adds a new call sd_event_add_inotify() which allows watching for
inotify events on specified paths.
sd-event will try to minimize the number of inotify fds allocated, and
will try to add file watches to the same inotify fd objects as far as
that's possible. Doing this kind of inotify object should optimize
behaviour in programs that watch a limited set of mostly independent
files as in most cases a single inotify object will suffice for watching
all files.
Traditionally, this kind of coalescing logic (i.e. that multiple event
sources are implemented on top of a single inotify object) was very hard
to do, as the inotify API had serious limitations: it only allowed
adding watches by path, and would implicitly merge watches installed on
the same node via different path, without letting the caller know about
whether such merging took place or not.
With the advent of O_PATH this issue can be dealt with to some point:
instead of adding a path to watch to an inotify object with
inotify_add_watch() right away, we can open the path with O_PATH first,
call fstat() on the fd, and check the .st_dev/.st_ino fields of that
against a list of watches we already have in place. If we find one we
know that the inotify_add_watch() will update the watch mask of the
existing watch, otherwise it will create a new watch. To make this
race-free we use inotify_add_watch() on the /proc/self/fd/ path of the
O_PATH fd, instead of the original path, so that we do the checking and
watch updating with guaranteed the same inode.
This approach let's us deal safely with inodes that may appear under
various different paths (due to symlinks, hardlinks, bind mounts, fs
namespaces). However it's not a perfect solution: currently the kernel
has no API for changing the watch mask of an existing watch -- unless
you have a path or fd to the original inode. This means we can "merge"
the watches of the same inode of multiple event sources correctly, but
we cannot "unmerge" it again correctly in many cases, as access to the
original inode might have been lost, due to renames, mount/unmount, or
deletions. We could in theory always keep open an O_PATH fd of the inode
to watch so that we can change the mask anytime we want, but this is
highly problematics, as it would consume too many fds (and in fact the
scarcity of fds is the reason why watch descriptors are a separate
concepts from fds) and would keep the backing mounts busy (wds do not
keep mounts busy, fds do). The current implemented approach to all this:
filter in userspace and accept that the watch mask on some inode might
be higher than necessary due to earlier installed event sources that
might have ceased to exist. This approach while ugly shouldn't be too
bad for most cases as the same inodes are probably wacthed for the same
masks in most implementations.
In order to implement priorities correctly a seperate inotify object is
allocated for each priority that is used. This way we get separate
per-priority event queues, of which we never dequeue more than a few
events at a time.