rule-syntax-check: fix handling of runaway strings in comma splitting (#8298)
A runaway string should still be returned by the code that splits on
commas, so add a '?' to the regex so that the last '"?' in a string
still produces a valid block for the split code.
Tested:
ACTION=="remove\"GOTO=""
Which then produced:
$ test/rule-syntax-check.py src/login/70-uaccess.rules
# looking at src/login/70-uaccess.rules
Invalid line src/login/70-uaccess.rules:10: ACTION=="remove\"GOTO=""
clause: ACTION=="remove\"GOTO=""
seccomp: rework functions for parsing system call filters
This reworks system call filter parsing, and replaces a couple of "bool"
function arguments by a single flags parameter.
This shouldn't change behaviour, except for one case: when we
recursively call our parsing function on our own syscall list, then
we'll lower the log level to LOG_DEBUG from LOG_WARNING, because at that
point things are just a problem in our own code rather than in the user
configuration we are parsing, and we shouldn't hence generate confusing
warnings about syntax errors.
It doesn't work, spits out only rubbish and was already excluded of
run-coccinelle.sh. It's a pitty it doesn't work, but let's drop this
dead piece of code for now.
doc: add a new doc/ directory, and move two markdown docs into them
I figure sooneror later we'll have more of these docs, hence let's give
them a clean place to be.
This leaves NEWS and README/README.md as well as the LICENSE texts in
the root directory of the project since that appears to be customary for
Free Software projects.
Franck Bui [Fri, 23 Feb 2018 15:54:40 +0000 (16:54 +0100)]
rule-syntax-check: PROGRAM is not supposed to get value assigned
In udev man page, "PROGRAM" key is part of the keys which are used for
matching purposes so it should only be used with the compare operator "==".
Actually it doesn't really make sense to assign it a value.
udev code allows both "=" and "==" for PROGRAM and both are handled the same
way but for consistencies it's better to have only the compare operator allowed
by the rule syntax checker.
No rules shipped by systemd use PROGRAM key so nothing need to be changed in
our rule files.
gcc warns about unitialized memory access because it notices that ssize_t which
is < 0 could be cast to positive int value. We know that this can't really
happen because only -1 can be returned, but OTOH, in principle a large
*positive* value cannot be cast properly. This is unlikely too, since xattrs
cannot be too large, but it seems cleaner to just use a size_t to return the
value and avoid the cast altoghter. This makes the code simpler and gcc is
happy too.
The following warning goes away:
[113/1502] Compiling C object 'src/basic/basic@sta/xattr-util.c.o'.
In file included from ../src/basic/alloc-util.h:28:0,
from ../src/basic/xattr-util.c:30:
../src/basic/xattr-util.c: In function ‘fd_getcrtime_at’:
../src/basic/macro.h:207:60: warning: ‘b’ may be used uninitialized in this function [-Wmaybe-uninitialized]
UNIQ_T(A,aq) < UNIQ_T(B,bq) ? UNIQ_T(A,aq) : UNIQ_T(B,bq); \
^
../src/basic/xattr-util.c:155:19: note: ‘b’ was declared here
usec_t a, b;
^
Before
systemd-detect-virt[18498]: No virtualization found in DMI
systemd-detect-virt[18498]: No virtualization found in CPUID
systemd-detect-virt[18498]: Virtualization XEN not found, /proc/xen does not exist
systemd-detect-virt[18498]: This platform does not support /proc/device-tree
systemd-detect-virt[18498]: Failed to check for virtualization: No such file or directory
The first four lines are at debug level, so the user would only see that last
one usually, which is not very enlightening.
This now becomes:
systemd-detect-virt[21172]: No virtualization found in DMI
systemd-detect-virt[21172]: No virtualization found in CPUID
systemd-detect-virt[21172]: Virtualization XEN not found, /proc/xen does not exist
systemd-detect-virt[21172]: This platform does not support /proc/device-tree
systemd-detect-virt[21172]: /proc/cpuinfo not found, assuming no UML virtualization.
systemd-detect-virt[21172]: This platform does not support /proc/sysinfo
systemd-detect-virt[21172]: Found VM virtualization none
systemd-detect-virt[21172]: none
basic/log: add an assert that does not recurse into logging functions
Then it can be used in the asserts in logging functions without causing
infinite recursion. The error is just printed to stderr, it should be
good enough for the common case.
gcc-8 throws an error if it knows snprintf might truncate output and the
return value is ignored:
../src/udev/udev-builtin-net_id.c: In function 'dev_pci_slot':
../src/udev/udev-builtin-net_id.c:297:47: error: '%s' directive output may be truncated writing up to 255 bytes into a region of size between 0 and 4095 [-Werror=format-truncation=]
snprintf(str, sizeof str, "%s/%s/address", slots, dent->d_name);
^~
../src/udev/udev-builtin-net_id.c:297:17: note: 'snprintf' output between 10 and 4360 bytes into a destination of size 4096
snprintf(str, sizeof str, "%s/%s/address", slots, dent->d_name);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
Let's check all return values. This actually makes the code better, because there's
no point in trying to open a file when the name has been truncated, etc.
basic/log: make sure header is printed correctly, add test
If log_do_header() was called with overly long parameters, it'd generate
improper output. Essentially, it'd be truncated at random point, in particular
missing a newline at the end, so it'd run with the next field, usually MESSAGE=.
log_do_header is called with parameters from compiled code (file name, lien
nubmer, etc), so in practice this was unlikely to ever be a problem, but it is
possible. In particular, if systemd was compiled from sources in some deeply
nested directory (which happens for example in mock and other build roots), the
filename could be very long.
As a safety measure, let's truncate all parameters to 256 bytes. So we have
5 fields which are 256 bytes (plus the field name prefix), and a few other
fields with fixed width. This must always fit in the 2048 byte buffer.
I don't think there's much gain in calculating the required length precisely,
since it's a lot of fields and a few bytes allocated on the stack don't matter.
basic/log: fix confusion with parameters to log_dispatch_internal
log_dispatch_internal has only one caller where the extra_field/extra
params are not null: log_unit_full. When log_unit_full() was called,
when we got to log_dispatch_internal, our header would look like this:
PRIORITY=7
SYSLOG_FACILITY=3
CODE_FILE=../src/core/manager.c
CODE_LINE=2145
CODE_FUNC=manager_invoke_sigchld_event
USER_UNIT=gnome-terminal-server.service
65dffa7a3b984a6d9a46f0b8fb57710bUSER_INVOCATION_ID=
SYSLOG_IDENTIFIER=systemd
It took me a while to understand why I'm not seeing mangled messages in the
journal (after all, "" is a valid rvalue for log messages). The answer is that
journald rejects any field name which starts with a digit, and the MESSAGE_ID
that was used here starts with a digit. Hence, those lines would be silently
filtered out.
Peter Hutterer [Fri, 23 Feb 2018 08:36:45 +0000 (18:36 +1000)]
udev: don't assign INPUT_ID_MOUSE to a touchpad/joystick/touchscreen (#8259)
If a touchpad has MT axes only but not ABS_X/ABS_Y (DualShock 4 controller),
then we hit both the conditions is_touchpad and the later check for
!has_abs_axes here, assigning is_mouse and ID_INPUT_MOUSE later.
This is a bug, we historically only assigned either of of the pointing device
tags ID_INPUT_MOUSE/TOUCHPAD/JOYSTICK/TOUCHSCREEN, never multiple of them.
Note that we cannot just check for has_abs_axes and has_mt_coordinates because
the apple touch mouse has both. We really need to check if the device has
already been assigned something else.
Alan Jenkins [Thu, 22 Feb 2018 20:38:44 +0000 (20:38 +0000)]
login: fix user@.service case, so we don't allow nested sessions (#8051)
> logind sessions are mostly bound to the audit session concept, and audit
> sessions remain unaffected by "su", in fact they are defined to be
> "sealed off", i.e. in a way that if a process entered a session once, it
> will always stay with it, and so will its children, i.e. the only way to
> get a new session is by forking off something off PID 1 (or something
> similar) that never has been part of a session.
The code had a gap. user@.service is a special case PAM session which does
not create a logind session. Let's remember to check for it.
Fix format-truncation compile failure by typecasting USB IDs (#8250)
This patch adds safe_atoux16 for parsing an unsigned hexadecimal 16bit int, and
uses that for parsing USB device and vendor IDs.
This fixes a compile error with gcc-8 because while we know that USB IDs are 2 bytes,
the compiler does not know that.
../src/udev/udev-builtin-hwdb.c:80:38: error: '%04X' directive output may be
truncated writing between 4 and 8 bytes into a region of size between 2 and 6
[-Werror=format-truncation=]
Signed-off-by: Adam Williamson <awilliam@redhat.com> Signed-off-by: Patrick Uiterwijk <puiterwijk@redhat.com>
libsystemd-network: fix endianness in ARP BPF filter (#8255)
Commit f11cba7479fe ("libsystemd-network: fix unaligned loads (issue #7654)")
changed the way in which the MAC address is read to use native endiannes:
This is wrong because loads done with BPF_LD + BPF_ABS are big-endian, as it
can be seen for the ethertype and arp-operation loads above in the
filter. Also, the same commit changed:
htobe32(*((unsigned int *)x) -> unaligned_read_be32(x)
in _bind_raw_socket(), which is the correct form.
The commit broke IPv4LL in presence of loops, as the sender now considers its
own packets as conflicting.
systemctl,man: use UNIT as the placeholder for a unit name
NAME is kind of meaningless, because everything has a name. "Unit"
makes it more obvious that a name of a unit is necessary. I was always
momentarily baffled by "set-property NAME ASSIGNMENT...", where there
are two objects (the unit and the property), and it's not clear which of
the two "NAME" is supposed to signify.
Sergey Ptashnick [Thu, 22 Feb 2018 13:25:11 +0000 (16:25 +0300)]
Update Russian translation (#8248)
Used "in"-form here (i.e. "зарегистрировать службу *в* DNS-SD") because
simply "служба DNS-SD" may be confused with resolved itself (at least in
Russian).
nologin: extend the /run/nologin descriptions a bit (#8244)
This is an attempt to improve #8228 a bit, by extending the /run/nologin
a bit, but still keeping it somewhat brief.
On purpose I used the vague wording "unprivileged user" rather than
"non-root user" so that pam_nologin can be updated to disable its
behaviour for members of the "wheel" group one day, and our messages
would still make sense.
When the user is dynamic, and we are setting up state, cache, or logs dirs,
behaviour is unchanged, we always do a recursive chown. This is necessary
because the user number might change between invocations.
But when setting up a directory for non-dynamic user, or a runtime directory
for a dynamic user, do any ownership or mode changes only when the directory
is initially created. Nothing says that the files under those directories have
to be all recursively owned by our user. This restores behaviour before 3536f49e8fa281539798a7bc5004d73302f39673, so modifications to the state of
the runtime directory persist between ExecStartPre's and ExecStart's, and even
longer in case the directory is persistent.
I think it _would_ be a nice property if setting a user would automatically
propagate to ownership of any Runtime/Logs/Cache directories. But this is
incompatible with another nice property, namely preserving changes to those
directories made by an admin, and with allowing change of ownership of files
in those directories by the service (e.g. to allow other users to access them).
Of the two, I think the second property is more important. Also, it's backwards
compatible.
There is no need to chmod a directory we just created, so move that step
up into a branch. After that, 'effective' is only used once, so get rid of
it too.
Generally we prefer 'return' from main() over exit() so that automatic
cleanups and such work correct. Let's do that in shutdown.c too, becuase
there's not really any reason not to.
With this we are pretty good in consistently using return from main()
rather than exit() all across the codebase. Yay!
basic: split out update_reboot_parameter_and_warn() into its own .c/.h files
This is primarily preparation for a follow-up commit that adds a common
implementation of the other side of the reboot parameter file, i.e. the
code that reads the file and issues reboot() for it.
basic: add a common syscall wrapper around reboot()
This mimics the raw_clone() call we have in place already and
establishes a new syscall wrapper raw_reboot() that wraps the kernel's
reboot() system call in a bit more low-level fashion that glibc's
reboot() wrapper. The main difference is that the extra "arg" argument
is supported.
Ultimately this just replaces the syscall wrapper implementation we
currently have at three places in our codebase by a single one.
With this change this means that all our syscall() invocations are
neatly separated out in static inline system call wrappers in our header
functions.
tree-wide: reopen log when we need to log in FORK_CLOSE_ALL_FDS children
In a number of occasions we use FORK_CLOSE_ALL_FDS when forking off a
child, since we don't want to pass fds to the processes spawned (either
because we later want to execve() some other process there, or because
our child might hang around for longer than expected, in which case it
shouldn't keep our fd pinned). This also closes any logging fds, and
thus means logging is turned off in the child. If we want to do proper
logging, explicitly reopen the logs hence in the child at the right
time.
This is particularly crucial in the umount/remount children we fork off
the shutdown binary, as otherwise the children can't log, which is
why #8155 is harder to debug than necessary: the log messages we
generate about failing mount() system calls aren't actually visible on
screen, as they done in the child processes where the log fds are
closed.
shutdown: explicitly set a log target in shutdown.c
We used to set this, but this was dropped when shutdown got taught to
get the target passed in from the regular PID 1. Let's readd this to
make things more explanatory, and cover all grounds, since after all the
target passed is in theory an optional part of the protocol between the
regular PID 1 and the shutdown PID 1.
log: only open kmsg on fallback if we actually want to use it
Previously, we'd try to open kmsg on failure of the journal/syslog even
if no automatic fallback to kmsg was requested — and we wouldn't even
use the open connection afterwards...
hwdb: drop bad definition for Cordless Wave Pro keyboard (#8230)
[I'm just submitting the solution originally suggested by @barzog.
Nevertheless, this looks pretty straightforward, we don't want to define
any keys on a universal receiver.
Note that this definition was added back in aedc2eddd16e48d468e6ad0aea2caf00c7d37365, when we didn't yet have
support for figuring out what hardware is connected behind a logitech
receiver.]
In 60-keyboard.hwdb there is a definition of # Cordless Wave Pro
evdev:input:b0003v046DpC52[9B]*
which in fact not a cordless keyboard but an USB receiver to which different
types of keyboard can be connected. The solution is to completely clean
definition evdev:input:b0003v046DpC52B* from there.
bpf: reset "extra" IP accounting counters when turning off IP accounting for a unit
We maintain an "extra" set of IP accounting counters that are used when
we systemd is reloaded to carry over the counters from the previous run.
Let's reset these to zero whenever IP accounting is turned off. If we
don't do this then turning off IP accounting and back on later wouldn't
reset the counters, which is quite surprising and different from how our
CPU time counting works.
bpf: rework how we keep track and attach cgroup bpf programs
So, the kernel's management of cgroup/BPF programs is a bit misdesigned:
if you attach a BPF program to a cgroup and close the fd for it it will
stay pinned to the cgroup with no chance of ever removing it again (or
otherwise getting ahold of it again), because the fd is used for
selecting which BPF program to detach. The only way to get rid of the
program again is to destroy the cgroup itself.
This is particularly bad for root the cgroup (and in fact any other
cgroup that we cannot realistically remove during runtime, such as
/system.slice, /init.scope or /system.slice/dbus.service) as getting rid
of the program only works by rebooting the system.
To counter this let's closely keep track to which cgroup a BPF program
is attached and let's implicitly detach the BPF program when we are
about to close the BPF fd.
This hence changes the bpf_program_cgroup_attach() function to track
where we attached the program and changes bpf_program_cgroup_detach() to
use this information. Moreover bpf_program_unref() will now implicitly
call bpf_program_cgroup_detach().
In order to simplify things, bpf_program_cgroup_attach() will now
implicitly invoke bpf_program_load_kernel() when necessary, simplifying
the caller's side.
Finally, this adds proper reference counting to BPF programs. This
is useful for working with two BPF programs in parallel: the BPF program
we are preparing for installation and the BPF program we so far
installed, shortening the window when we detach the old one and reattach
the new one.
bpf-program: make bpf_program_load_kernel() idempotent
Let's "seal" off the BPF program as soo as bpf_program_load_kernel() is
called, which allows us to make it idempotent: since the program can't
be modified anymore after being turned into a kernel object it's safe to
shortcut behaviour if called multiple times.
mount-setup: always use the same source as fstype for the API VFS we mount
So far, for all our API VFS mounts we used the fstype also as mount
source, let's do that for the cgroupsv2 mounts too. The kernel doesn't
really care about the source for API VFS, but it's visible to the user,
hence let's clean this up and follow the rule we otherwise follow.
bpf: use BPF_F_ALLOW_MULTI flag if it is available
This new kernel 4.15 flag permits that multiple BPF programs can be
executed for each packet processed: multiple per cgroup plus all
programs defined up the tree on all parent cgroups.
We can use this for two features:
1. Finally provide per-slice IP accounting (which was previously
unavailable)
2. Permit delegation of BPF programs to services (i.e. leaf nodes).
This patch beefs up PID1's handling of BPF to enable both.
Note two special items to keep in mind:
a. Our inner-node BPF programs (i.e. the ones we attach to slices) do
not enforce IP access lists, that's done exclsuively in the leaf-node
BPF programs. That's a good thing, since that way rules in leaf nodes
can cancel out rules further up (i.e. for example to implement a
logic of "disallow everything except httpd.service"). Inner node BPF
programs to accounting however if that's requested. This is
beneficial for performance reasons: it means in order to provide
per-slice IP accounting we don't have to add up all child unit's
data.
b. When this code is run on pre-4.15 kernel (i.e. where
BPF_F_ALLOW_MULTI is not available) we'll make IP acocunting on slice
units unavailable (i.e. revert to behaviour from before this commit).
For leaf nodes we'll fallback to non-ALLOW_MULTI mode however, which
means that BPF delegation is not available there at all, if IP
fw/acct is turned on for the unit. This is a change from earlier
behaviour, where we use the BPF_F_ALLOW_OVERRIDE flag, so that our
fw/acct would lose its effect as soon as delegation was turned on and
some client made use of that. I think the new behaviour is the safer
choice in this case, as silent bypassing of our fw rules is not
possible anymore. And if people want proper delegation then the way
out is a more modern kernel or turning off IP firewalling/acct for
the unit algother.
We make heavy use of BPF functionality these days, hence expose the BPF
file system too by default now. (Note however, that we don't actually
make use bpf file systems object yet, but we might later on too.)