resolved: respond to local resolver requests on 127.0.0.53:53
In order to improve compatibility with local clients that speak DNS directly
(and do not use NSS or our bus API) listen locally on 127.0.0.53:53 and process
any queries made that way.
Note that resolved does not implement a full DNS server on this port, but
simply enough to allow normal, local clients to resolve RRs through resolved.
Specifically it does not implement queries without the RD bit set (these are
requests where recursive lookups are explicitly disabled), and neither queries
with DNSSEC DO set in combination with DNSSEC CD (i.e. DNSSEC lookups with
validation turned off). It also refuses zone transfers and obsolete RR types.
All lookups done this way will be rejected with a clean error code, so that the
client side can repeat the query with a reduced feature set.
The code will set the DNSSEC AD flag however, depending on whether the data
resolved has been validated (or comes from a local, trusted source).
Lookups made via this mechanisms are propagated to LLMNR and mDNS as necessary,
but this is only partially useful as DNS packets cannot carry IP scope data
(i.e. the ifindex), and hence link-local addresses returned cannot be used
properly (and given that LLMNR/mDNS are mostly about link-local communication
this is quite a limitation). Also, given that DNS tends to use IDNA for
non-ASCII names, while LLMNR/mDNS uses UTF-8 lookups cannot be mapped 1:1.
In general this should improve compatibility with clients bypassing NSS but
it is highly recommended for clients to instead use NSS or our native bus API.
This patch also beefs up the DnsStream logic, as it reuses the code for local
TCP listening. DnsStream now provides proper reference counting for its
objects.
In order to avoid feedback loops resolved will no silently ignore 127.0.0.53
specified as DNS server when reading configuration.
resolved listens on 127.0.0.53:53 instead of 127.0.0.1:53 in order to leave
the latter free for local, external DNS servers or forwarders.
This also changes the "etc.conf" tmpfiles snippet to create a symlink from
/etc/resolv.conf to /usr/lib/systemd/resolv.conf by default, thus making this
stub the default mode of operation if /etc is not populated.
resolved: when using the ResolveRecord() bus call, adjust TTL for caching time
When we return the full RR wire data, let's make sure the TTL included in it is
adjusted by the time the RR sat in the cache.
As an optimization we do this only for ResolveRecord() and not for
ResolveHostname() and friends, since adjusting the TTL means copying the RR
object, and we don#t want to do that if there's no reason to.
(ResolveHostname() and friends don't return the TTL hence there's no reason to
in that case)
sd-bus: make sure bus_map_all_properties() handle booleans right
sd-bus generally exposes bools as "int" instead of "bool" in the public API.
This is relevant when unmarshaling booleans, as the relevant functions expect
an int* pointer and no bool* pointer. Since sizeof(bool) is not necessarily the
same as sizeof(int) this is problematic and might result in memory corruption.
Let's fix this, and make sure bus_map_all_properties() handles booleans as
ints, as the rest of sd-bus, and make all users of it expect the right thing.
It's weird having a "negative" function link_is_unmanaged(), let's invert it
and get rid of the negation this way, by renaming it to link_is_managed().
Internally we stored this as a positive boolean already, hence let's do this
for the function too.
Let's split the code from the inner loop out, into its own function
link_update_dns_server_one(). This matches how things are already handled for
the search domain logic. Also, this is preparation for a later commit that
persists DNS server data pushed in via the bus.
string-table: make sure DEFINE_STRING_TABLE_LOOKUP_WITH_BOOLEAN() handles NULL strings nicely
xyz_from_string() functions defined with DEFINE_STRING_TABLE_LOOKUP() properly
handle NULL strings already. make sure the equivalent functions defined with
DEFINE_STRING_TABLE_LOOKUP_WITH_BOOLEAN() do the same.
The new command shows the per-link and global DNS configuration currently in
effect. This is useful to quickly see the DNS settings resolved acquired from
networkd and that was pushed into it via the bus APIs.
resolved: export the effective per-link DNSSEC setting, not the internal one
Internally, we store the per-link DNSSEC setting as -1 (invalid) if there's no
link-specific setting configured, and the global setting should be used. When
exporting this one the bus we really should export the effective DNSSEC
setting however, i.e. return the global one if there's non set per-link.
Lukáš Nykrýn [Sun, 19 Jun 2016 17:22:46 +0000 (19:22 +0200)]
man: match runlevel symlinks recommendation with our makefile (#3563)
In makefile we create symlinks runlevel5.target to graphical.target and
runlevel2-4.target to multi-user.target. Let's say the same thing in
systemd.special manpage.
Fixes:
==27917== 3 bytes in 1 blocks are definitely lost in loss record 1 of 1
==27917== at 0x4C28BF6: malloc (vg_replace_malloc.c:299)
==27917== by 0x55083D9: strdup (in /usr/lib64/libc-2.22.so)
==27917== by 0x1140DA: find_converted_keymap (keymap-util.c:524)
==27917== by 0x110844: test_find_converted_keymap (test-keymap-util.c:52)
==27917== by 0x1124FE: main (test-keymap-util.c:213)
==27917==
Dave Reisner [Fri, 10 Jun 2016 13:50:16 +0000 (09:50 -0400)]
Ensure kdbus isn't used (#3501)
Delete the dbus1 generator and some critical wiring. This prevents
kdbus from being loaded or detected. As such, it will never be used,
even if the user still has a useful kdbus module loaded on their system.
Sort of fixes #3480. Not really, but it's better than the current state.
resolved: when restarting a transaction make sure to not touch it anymore (#3553)
dns_transaction_maybe_restart() is supposed to return 1 if the the transaction
has been restarted and 0 otherwise. dns_transaction_process_dnssec() relies on
this behaviour. Before this change in case of restart we'd call
dns_transaction_go() when restarting the lookup, returning its return value
unmodified. This is wrong however, as that function returns 1 if the
transaction is pending, and 0 if it completed immediately, which is a very
different set of return values. Fix this, by always returning 1 on redirection.
The wrong return value resulted in all kinds of bad memory accesses as we might
continue processing a transaction that was redirected and completed immediately
(and thus freed).
This patch also adds comments to the two functions to clarify the return values
for the future.
systemctl: delay pager/polkit agent opening as much as possible
In https://github.com/systemd/systemd/issues/3543, we would open the pager
before starting ssh, and the pipe fd was "leaked" into the ssh child as the
stderr fd. Previous commit fixes bus-socket to nullify stderr before launching
the child, but it seems reasonable to also delay starting the pager.
If we are going to croak when trying to open the transport, it seems better
to do this before starting the pager.
systemctl: make sure we terminate the bus connection first, and then close the pager (#3550)
If "systemctl -H" is used, let's make sure we first terminate the bus
connection, and only then close the pager. If done in this order ssh will get
an EOF on stdin (as we speak D-Bus through ssh's stdin/stdout), and then
terminate. This makes sure the standard error we were invoked on is released by
ssh, and only that makes sure we don't deadlock on the pager which waits for
all clients closing its input pipe.
(Similar fixes for the various other xyzctl tools that support both pagers and
-H)
If for whatever reason the file system is "corrupted", we want
to be resilient and ignore the error, as long as we can load the units
from a different place.
Arch bug https://bugs.archlinux.org/task/49547.
A user had an ntfs symlink (essentially a file) instead of a directory after
restoring from backup. We should just ignore that like we would treat a missing
directory, for general resiliency.
We should treat permission errors similarly. For example an unreadable
/usr/local/lib directory would prevent (user) instances of systemd from
loading any units. It seems better to continue.
core: set $JOURNAL_STREAM to the dev_t/ino_t of the journal stream of executed services
This permits services to detect whether their stdout/stderr is connected to the
journal, and if so talk to the journal directly, thus permitting carrying of
metadata.
- The passed max_length is also applied to the "comm" name, if comm_fallback is
set.
- The right thing happens if max_length == 1 is specified
- when the cmdline "foobar" is abbreviated to 6 characters the result is not
"foobar" instead of "foo...".
- trailing whitespace are removed before the ... suffix is appended. The 7
character abbreviation of "foo barz" is hence "foo..." instead of "foo ...".
- leading whitespace are suppressed from the cmdline
resolved: in the ResolveHostname() bus call, accept IP addresses with scope
When we get a literal IP address as string that includes a zone suffix, process
this properly and return the parsed ifindex back to the client, and include it
in the canonical name in case of a link-local IP address.
Apparently newer gcc versions are a bit more forgiving when assigning an
"unsigned char*" pointer to something of a different type. Let's add the
missing cast so that old gcc versions are fine, too.
systemctl: allow percent-based MemoryLimit= settings via systemctl set-property
The unit files already accept relative, percent-based memory limit
specification, let's make sure "systemctl set-property" support this too.
Since we want the physical memory size of the destination machine to apply we
pass the percentage in a new set of properties that only exist for this
purpose, and can only be set.
core: optionally, accept a percentage value for MemoryLimit= and related settings
If a percentage is used, it is taken relative to the installed RAM size. This
should make it easier to write generic unit files that adapt to the local system.
util: when determining the amount of memory on this system, take cgroup limit into account
When determining the amount of RAM in the system, let's make sure we also read
the root-level cgroup memory limit into account. This isn't particularly useful
on the host, but in containers it makes sure that whatever memory the container
got assigned is actually used for RAM size calculations.
Lukáš Nykrýn [Tue, 14 Jun 2016 12:20:56 +0000 (14:20 +0200)]
manager: reduce complexity of unit_gc_sweep (#3507)
When unit is marked as UNSURE, we are trying to find if it state was
changed over and over again. So lets not go through the UNSURE states
again. Also when we find a GOOD unit lets propagate the GOOD state to
all units that this unit reference.
This is a problem on machines with a lot of initscripts with different
starting priority, since those units will reference each other and the
original algorithm might get to n! complexity.
Thanks HATAYAMA Daisuke for the expand_good_state code.
Andrew Jeddeloh [Tue, 14 Jun 2016 09:09:06 +0000 (02:09 -0700)]
build: fix missing symbol for old kernel headers (#3530)
Fix issue where IN6_ADDR_GEN_MODE_STABLE_PRIVACY is undefined but
IFLA_INET6_ADDR_GEN_MODE is defined and thus the former does not get
fixed in missing.h. This occurs with kernel headers new enough to have
the IFLA_INET6_ADDR_GEN_MODE but old enough to not yet have
IN6_ADDR_GEN_MODE_STABLE_PRIVACY (e.g. 3.18).
This reworks "systemctl status" and "systemctl show" a bit. It removes the
definition of the `property_info` structure, because we can simply reuse the
existing UnitStatusInfo type for that.
The "could not be found" message is now printed by show_one() itself (and not
its caller), so that it is shown regardless by who the function is called.
(This makes it necessary to pass the unit name to the function.)
This also adds all properties found to a set, and then checks if any of the
properties passed via "--property=" is mising in it, if so, a proper error is
generated.
Support for checking the PID file of a unit is removed, as this cannot be done
reasonably client side (since the systemd instance we are talking to might sit
on another host)
Let's block access to the kernel keyring and a number of obsolete system calls.
Also, update list of syscalls that may alter the system clock, and do raw IO
access. Filter ptrace() if CAP_SYS_PTRACE is not passed to the container and
acct() if CAP_SYS_PACCT is not passed.
This also changes things so that kexec(), some profiling calls, the swap calls
and quotactl() is never available to containers, not even if CAP_SYS_ADMIN is
passed. After all we currently permit CAP_SYS_ADMIN to containers by default,
but these calls should not be available, even then.
This adds three new seccomp syscall groups: @keyring for kernel keyring access,
@cpu-emulation for CPU emulation features, for exampe vm86() for dosemu and
suchlike, and @debug for ptrace() and related calls.
Also, the @clock group is updated with more syscalls that alter the system
clock. capset() is added to @privileged, and pciconfig_iobase() is added to
@raw-io.
Finally, @obsolete is a cleaned up. A number of syscalls that never existed on
Linux and have no number assigned on any architecture are removed, as they only
exist in the man pages and other operating sytems, but not in code at all.
create_module() is moved from @module to @obsolete, as it is an obsolete system
call. mem_getpolicy() is removed from the @obsolete list, as it is not
obsolete, but simply a NUMA API.
Susant Sahani [Mon, 13 Jun 2016 13:57:38 +0000 (19:27 +0530)]
networkd: fix NULL pointer (#3523)
Not every link has kind associated with it.
(gdb) r
Starting program: /home/sus/tt/systemd/systemd-networkd
Missing separate debuginfos, use: dnf debuginfo-install
glibc-2.23.1-7.fc24.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
vboxnet0: Gained IPv6LL
wlp3s0: Gained IPv6LL
enp0s25: Gained IPv6LL
Enumeration completed
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6e27ade in __strcmp_sse2_unaligned () from /lib64/libc.so.6
(gdb) bt
src/network/networkd-link.c:2008
src/network/networkd-link.c:2059
src/network/networkd-link.c:2442
m=0x555555704a30, userdata=0x55555570bfe0) at src/network/networkd-link.c:2497
at src/libsystemd/sd-netlink/sd-netlink.c:347
src/libsystemd/sd-netlink/sd-netlink.c:402
src/libsystemd/sd-netlink/sd-netlink.c:432
userdata=0x5555556f7470) at src/libsystemd/sd-netlink/sd-netlink.c:739
src/libsystemd/sd-event/sd-event.c:2275
src/libsystemd/sd-event/sd-event.c:2626
timeout=18446744073709551615) at src/libsystemd/sd-event/sd-event.c:2685
bus=0x5555556f9af0, name=0x555555692315 "org.freedesktop.network1",
timeout=30000000,
check_idle=0x55555556ac84 <manager_check_idle>, userdata=0x5555556f6b20) at
src/shared/bus-util.c:134
src/network/networkd-manager.c:1128
src/network/networkd.c:127
(gdb) f 1
src/network/networkd-link.c:2008
2008 if (link->network->bridge || streq("bridge", link->kind)) {
(gdb) p link->kind
$1 = 0x0
Jouke Witteveen [Mon, 13 Jun 2016 10:50:12 +0000 (12:50 +0200)]
core/execute: pass env vars to PAM session setup (#3503)
Move the merger of environment variables before setting up the PAM
session and pass the aggregate environment to PAM setup. This allows
control over the PAM session hooks through environment variables.
PAM session initiation may update the environment. On successful
initiation of a PAM session, we adopt the environment of the
PAM context.
... as well as halt/poweroff/kexec/suspend/hibernate/hybrid-sleep.
Running those commands will fail in user mode, but we try to set the wall
message first, which might even succeed for privileged users. Best to nip
the whole sequence in the bud.
Our functions that query /proc/pid/ support using pid==0 to mean
self. get_process_id also seemed to support that, but it was not implemented
correctly: the result should be in *uid, not returned, and also it gave
completely bogus result when called from get_process_gid(). But afaict,
get_process_{uid,gid} were never called with pid==0, so it's not an actual
bug. Remove the broken code to avoid confusion.