Closes https://github.com/systemd/systemd/issues/35405. Apparently some
watchdog devices can be opened, but then the pings start failing after some
time. Since the timestamp of the last successful ping is not updated, we try to
ping again immediately, causing a busy loop and excessive logging.
After trying a few different approaches to fit this into the existing framework
without changing the logic too much, I settled on an approach with a second
timestamp. In particular, the timestamp of the last successful ping is public,
exposed as WatchdogLastPingTimestamp over dbus. It'd be wrong to redefine this
to mean the last ping *attempt*. So we need a second timestamp in some form.
Also, if we give up on pinging, we probably should attempt to disarm the
watchdog. It's possible that the pinging fails, but the watchdog would still
fire. I don't think we want that, since it seems that our internal loop is
working, it's just the watchdog that is broken.
Structured message with SD_MESSAGE_WATCHDOG_PING_FAILED is logged if we fail
to ping.
I tested this by attaching gdb to pid 1 and calling close(watchdog_fd).
We get a bunch of warning messages and then an attempt to close the watchdog:
Mar 21 15:46:17 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:20 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:23 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:26 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:29 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:32 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:35 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:37 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:40 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:43 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:46 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:49 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:52 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:55 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0: Bad file descriptor
Mar 21 15:46:58 fedora systemd[1]: Failed to ping hardware watchdog /dev/watchdog0, closing watchdog after 15 attempts: Bad file descriptor
Mar 21 15:46:58 fedora systemd[1]: Failed to disable hardware watchdog, ignoring: Bad file descriptor
Mar 21 15:46:58 fedora systemd[1]: Failed to disarm watchdog timer, ignoring: Bad file descriptor