]> git.ipfire.org Git - thirdparty/systemd.git/blame - docs/AUTOMATIC_BOOT_ASSESSMENT.md
update TODO
[thirdparty/systemd.git] / docs / AUTOMATIC_BOOT_ASSESSMENT.md
CommitLineData
c3e270f4
FB
1---
2title: Automatic Boot Assessment
4cdca0af 3category: Booting
b41a3f66 4layout: default
0aff7b75 5SPDX-License-Identifier: LGPL-2.1-or-later
c3e270f4
FB
6---
7
0c74648b
LP
8# Automatic Boot Assessment
9
10systemd provides support for automatically reverting back to the previous
db82e667 11version of the OS or kernel in case the system consistently fails to boot. The
db811444 12[Boot Loader Specification](https://uapi-group.org/specifications/specs/boot_loader_specification/#boot-counting)
db82e667
ZJS
13describes how to annotate boot loader entries with a counter that specifies how
14many attempts should be made to boot it. This document describes how systemd
15implements this scheme.
16
17The many different components involved in the implementation may be used
e347d53a 18independently and in combination with other software to, for example, support
db82e667
ZJS
19other boot loaders or take actions outside of the boot loader.
20
21Here's a brief overview of the complete set of components:
0c74648b 22
82a0ffe5
ZJS
23* The
24 [`kernel-install(8)`](https://www.freedesktop.org/software/systemd/man/kernel-install.html)
25 script can optionally create boot loader entries that carry an initial boot
26 counter (the initial counter is configurable in `/etc/kernel/tries`).
27
0c74648b
LP
28* The
29 [`systemd-boot(7)`](https://www.freedesktop.org/software/systemd/man/systemd-boot.html)
82a0ffe5 30 boot loader optionally maintains a per-boot-loader-entry counter described by
db811444 31 the [Boot Loader Specification](https://uapi-group.org/specifications/specs/boot_loader_specification/#boot-counting)
82a0ffe5
ZJS
32 that is decreased by one on each attempt to boot the entry, prioritizing
33 entries that have non-zero counters over those which already reached a
34 counter of zero when choosing the entry to boot.
35
36* The `boot-complete.target` target unit (see
37 [`systemd.special(7)`](https://www.freedesktop.org/software/systemd/man/systemd.special.html))
38 serves as a generic extension point both for units that are necessary to
39 consider a boot successful (e.g. `systemd-boot-check-no-failures.service`
40 described below), and units that want to act only if the boot is
41 successful (e.g. `systemd-bless-boot.service` described below).
42
43* The
44 [`systemd-boot-check-no-failures.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-boot-check-no-failures.service.html)
45 service is a simple service health check tool. When enabled it becomes an
46 indirect dependency of `systemd-bless-boot.service` (by means of
47 `boot-complete.target`, see below), ensuring that the boot will not be
48 considered successful if there are any failed services.
0c74648b
LP
49
50* The
51 [`systemd-bless-boot.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-bless-boot.service.html)
52 service automatically marks a boot loader entry, for which boot counting as
53 mentioned above is enabled, as "good" when a boot has been determined to be
54 successful, thus turning off boot counting for it.
55
56* The
57 [`systemd-bless-boot-generator(8)`](https://www.freedesktop.org/software/systemd/man/systemd-bless-boot-generator.html)
58 generator automatically pulls in `systemd-bless-boot.service` when use of
59 `systemd-boot` with boot counting enabled is detected.
60
ff2c2d08 61## Details
0c74648b 62
db811444
ZJS
63As described in the
64[Boot Loader Specification](https://uapi-group.org/specifications/specs/boot_loader_specification/#boot-counting),
db82e667
ZJS
65the boot counting data is stored in the file name of the boot loader entries as
66a plus (`+`), followed by a number, optionally followed by `-` and another
67number, right before the file name suffix (`.conf` or `.efi`).
68
69The first number is the "tries left" counter encoding how many attempts to boot
70this entry shall still be made. The second number is the "tries done" counter,
71encoding how many failed attempts to boot it have already been made. Each time
72a boot loader entry marked this way is booted the first counter is decremented,
73and the second one incremented. (If the second counter is missing, then it is
74assumed to be equivalent to zero.) If the boot attempt completed successfully
75the entry's counters are removed from the name (entry state "good"), thus
76turning off boot counting for the future.
0c74648b
LP
77
78## Walkthrough
79
80Here's an example walkthrough of how this all fits together.
81
db82e667 821. The user runs `echo 3 >/etc/kernel/tries` to enable boot counting.
0c74648b
LP
83
842. A new kernel is installed. `kernel-install` is used to generate a new boot
85 loader entry file for it. Let's say the version string for the new kernel is
86 `4.14.11-300.fc27.x86_64`, a new boot loader entry
87 `/boot/loader/entries/4.14.11-300.fc27.x86_64+3.conf` is hence created.
88
db82e667 893. The system is booted for the first time after the new kernel has been
0c74648b
LP
90 installed. The boot loader now sees the `+3` counter in the entry file
91 name. It hence renames the file to `4.14.11-300.fc27.x86_64+2-1.conf`
db82e667
ZJS
92 indicating that at this point one attempt has started.
93 After the rename completed, the entry is booted as usual.
0c74648b
LP
94
954. Let's say this attempt to boot fails. On the following boot the boot loader
e347d53a 96 will hence see the `+2-1` tag in the name, and will hence rename the entry file to
0c74648b
LP
97 `4.14.11-300.fc27.x86_64+1-2.conf`, and boot it.
98
e347d53a 995. Let's say the boot fails again. On the subsequent boot the loader will hence
0c74648b
LP
100 see the `+1-2` tag, and rename the file to
101 `4.14.11-300.fc27.x86_64+0-3.conf` and boot it.
102
db82e667
ZJS
1036. If this boot also fails, on the next boot the boot loader will see the tag
104 `+0-3`, i.e. the counter reached zero. At this point the entry will be
105 considered "bad", and ordered after all non-bad entries. The next newest
106 boot entry is now tried, i.e. the system automatically reverted to an
107 earlier version.
0c74648b 108
b2454670 109The above describes the walkthrough when the selected boot entry continuously
0c74648b
LP
110fails. Let's have a look at an alternative ending to this walkthrough. In this
111scenario the first 4 steps are the same as above:
112
1131. *as above*
114
1152. *as above*
116
1173. *as above*
118
1194. *as above*
120
1215. Let's say the second boot succeeds. The kernel initializes properly, systemd
122 is started and invokes all generators.
123
1246. One of the generators started is `systemd-bless-boot-generator` which
125 detects that boot counting is used. It hence pulls
126 `systemd-bless-boot.service` into the initial transaction.
127
1287. `systemd-bless-boot.service` is ordered after and `Requires=` the generic
129 `boot-complete.target` unit. This unit is hence also pulled into the initial
130 transaction.
131
1328. The `boot-complete.target` unit is ordered after and pulls in various units
133 that are required to succeed for the boot process to be considered
134 successful. One such unit is `systemd-boot-check-no-failures.service`.
135
82a0ffe5
ZJS
1369. The graphical desktop environment installed on the machine starts a
137 service called `graphical-session-good.service`, which is also ordered before
138 `boot-complete.target`, that registers a D-Bus endpoint.
139
14010. `systemd-boot-check-no-failures.service` is run after all its own
0c74648b
LP
141 dependencies completed, and assesses that the boot completed
142 successfully. It hence exits cleanly.
143
82a0ffe5
ZJS
14411. `graphical-session-good.service` waits for a user to log in. In the user
145 desktop environment, one minute after the user has logged in and started the
146 first program, a user service is invoked which makes a D-Bus call to
147 `graphical-session-good.service`. Upon receiving that call,
148 `graphical-session-good.service` exits cleanly.
149
15012. This allows `boot-complete.target` to be reached. This signifies to the
0c74648b
LP
151 system that this boot attempt shall be considered successful.
152
82a0ffe5 15313. Which in turn permits `systemd-bless-boot.service` to run. It now
0c74648b
LP
154 determines which boot loader entry file was used to boot the system, and
155 renames it dropping the counter tag. Thus
156 `4.14.11-300.fc27.x86_64+1-2.conf` is renamed to
157 `4.14.11-300.fc27.x86_64.conf`. From this moment boot counting is turned
db82e667 158 off for this entry.
0c74648b 159
82a0ffe5 16014. On the following boot (and all subsequent boots after that) the entry is
0c74648b
LP
161 now seen with boot counting turned off, no further renaming takes place.
162
ff2c2d08 163## How to adapt this scheme to other setups
0c74648b
LP
164
165Of the stack described above many components may be replaced or augmented. Here
166are a couple of recommendations.
167
1681. To support alternative boot loaders in place of `systemd-boot` two scenarios
169 are recommended:
170
db82e667
ZJS
171 a. Boot loaders already implementing the Boot Loader Specification can
172 simply implement the same rename logic, and thus integrate fully with
173 the rest of the stack.
0c74648b
LP
174
175 b. Boot loaders that want to implement boot counting and store the counters
176 elsewhere can provide their own replacements for
177 `systemd-bless-boot.service` and `systemd-bless-boot-generator`, but should
178 continue to use `boot-complete.target` and thus support any services
179 ordered before that.
180
1812. To support additional components that shall succeed before the boot is
182 considered successful, simply place them in units (if they aren't already)
183 and order them before the generic `boot-complete.target` target unit,
184 combined with `Requires=` dependencies from the target, so that the target
185 cannot be reached when any of the units fail. You may add any number of
186 units like this, and only if they all succeed the boot entry is marked as
187 good. Note that the target unit shall pull in these boot checking units, not
188 the other way around.
189
82a0ffe5
ZJS
190 Depending on the setup, it may be most convenient to pull in such units
191 through normal enablement symlinks, or during early boot using a
192 [`generator`](https://www.freedesktop.org/software/systemd/man/systemd.generator.html),
193 or even during later boot. In the last case, care must be taken to ensure
194 that the start job is created before `boot-complete.target` has been
195 reached.
196
0c74648b
LP
1973. To support additional components that shall only run on boot success, simply
198 wrap them in a unit and order them after `boot-complete.target`, pulling it
199 in.
200
72ceee43
LB
201 Such unit would be typically wanted (or required) by one of the
202 [`bootup`](https://www.freedesktop.org/software/systemd/man/bootup.html) targets,
e347d53a 203 for example, `multi-user.target`. To avoid potential loops due to conflicting
72ceee43
LB
204 [default dependencies](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Default%20Dependencies)
205 ordering, it is recommended to also add an explicit dependency (e.g.
206 `After=multi-user.target`) to the unit. This overrides the implicit ordering
207 and allows `boot-complete.target` to start after the given bootup target.
208
ff2c2d08 209## FAQ
0c74648b 210
db82e667
ZJS
2111. *I have a service which — when it fails — should immediately cause a
212 reboot. How does that fit in with the above?* — That's orthogonal to
0c74648b
LP
213 the above, please use `FailureAction=` in the unit file for this.
214
db82e667 2152. *Under some condition I want to mark the current boot loader entry as bad
0c74648b
LP
216 right-away, so that it never is tried again, how do I do that?* — You may
217 invoke `/usr/lib/systemd/systemd-bless-boot bad` at any time to mark the
218 current boot loader entry as "bad" right-away so that it isn't tried again
219 on later boots.