]> git.ipfire.org Git - thirdparty/systemd.git/blame - docs/AUTOMATIC_BOOT_ASSESSMENT.md
Update DISCOVERABLE_PARTITIONS.md
[thirdparty/systemd.git] / docs / AUTOMATIC_BOOT_ASSESSMENT.md
CommitLineData
c3e270f4
FB
1---
2title: Automatic Boot Assessment
4cdca0af 3category: Booting
b41a3f66 4layout: default
c3e270f4
FB
5---
6
0c74648b
LP
7# Automatic Boot Assessment
8
9systemd provides support for automatically reverting back to the previous
10version of the OS or kernel in case the system consistently fails to boot. This
11support is built into various of its components. When used together these
12components provide a complete solution on UEFI systems, built as add-on to the
13[Boot Loader
14Specification](https://systemd.io/BOOT_LOADER_SPECIFICATION). However, the
15different components may also be used independently, and in combination with
16other software, to implement similar schemes, for example with other boot
17loaders or for non-UEFI systems. Here's a brief overview of the complete set of
18components:
19
20* The
21 [`systemd-boot(7)`](https://www.freedesktop.org/software/systemd/man/systemd-boot.html)
22 boot loader optionally maintains a per-boot-loader-entry counter that is
23 decreased by one on each attempt to boot the entry, prioritizing entries that
24 have non-zero counters over those which already reached a counter of zero
25 when choosing the entry to boot.
26
27* The
28 [`systemd-bless-boot.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-bless-boot.service.html)
29 service automatically marks a boot loader entry, for which boot counting as
30 mentioned above is enabled, as "good" when a boot has been determined to be
31 successful, thus turning off boot counting for it.
32
33* The
34 [`systemd-bless-boot-generator(8)`](https://www.freedesktop.org/software/systemd/man/systemd-bless-boot-generator.html)
35 generator automatically pulls in `systemd-bless-boot.service` when use of
36 `systemd-boot` with boot counting enabled is detected.
37
38* The
39 [`systemd-boot-check-no-failures.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-boot-check-no-failures.service.html)
40 service is a simple health check tool that determines whether the boot
b2454670 41 completed successfully. When enabled it becomes an indirect dependency of
0c74648b
LP
42 `systemd-bless-boot.service` (by means of `boot-complete.target`, see
43 below), ensuring that the boot will not be considered successful if there are
44 any failed services.
45
46* The `boot-complete.target` target unit (see
47 [`systemd.special(7)`](https://www.freedesktop.org/software/systemd/man/systemd.special.html))
48 serves as a generic extension point both for units that shall be considered
49 necessary to consider a boot successful on one side (example:
50 `systemd-boot-check-no-failures.service` as described above), and units that
51 want to act only if the boot is successful on the other (example:
52 `systemd-bless-boot.service` as described above).
53
54* The
55 [`kernel-install(8)`](https://www.freedesktop.org/software/systemd/man/kernel-install.html)
56 script can optionally create boot loader entries that carry an initial boot
57 counter (the initial counter is configurable in `/etc/kernel/tries`).
58
ff2c2d08 59## Details
0c74648b
LP
60
61The boot counting data `systemd-boot` and `systemd-bless-boot.service`
62manage is stored in the name of the boot loader entries. If a boot loader entry
63file name contains `+` followed by one or two numbers (if two numbers, then
64those need to be separated by `-`) right before the `.conf` suffix, then boot
65counting is enabled for it. The first number is the "tries left" counter
66encoding how many attempts to boot this entry shall still be made. The second
67number is the "tries done" counter, encoding how many failed attempts to boot
68it have already been made. Each time a boot loader entry marked this way is
69booted the first counter is decreased by one, and the second one increased by
70one. (If the second counter is missing, then it is assumed to be equivalent to
71zero.) If the "tries left" counter is above zero the entry is still considered
72for booting (the entry's state is considered to be "indeterminate"), as soon as
73it reached zero the entry is not tried anymore (entry state "bad"). If the boot
74attempt completed successfully the entry's counters are removed from the name
75(entry state "good"), thus turning off boot counting for the future.
76
77## Walkthrough
78
79Here's an example walkthrough of how this all fits together.
80
811. The user runs `echo 3 > /etc/kernel/tries` to enable boot counting.
82
832. A new kernel is installed. `kernel-install` is used to generate a new boot
84 loader entry file for it. Let's say the version string for the new kernel is
85 `4.14.11-300.fc27.x86_64`, a new boot loader entry
86 `/boot/loader/entries/4.14.11-300.fc27.x86_64+3.conf` is hence created.
87
883. The system is booted for the first time after the new kernel is
89 installed. The boot loader now sees the `+3` counter in the entry file
90 name. It hence renames the file to `4.14.11-300.fc27.x86_64+2-1.conf`
91 indicating that at this point one attempt has started and thus only one less
92 is left. After the rename completed the entry is booted as usual.
93
944. Let's say this attempt to boot fails. On the following boot the boot loader
95 will hence see the `+2-1` tag in the name, and hence rename the entry file to
96 `4.14.11-300.fc27.x86_64+1-2.conf`, and boot it.
97
985. Let's say the boot fails again. On the subsequent boot the loader hence will
99 see the `+1-2` tag, and rename the file to
100 `4.14.11-300.fc27.x86_64+0-3.conf` and boot it.
101
d238709c 1026. If this boot also fails, on the next boot the boot loader will see the
0c74648b 103 tag `+0-3`, i.e. the counter reached zero. At this point the entry will be
e6190e28
DF
104 considered "bad", and ordered to the beginning of the list of entries. The
105 next newest boot entry is now tried, i.e. the system automatically reverted
106 back to an earlier version.
0c74648b 107
b2454670 108The above describes the walkthrough when the selected boot entry continuously
0c74648b
LP
109fails. Let's have a look at an alternative ending to this walkthrough. In this
110scenario the first 4 steps are the same as above:
111
1121. *as above*
113
1142. *as above*
115
1163. *as above*
117
1184. *as above*
119
1205. Let's say the second boot succeeds. The kernel initializes properly, systemd
121 is started and invokes all generators.
122
1236. One of the generators started is `systemd-bless-boot-generator` which
124 detects that boot counting is used. It hence pulls
125 `systemd-bless-boot.service` into the initial transaction.
126
1277. `systemd-bless-boot.service` is ordered after and `Requires=` the generic
128 `boot-complete.target` unit. This unit is hence also pulled into the initial
129 transaction.
130
1318. The `boot-complete.target` unit is ordered after and pulls in various units
132 that are required to succeed for the boot process to be considered
133 successful. One such unit is `systemd-boot-check-no-failures.service`.
134
1359. `systemd-boot-check-no-failures.service` is run after all its own
136 dependencies completed, and assesses that the boot completed
137 successfully. It hence exits cleanly.
138
13910. This allows `boot-complete.target` to be reached. This signifies to the
140 system that this boot attempt shall be considered successful.
141
14211. Which in turn permits `systemd-bless-boot.service` to run. It now
143 determines which boot loader entry file was used to boot the system, and
144 renames it dropping the counter tag. Thus
145 `4.14.11-300.fc27.x86_64+1-2.conf` is renamed to
146 `4.14.11-300.fc27.x86_64.conf`. From this moment boot counting is turned
147 off.
148
14912. On the following boot (and all subsequent boots after that) the entry is
150 now seen with boot counting turned off, no further renaming takes place.
151
ff2c2d08 152## How to adapt this scheme to other setups
0c74648b
LP
153
154Of the stack described above many components may be replaced or augmented. Here
155are a couple of recommendations.
156
1571. To support alternative boot loaders in place of `systemd-boot` two scenarios
158 are recommended:
159
160 a. Boot loaders already implementing the Boot Loader Specification can simply
161 implement an equivalent file rename based logic, and thus integrate fully
162 with the rest of the stack.
163
164 b. Boot loaders that want to implement boot counting and store the counters
165 elsewhere can provide their own replacements for
166 `systemd-bless-boot.service` and `systemd-bless-boot-generator`, but should
167 continue to use `boot-complete.target` and thus support any services
168 ordered before that.
169
1702. To support additional components that shall succeed before the boot is
171 considered successful, simply place them in units (if they aren't already)
172 and order them before the generic `boot-complete.target` target unit,
173 combined with `Requires=` dependencies from the target, so that the target
174 cannot be reached when any of the units fail. You may add any number of
175 units like this, and only if they all succeed the boot entry is marked as
176 good. Note that the target unit shall pull in these boot checking units, not
177 the other way around.
178
1793. To support additional components that shall only run on boot success, simply
180 wrap them in a unit and order them after `boot-complete.target`, pulling it
181 in.
182
ff2c2d08 183## FAQ
0c74648b
LP
184
1851. *Why do you use file renames to store the counter? Why not a regular file?*
186 — Mainly two reasons: it's relatively likely that renames can be implemented
187 atomically even in simpler file systems, while writing to file contents has
188 a much bigger chance to be result in incomplete or corrupt data, as renaming
189 generally avoids allocating or releasing data blocks. Moreover it has the
190 benefit that the boot count metadata is directly attached to the boot loader
191 entry file, and thus the lifecycle of the metadata and the entry itself are
192 bound together. This means no additional clean-up needs to take place to
193 drop the boot loader counting information for an entry when it is removed.
194
1952. *Why not use EFI variables for storing the boot counter?* — The memory chips
196 used to back the persistent EFI variables are generally not of the highest
197 quality, hence shouldn't be written to more than necessary. This means we
198 can't really use it for changes made regularly during boot, but can use it
199 only for seldom made configuration changes.
200
2013. *I have a service which — when it fails — should immediately cause a
202 reboot. How does that fit in with the above?* — Well, that's orthogonal to
203 the above, please use `FailureAction=` in the unit file for this.
204
2054. *Under some condition I want to mark the current boot loader entry as bad
206 right-away, so that it never is tried again, how do I do that?* — You may
207 invoke `/usr/lib/systemd/systemd-bless-boot bad` at any time to mark the
208 current boot loader entry as "bad" right-away so that it isn't tried again
209 on later boots.