]>
Commit | Line | Data |
---|---|---|
c3e270f4 FB |
1 | --- |
2 | title: Automatic Boot Assessment | |
4cdca0af | 3 | category: Booting |
b41a3f66 | 4 | layout: default |
0aff7b75 | 5 | SPDX-License-Identifier: LGPL-2.1-or-later |
c3e270f4 FB |
6 | --- |
7 | ||
0c74648b LP |
8 | # Automatic Boot Assessment |
9 | ||
10 | systemd provides support for automatically reverting back to the previous | |
db82e667 | 11 | version of the OS or kernel in case the system consistently fails to boot. The |
db811444 | 12 | [Boot Loader Specification](https://uapi-group.org/specifications/specs/boot_loader_specification/#boot-counting) |
db82e667 ZJS |
13 | describes how to annotate boot loader entries with a counter that specifies how |
14 | many attempts should be made to boot it. This document describes how systemd | |
15 | implements this scheme. | |
16 | ||
17 | The many different components involved in the implementation may be used | |
e347d53a | 18 | independently and in combination with other software to, for example, support |
db82e667 ZJS |
19 | other boot loaders or take actions outside of the boot loader. |
20 | ||
21 | Here's a brief overview of the complete set of components: | |
0c74648b | 22 | |
82a0ffe5 ZJS |
23 | * The |
24 | [`kernel-install(8)`](https://www.freedesktop.org/software/systemd/man/kernel-install.html) | |
25 | script can optionally create boot loader entries that carry an initial boot | |
26 | counter (the initial counter is configurable in `/etc/kernel/tries`). | |
27 | ||
0c74648b LP |
28 | * The |
29 | [`systemd-boot(7)`](https://www.freedesktop.org/software/systemd/man/systemd-boot.html) | |
82a0ffe5 | 30 | boot loader optionally maintains a per-boot-loader-entry counter described by |
db811444 | 31 | the [Boot Loader Specification](https://uapi-group.org/specifications/specs/boot_loader_specification/#boot-counting) |
82a0ffe5 ZJS |
32 | that is decreased by one on each attempt to boot the entry, prioritizing |
33 | entries that have non-zero counters over those which already reached a | |
34 | counter of zero when choosing the entry to boot. | |
35 | ||
36 | * The `boot-complete.target` target unit (see | |
37 | [`systemd.special(7)`](https://www.freedesktop.org/software/systemd/man/systemd.special.html)) | |
38 | serves as a generic extension point both for units that are necessary to | |
39 | consider a boot successful (e.g. `systemd-boot-check-no-failures.service` | |
40 | described below), and units that want to act only if the boot is | |
41 | successful (e.g. `systemd-bless-boot.service` described below). | |
42 | ||
43 | * The | |
44 | [`systemd-boot-check-no-failures.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-boot-check-no-failures.service.html) | |
45 | service is a simple service health check tool. When enabled it becomes an | |
46 | indirect dependency of `systemd-bless-boot.service` (by means of | |
47 | `boot-complete.target`, see below), ensuring that the boot will not be | |
48 | considered successful if there are any failed services. | |
0c74648b LP |
49 | |
50 | * The | |
51 | [`systemd-bless-boot.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-bless-boot.service.html) | |
52 | service automatically marks a boot loader entry, for which boot counting as | |
53 | mentioned above is enabled, as "good" when a boot has been determined to be | |
54 | successful, thus turning off boot counting for it. | |
55 | ||
56 | * The | |
57 | [`systemd-bless-boot-generator(8)`](https://www.freedesktop.org/software/systemd/man/systemd-bless-boot-generator.html) | |
58 | generator automatically pulls in `systemd-bless-boot.service` when use of | |
59 | `systemd-boot` with boot counting enabled is detected. | |
60 | ||
ff2c2d08 | 61 | ## Details |
0c74648b | 62 | |
db811444 ZJS |
63 | As described in the |
64 | [Boot Loader Specification](https://uapi-group.org/specifications/specs/boot_loader_specification/#boot-counting), | |
db82e667 ZJS |
65 | the boot counting data is stored in the file name of the boot loader entries as |
66 | a plus (`+`), followed by a number, optionally followed by `-` and another | |
67 | number, right before the file name suffix (`.conf` or `.efi`). | |
68 | ||
69 | The first number is the "tries left" counter encoding how many attempts to boot | |
70 | this entry shall still be made. The second number is the "tries done" counter, | |
71 | encoding how many failed attempts to boot it have already been made. Each time | |
72 | a boot loader entry marked this way is booted the first counter is decremented, | |
73 | and the second one incremented. (If the second counter is missing, then it is | |
74 | assumed to be equivalent to zero.) If the boot attempt completed successfully | |
75 | the entry's counters are removed from the name (entry state "good"), thus | |
76 | turning off boot counting for the future. | |
0c74648b LP |
77 | |
78 | ## Walkthrough | |
79 | ||
80 | Here's an example walkthrough of how this all fits together. | |
81 | ||
db82e667 | 82 | 1. The user runs `echo 3 >/etc/kernel/tries` to enable boot counting. |
0c74648b LP |
83 | |
84 | 2. A new kernel is installed. `kernel-install` is used to generate a new boot | |
85 | loader entry file for it. Let's say the version string for the new kernel is | |
86 | `4.14.11-300.fc27.x86_64`, a new boot loader entry | |
87 | `/boot/loader/entries/4.14.11-300.fc27.x86_64+3.conf` is hence created. | |
88 | ||
db82e667 | 89 | 3. The system is booted for the first time after the new kernel has been |
0c74648b LP |
90 | installed. The boot loader now sees the `+3` counter in the entry file |
91 | name. It hence renames the file to `4.14.11-300.fc27.x86_64+2-1.conf` | |
db82e667 ZJS |
92 | indicating that at this point one attempt has started. |
93 | After the rename completed, the entry is booted as usual. | |
0c74648b LP |
94 | |
95 | 4. Let's say this attempt to boot fails. On the following boot the boot loader | |
e347d53a | 96 | will hence see the `+2-1` tag in the name, and will hence rename the entry file to |
0c74648b LP |
97 | `4.14.11-300.fc27.x86_64+1-2.conf`, and boot it. |
98 | ||
e347d53a | 99 | 5. Let's say the boot fails again. On the subsequent boot the loader will hence |
0c74648b LP |
100 | see the `+1-2` tag, and rename the file to |
101 | `4.14.11-300.fc27.x86_64+0-3.conf` and boot it. | |
102 | ||
db82e667 ZJS |
103 | 6. If this boot also fails, on the next boot the boot loader will see the tag |
104 | `+0-3`, i.e. the counter reached zero. At this point the entry will be | |
105 | considered "bad", and ordered after all non-bad entries. The next newest | |
106 | boot entry is now tried, i.e. the system automatically reverted to an | |
107 | earlier version. | |
0c74648b | 108 | |
b2454670 | 109 | The above describes the walkthrough when the selected boot entry continuously |
0c74648b LP |
110 | fails. Let's have a look at an alternative ending to this walkthrough. In this |
111 | scenario the first 4 steps are the same as above: | |
112 | ||
113 | 1. *as above* | |
114 | ||
115 | 2. *as above* | |
116 | ||
117 | 3. *as above* | |
118 | ||
119 | 4. *as above* | |
120 | ||
121 | 5. Let's say the second boot succeeds. The kernel initializes properly, systemd | |
122 | is started and invokes all generators. | |
123 | ||
124 | 6. One of the generators started is `systemd-bless-boot-generator` which | |
125 | detects that boot counting is used. It hence pulls | |
126 | `systemd-bless-boot.service` into the initial transaction. | |
127 | ||
128 | 7. `systemd-bless-boot.service` is ordered after and `Requires=` the generic | |
129 | `boot-complete.target` unit. This unit is hence also pulled into the initial | |
130 | transaction. | |
131 | ||
132 | 8. The `boot-complete.target` unit is ordered after and pulls in various units | |
133 | that are required to succeed for the boot process to be considered | |
134 | successful. One such unit is `systemd-boot-check-no-failures.service`. | |
135 | ||
82a0ffe5 ZJS |
136 | 9. The graphical desktop environment installed on the machine starts a |
137 | service called `graphical-session-good.service`, which is also ordered before | |
138 | `boot-complete.target`, that registers a D-Bus endpoint. | |
139 | ||
140 | 10. `systemd-boot-check-no-failures.service` is run after all its own | |
0c74648b LP |
141 | dependencies completed, and assesses that the boot completed |
142 | successfully. It hence exits cleanly. | |
143 | ||
82a0ffe5 ZJS |
144 | 11. `graphical-session-good.service` waits for a user to log in. In the user |
145 | desktop environment, one minute after the user has logged in and started the | |
146 | first program, a user service is invoked which makes a D-Bus call to | |
147 | `graphical-session-good.service`. Upon receiving that call, | |
148 | `graphical-session-good.service` exits cleanly. | |
149 | ||
150 | 12. This allows `boot-complete.target` to be reached. This signifies to the | |
0c74648b LP |
151 | system that this boot attempt shall be considered successful. |
152 | ||
82a0ffe5 | 153 | 13. Which in turn permits `systemd-bless-boot.service` to run. It now |
0c74648b LP |
154 | determines which boot loader entry file was used to boot the system, and |
155 | renames it dropping the counter tag. Thus | |
156 | `4.14.11-300.fc27.x86_64+1-2.conf` is renamed to | |
157 | `4.14.11-300.fc27.x86_64.conf`. From this moment boot counting is turned | |
db82e667 | 158 | off for this entry. |
0c74648b | 159 | |
82a0ffe5 | 160 | 14. On the following boot (and all subsequent boots after that) the entry is |
0c74648b LP |
161 | now seen with boot counting turned off, no further renaming takes place. |
162 | ||
ff2c2d08 | 163 | ## How to adapt this scheme to other setups |
0c74648b LP |
164 | |
165 | Of the stack described above many components may be replaced or augmented. Here | |
166 | are a couple of recommendations. | |
167 | ||
168 | 1. To support alternative boot loaders in place of `systemd-boot` two scenarios | |
169 | are recommended: | |
170 | ||
db82e667 ZJS |
171 | a. Boot loaders already implementing the Boot Loader Specification can |
172 | simply implement the same rename logic, and thus integrate fully with | |
173 | the rest of the stack. | |
0c74648b LP |
174 | |
175 | b. Boot loaders that want to implement boot counting and store the counters | |
176 | elsewhere can provide their own replacements for | |
177 | `systemd-bless-boot.service` and `systemd-bless-boot-generator`, but should | |
178 | continue to use `boot-complete.target` and thus support any services | |
179 | ordered before that. | |
180 | ||
181 | 2. To support additional components that shall succeed before the boot is | |
182 | considered successful, simply place them in units (if they aren't already) | |
183 | and order them before the generic `boot-complete.target` target unit, | |
184 | combined with `Requires=` dependencies from the target, so that the target | |
185 | cannot be reached when any of the units fail. You may add any number of | |
186 | units like this, and only if they all succeed the boot entry is marked as | |
187 | good. Note that the target unit shall pull in these boot checking units, not | |
188 | the other way around. | |
189 | ||
82a0ffe5 ZJS |
190 | Depending on the setup, it may be most convenient to pull in such units |
191 | through normal enablement symlinks, or during early boot using a | |
192 | [`generator`](https://www.freedesktop.org/software/systemd/man/systemd.generator.html), | |
193 | or even during later boot. In the last case, care must be taken to ensure | |
194 | that the start job is created before `boot-complete.target` has been | |
195 | reached. | |
196 | ||
0c74648b LP |
197 | 3. To support additional components that shall only run on boot success, simply |
198 | wrap them in a unit and order them after `boot-complete.target`, pulling it | |
199 | in. | |
200 | ||
72ceee43 LB |
201 | Such unit would be typically wanted (or required) by one of the |
202 | [`bootup`](https://www.freedesktop.org/software/systemd/man/bootup.html) targets, | |
e347d53a | 203 | for example, `multi-user.target`. To avoid potential loops due to conflicting |
72ceee43 LB |
204 | [default dependencies](https://www.freedesktop.org/software/systemd/man/systemd.unit.html#Default%20Dependencies) |
205 | ordering, it is recommended to also add an explicit dependency (e.g. | |
206 | `After=multi-user.target`) to the unit. This overrides the implicit ordering | |
207 | and allows `boot-complete.target` to start after the given bootup target. | |
208 | ||
ff2c2d08 | 209 | ## FAQ |
0c74648b | 210 | |
db82e667 ZJS |
211 | 1. *I have a service which — when it fails — should immediately cause a |
212 | reboot. How does that fit in with the above?* — That's orthogonal to | |
0c74648b LP |
213 | the above, please use `FailureAction=` in the unit file for this. |
214 | ||
db82e667 | 215 | 2. *Under some condition I want to mark the current boot loader entry as bad |
0c74648b LP |
216 | right-away, so that it never is tried again, how do I do that?* — You may |
217 | invoke `/usr/lib/systemd/systemd-bless-boot bad` at any time to mark the | |
218 | current boot loader entry as "bad" right-away so that it isn't tried again | |
219 | on later boots. |