]> git.ipfire.org Git - thirdparty/xfsprogs-dev.git/blame - repair/README
xfsprogs: Release v6.7.0
[thirdparty/xfsprogs-dev.git] / repair / README
CommitLineData
959ef981
DC
1# SPDX-License-Identifier: GPL-2.0
2
2bd0ea18
NS
3A living document. The basic algorithm.
4
5TODO: (D == DONE)
6
70) Need to bring some sanity into the case of flags that can
8 be set in the secondaries at mkfs time but reset or cleared
9 in the primary later in the filesystem's life.
10
110) Clear the persistent read-only bit if set. Clear the
12 shared bit if set and the version number is zero. This
13 brings the filesystem back to a known state.
14
150) make sure that superblock geometry code checks the logstart
16 value against whether or not we have an internal log.
17 If we have an internal log and a logdev, that's ok.
18 (Maybe we just aren't using it). If we have an external
19 log (logstart == 0) but no logdev, that's right out.
20
210) write secondary superblock search code. Rewrite initial
22 superblock parsing code to be less complicated. Just
23 use variables to indicate primary, secondary, etc.,
24 and use a function to get the SB given a specific location
25 or something.
26
272) For inode alignment, if the SB bit is set and the
28 inode alignment size field in the SB is set, then
29 believe that the fs inodes MUST be aligned and
30 disallow any non-aligned inodes. Likewise, if
31 the SB bit isn't set (or earlier version) and
32 the inode alignment size field is zero, then
33 never set the bit even if the inodes are aligned.
34 Note that the bits and alignment values are
35 replicated in the secondary superblocks.
36
370) add feature specification options to parse_arguments
38
390) add logic to add_inode_ref(), add_inode_reached()
40 to detect nlink overflows in cases where the fs
41 (or user had indicated fs) doesn't support new nlinks.
42
436) check to make sure that the inodes containing btree blocks
44 with # recs < minrecs aren't legit -- e.g. the only
45 descendant of a root block.
46
477) inode di_size value sanity checking -- should always be less than
48 the biggest filebno offset mentioned in the bmaps. Doesn't
49 have to be equal though since we're allowed to overallocate
50 (it just wastes a little space). This is for both regular
51 files and directories (have to modify the existing directory
52 check).
53
54 Add tracking of largest offset in bmap scanning code. Compare
55 value against di_size. Should be >= di_size.
56
57 Alternatively, you could pass the inode into down through
58 the extent record processing layer and make the checks
59 there.
60
61 Add knowledge of quota inodes. size of quota inode is
62 always zero. We should maintain that.
63
648) Basic quota stuff.
65
66 Invariants
67 if quota feature bit is set, the quota inodes
68 if set, should point to disconnected, 0 len inodes.
69
70D - if quota inodes exist, the quota bits must be
71 turned on. It's ok for the quota flags to be
72 zeroed but they should be in a legal state
73 (see xfs_quota.h).
74
dfc130f3 75D - if the quota flags are non-zero, the corresponding
2bd0ea18
NS
76 quota inodes must exist.
77
78 quota inodes are never deleted, only their space
79 is freed.
80
81 if quotas are being downgraded, then check quota inodes
82 at the end of phase 3. If they haven't been cleared yet,
83 clear them. Regardless, then clear sb flags (quota inode
84 fields, quota flags, and quota bit).
85
86
875) look at verify_inode_chunk(). it's probably really broken.
88
89
909) Complicated quota stuff. Add code to bmap scan code to
91 track used blocks. Add another pair of AVL trees
92 to track user and project quota limits. Set AVL
93 trees up at the beginning of phase 3. Quota inodes
94 can be rebuilt or corrected later if damaged.
95
96
97D - 0) fix directory processing. phase 3, if an entry references
98 a free inode, *don't* mark it used. wait for the rest of
99 phase 3 processing to hit that inode. If it looks like it's
100 in use, we'll mark in use then. If not, we'll clear it and
101 mark the inode map. then in phase 4, you can depend on the
102 inode map. should probably set the parent info in phase 4.
103 So we have a check_dups flag. Maybe we should change the
104 name of check_dir to discover_inodes. During phase 3
105 (discover_inodes == 1), uncertain inodes are added to list.
106 During phase 4 (discover_inodes == 0), they aren't. And
107 we never mark inodes in use from the directory code.
108 During phase 4, we shouldn't complain about names with
109 a leading '/' since we made those names in phase 3.
110
111 Have to change dino_chunks.c (parent setting), dinode.c
112 and dir.c.
113
114D - 0) make sure we don't screw up filesystems with real-time inodes.
115 remember to initialize real-time map with all blocks XR_E_FREE.
116
117D - 4) check contents of symlinks as well as lengths in process_symlinks()
118 in dinode.c. Right now, we only check lengths.
119
120
121D - 1) Feature mismatches -- for quotas and attributes,
122 if the stuff exists in the filesystem, set the
123 superblock version bits.
124
125D - 0) rewrite directory leaf block holemap comparison code.
126 probably should just check the leaf block hole info
127 against our incore bitmap. If the hole flag is not
128 set, then we know that there can only be one hole and
129 it has to be between the entry table and the top of heap.
130 If the hole flag is set, then it's ok if the on-disk
131 holemap doesn't describe everything as long as what
132 it does describe doesn't conflict with reality.
133
134D - 0) rewrite setting nlinks handling -- for version 1
22bc10ed 135 inodes, set both nlinks and onlinks (zero projid_lo/hi
2bd0ea18
NS
136 and pad) if we have to change anything. For
137 version 2, I think we're ok.
138
139D - 0) Put awareness of quota inode into mark_standalone_inodes.
140
141
142D - 8) redo handling of superblocks with bad version numbers. need
143 to bail out (without harming) fs's that have sbs that
144 are newer than we are.
145
146D - 0) How do we handle feature mismatches between fs and
147 superblock? For nlink, check each inode after you
148 know it's good. If onlinks is 0 and nlinks is > 0
149 and it's a version 2 inode, then it really is a version
150 2 inode and the nlinks flag in the SB needs to be set.
151 If it's a version 2 inode and the SB agrees but onlink
152 is non-zero, then clear onlink.
153
154D - 3) keep cumulative counts of freeblocks, inodes, etc. to set in
155 the superblock at the end of phase 5. Remember that
156 agf freeblock counters don't include blocks used by
157 the non-root levels of the freespace trees but that
158 the sb free block counters include those.
159
160D - 0) Do parent setting in directory code (called by phase 3).
161 actually, I put it in process_inode_set and propagated
162 the parent up to it from the process_dinode/process_dir
163 routines. seemed cleaner than pushing the irec down
164 and letting them bang on it.
165
166D - 0) If we clear a file in phase 4, make sure that if it's
167 a directory that the parent info is cleared also.
168
169D - 0) put inode tree flashover (call to add_ino_backptrs) into phase 5.
170
171D - 0) do set/get_inode_parent functions in incore_ino.c.
172 also do is/set/ inode_processed.
dfc130f3 173
2bd0ea18
NS
174D - 0) do a versions.c to extract feature info and set global vars
175 from the superblock version number and possibly feature bits
176
177D - 0) change longform_dir_entry_check + shortform_dir_entry_check
178 to return a count of how many illegal '/' entries exist.
179 if > 0, then process_dirstack needs to call prune_dir_entry
180 with a hash value of 0 to delete the entries.
181
182D - 0) add the "processed" bitfield
183 to the backptrs_t struct that gets attached after
184 phase 4.
185
186D- ) Phase 6 !!!
187
188D - 0) look at usage of XFS_MAKE_IPTR(). It does the right
189 arithmetic assuming you count your offsets from the
190 beginning of the buffer.
191
192
193D - 0) look at references to XFS_INODES_PER_CHUNK. change the
14f8b681 194 ones that really mean sizeof(uint64_t)*NBBY to
2bd0ea18
NS
195 something else (like that only defined as a constant
196 INOS_PER_IREC. this isn't as important since
197 XFS_INODES_PER_CHUNK will never chang
198
199
200D - 0) look at junk_zerolen_dir_leaf_entries() to make sure it isn't hosing
201 the freemap since it assumed that bytes between the
202 end of the table and firstused didn't show up in the
203 freemap when they actually do.
204
205D - 0) track down XFS_INO_TO_OFFSET() usage. I don't think I'm
206 using it right. (e.g. I think
207 it gives you the offset of an inode into a block but
208 on small block filesystems, I may be reading in inodes
209 in multiblock buffers and working from the start of
210 the buffer plus I'm using it to get offsets into
211 my ino_rec's which may not be a good idea since I
212 use 64-inode ino_rec's whereas the offset macro
213 works off blocksize).
214
215D - 0.0) put buffer -> dirblock conversion macros into xfs kernel code
216
217D - 0.2) put in sibling pointer checking and path fixup into
218 bmap (long form) scan routines in scan.c
219D - 0.3) find out if bmap btrees with only root blocks are legal. I'm
220 betting that they're not because they'd be extent inodes
221 instead. If that's the case, rip some code out of
222 process_btinode()
223
224
225Algorithm (XXX means not done yet):
226
227Phase 1 -- get a superblock and zero log
228
229 get a superblock -- either read in primary or
230 find a secondary (ag header), check ag headers
231
232 To find secondary:
233
234 Go for brute force and read in the filesystem N meg
235 at a time looking for a superblock. as a
236 slight optimization, we could maybe skip
237 ahead some number of blocks to try and get
238 towards the end of the first ag.
239
240 After you find a secondary, try and find at least
241 other ags as a verification that the
242 secondary is a good superblock.
243
244XXX - Ugh. Have to take growfs'ed filesystems into account.
245 The root superblock geometry info may not be right if
246 recovery hasn't run or it's been trashed. The old ag's
247 may or may not be right since the system could have crashed
248 during growfs or the bwrite() to the superblocks could have
249 failed and the buffer been reused. So we need to check
250 to see if another ag exists beyond the "last" ag
251 to see if a growfs happened. If not, then we know that
252 the geometry info is good and treat the fs as a non-growfs'ed
253 fs. If we do have inconsistencies, then the smaller geometry
254 is the old fs and the larger the new. We can check the
255 new superblocks to see if they're good. If not, then we
256 know the system crashed at or soon after the growfs and
257 we can choose to either accept the new geometry info or
258 trash it and truncate the fs back to the old geometry
259 parameters.
260
261 Cross-check geometry information in secondary sb's with
262 primary to ensure that it's correct.
263
264 Use sim code to allow mount filesystems *without* reading
265 in root inode. This sets up the xfs_mount_t structure
266 and allows us to use XFS_* macros that we wouldn't
267 otherwise be able to use.
268
269 Note, I split phase 1 and 2 into separate pieces because I want
270 to initialize the xfs_repair incore data structures after phase 1.
271
272 parse superblock version and feature flags and set appropriate
273 global vars to reflect the flags (attributes, quotas, etc.)
274
275 Workaround for the mkfs "not zeroing the superblock buffer" bug.
276 Determine what field is the last valid non-zero field in
277 the superblock. The trick here is to be able to differentiate
278 the last valid non-zero field in the primary superblock and
279 secondaries because they may not be the same. Fields in
280 the primary can be set as the filesystem gets upgraded but
281 the upgrades won't touch the secondaries. This means that
282 we need to find some number of secondaries and check them.
283 So we do the checking here and the setting in phase2.
284
285Phase 2 -- check integrity of allocation group allocation structures
286
287 zero the log if in no modify mode
288
289 sanity check ag headers -- superblocks match, agi isn't
290 trashed -- the agf and agfl
291 don't really matter because we can
292 just recreate them later.
293
294 Zero part of the superblock buffer if necessary
295
296 Walk the freeblock trees to get an
297 initial idea of what the fs thinks is free.
298 Files that disagree (claim free'd blocks)
299 can be salvaged or deleted. If the btree is
300 internally inconsistent, when in doubt, mark
301 blocks free. If they're used, they'll be stolen
302 back later. don't have to check sibling pointers
303 for each level since we're going to regenerate
304 all the trees anyway.
305 Walk the inode allocation trees and
306 make sure they're ok, otherwise the sim
307 inode routines will probably just barf.
308 mark inode allocation tree blocks and ag header
309 blocks as used blocks. If the trees are
310 corrupted, this phase will generate "uncertain"
311 inode chunks. Those chunks go on a list and
312 will have to verified later. Record the blocks
313 that are used to detect corruption and multiply
314 claimed blocks. These trees will be regenerated
315 later. Mark the blocks containing inodes referenced
316 by uncorrupted inode trees as being used by inodes.
317 The other blocks will get marked when/if the inodes
318 are verified.
319
320 calculate root and realtime inode numbers from the
321 filesystem geometry, fix up mount structure's
322 incore superblock if they're wrong.
323
324ASSUMPTION: at end of phase 2, we've got superblocks and ag headers
325 that are not garbage (some data in them like counters and the
326 freeblock and inode trees may be inconsistent but the header
327 is readable and otherwise makes sense).
328
329XXX if in no_modify mode, check for blocks claimed by one freespace
330 btree and not the other
dfc130f3 331
2bd0ea18
NS
332Phase 3 -- traverse inodes to make the inodes, bmaps and freespace maps
333 consistent. For each ag, use either the incore inode map or
334 scan the ag for inodes.
335 Let's use the incore inode map, now that we've made one
336 up in phase2. If we lose the maps, we'll locate inodes
337 when we traverse the directory heirarchy. If we lose both,
338 we could scan the disk. Ugh. Maybe make that a command-line
339 option that we support later.
dfc130f3 340
2bd0ea18
NS
341 ASSUMPTION: we know if the ag allocation btrees are intact (phase 2)
342
343 First - Walk and clear the ag unlinked lists. We'll process
344 the inodes later. Check and make sure that the unlinked
345 lists reference known inodes. If not, add to the list
346 of uncertain inodes.
347
348 Second, check the uncertain inode list generated in phase2 and
349 above and get them into the inode tree if they're good.
350 The incore inode cluster tree *always* has good
351 clusters (alignment, etc.) in it.
dfc130f3 352
2bd0ea18
NS
353 Third, make sure that the root inode is known. If not,
354 and we know the inode number from the superblock,
ff1f79a7 355 discover that inode and its chunk.
2bd0ea18
NS
356
357 Then, walk the incore inode-cluster tree.
358
359 Maintain an in-core bitmap over the entire fs for block allocation.
360
361 traverse each inode, make sure inode mode field matches free/allocated
362 bit in the incore inode allocation tree. If there's a mismatch,
363 assume that the inode is in use.
364
365 - for each in-use inode, traverse each bmap/dir/attribute
366 map or tree. Maintain a map (extent list?) for the
367 current inode.
368
369 - For each block marked as used, check to see if already known
370 (referenced by another file or directory) and sanity
371 check the contents of the block as well if possible
372 (in the case of meta-blocks).
373
374 - if the inode claims already used blocks, mark the blocks
375 as multiply claimed (duplicate) and go on. the inode
376 will be cleared in phase 4.
377
378 - if metablocks are garbaged, clear the inode after
379 traversing what you can of the bmap and
380 proceed to next inode. We don't have to worry
381 about trashing the maps or trees in cleared inodes
382 because the blocks will show up as free in the
383 ag freespace trees that we set up in phase 5.
384
385 - clear the di_next_unlinked pointer -- all unlinked
386 but active files go bye-bye.
387
388 - All blocks start out unknown. We need the last state
389 in case we run into a case where we need to step
390 on a block to store filesystem meta-data and it
391 turns out later that it's referenced by some inode's
392 bmap. In that case, the inode loses because we've
393 already trashed the block. This shouldn't happen
394 in the first version unless some inode has a bogus
395 bmap referencing blocks in the ag header but the
396 4th state will keep us from inadvertently doing
397 something stupid in that case.
398
399 - If inode is allocated, mark all blocks allocated to the
400 current inode as allocated in the incore freespace
401 bitmap.
402
dfc130f3 403 - If inode is good and a directory, scan through it to
2bd0ea18 404 find leaf entries and discover any unknown inodes.
dfc130f3 405
2bd0ea18
NS
406 For shortform, we correct what we can.
407
408 If the directory is corrupt, we try and fix it in
409 place. If it has zero good entries, then we blast it.
410
411 All unknown inodes get put onto the uncertain inode
412 list. This is safe because we only put inodes onto
413 the list when we're processing known inodes so the
414 uncertain inode list isn't in use.
415
416 We fix only one problem -- an entry that has
417 a mathematically invalid inode numbers in them.
418 If that's the case, we replace the inode number
419 with NULLFSINO and we'll fix up the entry in
420 phase 6.
421
422 That info may conflict with the inode information,
423 but we'll straighten out any inconsistencies there
424 in phase4 when we process the inodes again.
425
426 Errors involving bogus forward/back links,
427 zero-length entries make the directory get
428 trashed.
429
430 if an entry references a free inode, ignore that
431 fact for now. wait for the rest of phase 3
432 processing to hit that inode. If it looks like it's
433 in use, we'll mark in use then. If not, we'll
434 clear it and mark the inode map. then in phase
435 4, you can depend on the inode map.
dfc130f3 436
2bd0ea18
NS
437 Entries that point to non-existent or free
438 inodes, and extra blocks in the directory
439 will get fixed in place in a later pass.
440
441 Entries that point to a quota inode are
442 marked TBD.
443
444 If the directory internally points to the same
445 block twice, the directory gets blown away.
446
447 Note that processing uncertain inodes can add more inodes
448 to the uncertain list if they're directories. So we loop
449 until the uncertain list is empty.
450
451 During inode verification, if the inode blocks are unknown,
452 mark then as in-use by inodes.
453
454XXX HEURISTIC -- if we blow an inode away that has space,
455 assume that the freespace btree is now out of wack.
456 If it was ok earlier, it's certain to be wrong now.
457 And the odds of this space free cancelling out the
458 existing error is so small I'm willing to ignore it.
459 Should probably do this via a global var and complain
460 about this later.
461
462Assumption: All known inodes are now marked as in-use or free. Any
463 inodes that we haven't found by now are hosed (lost) since
464 we can't reach them via either the inode btrees or via directory
465 entries.
466
467 Directories are semi-clean. All '.' entries are good.
468 Root '..' entry is good if root inode exists. All entries
dfc130f3 469 referencing non-existent inodes, free inodes, etc.
2bd0ea18
NS
470
471XXX verify that either quota inode is 0 or NULLFSINO or
472 if sb quota flag is non zero, verify that quota inode
473 is NULLFSINO or is referencing a used, but disconnected
474 inode.
475
476XXX if in no_modify mode, check for unclaimed blocks
477
478- Phase 4 - Check for inodes referencing duplicate blocks
479
480 At this point, all known duplicate blocks are marked in
481 the block map. However, some of the claimed blocks in
482 the bmap may in fact be free because they belong to inodes
483 that have to be cleared either due to being a trashed
484 directory or because it's the first inode to claim a
485 block that was then claimed later. There's a similar
486 problem with meta-data blocks that are referenced by
487 inode bmaps that are going to be freed once the inode
488 (or directory) gets cleared.
489
490 So at this point, we collect the duplicate blocks into
491 extents and put them into the duplicate extent list.
492
493 Mark the ag header blocks as in use.
494
495 We then process each inode twice -- the first time
496 we check to see if the inode claims a duplicate extent
497 and we do NOT set the block bitmap. If the inode claims
498 a duplicate extent, we clear the inode. Since the bitmap
499 hasn't been set, that automatically frees all blocks associated
500 with the cleared inode. If the inode is ok, process it a second
501 time and set the bitmap since we know that this inode will live.
502
503 The unlinked list gets cleared in every inode at this point as
504 well. We no longer need to preserve it since we've discovered
505 every inode we're going to find from it.
506
507 verify existence of root inode. if it exists, check for
508 existence of "lost+found". If it exists, mark the entry
509 to be deleted, and clear the inode. All the inodes that
510 were connected to the lost+found will be reconnected later.
511
512XXX HEURISTIC -- if we blow an inode away that has space,
513 assume that the freespace btree is now out of wack.
514 If it was ok earlier, it's certain to be wrong now.
515 And the odds of this space free cancelling out the
516 existing error is so small I'm willing to ignore it.
517 Should probably do this via a global var and complain
518 about this later.
519
520 Clear the quota inodes if the inode btree says that
521 they're not in use. The space freed will get picked
522 up by phase 5.
dfc130f3 523
2bd0ea18
NS
524XXX Clear the quota inodes if the filesystem is being downgraded.
525
526- Phase 5 - Build inode allocation trees, freespace trees and
527 agfl's for each ag. After this, we should be able to
528 unmount the filesystem and remount it for real.
529
530 For each ag: (if no in no_modify mode)
531
532 scan bitmap first to figure out number of extents.
dfc130f3 533
2bd0ea18
NS
534 calculate space required for all trees. Start with inode trees.
535 Setup the btree cursor which includes the list of preallocated
536 blocks. As a by-product, this will delete the extents required
537 for the inode tree from the incore extent tree.
dfc130f3 538
2bd0ea18
NS
539 Calculate how many extents will be required to represent the
540 remaining free extent tree on disk (twice, one for bybno and
541 one for bycnt). You have to iterate on this because consuming
542 extents can alter the number of blocks required to represent
543 the remaining extents. If there's slop left over, you can
544 put it in the agfl though.
545
546 Then, manually build the trees, agi, agfs, and agfls.
547
548XXX if in no_modify mode, scan the on-disk inode allocation
549 trees and compare against the incore versions. Don't have
550 to scan the freespace trees because we caught the problems
551 there in phase2 and phase3. But if we cleared any inodes
552 with space during phases 3 or 4, now is the time to complain.
553
dfc130f3 554XXX - Free duplicate extent lists. ???
2bd0ea18
NS
555
556Assumptions: at this point, sim code having to do with inode
557 creation/modification/deletion and space allocation
558 work because the inode maps, space maps, and bmaps
559 for all files in the filesystem are good. The only
560 structures that are screwed up are the directory contents,
561 which means that lookup may not work for beans, the
562 root inode which exists but may be completely bogus and
563 the link counts on all inodes which may also be bogus.
564
565 Free the bitmap, the freespace tree.
566
dfc130f3 567 Flash the incore inode tree over from parent list to having
2bd0ea18
NS
568 full backpointers.
569
570 realtime processing, if any --
571
572 (Skip to below if running in no_modify mode).
573
574 Generate the realtime bitmap from the incore realtime
575 extent map and slam the info into the realtime bitmap
576 inode. Generate summary info from the realtime extent map.
dfc130f3 577
2bd0ea18
NS
578XXX if in no_modify mode, compare contents of realtime bitmap
579 inode to the incore realtime extent map. generate the
580 summary info from the incore realtime extent map.
581 compare against the contents of the realtime summary inode.
582 complain if bad.
583
584 reset superblock counters, sync version numbers
585
586- Phase 6 - directory traversal -- check reference counts,
587 attach disconnected inodes, fix up bogus directories
588
589 Assumptions: all on-disk space and inode trees are structurally
590 sound. Incore and on-disk inode trees agree on whether
591 an inode is in use.
592
593 Directories are structurally sound. All hashvalues
594 are monotonically increasing and interior nodes are
595 correct so lookups work. All legal directory entries
596 point to inodes that are in use and exist. Shortform
597 directories are fine except that the links haven't been
598 checked for conflicts (cycles, ".." being correct, etc.).
599 Longform directories haven't been checked for those problems
600 either PLUS longform directories may still contain
601 entries beginning with '/'. No zero-length entries
602 exist (they've been deleted or converted to '/').
603
604 Root directory may or may not exist. orphange may
605 or may not exist. Contents of either may be completely
606 bogus.
607
608 Entries may point to free or non-existent inodes.
609
610 At this we point, we may need new incore structures and
611 may be able to trash an old one (like the filesystem
612 block map)
613
614 If '/' is trashed, then reinitialize it.
615
616 If no realtime inodes, make them and if necessary, slam the
617 summary info into the realtime summary
618 inode. Ditto with the realtime bitmap inode.
dfc130f3 619
2bd0ea18
NS
620 Make orphanage (lost+found ???).
621
622 Traverse each directory from '/' (unless it was created).
623 Check directory structure and each directory entry.
624 If the entry is bogus (points to a non-existent or
625 free inode, for example), mark that entry TBD. Maintain
626 link counts on all inodes. Currently, traversal is
627 depth-first.
628
629 Mark every inode reached as "reached" (includes
630 bumping up link counts).
631
632 If a entry points to a directory but the parent (..)
633 disagrees, then blow away the entry. if the directory
634 being pointed to winds up disconnected, it'll be moved
635 to the orphanage (and the link count incremented to
636 account for the link and the reached bit set then).
637
638 If an entry points to a directory that we've already
639 reached, then some entry is bad and should be blown
640 away. It's easiest to blow away the current entry
641 plus since presumably the parent entry in the
642 reached directory points to another directory,
643 then it's far more likely that the current
644 entry is bogus (otherwise the parent should point
645 at it).
646
647 If an entry points to a non-existent of free inode,
648 blow the entry away.
649
650 Every time a good entry is encountered update the
651 link count for the inode that the entry points to.
652
653 After traversal, scan incore inode map for directories not
ff1f79a7 654 reached. Go to first one and try and find its root
2bd0ea18
NS
655 by following .. entries. Once at root, run traversal
656 algorithm. When algorithm terminates, move subtree
657 root inode to the orphanage. Repeat as necessary
658 until all disconnected directories are attached.
659
660 Move all disconnected inodes to orphanage.
661
662- Phase 7: reset reference counts if required.
663
664 Now traverse the on-disk inodes again, and make sure on-disk
665 reference counts are correct. Reset if necessary.
666
667 SKIP all unused inodes -- that also makes us
668 skip the orphanage inode which we think is
669 unused but is really used. However, the ref counts
670 on that should be right so that's ok.
671
672---
673
674multiple TB xfs_repair
675
676modify above to work in a couple of AGs at a time. The bitmaps
677should span only the current set of AGs.
678
679The key it scan the inode bmaps and keep a list of inodes
680that span multiple AG sets and keep the list in a data structure
681that's keyed off AG set # as well as inode # and also has a bit
682to indicate whether or not the inode will be cleared.
683
684Then in each AG set, when doing duplicate extent processing,
685you have to process all multi-AG-set inodes that claim blocks in
686the current AG set. If there's a conflict, you mark clear the
687inode in the current AG and you mark the multi-AG inode as
688"to be cleared".
689
690After going through all AGs, you can clear the to-be-cleared
691multi-AG-set inodes and pull them off the list.
692
693When building up the AG freespace trees, you walk the bmaps
694of all multi-AG-set inodes that are in the AG-set and include
695blocks claimed in the AG by the inode as used.
696
697This probably involves adding a phase 3-0 which would have to
698check all the inodes to see which ones are multi-AG-set inodes
699and set up the multi-AG-set inode data structure. Plus the
700process_dinode routines may have to be altered just a bit
701to do the right thing if running in tera-byte mode (call
702out to routines that check the multi-AG-set inodes when
703appropriate).
704
705To make things go faster, phase 3-0 could probably run
706in parallel. It should be possible to run phases 2-5
707in parallel as well once the appropriate synchronization
708is added to the incore routines and the static directory
709leaf block bitmap is changed to be on the stack.
710
711Phase 7 probably can be in parallel as well.
712
713By in parallel, I mean that assuming that an AG-set
714contains 4 AGs, you could run 4 threads, 1 per AG
715in parallel to process the AG set.
716
717I don't see how phase 6 can be run in parallel though.
718
719And running Phase 8 in parallel is just silly.