]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
6 months agobcachefs: Minor bucket alloc optimization
Kent Overstreet [Sat, 7 Dec 2024 03:37:42 +0000 (22:37 -0500)] 
bcachefs: Minor bucket alloc optimization

Check open buckets and buckets waiting for journal commit before doing
other expensive lookups.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Mark more errors autofix
Kent Overstreet [Sat, 7 Dec 2024 00:49:46 +0000 (19:49 -0500)] 
bcachefs: Mark more errors autofix

tested repairing from a bug uncovered by the merge_torture_flakey test

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: fix bch2_btree_node_header_to_text() format string
Kent Overstreet [Sat, 7 Dec 2024 01:11:16 +0000 (20:11 -0500)] 
bcachefs: fix bch2_btree_node_header_to_text() format string

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Journal space calculations should skip durability=0 devices
Kent Overstreet [Thu, 5 Dec 2024 17:35:17 +0000 (12:35 -0500)] 
bcachefs: Journal space calculations should skip durability=0 devices

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: factor out str_hash.c
Kent Overstreet [Thu, 5 Dec 2024 04:36:33 +0000 (23:36 -0500)] 
bcachefs: factor out str_hash.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: kill flags param to bch2_subvolume_get()
Kent Overstreet [Thu, 5 Dec 2024 04:40:26 +0000 (23:40 -0500)] 
bcachefs: kill flags param to bch2_subvolume_get()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Don't call bch2_btree_interior_update_will_free_node() until after update...
Kent Overstreet [Thu, 5 Dec 2024 01:43:01 +0000 (20:43 -0500)] 
bcachefs: Don't call bch2_btree_interior_update_will_free_node() until after update succeeds

Originally, btree splits always succeeded once we got to the point of
recursing to the btree_insert_node() call.

But that changed when we switched to not taking intent locks all the way
up to the root, and that introduced a bug, because
bch2_btree_interior_update_will_free_node() cancels paending writes and
reparents a node that's going to be made visible on disk by another
btree update to the current btree update.

This was discovered in recent backpointers work, because
bch2_btree_interior_update_will_free_node() also clears the
will_make_reachable flag, causing backpointer target lookup to
spuriously thing it had found a dangling backpointer (when the
backpointer just hadn't been created yet by
btree_update_nodes_written()).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Make sure __bch2_run_explicit_recovery_pass() signals to rewind
Kent Overstreet [Thu, 5 Dec 2024 00:46:35 +0000 (19:46 -0500)] 
bcachefs: Make sure __bch2_run_explicit_recovery_pass() signals to rewind

We should always signal to rewind if the requested pass hasn't been run,
even if called multiple times.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Call bch2_btree_lost_data() on btree read error
Kent Overstreet [Thu, 5 Dec 2024 00:41:38 +0000 (19:41 -0500)] 
bcachefs: Call bch2_btree_lost_data() on btree read error

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Journal write path refactoring, debug improvements
Kent Overstreet [Wed, 4 Dec 2024 23:14:14 +0000 (18:14 -0500)] 
bcachefs: Journal write path refactoring, debug improvements

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: dev_alloc_list.devs -> dev_alloc_list.data
Kent Overstreet [Thu, 5 Dec 2024 00:21:22 +0000 (19:21 -0500)] 
bcachefs: dev_alloc_list.devs -> dev_alloc_list.data

This lets us use darray macros on dev_alloc_list (and it will become a
darray eventually, when we increase the maximum number of devices).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Fix failure to allocate journal write on discard retry
Kent Overstreet [Wed, 4 Dec 2024 23:16:25 +0000 (18:16 -0500)] 
bcachefs: Fix failure to allocate journal write on discard retry

When allocating a journal write fails, then retries after doing
discards, we were failing to count already allocated replicas.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: BCH_ERR_insufficient_journal_devices
Kent Overstreet [Wed, 4 Dec 2024 22:53:38 +0000 (17:53 -0500)] 
bcachefs: BCH_ERR_insufficient_journal_devices

kill another standard error code use

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Silence "unable to allocate journal write" if we're already RO
Kent Overstreet [Wed, 4 Dec 2024 22:48:06 +0000 (17:48 -0500)] 
bcachefs: Silence "unable to allocate journal write" if we're already RO

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: trace_accounting_mem_insert
Kent Overstreet [Wed, 4 Dec 2024 22:44:25 +0000 (17:44 -0500)] 
bcachefs: trace_accounting_mem_insert

Add a tracepoint for inserting new accounting entries: we're seeing odd
spinning behaviour in accounting read.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Advance to next bp on BCH_ERR_backpointer_to_overwritten_btree_node
Kent Overstreet [Wed, 4 Dec 2024 06:19:28 +0000 (01:19 -0500)] 
bcachefs: Advance to next bp on BCH_ERR_backpointer_to_overwritten_btree_node

Don't spin.

Fixes: de95cc201a97 ("bcachefs: Kill bch2_get_next_backpointer()")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Simplify disk accounting validate late
Kent Overstreet [Wed, 4 Dec 2024 03:03:18 +0000 (22:03 -0500)] 
bcachefs: Simplify disk accounting validate late

The validate late path was iterating over accounting entries in
eytzinger order, which is unnecessarily tricky when we may have to
remove entries.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: logged ops only use inum 0 of logged ops btree
Kent Overstreet [Mon, 2 Dec 2024 02:35:11 +0000 (21:35 -0500)] 
bcachefs: logged ops only use inum 0 of logged ops btree

we wish to use the logged ops btree for other items that aren't strictly
logged ops: cursors for inode allocation

There's no reason to create another cached btree for inode allocator
cursors - so reserve different parts of the keyspace for different
purposes.

Older versions will ignore or delete the cursors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: rcu_pending now works in userspace
Kent Overstreet [Wed, 4 Dec 2024 02:22:26 +0000 (21:22 -0500)] 
bcachefs: rcu_pending now works in userspace

Introduce a typedef to handle the difference between unsigned
long/struct urcu_gp_poll_state.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: BCACHEFS_PATH_TRACEPOINTS should depend on TRACING
Geert Uytterhoeven [Tue, 3 Dec 2024 16:40:10 +0000 (17:40 +0100)] 
bcachefs: BCACHEFS_PATH_TRACEPOINTS should depend on TRACING

When tracing is disabled, there is no point in asking the user about
enabling extra btree_path tracepoints in bcachefs.

Fixes: 32ed4a620c5405be ("bcachefs: Btree path tracepoints")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Fix allocating too big journal entry
Kent Overstreet [Tue, 3 Dec 2024 04:36:38 +0000 (23:36 -0500)] 
bcachefs: Fix allocating too big journal entry

The "journal space available" calculations didn't take into account
mismatched bucket sizes; we need to take the minimum space available out
of our devices.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Improve "unable to allocate journal write" message
Kent Overstreet [Sun, 1 Dec 2024 21:39:54 +0000 (16:39 -0500)] 
bcachefs: Improve "unable to allocate journal write" message

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: fix bch2_journal_key_insert_take() seq
Kent Overstreet [Sun, 1 Dec 2024 04:27:45 +0000 (23:27 -0500)] 
bcachefs: fix bch2_journal_key_insert_take() seq

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: bch2_async_btree_node_rewrites_flush()
Kent Overstreet [Fri, 29 Nov 2024 23:53:26 +0000 (18:53 -0500)] 
bcachefs: bch2_async_btree_node_rewrites_flush()

Add a method to flush btree node rewrites at the end of recovery, to
ensure that corrected errors are persisted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: If we did repair on a btree node, make sure we rewrite it
Kent Overstreet [Fri, 29 Nov 2024 23:17:00 +0000 (18:17 -0500)] 
bcachefs: If we did repair on a btree node, make sure we rewrite it

Ensure that "invalid bkey" repair gets persisted, so that it doesn't
repeatedly spam the logs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: bkey_fsck_err now respects errors_silent
Kent Overstreet [Fri, 29 Nov 2024 23:20:42 +0000 (18:20 -0500)] 
bcachefs: bkey_fsck_err now respects errors_silent

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: list_pop_entry()
Kent Overstreet [Sat, 30 Nov 2024 00:13:54 +0000 (19:13 -0500)] 
bcachefs: list_pop_entry()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Convert write path errors to inum_to_path()
Kent Overstreet [Thu, 14 Nov 2024 04:08:57 +0000 (23:08 -0500)] 
bcachefs: Convert write path errors to inum_to_path()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: bch2_inum_to_path()
Kent Overstreet [Sat, 28 Sep 2024 19:40:49 +0000 (15:40 -0400)] 
bcachefs: bch2_inum_to_path()

Add a function for walking backpointers to find a path from a given
inode number, and convert various error messages to use it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Fix fsck.c build in userspace
Kent Overstreet [Sat, 30 Nov 2024 02:12:47 +0000 (21:12 -0500)] 
bcachefs: Fix fsck.c build in userspace

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Add missing parameter description to bch2_bucket_alloc_trans()
Yang Li [Fri, 29 Nov 2024 06:38:27 +0000 (14:38 +0800)] 
bcachefs: Add missing parameter description to bch2_bucket_alloc_trans()

The function bch2_bucket_alloc_trans() lacked a description for the
nowait parameter in its documentation comment block. This patch adds the
missing description to ensure all parameters are properly documented.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=12179
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Don't recurse in check_discard_freespace_key
Kent Overstreet [Fri, 29 Nov 2024 00:30:23 +0000 (19:30 -0500)] 
bcachefs: Don't recurse in check_discard_freespace_key

When calling check_discard_freeespace_key from the allocator, we can't
repair without recursing - run it asynchronously instead.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Check for extent crc uncompressed/compressed size mismatch
Kent Overstreet [Fri, 29 Nov 2024 00:02:18 +0000 (19:02 -0500)] 
bcachefs: Check for extent crc uncompressed/compressed size mismatch

When not compressed, these must be equal - this fixes an assertion pop
in bch2_rechecksum_bio().

Reported-by: syzbot+50d3544c9b8db9c99fd2@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: bch2_trans_relock() is trylock for lockdep
Kent Overstreet [Thu, 28 Nov 2024 23:05:06 +0000 (18:05 -0500)] 
bcachefs: bch2_trans_relock() is trylock for lockdep

fix some spurious lockdep splats

Reported-by: syzbot+e088be3c2d5c05aaac35@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: cryptographic MACs on superblock are not (yet?) supported
Kent Overstreet [Thu, 28 Nov 2024 22:57:55 +0000 (17:57 -0500)] 
bcachefs: cryptographic MACs on superblock are not (yet?) supported

We should add support for cryptographic macs on the superblock - and it
won't be hard, but it'll need an incompatible feature bit (and we have a
new incompatible feature versioning scheme coming).

For now, just add a guard to avoid a dull ptr deref in gen_poly_key().

Reported-by: syzbot+dd3d9835055dacb66f35@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Check for inode journal seq in the future
Kent Overstreet [Thu, 28 Nov 2024 22:48:20 +0000 (17:48 -0500)] 
bcachefs: Check for inode journal seq in the future

More check and repair code: this fixes a warning in
bch2_journal_flush_seq_async()

Reported-by: syzbot+d119b445ec739e7f3068@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Check for bucket journal seq in the future
Kent Overstreet [Thu, 28 Nov 2024 21:59:40 +0000 (16:59 -0500)] 
bcachefs: Check for bucket journal seq in the future

This fixes an assertion pop in bch2_journal_noflush_seq() - log the
error to the superblock and continue instead.

Reported-by: syzbot+85700120f75fc10d4e18@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: do_fsck_ask_yn()
Kent Overstreet [Thu, 28 Nov 2024 21:25:41 +0000 (16:25 -0500)] 
bcachefs: do_fsck_ask_yn()

__bch2_fsck_err() is huge, and badly needs more refactoring

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Don't error out when logging fsck error
Kent Overstreet [Thu, 28 Nov 2024 21:14:06 +0000 (16:14 -0500)] 
bcachefs: Don't error out when logging fsck error

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: mark more errors AUTOFIX
Kent Overstreet [Thu, 28 Nov 2024 21:09:15 +0000 (16:09 -0500)] 
bcachefs: mark more errors AUTOFIX

mark errors as autofix where syzbot has hit the repair paths

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: add missing printbuf_reset()
Kent Overstreet [Thu, 28 Nov 2024 21:09:04 +0000 (16:09 -0500)] 
bcachefs: add missing printbuf_reset()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Fix journal_iter list corruption
Kent Overstreet [Thu, 28 Nov 2024 20:10:24 +0000 (15:10 -0500)] 
bcachefs: Fix journal_iter list corruption

Fix exiting an iterator that wasn't initialized.

Reported-by: syzbot+2f7c2225ed8a5cb24af1@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Guard against backpointers to unknown btrees
Kent Overstreet [Thu, 28 Nov 2024 03:29:54 +0000 (22:29 -0500)] 
bcachefs: Guard against backpointers to unknown btrees

Reported-by: syzbot+997f0573004dcb964555@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Issue a transaction restart after commit in repair
Kent Overstreet [Thu, 28 Nov 2024 03:09:29 +0000 (22:09 -0500)] 
bcachefs: Issue a transaction restart after commit in repair

transaction commits invalidate pointers to btree values, and they also
downgrade intent locks.

This breaks the interior btree update path, which takes intent locks and
then calls into the allocator.

This isn't an ideal solution: we can't unconditionally issue a restart
after a transaction commit, because that would break other codepaths.

Reported-by: syzbot+78d82470c16a49702682@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Guard against journal seq overflow
Kent Overstreet [Thu, 28 Nov 2024 02:58:43 +0000 (21:58 -0500)] 
bcachefs: Guard against journal seq overflow

Wraparound is impractical to handle since in various places we use 0 as
a sentinal value - but 64 bits (or 56, because the btree write buffer
steals a few bits) is enough for all practical purposes.

Reported-by: syzbot+73ed43fbe826227bd4e0@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: BCH_FS_recovery_running
Kent Overstreet [Wed, 27 Nov 2024 08:00:54 +0000 (03:00 -0500)] 
bcachefs: BCH_FS_recovery_running

If we're autofixing topology errors, we shouldn't shutdown if we're
still in recovery.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Make topology errors autofix
Kent Overstreet [Mon, 25 Nov 2024 02:28:07 +0000 (21:28 -0500)] 
bcachefs: Make topology errors autofix

These repair paths are well tested, we can repair them without explicit
user intervention

This also tweaks bch2_topology_error() so that we run topology repair if
we're in recovery, not just fsck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: struct bkey_validate_context
Kent Overstreet [Wed, 27 Nov 2024 05:29:52 +0000 (00:29 -0500)] 
bcachefs: struct bkey_validate_context

Add a new parameter to bkey validate functions, and use it to improve
invalid bkey error messages: we can now print the btree and depth it
came from, or if it came from the journal, or is a btree root.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Ignore empty btree root journal entries
Kent Overstreet [Wed, 27 Nov 2024 06:03:41 +0000 (01:03 -0500)] 
bcachefs: Ignore empty btree root journal entries

There's no reason to treat them as errors: just ignore them, and go with
a previous btree root if we had one.

Reported-by: syzbot+e22007d6acb9c87c2362@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Fix null ptr deref in btree_path_lock_root()
Kent Overstreet [Wed, 27 Nov 2024 03:59:27 +0000 (22:59 -0500)] 
bcachefs: Fix null ptr deref in btree_path_lock_root()

Historically, we required that all btree node roots point to a valid
(possibly fake) node, but we're improving our ability to continue in the
presence of errors.

Reported-by: syzbot+e22007d6acb9c87c2362@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Go RW earlier, for normal rw mount
Kent Overstreet [Wed, 27 Nov 2024 02:27:16 +0000 (21:27 -0500)] 
bcachefs: Go RW earlier, for normal rw mount

Previously, when mounting read-write after a clean shutdown, we wouldn't
go read-write until after all the recovery passes completed.

Now, go RW early in recovery, the same as any other situation we'll need
to go read-write. This fixes a bug where we discover unlinked inodes
after a clean shutdown: repair fails because we're read only.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Fix bch2_btree_node_update_key_early()
Kent Overstreet [Tue, 26 Nov 2024 20:16:57 +0000 (15:16 -0500)] 
bcachefs: Fix bch2_btree_node_update_key_early()

Fix an assertion pop from the recent btree cache freelist fixes.

Fixes: baefd3f849ed ("bcachefs: btree_cache.freeable list fixes")
Reported-by: Tyler <th020394@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Change "disk accounting version 0" check to commit only
Kent Overstreet [Mon, 25 Nov 2024 22:03:13 +0000 (17:03 -0500)] 
bcachefs: Change "disk accounting version 0" check to commit only

6.11 had a bug where we'd sometimes create disk accounting keys with
version 0, which causes issues for journal replay - but we don't need to
delete existing accounting keys with version 0.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Don't try to en/decrypt when encryption not available
Kent Overstreet [Mon, 25 Nov 2024 07:05:02 +0000 (02:05 -0500)] 
bcachefs: Don't try to en/decrypt when encryption not available

If a btree node says it's encrypted, but the superblock never had an
encryptino key - whoops, that needs to be handled.

Reported-by: syzbot+026f1857b12f5eb3f9e9@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Fix dup/misordered check in btree node read
Kent Overstreet [Mon, 25 Nov 2024 06:26:56 +0000 (01:26 -0500)] 
bcachefs: Fix dup/misordered check in btree node read

We were checking for out of order keys, but not duplicate keys.

Reported-by: syzbot+dedbd67513939979f84f@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Bad btree roots are now autofix
Kent Overstreet [Mon, 25 Nov 2024 05:21:27 +0000 (00:21 -0500)] 
bcachefs: Bad btree roots are now autofix

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Kill bch2_bucket_alloc_new_fs()
Kent Overstreet [Mon, 25 Nov 2024 04:28:21 +0000 (23:28 -0500)] 
bcachefs: Kill bch2_bucket_alloc_new_fs()

The early-early allocation path, bch2_bucket_alloc_new_fs(), is no
longer needed - and inconsistencies around new_fs_bucket_idx have been a
frequent source of bugs.

Reported-by: syzbot+592425844580a6598410@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Fix btree node scan when unknown btree IDs are present
Kent Overstreet [Mon, 25 Nov 2024 03:57:01 +0000 (22:57 -0500)] 
bcachefs: Fix btree node scan when unknown btree IDs are present

btree_root entries for unknown btree IDs are created during recovery,
before reading those btree roots.

But btree_node_scan may find btree nodes with unknown btree IDs when we
haven't seen roots for those btrees.

Reported-by: syzbot+1f202d4da221ec6ebf8e@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: backpointer_to_missing_ptr is now autofix
Kent Overstreet [Mon, 25 Nov 2024 03:45:25 +0000 (22:45 -0500)] 
bcachefs: backpointer_to_missing_ptr is now autofix

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Fix accounting_read when we rewind
Kent Overstreet [Mon, 25 Nov 2024 03:28:41 +0000 (22:28 -0500)] 
bcachefs: Fix accounting_read when we rewind

If we rewind recovery to run topology repair, that causes
accounting_read to run twice.

This fixes accounting being double counted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: disk_accounting: bch2_dev_rcu -> bch2_dev_rcu_noerror
Kent Overstreet [Mon, 25 Nov 2024 03:23:41 +0000 (22:23 -0500)] 
bcachefs: disk_accounting: bch2_dev_rcu -> bch2_dev_rcu_noerror

Accounting keys that reference invalid devices are corrected by fsck,
they shouldn't cause an emergency shutdown.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: errcode cleanup: journal errors
Kent Overstreet [Mon, 25 Nov 2024 02:49:08 +0000 (21:49 -0500)] 
bcachefs: errcode cleanup: journal errors

Instead of throwing standard error codes, we should be throwing
dedicated private error codes, this greatly improves debugability.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Use separate rhltable for bch2_inode_or_descendents_is_open()
Kent Overstreet [Mon, 25 Nov 2024 01:15:30 +0000 (20:15 -0500)] 
bcachefs: Use separate rhltable for bch2_inode_or_descendents_is_open()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: BCH_ERR_btree_node_read_error_cached
Kent Overstreet [Sun, 24 Nov 2024 03:12:58 +0000 (22:12 -0500)] 
bcachefs: BCH_ERR_btree_node_read_error_cached

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: btree_write_buffer_flush_seq() no longer closes journal
Kent Overstreet [Tue, 23 Apr 2024 06:18:18 +0000 (02:18 -0400)] 
bcachefs: btree_write_buffer_flush_seq() no longer closes journal

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: discard fastpath now uses bch2_discard_one_bucket()
Kent Overstreet [Fri, 22 Nov 2024 01:09:45 +0000 (20:09 -0500)] 
bcachefs: discard fastpath now uses bch2_discard_one_bucket()

The discard bucket fastpath previously was using its own code for
discarding buckets and clearing them in the need_discard btree, which
didn't have any of the consistency checks of the main discard path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Bias reads more in favor of faster device
Kent Overstreet [Sat, 23 Nov 2024 21:47:10 +0000 (16:47 -0500)] 
bcachefs: Bias reads more in favor of faster device

Per reports of performance issues on mixed multi device filesystems
where we're issuing too much IO to the spinning rust - tweak this
algorithm.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: trivial btree write buffer refactoring
Kent Overstreet [Sat, 23 Nov 2024 23:21:12 +0000 (18:21 -0500)] 
bcachefs: trivial btree write buffer refactoring

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Can now block journal activity without closing cur entry
Kent Overstreet [Sat, 23 Nov 2024 21:27:47 +0000 (16:27 -0500)] 
bcachefs: Can now block journal activity without closing cur entry

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: New backpointers helpers
Kent Overstreet [Fri, 15 Nov 2024 02:34:43 +0000 (21:34 -0500)] 
bcachefs: New backpointers helpers

- bch2_backpointer_del()
- bch2_backpointer_maybe_flush()

Kill a bit of open coding and make sure we're properly handling the
btree write buffer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: kill bch_backpointer.bucket_offset usage
Kent Overstreet [Sun, 17 Nov 2024 23:26:54 +0000 (18:26 -0500)] 
bcachefs: kill bch_backpointer.bucket_offset usage

bch_backpointer.bucket_offset is going away - it's no longer needed
since we no longer store backpointers in alloc keys, the same
information is in the key position itself.

And we'll be reclaiming the space in bch_backpointer for the bucket
generation number.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Fix check_backpointers_to_extents range limiting
Kent Overstreet [Mon, 18 Nov 2024 05:32:57 +0000 (00:32 -0500)] 
bcachefs: Fix check_backpointers_to_extents range limiting

bch2_get_btree_in_memory_pos() will return positions that refer directly
to the btree it's checking will fit in memory - i.e. backpointer
positions, not buckets.

This also means check_bp_exists() no longer has to refer to the device,
and we can delete some code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: bch_backpointer -> bkey_i_backpointer
Kent Overstreet [Fri, 15 Nov 2024 22:36:09 +0000 (17:36 -0500)] 
bcachefs: bch_backpointer -> bkey_i_backpointer

Since we no longer store backpointers in alloc keys, there's no reason
not to pass around bkey_i_backpointers; this means we don't have to pass
the bucket pos separately.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Drop swab code for backpointers in alloc keys
Kent Overstreet [Fri, 15 Nov 2024 22:45:44 +0000 (17:45 -0500)] 
bcachefs: Drop swab code for backpointers in alloc keys

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: bucket_pos_to_bp_end()
Kent Overstreet [Fri, 15 Nov 2024 21:30:30 +0000 (16:30 -0500)] 
bcachefs: bucket_pos_to_bp_end()

Better helpers for iterating over backpointers within a specific bucket

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: check for backpointers to invalid device
Kent Overstreet [Mon, 18 Nov 2024 05:16:52 +0000 (00:16 -0500)] 
bcachefs: check for backpointers to invalid device

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: fix bp_pos_to_bucket_nodev_noerror
Kent Overstreet [Fri, 15 Nov 2024 03:49:40 +0000 (22:49 -0500)] 
bcachefs: fix bp_pos_to_bucket_nodev_noerror

_noerror means don't produce inconsistent errors, so it should be using
bch2_dev_rcu_noerror().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Fix evacuate_bucket tracepoint
Kent Overstreet [Mon, 9 Dec 2024 11:18:49 +0000 (06:18 -0500)] 
bcachefs: Fix evacuate_bucket tracepoint

86a494c8eef9 ("bcachefs: Kill bch2_get_next_backpointer()") dropped some
things the tracepoint emitted because bch2_evacuate_bucket() no longer
looks at the alloc key - but we did want at least some of that.

We still no longer look at the alloc key so we can't report on the
fragmentation number, but that's a direct function of dirty_sectors and
a copygc concern anyways - copygc should get its own tracepoint that
includes information from the fragmentation LRU.

But we can report on the number of sectors we moved and the bucket size.

Co-developed-by: Piotr Zalewski <pZ010001011111@proton.me>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: fix O(n^2) issue with whiteouts in journal keys
Kent Overstreet [Sun, 17 Nov 2024 07:23:24 +0000 (02:23 -0500)] 
bcachefs: fix O(n^2) issue with whiteouts in journal keys

The journal_keys array can't be substantially modified after we go RW,
because lookups need to be able to check it locklessly - thus we're
limited on what we can do when a key in the journal has been
overwritten.

This is a problem when there's many overwrites to skip over for peek()
operations. To fix this, add tracking of ranges of overwrites: we create
a range entry when there's more than one contiguous whiteout.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: btree_and_journal_iter: don't iterate over too many whiteouts when prefetching
Kent Overstreet [Sun, 17 Nov 2024 19:39:46 +0000 (14:39 -0500)] 
bcachefs: btree_and_journal_iter: don't iterate over too many whiteouts when prefetching

To help ameloriate issues with peek operations having to skip over
deletions in the journal - just bail out if all we're doing is
prefetching btree nodes.

Since btree node prefetching runs every time we iterate to a new node,
and has to sequentially scan ahead, this avoids another O(n^2).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: journal keys: sort keys for interior nodes first
Kent Overstreet [Sun, 17 Nov 2024 19:20:35 +0000 (14:20 -0500)] 
bcachefs: journal keys: sort keys for interior nodes first

There's an unavoidable issue with btree lookups when we're overlaying
journal keys and the journal has many deletions for keys present in the
btree - peek operations will have to iterate over all those deletions to
find the next live key to return.

This is mainly a problem for lookups in interior nodes, if we have to
traverse to a leaf. Looking up an insert position in a leaf (for journal
replay) doesn't have to find the next live key, but walking down the
btree does.

So to ameloriate this, change journal key sort ordering so that we
replay keys from roots and interior nodes first.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: kill bch2_journal_entries_free()
Kent Overstreet [Sun, 17 Nov 2024 04:54:19 +0000 (23:54 -0500)] 
bcachefs: kill bch2_journal_entries_free()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Don't BUG_ON() when superblock feature wasn't set for compressed data
Kent Overstreet [Fri, 15 Nov 2024 04:03:40 +0000 (23:03 -0500)] 
bcachefs: Don't BUG_ON() when superblock feature wasn't set for compressed data

We don't allocate the mempools for compression/decompression unless we
need them - but that means there's an inconsistency to check for.

Reported-by: syzbot+cb3fbcfb417448cfd278@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Don't use a shared decompress workspace mempool
Kent Overstreet [Fri, 15 Nov 2024 05:52:20 +0000 (00:52 -0500)] 
bcachefs: Don't use a shared decompress workspace mempool

gzip and zstd require different decompress workspace sizes, and if we
start with one and then start using the other at runtime we may not get
the correct size

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: compression workspaces should be indexed by opt, not type
Kent Overstreet [Sun, 17 Nov 2024 02:03:53 +0000 (21:03 -0500)] 
bcachefs: compression workspaces should be indexed by opt, not type

type includes lz4 and lz4_old, which do not get different compression
workspaces, and incompressible, a fake type - BCH_COMPRESSION_OPTS() is
the correct enum to use.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: add missing BTREE_ITER_intent
Kent Overstreet [Sun, 17 Nov 2024 08:31:01 +0000 (03:31 -0500)] 
bcachefs: add missing BTREE_ITER_intent

this fixes excessive transaction restarts due to trans_commit having to
upgrade

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Kill bch2_get_next_backpointer()
Kent Overstreet [Fri, 15 Nov 2024 02:53:38 +0000 (21:53 -0500)] 
bcachefs: Kill bch2_get_next_backpointer()

Since for quite some time backpointers have only been stored in the
backpointers btree, not alloc keys (an aborted experiment, support for
which has been removed) - we can replace get_next_backpointer() with
simple btree iteration.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Delete backpointers check in try_alloc_bucket()
Kent Overstreet [Fri, 15 Nov 2024 02:28:40 +0000 (21:28 -0500)] 
bcachefs: Delete backpointers check in try_alloc_bucket()

try_alloc_bucket() has a "safety" check, which avoids allocating a
bucket if there's any backpointers present.

But backpointers are not the source of truth for live data in a bucket,
the bucket sector counts are; this check was fairly useless, and we're
also deferring backpointers checks from fsck to runtime in the near
future.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: peek_prev_min(): Search forwards for extents, snapshots
Kent Overstreet [Sat, 26 Oct 2024 00:41:06 +0000 (20:41 -0400)] 
bcachefs: peek_prev_min(): Search forwards for extents, snapshots

With extents and snapshots, for slightly different reasons, we may have
to search forwards to find a key that compares equal to iter->pos (i.e.
a key that peek_prev() should return, as it returns keys <= iter->pos).

peek_slot() does this, and is an easy way to fix this case.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Implement bch2_btree_iter_prev_min()
Kent Overstreet [Fri, 25 Oct 2024 02:12:37 +0000 (22:12 -0400)] 
bcachefs: Implement bch2_btree_iter_prev_min()

A user contributed a filessytem dump, where the dump was actually
corrupted (due to being taken while the filesystem was online), but
which exposed an interesting bug in fsck - reconstruct_inode().

When itearting in BTREE_ITER_filter_snapshots mode, it's required to
give an end position for the iteration and it can't span inode numbers;
continuing into the next inode might mean we start seeing keys from a
different snapshot tree, that the is_ancestor() checks always filter,
thus we're never able to return a key and stop iterating.

Backwards iteration never implemented the end position because nothing
else needed it - except for reconstuct_inode().

Additionally, backwards iteration is now able to overlay keys from the
journal, which will be useful if we ever decide to start doing journal
replay in the background.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: discard_one_bucket() now uses need_discard_or_freespace_err()
Kent Overstreet [Sun, 27 Oct 2024 03:25:17 +0000 (23:25 -0400)] 
bcachefs: discard_one_bucket() now uses need_discard_or_freespace_err()

More conversion of inconsistent errors to fsck errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: bch2_bucket_do_index(): inconsistent_err -> fsck_err
Kent Overstreet [Sun, 27 Oct 2024 02:21:20 +0000 (22:21 -0400)] 
bcachefs: bch2_bucket_do_index(): inconsistent_err -> fsck_err

Factor out a common helper, need_discard_or_freespace_err(), which is
now used by both fsck and the runtime checks, and can repair.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: try_alloc_bucket() now uses bch2_check_discard_freespace_key()
Kent Overstreet [Sun, 27 Oct 2024 04:40:43 +0000 (00:40 -0400)] 
bcachefs: try_alloc_bucket() now uses bch2_check_discard_freespace_key()

check_discard_freespace_key() was doing all the same checks as
try_alloc_bucket(), but with repair.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: rework bch2_bucket_alloc_freelist() freelist iteration
Kent Overstreet [Mon, 28 Oct 2024 00:47:03 +0000 (20:47 -0400)] 
bcachefs: rework bch2_bucket_alloc_freelist() freelist iteration

Prep work for converting try_alloc_bucket() to use
bch2_check_discard_freespace_key().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: kill inconsistent err in invalidate_one_bucket()
Kent Overstreet [Sun, 27 Oct 2024 04:05:54 +0000 (00:05 -0400)] 
bcachefs: kill inconsistent err in invalidate_one_bucket()

Change it to a normal fsck_err() - meaning it'll get repaired at runtime
when that's flipped on.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Don't delete reflink pointers to missing indirect extents
Kent Overstreet [Mon, 21 Oct 2024 00:27:44 +0000 (20:27 -0400)] 
bcachefs: Don't delete reflink pointers to missing indirect extents

To avoid tragic loss in the event of transient errors (i.e., a btree
node topology error that was later corrected by btree node scan), we
can't delete reflink pointers to correct errors.

This adds a new error bit to bch_reflink_p, indicating that it is known
to point to a missing indirect extent, and the error has already been
reported.

Indirect extent lookups now use bch2_lookup_indirect_extent(), which on
error reports it as a fsck_err() and sets the error bit, and clears it
if necessary on succesful lookup.

This also gets rid of the bch2_inconsistent_error() call in
__bch2_read_indirect_extent, and in the reflink_p trigger: part of the
online self healing project.

An on disk format change isn't necessary here: setting the error bit
will be interpreted by older versions as pointing to a different index,
which will also be missing - which is fine.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Reorganize reflink.c a bit
Kent Overstreet [Thu, 31 Oct 2024 05:25:09 +0000 (01:25 -0400)] 
bcachefs: Reorganize reflink.c a bit

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Reserve 8 bits in bch_reflink_p
Kent Overstreet [Tue, 29 Oct 2024 03:43:16 +0000 (23:43 -0400)] 
bcachefs: Reserve 8 bits in bch_reflink_p

Better repair for reflink pointers, as well as propagating new inode
options to indirect extents, are going to require a few extra bits
bch_reflink_p: so claim a few from the high end of the destination
index.

Also add some missing bounds checking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: Kill FSCK_NEED_FSCK
Kent Overstreet [Tue, 29 Oct 2024 01:27:23 +0000 (21:27 -0400)] 
bcachefs: Kill FSCK_NEED_FSCK

If we find an error that indicates that we need to run fsck, we can
specify that directly with run_explicit_recovery_pass().

These are now log_fsck_err() calls: we're just logging in the superblock
that an error occurred - and possibly doing an emergency shutdown,
depending on policy.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
6 months agobcachefs: lru errors are expected when reconstructing alloc
Kent Overstreet [Tue, 29 Oct 2024 05:17:08 +0000 (01:17 -0400)] 
bcachefs: lru errors are expected when reconstructing alloc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>