]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
3 weeks agobcachefs: bch2_get_snapshot_overwrites()
Kent Overstreet [Wed, 28 May 2025 19:08:19 +0000 (15:08 -0400)] 
bcachefs: bch2_get_snapshot_overwrites()

New helper for getting a list of snapshot IDs that have overwritten a
given key.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: bch2_dev_journal_bucket_delete()
Kent Overstreet [Wed, 28 May 2025 18:26:33 +0000 (14:26 -0400)] 
bcachefs: bch2_dev_journal_bucket_delete()

Recover from "journal and btree in same bucket".

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Runtime self healing for keys for deleted snapshots
Kent Overstreet [Wed, 28 May 2025 02:20:27 +0000 (22:20 -0400)] 
bcachefs: Runtime self healing for keys for deleted snapshots

If snapshot deletion incorrectly missing some keys and leaves keys for
deleted snapshots, that causes a bit of a problem for data move - we
can't move an extent for a nonexistent snapshot, because the extent
might have to be fragmented, and maintaining correct visibility in child
snapshots doesn't work if it doesn't have a snapshot.

Previously we'd just skip these keys, but it turns out that causes
copygc to spin.

So we need runtime self healing, i.e. calling check_key_has_snapshot()
from the data move path.

Snapshot deletion v2 included sentinal values for deleted snapshot
nodes, so this is quite safe.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Don't unlock trans before data_update_init()
Kent Overstreet [Wed, 28 May 2025 20:06:07 +0000 (16:06 -0400)] 
bcachefs: Don't unlock trans before data_update_init()

data_update_init() does need to do btree operations, delay doing the
unlock-before-io.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Use bch2_err_matches() for BCH_ERR_fsck_(fix|ignore)
Kent Overstreet [Wed, 28 May 2025 15:31:51 +0000 (11:31 -0400)] 
bcachefs: Use bch2_err_matches() for BCH_ERR_fsck_(fix|ignore)

We'll be adding subtypes of these errors, and new error code tracing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Mark bch_errcode helpers __attribute__((const))
Kent Overstreet [Wed, 28 May 2025 15:27:59 +0000 (11:27 -0400)] 
bcachefs: Mark bch_errcode helpers __attribute__((const))

These don't access global memory or defer pointer arguments - this
enables CSE optimizations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Add missing printbuf_reset() in bch2_check_dirent_inode_dirent()
Kent Overstreet [Thu, 29 May 2025 19:02:37 +0000 (15:02 -0400)] 
bcachefs: Add missing printbuf_reset() in bch2_check_dirent_inode_dirent()

We were accidentally including the contents from the previous
fsck_err().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: sysfs/errors
Kent Overstreet [Wed, 28 May 2025 05:00:34 +0000 (01:00 -0400)] 
bcachefs: sysfs/errors

Make the superblock error counters available in sysfs; the only other
way they can be seen is 'show-super', but we don't write the superblock
every time the error count gets incremented.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: bch2_check_fix_ptrs() can now repair btree roots
Kent Overstreet [Wed, 28 May 2025 00:51:00 +0000 (20:51 -0400)] 
bcachefs: bch2_check_fix_ptrs() can now repair btree roots

This is straightforward enough: check_fix_ptrs() currently only runs
before we go RW, so updating the btree root pointer in c->btree_roots
suffices - it'll be written out in the first journal write we do.

For that, do_bch2_trans_commit_to_journal_replay() now handles
JSET_ENTRY_btree_root entries.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Include b->ob.nr in cached_btree_node_to_text()
Kent Overstreet [Tue, 27 May 2025 18:39:43 +0000 (14:39 -0400)] 
bcachefs: Include b->ob.nr in cached_btree_node_to_text()

We have a bug report that looks like we might be leaking open buckets -
let's check if they got left attached to the cached btree node.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Move devs_sorted to alloc_request
Kent Overstreet [Mon, 26 May 2025 21:03:48 +0000 (17:03 -0400)] 
bcachefs: Move devs_sorted to alloc_request

More stack usage work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: reduce stack usage in alloc_sectors_start()
Kent Overstreet [Mon, 26 May 2025 21:15:11 +0000 (17:15 -0400)] 
bcachefs: reduce stack usage in alloc_sectors_start()

with typical config options, variables in different inline functions
aren't sharing stack space - and these are slowpaths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: bch2_alloc_v4_to_text()
Kent Overstreet [Mon, 26 May 2025 18:24:19 +0000 (14:24 -0400)] 
bcachefs: bch2_alloc_v4_to_text()

Specialize the .to_text() for alloc_v4, to avoid the temporary on the
stack for conversion from old versions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Tweak bch2_data_update_init() for stack usage
Kent Overstreet [Mon, 26 May 2025 17:26:10 +0000 (13:26 -0400)] 
bcachefs: Tweak bch2_data_update_init() for stack usage

- Separate out a slowpath for bkey_nocow_lock()
- Don't call bch2_bkey_ptrs_c() or loop over pointers more than
  necessary

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: kill replicas_sectors arg to __trigger_extent()
Kent Overstreet [Mon, 26 May 2025 18:15:28 +0000 (14:15 -0400)] 
bcachefs: kill replicas_sectors arg to __trigger_extent()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Don't stack allocate bch_writepage_state
Kent Overstreet [Mon, 26 May 2025 20:16:17 +0000 (16:16 -0400)] 
bcachefs: Don't stack allocate bch_writepage_state

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: factor out break_cycle_fail()
Kent Overstreet [Mon, 26 May 2025 20:29:56 +0000 (16:29 -0400)] 
bcachefs: factor out break_cycle_fail()

More stack usage work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: btree_node_missing_err()
Kent Overstreet [Mon, 26 May 2025 16:48:19 +0000 (12:48 -0400)] 
bcachefs: btree_node_missing_err()

Factor out an error path for a small stack usage improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Kill bkey_buf in btree_path_down()
Kent Overstreet [Sun, 25 May 2025 21:56:45 +0000 (17:56 -0400)] 
bcachefs: Kill bkey_buf in btree_path_down()

Allocate some (smaller) temporary storage in btree_trans for this -
btree_path_down() is in our max-stack call stack.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Add missing error logging in delete_dead_inodes()
Kent Overstreet [Wed, 28 May 2025 00:37:21 +0000 (20:37 -0400)] 
bcachefs: Add missing error logging in delete_dead_inodes()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Fix misaligned bucket check in journal space calculations
Kent Overstreet [Wed, 28 May 2025 02:06:04 +0000 (22:06 -0400)] 
bcachefs: Fix misaligned bucket check in journal space calculations

Fix an assertion pop in the tiering_misaligned test: rounding down to
bucket size at the end of the journal space calculations leaves
cur_entry_sectors == 0, which is incorrect with !cur_entry_err.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Fix incorrect multiple dev check in journal write path
Kent Overstreet [Wed, 28 May 2025 00:37:50 +0000 (20:37 -0400)] 
bcachefs: Fix incorrect multiple dev check in journal write path

It's uncomon to have multiple devices with journalling only on a subset,
but can be specified with the 'data_allowed' option. We need to know if
we're doing data/metadata writes to multiple devices, as that requires
issuing flushes before the journal writes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Catch data_update_done events in trace_io_move_start_fail
Kent Overstreet [Wed, 28 May 2025 01:45:56 +0000 (21:45 -0400)] 
bcachefs: Catch data_update_done events in trace_io_move_start_fail

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: io_move_evacuate_bucket tracepoint, counter
Kent Overstreet [Wed, 28 May 2025 01:54:22 +0000 (21:54 -0400)] 
bcachefs: io_move_evacuate_bucket tracepoint, counter

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: trace_io_move_pred
Kent Overstreet [Tue, 27 May 2025 03:00:21 +0000 (23:00 -0400)] 
bcachefs: trace_io_move_pred

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Fix infinite loop in journal_entry_btree_keys_to_text()
Kent Overstreet [Sun, 25 May 2025 21:04:11 +0000 (17:04 -0400)] 
bcachefs: Fix infinite loop in journal_entry_btree_keys_to_text()

Fix an infinite loop when bkey_i->k.u64s is 0.

This only happens in userspace, where 'bcachefs list_journal' can print
the entire contents of the journal, and non-dirty entries aren't
validated.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
3 weeks agobcachefs: Journal read error message improvements
Kent Overstreet [Mon, 26 May 2025 16:21:57 +0000 (12:21 -0400)] 
bcachefs: Journal read error message improvements

- Don't print a checksum error when we first read a journal entry: we
  print a checksum error later if we'll be using the journal entry.

- Continuing with the theme of of improving error messages and grouping
  errors into a single log message per error, print a single 'checksum
  error' message per journal entry, and use bch2_journal_ptr_to_text()
  to print out where on the device it was.

- Factor out checksum error messages and checking for missing journal
  entries into helpers, bch2_journal_read() has gotten obnoxiously big.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Don't rewind to run a recovery pass we already ran
Kent Overstreet [Mon, 26 May 2025 15:12:53 +0000 (11:12 -0400)] 
bcachefs: Don't rewind to run a recovery pass we already ran

Fix a small regression from the "run recovery passes" rewrite, which
enabled async recovery passes.

This fixes getting stuck in a loop in recovery.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Move unicode message to after the startup message
Kent Overstreet [Sun, 25 May 2025 15:51:33 +0000 (11:51 -0400)] 
bcachefs: Move unicode message to after the startup message

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix missing commit in check_dirents
Kent Overstreet [Sat, 24 May 2025 23:53:03 +0000 (19:53 -0400)] 
bcachefs: Fix missing commit in check_dirents

Other repair code seems to be doing commits themselves, but
check_key_has_snapshot() does not.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix lost rebalance wakeups
Kent Overstreet [Sat, 24 May 2025 19:29:50 +0000 (15:29 -0400)] 
bcachefs: Fix lost rebalance wakeups

Fix a missing wakeup in

'bcachefs set-file-option' -> xattr option update -> inode_write

this was missing because the wakeup needs to happen after transaction
commit. Also, add a 'kick' counter, to make sure we don't miss a wakeup
that occured right after we finished checking the rebalance_work btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_kthread_io_clock_wait_once()
Kent Overstreet [Sat, 24 May 2025 19:24:00 +0000 (15:24 -0400)] 
bcachefs: bch2_kthread_io_clock_wait_once()

Add a version of bch2_kthread_io_clock_wait() that only schedules once -
behaving more like schedule_timeout().

This will be used for fixing rebalance wakeups.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Ensure we print output of run_recovery_pass if it errors
Kent Overstreet [Sat, 24 May 2025 18:37:20 +0000 (14:37 -0400)] 
bcachefs: Ensure we print output of run_recovery_pass if it errors

Also, don't error out in bucket_ref_update_err(): we don't want to
return -BCH_ERR_cannot_rewind_recovery if it's not an insert, if it's an
overwrite we continue.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix missing BTREE_UPDATE_internal_snapshot_node
Kent Overstreet [Sat, 24 May 2025 18:20:58 +0000 (14:20 -0400)] 
bcachefs: Fix missing BTREE_UPDATE_internal_snapshot_node

Repair code will do updates on older snapshot versions, so needs the
correct annotation.

Reported-by: syzbot+42581416dba62b364750@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: fix REFLINK_P_MAY_UPDATE_OPTIONS
Kent Overstreet [Sat, 24 May 2025 05:56:10 +0000 (01:56 -0400)] 
bcachefs: fix REFLINK_P_MAY_UPDATE_OPTIONS

If we're doing a reflink copy of existing reflinked data, we may only
set REFLINK_P_MAY_UPDATE_OPTIONS if it was set on the reflink pointer
we're copying from.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Don't mount bs > ps without TRANSPARENT_HUGEPAGE
Kent Overstreet [Sat, 24 May 2025 01:59:12 +0000 (21:59 -0400)] 
bcachefs: Don't mount bs > ps without TRANSPARENT_HUGEPAGE

Large folios aren't supported without TRANSPARENT_HUGEPAGE

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix btree_iter_next_node() for new locking asserts
Kent Overstreet [Sat, 24 May 2025 00:11:43 +0000 (20:11 -0400)] 
bcachefs: Fix btree_iter_next_node() for new locking asserts

We can't unlock a should_be_locked path unless we're in a transaction
restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Ensure we don't use a blacklisted journal seq
Kent Overstreet [Fri, 23 May 2025 18:03:06 +0000 (14:03 -0400)] 
bcachefs: Ensure we don't use a blacklisted journal seq

Different versions differ on the size of the blacklist range; it is
theoretically possible that we could end up with blacklisted journal
sequence numbers newer than the newest seq we find in the journal, and
pick a new start seq that's blacklisted.

Explicitly check for this in bch2_fs_journal_start().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Small check_fix_ptr fixes
Kent Overstreet [Fri, 23 May 2025 18:19:25 +0000 (14:19 -0400)] 
bcachefs: Small check_fix_ptr fixes

We don't want to change the bucket gen, on gen mismatch: it's possible
to have multiple btree nodes with different gens in the same bucket that
we want to keep, if we have to recover from btree node scan.

It's also not necessary to set g->gen_valid; add a comment to that
effect.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix opts.recovery_pass_last
Kent Overstreet [Fri, 23 May 2025 22:31:53 +0000 (18:31 -0400)] 
bcachefs: Fix opts.recovery_pass_last

This was lost in the giant recovery pass rework - but it's used heavily
by bcachefs subcommand utilities.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix allocate -> self healing path
Kent Overstreet [Fri, 23 May 2025 22:30:10 +0000 (18:30 -0400)] 
bcachefs: Fix allocate -> self healing path

When we go to allocate and find taht a bucket in the freespace btree is
actually allocated, we're supposed to return nonzero to tell the
allocator to skip it.

This fixes an emergency read only due to a bucket/ptr gen mismatch - we
also don't return the correct bucket gen when this happens.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix endianness in casefold check/repair
Kent Overstreet [Fri, 23 May 2025 17:13:44 +0000 (13:13 -0400)] 
bcachefs: Fix endianness in casefold check/repair

Fixes: 010c89468134 ("bcachefs: Check for casefolded dirents in non casefolded dirs")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Path must be locked if trans->locked && should_be_locked
Kent Overstreet [Wed, 10 Apr 2024 03:53:57 +0000 (23:53 -0400)] 
bcachefs: Path must be locked if trans->locked && should_be_locked

If path->should_be_locked is true, that means user code (of the btree
API) has seen, in this transaction, something guarded by the node this
path has locked, and we have to keep it locked until the end of the
transaction.

Assert that we're not violating this; should_be_locked should also be
cleared only in _very_ special situations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Simplify bch2_path_put()
Kent Overstreet [Thu, 22 May 2025 19:52:15 +0000 (15:52 -0400)] 
bcachefs: Simplify bch2_path_put()

Simplify the "do we need to keep this locked?" checks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Plumb btree_trans for more locking asserts
Kent Overstreet [Thu, 22 May 2025 19:33:14 +0000 (15:33 -0400)] 
bcachefs: Plumb btree_trans for more locking asserts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Clear trans->locked before unlock
Kent Overstreet [Thu, 22 May 2025 20:04:15 +0000 (16:04 -0400)] 
bcachefs: Clear trans->locked before unlock

We're adding new should_be_locked assertions: it's going to be illegal
to unlock a should_be_locked path when trans->locked is true.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Clear should_be_locked before unlock in key_cache_drop()
Kent Overstreet [Thu, 22 May 2025 20:03:08 +0000 (16:03 -0400)] 
bcachefs: Clear should_be_locked before unlock in key_cache_drop()

We're adding new should_be_locked assertions, also add a comment
explaining why clearing should_be_locked is safe here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_path_get() reuses paths if upgrade_fails & !should_be_locked
Kent Overstreet [Thu, 22 May 2025 22:12:54 +0000 (18:12 -0400)] 
bcachefs: bch2_path_get() reuses paths if upgrade_fails & !should_be_locked

Small additional optimization over the previous patch, bringing us
closer to the original behaviour, except when we need to clone to avoid
a transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Give out new path if upgrade fails
Kent Overstreet [Thu, 22 May 2025 22:00:45 +0000 (18:00 -0400)] 
bcachefs: Give out new path if upgrade fails

Avoid transaction restarts due to failure to upgrade - we can traverse a
new iterator without a transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix btree_path_get_locks when not doing trans restart
Kent Overstreet [Thu, 22 May 2025 20:54:31 +0000 (16:54 -0400)] 
bcachefs: Fix btree_path_get_locks when not doing trans restart

btree_path_get_locks, on failure, shouldn't unlock if we're not issuing
a transaction restart: we might drop locks we're not supposed to (if
path->should_be_locked is set).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: btree_node_locked_type_nowrite()
Kent Overstreet [Thu, 22 May 2025 22:03:32 +0000 (18:03 -0400)] 
bcachefs: btree_node_locked_type_nowrite()

Small helper to improve locking assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Kill bch2_path_put_nokeep()
Kent Overstreet [Thu, 22 May 2025 19:40:24 +0000 (15:40 -0400)] 
bcachefs: Kill bch2_path_put_nokeep()

bch2_path_put_nokeep() was intended for paths we wouldn't need to
preserve for a transaction restart - it always frees them right away
when the ref hits 0.

But since paths are shared, freeing unconditionally is a bug, the path
might have been used elsewhere and have should_be_locked set, i.e. we
need to keep it locked until the end of the transaction.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_journal_write_checksum()
Kent Overstreet [Wed, 7 May 2025 01:54:35 +0000 (21:54 -0400)] 
bcachefs: bch2_journal_write_checksum()

We need to delay checksumming the journal write; we don't know the
blocksize until after we allocate the write.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Reduce stack usage in data_update_index_update()
Kent Overstreet [Thu, 22 May 2025 16:50:22 +0000 (12:50 -0400)] 
bcachefs: Reduce stack usage in data_update_index_update()

Separate tracepoint message generation and other slowpath code into
non-inline functions, and use bch2_trans_log_str() instead of using a
printbuf for our journal message.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_trans_log_str()
Kent Overstreet [Thu, 22 May 2025 16:49:56 +0000 (12:49 -0400)] 
bcachefs: bch2_trans_log_str()

The data update path doesn't need a printbuf for its log message - this
will help reduce stack usage.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Kill bkey_buf usage in data_update_index_update()
Kent Overstreet [Thu, 22 May 2025 16:34:40 +0000 (12:34 -0400)] 
bcachefs: Kill bkey_buf usage in data_update_index_update()

Reduce stack usage - bkey_buf has a 96 byte buffer on the stack, but the
btree_trans bump allocator works just fine here.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Drop empty accounting updates
Kent Overstreet [Wed, 21 May 2025 19:54:56 +0000 (15:54 -0400)] 
bcachefs: Drop empty accounting updates

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Improve trace_trans_restart_upgrade
Kent Overstreet [Wed, 21 May 2025 07:19:18 +0000 (03:19 -0400)] 
bcachefs: Improve trace_trans_restart_upgrade

- Convert to a 'fs_str' tracepoint that just emits as a string: this
  lets us build up the tracepoint with a printbuf, using our pretty
  printers, and they're much easier to manage

- Include locks_held, before and after

- Include the btree node pointer we failed on (error pointer, null, or
  real node)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: fix bch2_inum_snapshot_to_path()
Kent Overstreet [Wed, 21 May 2025 02:59:58 +0000 (22:59 -0400)] 
bcachefs: fix bch2_inum_snapshot_to_path()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: fix duplicate printk
Kent Overstreet [Wed, 21 May 2025 00:15:39 +0000 (20:15 -0400)] 
bcachefs: fix duplicate printk

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: BCH_INODE_has_case_insensitive
Kent Overstreet [Mon, 19 May 2025 14:31:44 +0000 (10:31 -0400)] 
bcachefs: BCH_INODE_has_case_insensitive

Add a flag for tracking whether a directory has case-insensitive
descendents - so that overlayfs can disallow mounting, even though the
filesystem supports case insensitivity.

This is a new on disk format version, with a (cheap) upgrade to ensure
the flag is correctly set on existing inodes.

Create, rename and fssetxattr are all plumbed to ensure the new flag is
set, and we've got new fsck code that hooks into check_inode(0.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_inode_find_by_inum_snapshot()
Kent Overstreet [Mon, 19 May 2025 14:10:19 +0000 (10:10 -0400)] 
bcachefs: bch2_inode_find_by_inum_snapshot()

Move a fsck.c helper into inode.c, eliminate some duplicate and organize
the inode lookup helpers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_inum_snapshot_to_path()
Kent Overstreet [Mon, 19 May 2025 13:48:50 +0000 (09:48 -0400)] 
bcachefs: bch2_inum_snapshot_to_path()

Add a better helper for printing out paths of inodes when we don't know
the subvolume, for fsck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_rename_trans() only runs rename-to-dir code if needed
Kent Overstreet [Mon, 19 May 2025 13:17:39 +0000 (09:17 -0400)] 
bcachefs: bch2_rename_trans() only runs rename-to-dir code if needed

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: subvol_inum_eq()
Kent Overstreet [Mon, 19 May 2025 13:12:49 +0000 (09:12 -0400)] 
bcachefs: subvol_inum_eq()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Don't set bi_casefold on non directories
Kent Overstreet [Mon, 19 May 2025 13:15:31 +0000 (09:15 -0400)] 
bcachefs: Don't set bi_casefold on non directories

bi_casefold only makes sense for directories, and since it's one of the
variable length fields setting it unnecessarily wastes space.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Remove duplicate call to bch2_trans_begin()
Alan Huang [Mon, 19 May 2025 11:51:04 +0000 (19:51 +0800)] 
bcachefs: Remove duplicate call to bch2_trans_begin()

There is one in for_each_btree_key_max().

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Call bch2_bkey_set_needs_rebalance() earlier in write path
Kent Overstreet [Fri, 16 May 2025 20:45:44 +0000 (16:45 -0400)] 
bcachefs: Call bch2_bkey_set_needs_rebalance() earlier in write path

There's no reason to be running this inside our transaction; it forces
us to copy the key we're updating to a temporary, which we'd like to
skip.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Simplify bch2_extent_atomic_end()
Kent Overstreet [Fri, 16 May 2025 21:07:06 +0000 (17:07 -0400)] 
bcachefs: Simplify bch2_extent_atomic_end()

It used to be that we had a fixed maximum number of btree paths to work
with - 64.

That's no longer the case, so bch2_extent_atomic_end() doesn't have to
be as strict.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Coalesce accounting in trans commit
Kent Overstreet [Sat, 17 May 2025 00:43:18 +0000 (20:43 -0400)] 
bcachefs: Coalesce accounting in trans commit

Accounting has gotten quite heavy, and there's lots of redundancy in
accounting updates within a transaction, as we often add/delete multiple
extents that touch the same accountign counters.

This will reduce the amount of data that we journal, and reduce pressure
downstream on the btree write buffer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Split out accounting in transaction commit
Kent Overstreet [Sat, 17 May 2025 00:26:01 +0000 (20:26 -0400)] 
bcachefs: Split out accounting in transaction commit

There can be a lot of rendundancy in accounting updates within a single
btree transaction.

Split out accounting updates so that they can be deduped, in the next
commit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: btree_trans_subbuf
Kent Overstreet [Sat, 17 May 2025 00:23:58 +0000 (20:23 -0400)] 
bcachefs: btree_trans_subbuf

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Make accounting mismatch errors more readable
Kent Overstreet [Sun, 18 May 2025 02:32:51 +0000 (22:32 -0400)] 
bcachefs: Make accounting mismatch errors more readable

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: async objs now support bch_write_ops
Kent Overstreet [Sat, 17 May 2025 23:54:39 +0000 (19:54 -0400)] 
bcachefs: async objs now support bch_write_ops

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: fix bch2_debugfs_flush_buf() when tabstops are in use
Kent Overstreet [Sat, 17 May 2025 23:53:50 +0000 (19:53 -0400)] 
bcachefs: fix bch2_debugfs_flush_buf() when tabstops are in use

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: fsck: Include loops in error messages
Kent Overstreet [Sat, 17 May 2025 19:58:23 +0000 (15:58 -0400)] 
bcachefs: fsck: Include loops in error messages

This fixes the subvol loop checking and directory loop checking to print
the loop.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_check_bucket_backpointer_mismatch()
Kent Overstreet [Fri, 9 May 2025 21:01:05 +0000 (17:01 -0400)] 
bcachefs: bch2_check_bucket_backpointer_mismatch()

Detect buckets with missing backpointers, and run repair on demand.

__bch2_move_data_phys() now calls
bch2_check_bucket_backpointer_mismatch() as it walks buckets, which
checks for missing backpointers by comparing backpointers against bucket
sector counts.

When missing backpointers are detected, we kick off
bch2_check_extents_to_backpointers() asynchronously - right away if
we're trying to evacuate, or with a threshold if we're just running
copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Improve bucket_bitmap code
Kent Overstreet [Sat, 17 May 2025 19:05:26 +0000 (15:05 -0400)] 
bcachefs: Improve bucket_bitmap code

Add some more helpers, and mismatches is now a superset of the empty
bitmap - simplifies most checks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Run recovery passes asynchronously
Kent Overstreet [Sat, 10 May 2025 03:28:01 +0000 (23:28 -0400)] 
bcachefs: Run recovery passes asynchronously

When we request a recovery pass to be run online, i.e. not during
recovery, if it's an online pass it'll now be run in the background,
instead of waiting for the next mount.

To avoid situations where recovery passes are running continuously, this
also includes ratelimiting: if the RUN_RECOVERY_PASS_ratelimit flag is
passed, the pass may be deferred until later - depending on the runtime
and last run stats in the recovery_passes superblock section.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_run_explicit_recovery_pass() cleanup
Kent Overstreet [Wed, 14 May 2025 19:54:20 +0000 (15:54 -0400)] 
bcachefs: bch2_run_explicit_recovery_pass() cleanup

Consolidate the run_explicit_recovery_pass() interfaces by adding a
flags parameter; this will also let us add a RUN_RECOVERY_PASS_ratelimit
flag.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_recovery_pass_status_to_text()
Kent Overstreet [Sat, 10 May 2025 22:23:41 +0000 (18:23 -0400)] 
bcachefs: bch2_recovery_pass_status_to_text()

Show recovery pass status in sysfs - important now that we're running
them automatically in the background.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Reduce usage of recovery.curr_pass
Kent Overstreet [Tue, 13 May 2025 21:36:55 +0000 (17:36 -0400)] 
bcachefs: Reduce usage of recovery.curr_pass

We want recovery.curr_pass to be private to the recovery passes code,
for better showing recovery pass status; also, it may rewind and is
generally not the correct member to use.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: __bch2_run_recovery_passes()
Kent Overstreet [Sat, 10 May 2025 21:45:45 +0000 (17:45 -0400)] 
bcachefs: __bch2_run_recovery_passes()

Consolidate bch2_run_recovery_passes() and
bch2_run_online_recovery_passes(), prep work for automatically
scheduling and running recovery passes in the background.

- Now takes a mask of which passes to run, automatic background repair
  will pass in sb.recovery_passes_required.

- Skips passes that are failing: a pass that failed may be reattempted
  after another pass succeeds (some passes depend on repair done by
  other passes for successful completion).

- bch2_recovery_passes_match() helper to skip alloc passes on a
  filesystem without alloc info.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: struct bch_fs_recovery
Kent Overstreet [Sat, 10 May 2025 22:21:49 +0000 (18:21 -0400)] 
bcachefs: struct bch_fs_recovery

bch_fs has gotten obnoxiously big, let's start organizing thins a bit
better.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: kill copy in bch2_disk_accounting_mod()
Kent Overstreet [Sat, 17 May 2025 03:12:09 +0000 (23:12 -0400)] 
bcachefs: kill copy in bch2_disk_accounting_mod()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Optimize bch2_trans_start_alloc_update()
Kent Overstreet [Fri, 16 May 2025 21:29:53 +0000 (17:29 -0400)] 
bcachefs: Optimize bch2_trans_start_alloc_update()

Avoid doing more updates if we already have one.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: btree key cache asserts
Kent Overstreet [Thu, 15 May 2025 11:45:52 +0000 (07:45 -0400)] 
bcachefs: btree key cache asserts

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: journal path now uses discard_opt_enabled()
Kent Overstreet [Fri, 16 May 2025 21:18:27 +0000 (17:18 -0400)] 
bcachefs: journal path now uses discard_opt_enabled()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: relock_fail tracepoint now includes btree
Kent Overstreet [Fri, 16 May 2025 21:21:00 +0000 (17:21 -0400)] 
bcachefs: relock_fail tracepoint now includes btree

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: do_rebalance_scan() now only updates bch_extent_rebalance
Kent Overstreet [Thu, 15 May 2025 14:08:06 +0000 (10:08 -0400)] 
bcachefs: do_rebalance_scan() now only updates bch_extent_rebalance

This ensures that our pending rebalance work accounting is accurate
quickly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: better error message for subvol_fs_path_parent_wrong
Kent Overstreet [Thu, 15 May 2025 13:15:24 +0000 (09:15 -0400)] 
bcachefs: better error message for subvol_fs_path_parent_wrong

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Improve bch2_repair_inode_hash_info()
Kent Overstreet [Thu, 15 May 2025 12:41:26 +0000 (08:41 -0400)] 
bcachefs: Improve bch2_repair_inode_hash_info()

Improve this so it can be used by fsck.c check_inode(); it provides a
much better error message than the check_inode() version.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: bch2_inode_find_snapshot_root()
Kent Overstreet [Thu, 15 May 2025 12:31:02 +0000 (08:31 -0400)] 
bcachefs: bch2_inode_find_snapshot_root()

Factor out a small common helper.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Early return to avoid unnecessary lock
Alan Huang [Mon, 12 May 2025 18:44:26 +0000 (02:44 +0800)] 
bcachefs: Early return to avoid unnecessary lock

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Kill BTREE_TRIGGER_bucket_invalidate
Alan Huang [Thu, 15 May 2025 14:29:50 +0000 (22:29 +0800)] 
bcachefs: Kill BTREE_TRIGGER_bucket_invalidate

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Fix opt hooks in sysfs for non sb option
Kent Overstreet [Wed, 14 May 2025 21:58:00 +0000 (17:58 -0400)] 
bcachefs: Fix opt hooks in sysfs for non sb option

We weren't checking if the option changed for non-superblock options -
this led to rebalance not waking up when enabling the
"rebalance_enabled" option.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: fix can_write_extent()
Kent Overstreet [Wed, 14 May 2025 14:44:21 +0000 (10:44 -0400)] 
bcachefs: fix can_write_extent()

Failing to check the return value of bch2_dev_rcu(): we could
(technically) race with device removal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Add tracepoint, counter for io_move_created_rebalance
Kent Overstreet [Tue, 13 May 2025 17:49:51 +0000 (13:49 -0400)] 
bcachefs: Add tracepoint, counter for io_move_created_rebalance

Internal moves shouldn't add new rebalance_work, but it's been reported
that this seems to be happening. Add a tracepoint and counter so we can
see what's going on.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: move_buckets in rhashtable when allocated
Kent Overstreet [Thu, 8 May 2025 21:19:10 +0000 (17:19 -0400)] 
bcachefs: move_buckets in rhashtable when allocated

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
4 weeks agobcachefs: Move pending buckets queue to buckets_in_flight
Kent Overstreet [Thu, 8 May 2025 21:17:17 +0000 (17:17 -0400)] 
bcachefs: Move pending buckets queue to buckets_in_flight

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>