git.ipfire.org Git - thirdparty/kernel/linux.git/log

]> git.ipfire.org Git - thirdparty/kernel/linux.git/log

projects / thirdparty / kernel / linux.git / log

Kent Overstreet [Mon, 22 Aug 2022 19:29:53 +0000 (15:29 -0400)]

bcachefs: Delete old deadlock avoidance code

This deletes our old lock ordering based deadlock avoidance code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Tue, 23 Aug 2022 03:12:11 +0000 (23:12 -0400)]

bcachefs: Print deadlock cycle in debugfs

In the event that we're not finished debugging the cycle detector, this
adds a new file to debugfs that shows what the cycle detector finds, if
anything. By comparing this with btree_transactions, which shows held
locks for every btree_transaction, we'll be able to determine if it's
the cycle detector that's buggy or something else.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 22 Aug 2022 17:23:47 +0000 (13:23 -0400)]

bcachefs: Deadlock cycle detector

We've outgrown our own deadlock avoidance strategy.

The btree iterator API provides an interface where the user doesn't need
to concern themselves with lock ordering - different btree iterators can
be traversed in any order. Without special care, this will lead to
deadlocks.

Our previous strategy was to define a lock ordering internally, and
whenever we attempt to take a lock and trylock() fails, we'd check if
the current btree transaction is holding any locks that cause a lock
ordering violation. If so, we'd issue a transaction restart, and then
bch2_trans_begin() would re-traverse all previously used iterators, but
in the correct order.

That approach had some issues, though.
- Sometimes we'd issue transaction restarts unnecessarily, when no
   deadlock would have actually occured. Lock ordering restarts have
   become our primary cause of transaction restarts, on some workloads
   totally 20% of actual transaction commits.

- To avoid deadlock or livelock, we'd often have to take intent locks
   when we only wanted a read lock: with the lock ordering approach, it
   is actually illegal to hold _any_ read lock while blocking on an intent
   lock, and this has been causing us unnecessary lock contention.

- It was getting fragile - the various lock ordering rules are not
   trivial, and we'd been seeing occasional livelock issues related to
   this machinery.

So, since bcachefs is already a relational database masquerading as a
filesystem, we're stealing the next traditional database technique and
switching to a cycle detector for avoiding deadlocks.

When we block taking a btree lock, after adding ourself to the waitlist
but before sleeping, we do a DFS of btree transactions waiting on other
btree transactions, starting with the current transaction and walking
our held locks, and transactions blocking on our held locks.

If we find a cycle, we emit a transaction restart. Occasionally (e.g.
the btree split path) we can not allow the lock() operation to fail, so
if necessary we'll tell another transaction that it has to fail.

Result: trans_restart_would_deadlock events are reduced by a factor of
10 to 100, and we'll be able to delete a whole bunch of grotty, fragile
code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Fri, 5 Aug 2022 17:06:44 +0000 (13:06 -0400)]

bcachefs: Fix bch2_btree_node_upgrade()

Previously, if we were trying to upgrade from a read to an intent lock
but we held an additional read lock via another btree_path,
bch2_btree_node_upgrade() would always fail, in six_lock_tryupgrade().

This patch factors out the code that __bch2_btree_node_lock_write() uses
to temporarily drop extra read locks, so that six_lock_tryupgrade() can
succeed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Mon, 19 Sep 2022 18:14:01 +0000 (14:14 -0400)]

bcachefs: Add a debug assert

Chasing down a strange locking bug.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 26 Aug 2022 23:22:24 +0000 (19:22 -0400)]

six locks: Wakeup now takes lock on behalf of waiter

This brings back an important optimization, to avoid touching the wait
lists an extra time, while preserving the property that a thread is on a
lock waitlist iff it is waiting - it is never removed from the waitlist
until it has the lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 15 Oct 2022 04:34:38 +0000 (00:34 -0400)]

six locks: Fix a lost wakeup

There was a lost wakeup between a read unlock in percpu mode and a write
lock. The unlock path unlocks, then executes a barrier, then checks for
waiters; correspondingly, the lock side should set the wait bit and
execute a barrier, then attempt to take the lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 24 Sep 2022 04:13:56 +0000 (00:13 -0400)]

six locks: Enable lockdep

Now that we have lockdep_set_no_check_recursion(), we can enable lockdep
checking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 24 Sep 2022 05:33:13 +0000 (01:33 -0400)]

six locks: Add start_time to six_lock_waiter

This is needed by the cycle detector in bcachefs - we need a way to
iterater over waitlist entries while dropping and retaking the waitlist
lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 27 Aug 2022 20:22:51 +0000 (16:22 -0400)]

six locks: six_lock_waiter()

This allows passing in the wait list entry - to be used for a deadlock
cycle detector.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 25 Aug 2022 14:49:52 +0000 (10:49 -0400)]

six locks: Simplify wait lists

This switches to a single list of waiters, instead of separate lists for
read and intent, and switches write locks to also use the wait lists
instead of being handled differently.

Also, removal from the wait list is now done by the process waiting on
the lock, not the process doing the wakeup. This is needed for the new
deadlock cycle detector - we need tasks to stay on the waitlist until
they've successfully acquired the lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 18 Sep 2022 21:10:33 +0000 (17:10 -0400)]

bcachefs: Add private error codes for ENOSPC

Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 18 Sep 2022 19:43:50 +0000 (15:43 -0400)]

bcachefs: Errcodes can now subtype standard error codes

The next patch is going to be adding private error codes for all the
places we return -ENOSPC.

Additionally, this patch updates return paths at all module boundaries
to call bch2_err_class(), to return the standard error code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 18 Sep 2022 17:37:34 +0000 (13:37 -0400)]

bcachefs: Make an assertion more informative

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 16 Sep 2022 18:42:38 +0000 (14:42 -0400)]

bcachefs: All held locks must be in a btree path

With the new deadlock cycle detector, it's critical that all held locks
be marked in a btree_path, because that's what the cycle detector
traverses - any locks that aren't correctly marked will cause deadlocks.

This changes the btree_path to allocate some btree_paths for the new
nodes, since until the final update is done we otherwise don't have a
path referencing them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 17 Sep 2022 18:36:24 +0000 (14:36 -0400)]

bcachefs: bch2_btree_path_upgrade() now emits transaction restart

Centralizing the transaction restart/tracepoint in
bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits
old and new locks_want.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 17 Sep 2022 19:20:13 +0000 (15:20 -0400)]

bcachefs: Add a manual trigger for lock wakeups

Spotted a lockup once that appeared to be a lost wakeup. Adding a manual
trigger for lock wakeups will make it easy to tell if that's what it is
next time it occurs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 16 Sep 2022 22:39:01 +0000 (18:39 -0400)]

bcachefs: Fix sb_field_counters formatting

We have counters with longer names now, so adjust the tabstop - also,
make sure there's always a space printed between the name and the
number.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 4 Sep 2022 18:10:12 +0000 (14:10 -0400)]

bcachefs: Re-enable hash_redo_key()

When subvolumes & snapshots were rolled out, hash_redo_key() was
disabled due to some new complications - namely, bch2_hash_set() works
at the subvolume level, and fsck does not run in a defined subvolume,
instead working at the snapshot ID level.

This patch splits out bch2_hash_set_snapshot() from bch2_hash_set(), and
makes one small tweak for fsck:

- Normally, bch2_hash_set() (and other dirent code) needs to know what
   subvolume we're in, because dirents that point to other subvolumes
   should only be visible in the subvolume they were created in, not
   other snapshots. We can't check that in fsck, so we just assume that
   all dirents are visible.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 12 Sep 2022 06:22:47 +0000 (02:22 -0400)]

bcachefs: Kill journal_keys->journal_seq_base

This removes an optimization that didn't actually save us any memory,
due to alignment, but did make the code more complicated than it needed
to be. We were also seeing a bug where journal_seq_base wasn't getting
correctly initailized, so hopefully it'll fix that too.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 4 Sep 2022 05:28:51 +0000 (01:28 -0400)]

bcachefs: Fix redundant transaction restart

Little bit of tidying up, this makes the counters a little bit clearer
as to what's happening.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 4 Sep 2022 02:24:16 +0000 (22:24 -0400)]

bcachefs: Ensure intent locks are marked before taking write locks

Locks must be correctly marked for the cycle detector to work.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 3 Sep 2022 02:59:39 +0000 (22:59 -0400)]

bcachefs: Avoid using btree_node_lock_nopath()

With the upcoming cycle detector, we have to be careful about using
btree_node_lock_nopath - in particular, using it to take write locks can
cause deadlocks.

All held locks need to be tracked in a btree_path, so that the cycle
detector knows about them - unless we know that we cannot cause
deadlocks for other reasons: e.g. we are only taking read locks, or
we're in very early fsck (topology repair).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 4 Sep 2022 02:07:31 +0000 (22:07 -0400)]

bcachefs: Fix usage of six lock's percpu mode, key cache version

Similar to "bcachefs: Fix usage of six lock's percpu mode", six locks
have a percpu mode, but we can't switch between percpu and non percpu
modes while a lock is in use: threads attempting to take a read lock may
race, and we'll end up with the read count permanently off.

Fixing this the "correct" way, in six_lock_pcpu_(alloc|free) would
require an RCU barrier, and we don't want to do that - instead, we have
to permanently segragate percpu and non percpu objects, including when
on freelists.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 4 Sep 2022 01:14:53 +0000 (21:14 -0400)]

bcachefs: Refactor bkey_cached_alloc() path

Clean up the arguments passed and make them more consistent.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 4 Sep 2022 01:09:54 +0000 (21:09 -0400)]

bcachefs: Convert more locking code to btree_bkey_cached_common

Ideally, all the code in btree_locking.c should be converted, but then
we'd want to convert btree_path to point to btree_key_cached_common too,
and then we'd be in for a much bigger cleanup - but a bit of incremental
cleanup will still be helpful for the next patches.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Wed, 31 Aug 2022 22:53:42 +0000 (18:53 -0400)]

bcachefs: btree_bkey_cached_common->cached

Add a type descriptor to btree_bkey_cached_common - there's no reason
not to since we've got padding that was otherwise unused, and this is a
nice cleanup (and helpful in later patches).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 2 Sep 2022 02:05:16 +0000 (22:05 -0400)]

bcachefs: Fix six_lock_readers_add()

Have to be careful with bit fields - when subtracting, this was
overflowing into the write_locking bit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Tue, 23 Aug 2022 03:39:23 +0000 (23:39 -0400)]

bcachefs: bch2_btree_node_lock_write_nofail()

Taking a write lock will be able to fail, with the new cycle detector -
unless we pass it nofail, which is possible but not preferred.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 21 Aug 2022 18:29:43 +0000 (14:29 -0400)]

bcachefs: New locking functions

In the future, with the new deadlock cycle detector, we won't be using
bare six_lock_* anymore: lock wait entries will all be embedded in
btree_trans, and we will need a btree_trans context whenever locking a
btree node.

This patch plumbs a btree_trans to the few places that need it, and adds
two new locking functions
- btree_node_lock_nopath, which may fail returning a transaction
restart, and
- btree_node_lock_nopath_nofail, to be used in places where we know we
cannot deadlock (i.e. because we're holding no other locks).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Fri, 26 Aug 2022 18:55:00 +0000 (14:55 -0400)]

bcachefs: Mark write locks before taking lock

six locks are unfair: while a thread is blocked trying to take a write
lock, new read locks will fail. The new deadlock cycle detector makes
use of our existing lock tracing, so we need to tell it we're holding a
write lock before we take the lock for it to work correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 27 Aug 2022 21:47:27 +0000 (17:47 -0400)]

bcachefs: Delete time_stats for lock contended times

Since we've now got time_stats for lock hold times (per btree
transaction), we don't need this anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Tue, 30 Aug 2022 15:40:03 +0000 (11:40 -0400)]

bcachefs: Don't leak lock pcpu counts memory

This fixes a small memory leak.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 27 Aug 2022 19:00:59 +0000 (15:00 -0400)]

six locks: Delete six_lock_pcpu_free_rcu()

Didn't have any users, and wasn't a good idea to begin with - delete it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 27 Aug 2022 16:48:36 +0000 (12:48 -0400)]

bcachefs: Add persistent counters for all tracepoints

Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 27 Aug 2022 16:37:05 +0000 (12:37 -0400)]

bcachefs: Fix bch2_btree_update_start() to return -BCH_ERR_journal_reclaim_would_deadlock

On failure to get a journal pre-reservation because we're called from
journal reclaim we're not supposed to return a transaction restart error
- this fixes a livelock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 27 Aug 2022 16:28:09 +0000 (12:28 -0400)]

bcachefs: Improve bch2_btree_node_relock()

This moves the IS_ERR_OR_NULL() check to the inline part, since that's a
fast path event.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 27 Aug 2022 16:23:38 +0000 (12:23 -0400)]

bcachefs: Improve trans_restart_journal_preres_get tracepoint

It now includes journal_flags.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 27 Aug 2022 16:11:18 +0000 (12:11 -0400)]

bcachefs: Improve btree_node_relock_fail tracepoint

It now prints the error name when the btree node is an error pointer;
also, don't trace failures when the the btree node is
BCH_ERR_no_btree_node_up.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 27 Aug 2022 14:30:36 +0000 (10:30 -0400)]

bcachefs: Make more btree_paths available

- Don't decrease BTREE_ITER_MAX when building with CONFIG_LOCKDEP
   anymore. The lockdep table sizes are configurable now, we don't need
   this anymore.
- btree_trans_too_many_iters() is less conservative now. Previously it
   was causing a transaction restart if we had used more than
   BTREE_ITER_MAX / 2 paths, change this to BTREE_ITER_MAX - 8.

This helps with excessive transaction restarts/livelocks in the bucket
allocator path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 26 Aug 2022 01:42:46 +0000 (21:42 -0400)]

bcachefs: Correctly initialize bkey_cached->lock

We need to use the right class for some assertions to work correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Tue, 23 Aug 2022 01:05:31 +0000 (21:05 -0400)]

bcachefs: Track held write locks

The upcoming lock cycle detection code will need to know precisely which
locks every btree_trans is holding, including write locks - this patch
updates btree_node_locked_type to include write locks.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Tue, 23 Aug 2022 05:20:24 +0000 (01:20 -0400)]

bcachefs: Print lock counts in debugs btree_transactions

Improve our debugfs output, to help in debugging deadlocks: this shows,
for every btree node we print, the current number of readers/intent
locks/write locks held.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 22 Aug 2022 17:21:10 +0000 (13:21 -0400)]

bcachefs: Switch btree locking code to struct btree_bkey_cached_common

This is just some type safety cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Tue, 23 Aug 2022 01:49:55 +0000 (21:49 -0400)]

bcachefs: Track maximum transaction memory

This patch
- tracks maximum bch2_trans_kmalloc() memory used in btree_transaction_stats
- makes it available in debugfs
- switches bch2_trans_init() to using that for the amount of memory to
preallocate, instead of the parameter passed in

This drastically reduces transaction restarts, and means we no longer
need to track this in the source code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 22 Aug 2022 03:08:53 +0000 (23:08 -0400)]

six locks: Improve six_lock_count

six_lock_count now counts up whether a write lock held, and this patch
now also correctly counts six_lock->intent_lock_recurse.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Sun, 21 Aug 2022 21:20:42 +0000 (17:20 -0400)]

bcachefs: Kill nodes_intent_locked

Previously, we used two different bit arrays for tracking held btree
node locks. This patch switches to an array of two bit integers, which
will let us track, in a future patch, when we hold a write lock.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Sun, 21 Aug 2022 22:17:51 +0000 (18:17 -0400)]

bcachefs: Better use of locking helpers

Held btree locks are tracked in btree_path->nodes_locked and
btree_path->nodes_intent_locked. Upcoming patches are going to change
the representation in struct btree_path, so this patch switches to
proper helpers instead of direct access to these fields.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Fri, 19 Aug 2022 23:50:18 +0000 (19:50 -0400)]

bcachefs: Reorganize btree_locking.[ch]

Tidy things up a bit before doing more work in this file.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Fri, 19 Aug 2022 19:35:34 +0000 (15:35 -0400)]

bcachefs: btree_locking.c

Start to centralize some of the locking code in a new file; more locking
code will be moving here in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Thu, 18 Aug 2022 21:57:24 +0000 (17:57 -0400)]

bcachefs: Fix adding a device with a label

Device labels are represented as pointers in the member info section: we
need to get and then set the label for it to be kept correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Thu, 18 Aug 2022 21:00:12 +0000 (17:00 -0400)]

bcachefs: fsck: Another transaction restart handling fix

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 18 Aug 2022 17:00:26 +0000 (13:00 -0400)]

bcachefs: bch2_btree_delete_range_trans() now returns -BCH_ERR_transaction_restart_nested

The new convention is that functions that handle transaction restarts
within an existing transaction context should return
-BCH_ERR_transaction_restart_nested when they did so, since they
invalidated the outer transaction context.

This also means bch2_btree_delete_range_trans() is changed to only call
bch2_trans_begin() after a transaction restart, not on every loop
iteration.

This is to fix a bug in fsck, in check_inode() when we truncate an inode
with BCH_INODE_I_SIZE_DIRTY set.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Thu, 18 Aug 2022 02:17:08 +0000 (22:17 -0400)]

bcachefs: Minor transaction restart handling fix

- fsck_inode_rm() wasn't returning BCH_ERR_transaction_restart_nested
- change bch2_trans_verify_not_restarted() to call panic() - we don't
want these errors to be missed

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Wed, 17 Aug 2022 21:49:12 +0000 (17:49 -0400)]

bcachefs: Fix bch2_btree_iter_peek_slot() error path

iter->k needs to be consistent with iter->pos - required for
bch2_btree_iter_(rewind|advance) to work correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Tue, 16 Aug 2022 07:08:15 +0000 (03:08 -0400)]

bcachefs: Another should_be_locked fixup

When returning a key from the key cache, in BTREE_ITER_WITH_KEY_CACHE
mode, we don't want to set should_be_locked on iter->path; we're not
returning a key from that path, so we donn't need to, and also since we
traversed the key cache iterator before setting should_be_locked on that
path it might be unlocked (if we unlocked, bch2_trans_relock() won't
have relocked it).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Sun, 14 Aug 2022 18:44:17 +0000 (14:44 -0400)]

bcachefs: bch2_bkey_packed_to_binary_text()

For debugging the eytzinger search tree code, and low level bkey packing
code, it can be helpful to see things in binary: this patch improves our
helpers for doing so.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Thu, 7 Jul 2022 04:37:46 +0000 (00:37 -0400)]

bcachefs: Add assertions for unexpected transaction restarts

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Mon, 15 Aug 2022 22:55:20 +0000 (18:55 -0400)]

bcachefs: btree_path_down() optimization

We should be calling btree_node_mem_ptr_set() before path_level_init(),
since we already touched the key that btree_node_mem_ptr_set() will
modify and path_level_init() will be doing the lookup in the child btree
node we're recursing to.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Wed, 17 Aug 2022 18:20:48 +0000 (14:20 -0400)]

bcachefs: Always rebuild aux search trees when node boundaries change

Topology repair may change btree node min/max keys: when it does so, we
need to always rebuild eytzinger search trees because nodes directly
depend on those values.

This fixes a bug found by the 'kill_btree_node' test, where we'd pop an
assertion in bch2_bset_search_linear().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Mon, 15 Aug 2022 18:05:44 +0000 (14:05 -0400)]

bcachefs: Add an overflow check in set_bkey_val_u64s()

For now this is just a BUG_ON() - we may want to change this to return
an error in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Olexa Bilaniuk [Mon, 15 Aug 2022 18:20:22 +0000 (14:20 -0400)]

bcachefs: remove dead whiteout_u64s argument.

Signed-off-by: Olexa Bilaniuk <obilaniu@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 14 Aug 2022 20:11:35 +0000 (16:11 -0400)]

bcachefs: Debugfs cleanup

This improves flush_buf() so that it always returns nonzero when we're
done reading and ready to return to userspace, and so that it returns
the value we want to return to userspace (number of bytes read, if there
wasn't an error).

In the future we'll be better abstracting this mechanism and pulling it
out of bcachefs, and using it to replace seq_file.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Mon, 15 Aug 2022 18:01:56 +0000 (14:01 -0400)]

bcachefs: Fix bch2_fs_check_snapshots()

We were iterating starting at BCACHEFS_ROOT_INO, but snapshots start at
POS_MIN - meaning this code was never getting run.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Reported-by: Olexa Bilaniuk <obilaniu@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Fri, 12 Aug 2022 16:45:01 +0000 (12:45 -0400)]

bcachefs: Increment restart count in bch2_trans_begin()

Instead of counting transaction restarts, count when the transaction is
restarted: if bch2_trans_begin() was called when the transaction wasn't
restarted we need to ensure restart_count is still incremented.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Fri, 12 Aug 2022 01:06:43 +0000 (21:06 -0400)]

bcachefs: Fix assertion in bch2_btree_key_cache_drop()

Turns out this assertion was something we could legitimately hit - add a
comment describing what's going on, and handle it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Fri, 12 Aug 2022 01:06:02 +0000 (21:06 -0400)]

bcachefs: Print last line in debugfs/btree_transaction_stats

We need to turn the flush_buf() thing into a proper API, to replace
seq_file.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Fri, 12 Aug 2022 00:14:54 +0000 (20:14 -0400)]

bcachefs: Track the maximum btree_paths ever allocated by each transaction

We need a way to check if the machinery for handling btree_paths with in
a transaction is behaving reasonably, as it often has not been - we've
had bugs with transaction path overflows caused by duplicate paths and
plenty of other things.

This patch tracks, per transaction fn, the most btree paths ever
allocated by that transaction and makes it available in debugfs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 11 Aug 2022 23:36:24 +0000 (19:36 -0400)]

bcachefs: Rename lock_held_stats -> btree_transaction_stats

Going to be adding more things to this in the next patch.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 11 Aug 2022 21:25:25 +0000 (17:25 -0400)]

bcachefs: Switch bch2_btree_delete_range() to bch2_trans_run()

This fixes an assertion about unexpected transaction restarts -
bch2_delete_range_trans() handles transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Thu, 11 Aug 2022 17:23:04 +0000 (13:23 -0400)]

bcachefs: Fix btree_path->uptodate inconsistency

This fixes an assertion in bch2_btree_path_peek_slot().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Thu, 11 Aug 2022 00:05:14 +0000 (20:05 -0400)]

bcachefs: Fix duplicate paths left by bch2_path_put()

bch2_path_put() is supposed to drop paths that aren't needed on
transaction restart, or to hold locks that we're supposed to keep until
transaction commit: but it was failing to free paths in some cases that
it should have, leading to transaction path overflows with lots of
duplicate paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Thu, 11 Aug 2022 16:23:21 +0000 (12:23 -0400)]

bcachefs: Kill BTREE_ITER_CACHED_(NOFILL|NOCREATE)

These were used more prior to getting rid of the in-memory bucket arrays
- they don't serve much purpose anymore, and deleting them lets us write
better assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Wed, 10 Aug 2022 16:42:55 +0000 (12:42 -0400)]

bcachefs: Tracepoint improvements

Our types are exported to the tracepoint code, so it's not necessary to
break things out individually when passing them to tracepoints - we can
also call other functions from TP_fast_assign().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Thu, 11 Aug 2022 00:22:01 +0000 (20:22 -0400)]

bcachefs: "Snapshot deletion did not run correctly" should be a fsck err

This was noticed when a test hit this error and didn't fail, because
fsck wasn't returning that it fixed errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Wed, 10 Aug 2022 16:34:18 +0000 (12:34 -0400)]

bcachefs: six_lock_counts() is now in six.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Wed, 10 Aug 2022 23:08:30 +0000 (19:08 -0400)]

bcachefs: BTREE_ITER_NO_NODE -> BCH_ERR codes

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Wed, 10 Aug 2022 22:55:53 +0000 (18:55 -0400)]

bcachefs: Don't set should_be_locked on paths that aren't locked

It doesn't make any sense to set should_be_locked on btree_paths that
aren't locked, and is often a bug - this patch adds assertions and fixes
some of those bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Tue, 9 Aug 2022 17:47:03 +0000 (13:47 -0400)]

bcachefs: Fix missing error handling in bch2_subvolume_delete()

This fixes an assertion when the transaction has been unexpectedly
restarted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Sun, 7 Aug 2022 03:02:09 +0000 (23:02 -0400)]

bcachefs: Improve an error message

Update an error message to use bch2_err_str().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Sun, 7 Aug 2022 17:43:32 +0000 (13:43 -0400)]

bcachefs: Tracepoint improvements

- use strlcpy(), not strncpy()
- add tracepoints for btree_path alloc and free
- give the tracepoint for key cache upgrade fail a proper name
- add a tracepoint for btree_node_upgrade_fail

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Fri, 5 Aug 2022 21:08:35 +0000 (17:08 -0400)]

bcachefs: Fix incorrectly freeing btree_path in alloc path

Clearing path->preserve means the path will be dropping in
bch2_trans_begin() - but on transaction restart, we're likely to need
that path again.

This fixes a livelock in the allocation path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Fri, 5 Aug 2022 15:36:13 +0000 (11:36 -0400)]

bcachefs: Fix bch2_btree_trans_to_text()

bch2_btree_trans_to_text() is used to print btree_transactions owned by
other threads; thus, it needs to be particularly careful. This fixes a
null ptr deref caused by racing with the owning thread changing
path->l[].b.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Thu, 4 Aug 2022 16:46:37 +0000 (12:46 -0400)]

bcachefs: Add distinct error code for key_cache_upgrade

This aids in debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Tue, 26 Jul 2022 04:50:25 +0000 (00:50 -0400)]

bcachefs: Fix not punting to worqueue when promoting

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Fri, 22 Jul 2022 10:57:05 +0000 (06:57 -0400)]

bcachefs: fsck: Fix nested transaction handling

This uses the new trans->restart count to make sure we always correctly
return -BCH_ERR_transaction_restart_nested when we restart a nested
transaction - eliminating some other hacks and preparing for new
assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Thu, 21 Jul 2022 19:41:29 +0000 (15:41 -0400)]

bcachefs: Add an O_DIRECT option (for userspace)

Sometimes we see IO errors due to O_DIRECT alignment issues - having an
option to use buffered IO will be helpful.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Thu, 21 Jul 2022 13:53:28 +0000 (09:53 -0400)]

bcachefs: Tighten up btree_path assertions

Currently seeing a very rare and difficult to explain btree_path
inconsistency - this patch adds assertions to the only place that seems
to be missing them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Sun, 17 Jul 2022 06:46:46 +0000 (02:46 -0400)]

bcachefs: bch2_bucket_alloc_trans_early -> for_each_btree_key_norestart

Nested btree transactions require special care, and an upcoming patch is
going to add assertions to that effect. We don't want to be using them
unnecessarily, so this patch switches bch2_bucket_trans_early() to not
handle transaction restarts.

This patch also adds a cursor so that on transaction restart we can
continue scanning from where the previous search for an empty bucket
left off.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Wed, 20 Jul 2022 21:35:57 +0000 (17:35 -0400)]

bcachefs: Fix check_i_sectors()

bch2_count_inode_sectors() uses for_each_btree_key() internally, which
handles lock restarts - the lockrestart_do() in check_i_sectors() is
redundant, and buggy here since the count that
bch2_count_inode_sectors() returns was interpreted as an error by
lockrestart_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Wed, 20 Jul 2022 20:50:26 +0000 (16:50 -0400)]

bcachefs: Convert debugfs code to for_each_btree_key2()

This fixes a bug where we were leaking a transaction restart error to
userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Wed, 20 Jul 2022 20:25:00 +0000 (16:25 -0400)]

bcachefs: Unit test updates

- Convert to for_each_btree_key2(), for_each_btree_key_commit(),
for_each_btree_key_reverse()
- No more bare bch2_btree_iter_peek(); we're now fault-injection lock
restarts, so we always need a lockrestart_do() or equivalent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Wed, 20 Jul 2022 20:13:27 +0000 (16:13 -0400)]

bcachefs: for_each_btree_key_reverse()

This adds a new macro, like for_each_btree_key2(), but for iterating in
reverse order.

Also, change for_each_btree_key2() to properly check the return value of
bch2_btree_iter_advance().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Tue, 19 Jul 2022 21:20:18 +0000 (17:20 -0400)]

bcachefs: Convert fsck errors to errcode.h

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Mon, 18 Jul 2022 00:22:30 +0000 (20:22 -0400)]

bcachefs: Inject transaction restarts in debug mode

In CONFIG_BCACHEFS_DEBUG mode, we'll now randomly issue transaction
restarts - with a decaying probability based on the number of restarts
we've already had, to ensure that transactions eventually make forward
progress. This should help shake out some bugs.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 18 Jul 2022 03:06:38 +0000 (23:06 -0400)]

bcachefs: EINTR -> BCH_ERR_transaction_restart

Now that we have error codes, with subtypes, we can switch to our own
error code for transaction restarts - and even better, a distinct error
code for each transaction restart reason: clearer code and better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Tue, 5 Jul 2022 21:27:44 +0000 (17:27 -0400)]

bcachefs: btree_trans_too_many_iters() is now a transaction restart

All transaction restarts need a tracepoint - this is essential for
debugging

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Tue, 19 Jul 2022 18:51:52 +0000 (14:51 -0400)]

bcachefs: Prevent a btree iter overflow in alloc path

In bch2_bucket_alloc_trans(), we're iterating over buckets - but not
directly with an iterator, since we're iterating over the freespace
btree.

This means that we need to clear iter->path->preserve, otherwise we'll
end up retaining a btree_path for every alloc key we touched - which is
not what we want here.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Mon, 18 Jul 2022 23:42:58 +0000 (19:42 -0400)]

bcachefs: Use bch2_err_str() in error messages

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

commit | commitdiff | tree

Kent Overstreet [Mon, 18 Jul 2022 02:31:21 +0000 (22:31 -0400)]

bcachefs: Improved errcodes

Instead of overloading standard error codes (EINTR/EAGAIN), and defining
short lists of error codes in multiple places that potentially end up
overlapping & conflicting, we're now going to have one master list of
error codes.

Error codes are defined with an x-macro: thus we also have
bch2_err_str() now.

Also, error codes have a class field. Now, instead of checking for
errors with ==, code should use bch2_err_matches(), which returns true
if the error is equal to or a sub-error of the error class.

This means we can define unique errors for every source location where
an error is generated, which will help improve our error messages.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

A mirror of Linus' kernel repository

RSS Atom