git.ipfire.org Git - thirdparty/kernel/linux.git/log

]> git.ipfire.org Git - thirdparty/kernel/linux.git/log

projects / thirdparty / kernel / linux.git / log

Kent Overstreet [Wed, 26 Feb 2025 03:14:06 +0000 (22:14 -0500)]

bcachefs: Make sure c->vfs_sb is set before starting fs

This is necessary for the new blk_holder_ops, which want the vfs
super_block available for synchronization.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Tue, 25 Feb 2025 23:58:46 +0000 (18:58 -0500)]

bcachefs: Stash a pointer to the filesystem for blk_holder_ops

Note that we open block devices before we allocate bch_fs, but once
attached to a filesystem they will be closed before the bch_fs is torn
down - so stashing a pointer without a refcount looks incorrect but it's
not.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 28 Feb 2025 19:38:47 +0000 (14:38 -0500)]

bcachefs: Finish bch2_account_io_completion() conversions

More prep work for automatically kicking devices out after too many IO
errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 28 Feb 2025 19:07:22 +0000 (14:07 -0500)]

bcachefs: bch2_account_io_completion()

We need to start accounting successes for every IO, not just failures,
so introduce a unified hook for io completion accounting and convert
io_read.c.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 28 Feb 2025 18:59:15 +0000 (13:59 -0500)]

bcachefs: Fix read path io_ref handling

We were using our device pointer after we'd released our ref to it.

Unlikely to be a race that's practical to hit, since actually removing a
member device is a whole process besides just taking it offline, but -
needs to be fixed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 28 Feb 2025 16:37:36 +0000 (11:37 -0500)]

bcachefs: data_update now checks for extents that can't be moved

If a device is ro or failed, we might not have anywhere to move a
replica.

Check for this early, before doing the read and attempting to write.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 1 Mar 2025 20:46:59 +0000 (15:46 -0500)]

bcachefs: give bch2_write_super() a proper error code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Tue, 25 Feb 2025 01:29:58 +0000 (20:29 -0500)]

bcachefs: bcachefs_metadata_version_extent_flags

This implements a new extent field bitflags that apply to the whole
extent. There's been a couple things we've wanted this for in the past,
but the immediate need is extent poisoning, to solve a rebalance issue.

Unknown extent fields can't be parsed (we won't known their size, so we
can't advance to the next field), so this is an incompat feature, and
using it prevents the filesystem from being mounted by old versions.

This also adds the BCH_EXTENT_poisoned flag; this indicates that the
data is known to be bad (i.e. there was a checksum error, and we had to
write a new checksum) and reads will return errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 28 Feb 2025 23:59:58 +0000 (18:59 -0500)]

bcachefs: bch2_request_incompat_feature() now returns error code

For future usage, we'll want a dedicated error code for better
debugging.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Thorsten Blum [Mon, 10 Mar 2025 19:20:29 +0000 (20:20 +0100)]

bcachefs: Fix error type in bch2_alloc_v3_validate()

Use error type alloc_v3_unpack_error in bch2_alloc_v3_validate().

Fixes: b65db750e2bb ("bcachefs: Enumerate fsck errors")
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 10 Mar 2025 18:20:58 +0000 (14:20 -0400)]

bcachefs: BCH_SB_FEATURES_ALL includes BCH_FEATURE_incompat_verison_field

These features are set on format and incompat upgarde.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Bagas Sanjaya [Mon, 24 Feb 2025 12:40:28 +0000 (19:40 +0700)]

Documentation: bcachefs: SubmittingPatches: Convert footnotes to reST syntax

Footnotes list are outputted in htmldocs simply as long-running
paragraph instead. Use reST numbered footnotes syntax for the job.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Bagas Sanjaya [Mon, 24 Feb 2025 12:40:27 +0000 (19:40 +0700)]

Documentation: bcachefs: SubmittingPatches: Demote section headings

SubmttingPatches.rst has 4 section headings, all under the same heading
levels. In absence of title headings, these section headings are all
ended up as title headings in the docs output, which also affect
the index toctree (increasing titles to 6 from the original 2)
due to :numbered: option.

Demote second-to-last section headings, making "Submitting patches
to bcachefs" as title heading.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Bagas Sanjaya [Mon, 24 Feb 2025 12:40:26 +0000 (19:40 +0700)]

Documentation: bcachefs: Split index toctree

bcachefs subsystem currently has 4 docs: two are development notes and
the rest are actual filesystem docs. These two groups are clearly
distinct and can be organized.

Split the toctree into two, one for each docs group. While at it, also
reduce :maxdepth: so that only title headings are listed in the
toctrees.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Bagas Sanjaya [Sat, 22 Feb 2025 09:18:53 +0000 (16:18 +0700)]

Documentation: bcachefs: Add casefolding toctree entry

Sphinx reports htmldocs toctree warning:

Documentation/filesystems/bcachefs/casefolding.rst: WARNING: document isn't included in any toctree

Fix the warning by adding casefolding documentation entry to bcachefs
toctree.

Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250221161728.32739f85@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Bagas Sanjaya [Sat, 22 Feb 2025 09:18:52 +0000 (16:18 +0700)]

Documentation: bcachefs: casefolding: Use bullet list for dirent structure

The doc lists dirent structure for both regular and casefolded names,
yet it is written (and rendered) as long paragraph instead.

Write the structure list as bullet list.

Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Bagas Sanjaya [Sat, 22 Feb 2025 09:18:51 +0000 (16:18 +0700)]

Documentation: bcachefs: casefolding: Fix dentry/dcache considerations section

Sphinx reports htmldocs warnings on dentry/dcache section:

Documentation/filesystems/bcachefs/casefolding.rst:75: WARNING: Title underline too short.

dentry/dcache considerations
--------- [docutils]
Documentation/filesystems/bcachefs/casefolding.rst:84: WARNING: Definition list ends without a blank line; unexpected unindent. [docutils]

Fix the section by:

* Extending the section underline to match the section title length;
* Separating problem list from surrounding paragraphs.

Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250221161911.2d16138b@canb.auug.org.au/
Closes: https://lore.kernel.org/linux-next/20250221162135.79be0147@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Bagas Sanjaya [Sat, 22 Feb 2025 09:18:50 +0000 (16:18 +0700)]

Documentation: bcachefs: casefolding: Do not italicize NUL

Sphinx reports htmldocs warning:

Documentation/filesystems/bcachefs/casefolding.rst:36: WARNING: Inline interpreted text or phrase reference start-string without end-string. [docutils]

That's because NUL word is italicized but it is written in plural form
instead (`NUL`s). Sphinx, however, doesn't tip over when the italicized
word in this fashion is followed by punctuation instead.

Do not italicize the word to keep Sphinx happy.

Fixes: bc5cc09246c5 ("bcachefs: bcachefs_metadata_version_casefolding")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250221162135.79be0147@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 13 Feb 2025 17:46:15 +0000 (12:46 -0500)]

bcachefs: sysfs internal/trigger_btree_updates

Add a debug knob to manually trigger the btree updates worker.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Joshua Ashton [Sun, 13 Aug 2023 17:34:17 +0000 (18:34 +0100)]

bcachefs: bcachefs_metadata_version_casefolding

This patch implements support for case-insensitive file name lookups
in bcachefs.

The implementation uses the same UTF-8 lowering and normalization that
ext4 and f2fs is using.

More information is provided in Documentation/bcachefs/casefolding.rst

Compatibility notes:

This uses the new versioning scheme for incompatible features where an
incompatible feature is tied to a version number: the superblock says
"we may use incompat features up to x" and "incompat features up to x
are in use", disallowing mounting by previous versions.

Additionally, and old style incompat feature bit is used, so that
kernels without utf8 casefolding support know if casefolding
specifically is in use and they're allowed to mount.

Signed-off-by: Joshua Ashton <joshua@froggi.es>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Joshua Ashton [Sun, 13 Aug 2023 16:49:12 +0000 (17:49 +0100)]

bcachefs: Split out dirent alloc and name initialization

Splits out the code that allocates the dirent and initializes the name
to make things easier to implement casefolding in a future commit.

Cc: André Almeida <andrealmeid@igalia.com>
Cc: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 20 Feb 2025 18:15:50 +0000 (13:15 -0500)]

bcachefs: Kill dirent_occupied_size() in create path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 20 Feb 2025 17:58:21 +0000 (12:58 -0500)]

bcachefs: Kill dirent_occupied_size() in rename path

Cc: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 8 Feb 2025 02:31:03 +0000 (21:31 -0500)]

bcachefs: bcachefs_metadata_version_stripe_lru

Add a persistent LRU for stripes, ordered by "number of empty blocks",
i.e. order in which we wish to reuse them.

This will replace the in-memory stripes heap, so we can kill off reading
stripes into memory at startup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 7 Feb 2025 06:34:00 +0000 (01:34 -0500)]

bcachefs: bcachefs_metadata_version_stripe_backpointers

Stripes now have backpointers.

This is needed for proper scrub - stripe checksums need to be verified,
separately from extents within the stripe, since a block may not be full
of live extents but it's still needed for reconstruct.

And this will be needed for (efficient) evacuate/repair paths.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 8 Feb 2025 00:56:11 +0000 (19:56 -0500)]

bcachefs: Advance bch_alloc.oldest_gen if no stale pointers

Now that we've got cached backpointers and aren't leaving around stale
pointers on bucket invalidation, we no longer need the periodic (rare)
gc_gens - which recalculates each bucket's oldest gen to avoid wraparound.

We can't delete that code because we've got to support existing
filesystems that will still have stale pointers, but this gets rid of
another scalability limit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 7 Feb 2025 23:12:57 +0000 (18:12 -0500)]

bcachefs: Invalidate cached data by backpointers

If we don't leave stale pointers around, we won't have to deal with
bucket gen wraparound.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 7 Feb 2025 06:33:35 +0000 (01:33 -0500)]

bcachefs: bcachefs_metadata_version_cached_backpointers

Cached pointers now have backpointers.

This means that we'll be able to kill cached pointers in the
bucket_invalidate path, when invalidating/reusing buckets containing
cached data, instead of leaving them around to be cleaned up by gc_gens
garbago collection - which requires a full metadata scan.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Tue, 11 Feb 2025 18:45:46 +0000 (13:45 -0500)]

bcachefs: rework bch2_trans_commit_run_triggers()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Tue, 11 Feb 2025 15:09:31 +0000 (10:09 -0500)]

bcachefs: Better trigger ordering

Transactional triggers need to run in a defined ordering, which is not
quite the same as btree ID integer comparison.

Previously this was handled in a hacky way in
bch2_trans_commit_run_triggers(), since it was only the alloc btree that
needed special handling, but upcoming stripe btree changes are going to
require more ordering changes - so, define that ordering.

Next patch will change the transaction commit path to use it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Tue, 11 Feb 2025 01:32:37 +0000 (20:32 -0500)]

bcachefs: bch2_trigger_stripe_ptr() no longer uses ec_stripes_heap_lock

Introduce per-entry locks, like with struct bucket - the stripes heap is
going away.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 10 Feb 2025 23:48:12 +0000 (18:48 -0500)]

bcachefs: Rework bch2_check_lru_key()

It's now easier to add new LRU types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 10 Feb 2025 23:42:45 +0000 (18:42 -0500)]

bcachefs: decouple bch2_lru_check_set() from alloc btree

Pass in the backpointer explicitly, instead of assuming 'referring_k' is
an alloc key and calculating it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 10 Feb 2025 23:39:50 +0000 (18:39 -0500)]

bcachefs: s/BCH_LRU_FRAGMENTATION_START/BCH_LRU_BUCKET_FRAGMENTATION/

FRAGMENTATION_START was incorrect, there's currently only one
fragmentation LRU (at the end of the reserved bits for LRU type), and
we're getting ready to add a stripe fragmentation lru - so give it a
better name.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 10 Feb 2025 23:37:50 +0000 (18:37 -0500)]

bcachefs: bch2_lru_change() checks for no-op

Minor cleanup, no reason for the caller to have to this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Wed, 12 Feb 2025 14:47:39 +0000 (09:47 -0500)]

bcachefs: minor journal errcode cleanup

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 10 Feb 2025 22:04:08 +0000 (17:04 -0500)]

bcachefs: bch2_write_op_error() now prints info about data update

A user has been seeing the "error verifying existing checksum while
rewriting existing data (memory corruption?)" error.

This generally indicates a hardware issue (and that may be the case
here), but it might also indicate a bug, in which case we need more
information to look for patterns.

Reported-by: Roland Vet <vet.roland@protonmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 10 Feb 2025 16:55:33 +0000 (11:55 -0500)]

bcachefs: metadata_target is not an inode option

This option only applies filesystem wide.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Tue, 28 Jan 2025 17:24:15 +0000 (18:24 +0100)]

bcachefs: eytzinger1_{next,prev} cleanup

The eytzinger code was previously relying on the following wrap-around
properties and their "eytzinger0" equivalents:

  eytzinger1_prev(0, size) == eytzinger1_last(size)
  eytzinger1_next(0, size) == eytzinger1_first(size)

However, these properties are no longer relied upon and no longer
necessary, so remove the corresponding asserts and forbid the use of
eytzinger1_prev(0, size) and eytzinger1_next(0, size).

This allows to further simplify the code in eytzinger1_next() and
eytzinger1_prev(): where the left shifting happens, eytzinger1_next() is
trying to move i to the lowest child on the left, which is equivalent to
doubling i until the next doubling would cause it to be greater than
size.  This is implemented by shifting i to the left so that the most
significant bits align and then shifting i to the right by one if the
result is greater than size.

Likewise, eytzinger1_prev() is trying to move to the lowest child on the
right; the same applies here.

The 1-offset in (size - 1) in eytzinger1_next() isn't needed at all, but
the equivalent offset in eytzinger1_prev() is surprisingly needed to
preserve the 'eytzinger1_prev(0, size) == eytzinger1_last(size)'
property.  However, since we no longer support that property, we can get
rid of these offsets as well.  This saves one addition in each function
and makes the code less confusing.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Mon, 27 Jan 2025 19:54:52 +0000 (20:54 +0100)]

bcachefs: convert eytzinger sort to be 1-based (2)

In this second step, transform the eytzinger indexes i, j, and k in
eytzinger1_sort_r() from 0-based to 1-based. This step looks a bit
messy, but the resulting code is slightly better.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Wed, 27 Nov 2024 12:26:10 +0000 (13:26 +0100)]

bcachefs: convert eytzinger sort to be 1-based (1)

In this first step, convert the eytzinger sort functions to use 1-based
primitives.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Tue, 28 Jan 2025 09:56:37 +0000 (10:56 +0100)]

bcachefs: convert eytzinger0_find to be 1-based

Several of the algorithms on eytzinger trees are implemented in terms of
the eytzinger0 primitives. However, those algorithms can just as easily
be expressed in terms of the eytzinger1 primitives, and that leads to
better and easier to understand code. Start by converting
eytzinger0_find().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Sat, 1 Feb 2025 12:55:46 +0000 (13:55 +0100)]

bcachefs: Add eytzinger0_find self test

Function eytzinger0_find() isn't currently covered, so add a self test.

We can rely on eytzinger0_find_le() here because it is being
tested independently.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Mon, 27 Jan 2025 16:15:36 +0000 (17:15 +0100)]

bcachefs: add eytzinger0_find_ge self test

Add an eytzinger0_find_ge() self test similar to eytzinger0_find_gt().

Note that this test requires eytzinger0_find_ge() to return the first
matching element in the array in case of duplicates. To prevent
bisection errors, we only add this test after strenghening the original
implementation (see the previous commit).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Mon, 27 Jan 2025 16:52:39 +0000 (17:52 +0100)]

bcachefs: implement eytzinger0_find_ge directly

Implement eytzinger0_find_ge() directly instead of implementing it in
terms of eytzinger0_find_le() and adjusting the result.

This turns eytzinger0_find_ge() into a minimum search, so when there are
duplicate elements, the result of eytzinger0_find_ge() will now always
point at the first matching element.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Mon, 27 Jan 2025 16:52:39 +0000 (17:52 +0100)]

bcachefs: implement eytzinger0_find_gt directly

Instead of implementing eytzinger0_find_gt() in terms of
eytzinger0_find_le() and adjusting the result, implement it directly.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Mon, 27 Jan 2025 16:05:21 +0000 (17:05 +0100)]

bcachefs: add eytzinger0_find_gt self test

Add an eytzinger0_find_gt() self test similar to eytzinger0_find_le().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Mon, 27 Jan 2025 13:33:20 +0000 (14:33 +0100)]

bcachefs: simplify eytzinger0_find_le

Replace the over-complicated implementation of eytzinger0_find_le() by
an equivalent, simpler version.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Tue, 28 Jan 2025 09:56:04 +0000 (10:56 +0100)]

bcachefs: convert eytzinger0_find_le to be 1-based

eytzinger0_find_le() is also easy to concert to 1-based eytzinger (but
see the next commit).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Sun, 26 Jan 2025 16:57:06 +0000 (17:57 +0100)]

bcachefs: improve eytzinger0_find_le self test

Rename eytzinger0_find_test_val() to eytzinger0_find_test_le() and add a
new eytzinger0_find_test_val() wrapper that calls it.

We have already established that the array is sorted in eytzinger order,
so we can use the eytzinger iterator functions and check the boundary
conditions to verify the result of eytzinger0_find_le().

Only scan the entire array if we get an incorrect result. When we need
to scan, use eytzinger0_for_each_prev() so that we'll stop at the
highest matching element in the array in case there are duplicates;
going through the array linearly wouldn't give us that.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Mon, 27 Jan 2025 16:26:05 +0000 (17:26 +0100)]

bcachefs: add eytzinger0_for_each_prev

Add an eytzinger0_for_each_prev() macro for iterating through an
eytzinger array in reverse.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Sun, 26 Jan 2025 10:22:33 +0000 (11:22 +0100)]

bcachefs: eytzinger0_find_test improvement

In eytzinger0_find_test(), remember the smallest element seen so far
instead of comparing adjacent array elements.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Sun, 26 Jan 2025 10:28:59 +0000 (11:28 +0100)]

bcachefs: eytzinger[01]_test improvement

In eytzinger[01]_test(), make sure that eytzinger[01]_for_each()
iterates over all array elements.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Tue, 26 Nov 2024 22:33:55 +0000 (23:33 +0100)]

bcachefs: eytzinger self tests: fix cmp_u16 typo

Fix an obvious typo in cmp_u16().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Tue, 26 Nov 2024 20:55:49 +0000 (21:55 +0100)]

bcachefs: eytzinger self tests: missing newline termination

pr_info() format strings need to be newline terminated.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Tue, 26 Nov 2024 11:12:36 +0000 (12:12 +0100)]

bcachefs: eytzinger self tests: loop cleanups

The iterator variable of eytzinger0_for_each() loops has been changed to
be locally scoped at some point, so remove variables defined outside the
loop that are now unused. In addition and for clarity, use a different
variable inside those loops where an outside variable would be shadowed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Tue, 28 Jan 2025 00:39:23 +0000 (01:39 +0100)]

bcachefs: EYTZINGER_DEBUG fix

When EYTZINGER_DEBUG is defined, <linux/bug.h> needs to be included.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Andreas Gruenbacher [Tue, 28 Jan 2025 09:32:47 +0000 (10:32 +0100)]

bcachefs: bch2_blacklist_entries_gc cleanup

Use an eytzinger0_for_each() loop here.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 7 Feb 2025 21:58:34 +0000 (16:58 -0500)]

bcachefs: bch2_bkey_ptr_data_type() now correctly returns cached for cached ptrs

Necessary for adding backpointers for cached pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 27 Jan 2025 06:22:42 +0000 (01:22 -0500)]

bcachefs: Add time_stat for btree writes

We have other metadata IO types covered, this was missing.

Note: this includes the time until completion, i.e. including parent
pointer update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 7 Feb 2025 22:12:47 +0000 (17:12 -0500)]

bcachefs: Add comment explaining why asserts in invalidate_one_bucket() are impossible

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 8 Feb 2025 02:26:27 +0000 (21:26 -0500)]

bcachefs: Ignore backpointers to stripes in ec_stripe_update_extents()

Prep work for stripe backpointers: this path previously would get very
confused at being asked to process (remove redundant replicas) stripes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 23 Jan 2025 18:46:47 +0000 (13:46 -0500)]

bcachefs: Increase JOURNAL_BUF_NR

Increase journal pipelining.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 6 Feb 2025 00:13:39 +0000 (19:13 -0500)]

bcachefs: Free journal bufs when not in use

Since we're increasing the number of 'struct journal_bufs', we don't
want them all permanently holding onto buffers for the journal data -
that'd be 16 * 2MB = 32MB, or potentially more.

Add a single-element mempool (open coded, since buffer size varies),
this also means we won't be hitting the memory allocator every time we
open and close a journal entry/buffer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 23 Jan 2025 18:06:35 +0000 (13:06 -0500)]

bcachefs: Don't touch journal_buf->data->seq in journal_res_get

This is a small optimization, reducing the number of cachelines we touch
in the fast path - and it's also necessary for the next patch that
increases JOURNAL_BUF_NR.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 23 Jan 2025 19:02:44 +0000 (14:02 -0500)]

bcachefs: Kill journal_res.idx

More dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 23 Jan 2025 18:43:15 +0000 (13:43 -0500)]

bcachefs: Kill journal_res_state.unwritten_idx

Dead code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 7 Feb 2025 19:01:05 +0000 (14:01 -0500)]

bcachefs: add progress indicator to check_allocations

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 6 Feb 2025 21:25:29 +0000 (16:25 -0500)]

bcachefs: Add a progress indicator to bch2_dev_data_drop()

This code needs quite a bit of work: we don't want to be walking all
metadata in the filesystem, we should just be walking backpointers, and
it should be switched to a data ioctl that can report progress via a
file descriptor, not the system console.

But that'll take more work - before we can safely walk only backpointers
we need to change device add to not reuse device indexes, since with
that change accounting being wrong introduces the possibility of
removing a device that still has pointers.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 6 Feb 2025 20:59:28 +0000 (15:59 -0500)]

bcachefs: Factor out progress.[ch]

the backpointers code has progress indicators; these aren't great, since
they print to the dmesg console and we much prefer to have progress
indicators reporting to a specific userspace program so they're not
spamming the system console.

But not all codepaths that need progress indicators support that yet,
and we don't want users to think "this is hung".

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 7 Feb 2025 18:37:30 +0000 (13:37 -0500)]

bcachefs: bch2_inum_offset_err_msg_trans() no longer handles transaction restarts

we're starting to use error messages with paths in fsck_errors(), where
we do not want nested transaction restart handling, so let's prepare for
that.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 7 Feb 2025 06:33:01 +0000 (01:33 -0500)]

bcachefs: bch2_indirect_extent_missing_error() prints path, not just inode number

We want all error messages converted to print paths, not just inode
numbers - users want this information, and it speeds up debugging too.

Auditing and converting all error messages is going to be a big project,
so for the moment we're just doing this incrementally.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 23 Jan 2025 16:45:22 +0000 (11:45 -0500)]

bcachefs: Convert migrate to move_data_phys()

Iterating over backpointers on a specific device is potentially much
cheaper than walking all filesystem data.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Tue, 4 Feb 2025 01:15:52 +0000 (20:15 -0500)]

bcachefs: Read/move path counter work

Reorganize counters a bit, grouping related counters together.

New counters:
- io_read_inline
- io_read_hole

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Alan Huang [Mon, 27 Jan 2025 09:12:41 +0000 (17:12 +0800)]

bcachefs: Fix subtraction underflow

When ancestor is less than IS_ANCESTOR_BITMAP, we would get an incorrect
result.

Signed-off-by: Alan Huang <mmpgouride@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 29 Dec 2024 00:59:55 +0000 (19:59 -0500)]

bcachefs: Scrub

Add a new data op to walk all data and metadata in a filesystem,
checking if it can be read successfully, and on error repairing from
another copy if possible.

- New helper: bch2_dev_idx_is_online(), so that we can bail out and
  report to userspace when we're unable to scrub because the device is
  offline

- data_update_opts, which controls the data move path, now understands
  scrub: data is only read, not written. The read path is responsible
  for rewriting on read error, as with other reads.

- scrub_pred skips data extents that don't have checksums

- bch_ioctl_data has a new scrub member, which has a data_types field
  for data types to check - i.e. all data types, or only metadata.

- Add new entries to bch_move_stats so that we can report numbers for
  corrected and uncorrected errors

- Add a new enum to bch_ioctl_data_event for explicitly reporting
  completion and return code (i.e. device offline)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 30 Dec 2024 21:24:23 +0000 (16:24 -0500)]

bcachefs: bch2_btree_node_scrub()

Add a function for scrubbing btree nodes - reading them in, and kicking
off a rewrite if there's an error.

The btree_node_read_done() checks have to be duplicated because we're
not using a pointer to a struct btree - the btree node might already be
in cache, and we need to check a specific replica, which might not be
the one we previously read from.

This will be used in the next patch implementing high-level scrub.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 29 Dec 2024 00:58:47 +0000 (19:58 -0500)]

bcachefs: bch2_bkey_pick_read_device() can now specify a device

To be used for scrub, where we want the read to come from a specific
device.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 29 Dec 2024 02:04:36 +0000 (21:04 -0500)]

bcachefs: __bch2_move_data_phys() now uses bch2_btree_node_rewrite_pos()

Kill most of the separate logic for btree nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 28 Dec 2024 15:40:11 +0000 (10:40 -0500)]

bcachefs: bch2_move_data_phys()

Add a more general version of bch2_evacuate_bucket - to be used for
scrub.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 29 Dec 2024 02:00:34 +0000 (21:00 -0500)]

bcachefs: bch2_btree_node_rewrite_pos()

Add a new helper for rewriting a btree node given a just the key, not a
pointer to the node itself.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 28 Dec 2024 21:20:38 +0000 (16:20 -0500)]

bcachefs: backpointer_get_key() doesn't pull in btree node

We may not need to pull in a btree node when walking backpointers -
don't do so unnecessarily when using backpointer_get_key().

It'll still fall back to backpointer_get_node() in a few situations,
including btree roots (where an iterator can't point at just the key),
and races due to the interior update path not having deleted a
backpointer to an old node yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 30 Dec 2024 21:32:57 +0000 (16:32 -0500)]

bcachefs: Internal reads can now correct errors

Rework the read path so that BCH_READ_NODECODE reads now also self-heal
after a read error and a successful retry - prerequisite for scrub.

- __bch2_read_endio() now handles a read that's both BCH_READ_NODECODE
  and a bounce.

  Normally, we don't want a BCH_READ_NODECODE read to ever allocate a
  split bch_read_bio: we want to maintain the relationship between the
  bch_read_bio and the data_update it's embedded in.

  But correcting read errors requires allocating a split/bounce rbio
  that's embedded in a promote_op. We do still have a 1-1 relationship,
  i.e. we only allocate a single split/bounce if it's a
  BCH_READ_NODECODE, so things hopefully don't get too crazy.

- __bch2_read_extent() now is allowed to allocate the promote_op for
  rewriting after a failed read, even if it's BCH_READ_NODECODE.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Mon, 20 Jan 2025 01:34:57 +0000 (20:34 -0500)]

bcachefs: Don't self-heal if a data update is already rewriting

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 18 Jan 2025 00:26:10 +0000 (19:26 -0500)]

bcachefs: Don't start promotes from bch2_rbio_free()

we don't want to block completion of the read - starting a promote calls
into the write path, which will block.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 19 Jan 2025 18:55:33 +0000 (13:55 -0500)]

bcachefs: Bail out early on alloc_nowait data updates

If a data update doesn't want to block on allocations (promotes, self
healing on read error) - check if the allocation would fail before
kicking off the data update and calling into the write path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 19 Jan 2025 18:43:44 +0000 (13:43 -0500)]

bcachefs: Rework init order in bch2_data_update_init()

Initialize the write op first, so that in the next patch we can check if
the allocator would block (for BCH_WRITE_alloc_nowait ops) and bail out
before taking nocow locks/dev refs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sat, 18 Jan 2025 07:05:57 +0000 (02:05 -0500)]

bcachefs: Self healing writes are BCH_WRITE_alloc_nowait

If a drive is failing and we're moving data off of it, we can't
necessairly depend on capacity/disk reservation calculations to avoid
deadlocking/blocking on the allocator.

And, we don't want to queue up infinite self healing moves anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 19 Jan 2025 18:11:24 +0000 (13:11 -0500)]

bcachefs: Promotes should use BCH_WRITE_only_specified_devs

Promotes, like most other internal moves, should only go to the
specified target and not fall back to allocating from the full
filesystem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 16 Jan 2025 08:43:03 +0000 (03:43 -0500)]

bcachefs: Be stricter in bch2_read_retry_nodecode()

Now that data_update embeds bch_read_bio, BCH_READ_NODECODE means that
the read is embedded in a a data_update - and we can check in the retry
path if the extent has changed and bail out.

This likely fixes some subtle bugs with read errors and data moves.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 16 Jan 2025 05:40:43 +0000 (00:40 -0500)]

bcachefs: cleanup redundant code around data_update_op initialization

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 17 Jan 2025 19:26:30 +0000 (14:26 -0500)]

bcachefs: bch2_update_unwritten_extent() no longer depends on wbio

Prep work for improving bch2_data_update_init().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Thu, 16 Jan 2025 03:22:29 +0000 (22:22 -0500)]

bcachefs: promote_op uses embedded bch_read_bio

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Wed, 15 Jan 2025 23:53:55 +0000 (18:53 -0500)]

bcachefs: data_update now embeds bch_read_bio

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Wed, 15 Jan 2025 17:59:43 +0000 (12:59 -0500)]

bcachefs: rbio_init() cleanup

Move more initialization to rbio_init(), to assist in further cleanups.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Tue, 14 Jan 2025 20:20:04 +0000 (15:20 -0500)]

bcachefs: rbio_init_fragment()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Sun, 19 Jan 2025 18:18:50 +0000 (13:18 -0500)]

bcachefs: Rename BCH_WRITE flags fer consistency with other x-macros enums

The uppercase/lowercase style is nice for making the namespace explicit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 17 Jan 2025 22:40:39 +0000 (17:40 -0500)]

bcachefs: x-macroize BCH_READ flags

Will be adding a bch2_read_bio_to_text().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Fri, 17 Jan 2025 15:47:42 +0000 (10:47 -0500)]

bcachefs: kill bch_read_bio.devs_have

Dead code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

commit | commitdiff | tree

Kent Overstreet [Tue, 31 Dec 2024 23:16:17 +0000 (18:16 -0500)]

bcachefs: bch2_data_update_inflight_to_text()

Add a new helper for bch2_moving_ctxt_to_text(), which may be used to
debug if moving_ios are getting stuck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

A mirror of Linus' kernel repository

RSS Atom