Kent Overstreet [Fri, 15 Nov 2024 05:52:20 +0000 (00:52 -0500)]
bcachefs: Don't use a shared decompress workspace mempool
gzip and zstd require different decompress workspace sizes, and if we
start with one and then start using the other at runtime we may not get
the correct size
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 17 Nov 2024 02:03:53 +0000 (21:03 -0500)]
bcachefs: compression workspaces should be indexed by opt, not type
type includes lz4 and lz4_old, which do not get different compression
workspaces, and incompressible, a fake type - BCH_COMPRESSION_OPTS() is
the correct enum to use.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 15 Nov 2024 02:53:38 +0000 (21:53 -0500)]
bcachefs: Kill bch2_get_next_backpointer()
Since for quite some time backpointers have only been stored in the
backpointers btree, not alloc keys (an aborted experiment, support for
which has been removed) - we can replace get_next_backpointer() with
simple btree iteration.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 15 Nov 2024 02:28:40 +0000 (21:28 -0500)]
bcachefs: Delete backpointers check in try_alloc_bucket()
try_alloc_bucket() has a "safety" check, which avoids allocating a
bucket if there's any backpointers present.
But backpointers are not the source of truth for live data in a bucket,
the bucket sector counts are; this check was fairly useless, and we're
also deferring backpointers checks from fsck to runtime in the near
future.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 26 Oct 2024 00:41:06 +0000 (20:41 -0400)]
bcachefs: peek_prev_min(): Search forwards for extents, snapshots
With extents and snapshots, for slightly different reasons, we may have
to search forwards to find a key that compares equal to iter->pos (i.e.
a key that peek_prev() should return, as it returns keys <= iter->pos).
peek_slot() does this, and is an easy way to fix this case.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 25 Oct 2024 02:12:37 +0000 (22:12 -0400)]
bcachefs: Implement bch2_btree_iter_prev_min()
A user contributed a filessytem dump, where the dump was actually
corrupted (due to being taken while the filesystem was online), but
which exposed an interesting bug in fsck - reconstruct_inode().
When itearting in BTREE_ITER_filter_snapshots mode, it's required to
give an end position for the iteration and it can't span inode numbers;
continuing into the next inode might mean we start seeing keys from a
different snapshot tree, that the is_ancestor() checks always filter,
thus we're never able to return a key and stop iterating.
Backwards iteration never implemented the end position because nothing
else needed it - except for reconstuct_inode().
Additionally, backwards iteration is now able to overlay keys from the
journal, which will be useful if we ever decide to start doing journal
replay in the background.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 21 Oct 2024 00:27:44 +0000 (20:27 -0400)]
bcachefs: Don't delete reflink pointers to missing indirect extents
To avoid tragic loss in the event of transient errors (i.e., a btree
node topology error that was later corrected by btree node scan), we
can't delete reflink pointers to correct errors.
This adds a new error bit to bch_reflink_p, indicating that it is known
to point to a missing indirect extent, and the error has already been
reported.
Indirect extent lookups now use bch2_lookup_indirect_extent(), which on
error reports it as a fsck_err() and sets the error bit, and clears it
if necessary on succesful lookup.
This also gets rid of the bch2_inconsistent_error() call in
__bch2_read_indirect_extent, and in the reflink_p trigger: part of the
online self healing project.
An on disk format change isn't necessary here: setting the error bit
will be interpreted by older versions as pointing to a different index,
which will also be missing - which is fine.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 29 Oct 2024 03:43:16 +0000 (23:43 -0400)]
bcachefs: Reserve 8 bits in bch_reflink_p
Better repair for reflink pointers, as well as propagating new inode
options to indirect extents, are going to require a few extra bits
bch_reflink_p: so claim a few from the high end of the destination
index.
Also add some missing bounds checking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 29 Oct 2024 01:27:23 +0000 (21:27 -0400)]
bcachefs: Kill FSCK_NEED_FSCK
If we find an error that indicates that we need to run fsck, we can
specify that directly with run_explicit_recovery_pass().
These are now log_fsck_err() calls: we're just logging in the superblock
that an error occurred - and possibly doing an emergency shutdown,
depending on policy.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 27 Oct 2024 02:52:06 +0000 (22:52 -0400)]
bcachefs: Delete dead code from bch2_discard_one_bucket()
alloc key validation ensures that if a bucket is in need_discard state
the sector counts are all zero - we don't have to check for that.
The NEED_INC_GEN check appears to be dead code, as well: we only see
buckets in the need_discard btree, and it's an error if they aren't in
the need_discard state.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bcachefs: Remove redundant initialization in bch2_vfs_inode_init()
`inode->v.i_ino` has been initialized to `inum.inum`. If `inum.inum` and
`bi->bi_inum` are not equal, BUG_ON() is triggered in
bch2_inode_update_after_write().
Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Integral [Wed, 23 Oct 2024 10:00:33 +0000 (18:00 +0800)]
bcachefs: add support for true/false & yes/no in bool-type options
Here is the patch which uses existing constant table:
Currently, when using bcachefs-tools to set options, bool-type options
can only accept 1 or 0. Add support for accepting true/false and yes/no
for these options.
Signed-off-by: Integral <integral@murena.io> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Acked-by: David Howells <dhowells@redhat.com>
Kent Overstreet [Sat, 26 Oct 2024 02:31:20 +0000 (22:31 -0400)]
bcachefs: Assert that we're not violating key cache coherency rules
We're not allowed to have a dirty key in the key cache if the key
doesn't exist at all in the btree - creation has to bypass the key
cache, so that iteration over the btree can check if the key is present
in the key cache.
Things break in subtle ways if cache coherency is broken, so this needs
an assert.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Oct 2024 03:52:51 +0000 (23:52 -0400)]
bcachefs: Better in_restart error
We're ramping up on checking transaction restart handling correctness -
so, in debug mode we now save a backtrace for where the restart was
emitted, which makes it much easier to track down the incorrect
handling.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 8 Nov 2024 03:00:05 +0000 (22:00 -0500)]
bcachefs: Fix unhandled transaction restart in evacuate_bucket()
Generally, releasing a transaction within a transaction restart means an
unhandled transaction restart: but this can happen legitimately within
the move code, e.g. when bch2_move_ratelimit() tells us to exit before
we've retried.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 31 Oct 2024 04:25:36 +0000 (00:25 -0400)]
bcachefs: Improved check_topology() assert
On interior btree node updates, we always verify that we're not
introducing topology errors: child nodes should exactly span the range
of the parent node.
single_device.ktest small_nodes has been popping this assert: change it
to give us more information.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Hongbo Li [Tue, 29 Oct 2024 12:54:08 +0000 (20:54 +0800)]
bcachefs: use attribute define helper for sysfs attribute
The sysfs attribute definition has been wrapped into macro:
rw_attribute, read_attribute and write_attribute, we can
use these helpers to uniform the attribute definition.
Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Hongbo Li [Tue, 29 Oct 2024 12:53:50 +0000 (20:53 +0800)]
bcachefs: remove write permission for gc_gens_pos sysfs interface
The gc_gens_pos is used to show the status of bucket gen gc.
There is no need to assign write permissions for this attribute.
Here we can use read_attribute helper to define this attribute.
Fixes: ac516d0e7db7 ("bcachefs: Add the status of bucket gen gc to sysfs") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 01:41:20 +0000 (21:41 -0400)]
bcachefs: Simplify option logic in rebalance
Since bch2_move_get_io_opts() now synchronizes io_opts with options from
bch_extent_rebalance, delete the ad-hoc logic in rebalance.c that
previously did this.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 01:41:20 +0000 (21:41 -0400)]
bcachefs: get_update_rebalance_opts()
bch2_move_get_io_opts() now synchronizes options loaded from the
filesystem and inode (if present, i.e. not walking the reflink btree
directly) with options from the bch_extent_rebalance_entry, updating the
extent if necessary.
Since bch_extent_rebalance tracks where its option came from we can
preserve "inode options override filesystem options", even for indirect
extents where we don't have access to the inode the options came from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 21 Oct 2024 00:53:53 +0000 (20:53 -0400)]
bcachefs: bch2_write_inode() now checks for changing rebalance options
Previously, BCHFS_IOC_REINHERIT_ATTRS didn't trigger rebalance scans
when changing rebalance options - it had been missed, only the xattr
interface triggered them.
Ideally they'd be done by the transactional trigger, but unpacking the
inode to get the options is too heavy to be done in the low level
trigger - the inode trigger is run on every extent update, since the
bch_inode.bi_journal_seq has to be updated for fsync.
bch2_write_inode() is a good compromise, it already unpacks and repacks
and is not run in any super-fast paths.
Additionally, creating the new rebalance entry to trigger the scan is
now done in the same transaction as the inode update that changed the
options.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 20 Oct 2024 03:26:11 +0000 (23:26 -0400)]
bcachefs: Add bch_io_opts fields for indicating whether the opts came from the inode
This is going to be used in the bch_extent_rebalance improvements, which
propagate io_path options into the extent (important for rebalance,
which needs something present in the extent for transactionally tagging
them in the rebalance_work btree, and also for indirect extents).
By tracking in bch_extent_rebalance whether the option came from the
filesystem or the inode we can correctly handle options being changed on
indirect extents.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Thorsten Blum [Sat, 26 Oct 2024 15:47:04 +0000 (17:47 +0200)]
bcachefs: Annotate struct bucket_gens with __counted_by()
Add the __counted_by compiler attribute to the flexible array member b
to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
Use struct_size() to calculate the number of bytes to be allocated.
Update bucket_gens->nbuckets and bucket_gens->nbuckets_minus_first when
resizing.
Compile-tested only.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 15 Oct 2024 01:35:44 +0000 (21:35 -0400)]
bcachefs: Add block plugging to read paths
This will help with some of the btree_trans srcu lock hold time warnings
that are still turning up; submit_bio() can block for awhile if the
device is sufficiently congested.
It's not a perfect solution since blk_plug bios are submitted when
scheduling; we might want a way to disable the "submit on context
switch" behaviour, or switch to our own plugging in the future.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 12 Oct 2024 02:53:09 +0000 (22:53 -0400)]
bcachefs: -o norecovery now bails out of recovery earlier
-o norecovery (used by the dump tool) should be doing the absolute
minimum amount of work to get the filesystem up and readable; we
shouldn't be running check and repair code, or going read-write.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 22 Sep 2024 00:21:18 +0000 (20:21 -0400)]
bcachefs: bch2_run_explicit_recovery_pass() returns different error when not in recovery
if we're not in recovery then there's no way to rewind recovery - give
this a different errcode so that any error messages will give us a
better idea of what happened.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The header files dirent_format.h and disk_groups_format.h are included
twice. Remove the redundant includes and the following warnings reported
by make includecheck:
disk_groups_format.h is included more than once
dirent_format.h is included more than once
Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 24 Sep 2024 09:08:39 +0000 (05:08 -0400)]
bcachefs: Remove unnecessary peek_slot()
hash_lookup() used to return an errorcode, and a peek_slot() call was
required to get the key it looked up. But we're adding fault injection
for transaction restarts, so fix this old unconverted code.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Linus Torvalds [Sun, 15 Dec 2024 23:33:41 +0000 (15:33 -0800)]
Merge tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
- Limit EFI zboot to GZIP and ZSTD before it comes in wider use
- Fix inconsistent error when looking up a non-existent file in
efivarfs with a name that does not adhere to the NAME-GUID format
- Drop some unused code
* tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/esrt: remove esre_attribute::store()
efivarfs: Fix error on non-existent file
efi/zboot: Limit compression options to GZIP and ZSTD
Linus Torvalds [Sun, 15 Dec 2024 23:29:07 +0000 (15:29 -0800)]
Merge tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"i2c host fixes: PNX used the wrong unit for timeouts, Nomadik was
missing a sentinel, and RIIC was missing rounding up"
* tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: riic: Always round-up when calculating bus period
i2c: nomadik: Add missing sentinel to match table
i2c: pnx: Fix timeout in wait functions