From: Sasha Levin Date: Sat, 23 Sep 2023 12:16:09 +0000 (-0400) Subject: Fixes for 6.1 X-Git-Tag: v6.5.6~114 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=2d7f0d29e49eaba6e5a3ddf9ea6ecaa0b71423e3;p=thirdparty%2Fkernel%2Fstable-queue.git Fixes for 6.1 Signed-off-by: Sasha Levin --- diff --git a/queue-6.1/btrfs-improve-error-message-after-failure-to-add-del.patch b/queue-6.1/btrfs-improve-error-message-after-failure-to-add-del.patch new file mode 100644 index 00000000000..3532dff6a04 --- /dev/null +++ b/queue-6.1/btrfs-improve-error-message-after-failure-to-add-del.patch @@ -0,0 +1,54 @@ +From fa496501a0eff85d4551f85c9adc559bb896b56d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 28 Aug 2023 09:06:42 +0100 +Subject: btrfs: improve error message after failure to add delayed dir index + item + +From: Filipe Manana + +[ Upstream commit 91bfe3104b8db0310f76f2dcb6aacef24c889366 ] + +If we fail to add a delayed dir index item because there's already another +item with the same index number, we print an error message (and then BUG). +However that message isn't very helpful to debug anything because we don't +know what's the index number and what are the values of index counters in +the inode and its delayed inode (index_cnt fields of struct btrfs_inode +and struct btrfs_delayed_node). + +So update the error message to include the index number and counters. + +We actually had a recent case where this issue was hit by a syzbot report +(see the link below). + +Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/ +Reviewed-by: Qu Wenruo +Signed-off-by: Filipe Manana +Reviewed-by: David Sterba +Signed-off-by: David Sterba +Stable-dep-of: 2c58c3931ede ("btrfs: remove BUG() after failure to insert delayed dir index item") +Signed-off-by: Sasha Levin +--- + fs/btrfs/delayed-inode.c | 7 ++++--- + 1 file changed, 4 insertions(+), 3 deletions(-) + +diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c +index d2cbb7733c7d6..34e843460e4db 100644 +--- a/fs/btrfs/delayed-inode.c ++++ b/fs/btrfs/delayed-inode.c +@@ -1506,9 +1506,10 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans, + ret = __btrfs_add_delayed_item(delayed_node, delayed_item); + if (unlikely(ret)) { + btrfs_err(trans->fs_info, +- "err add delayed dir index item(name: %.*s) into the insertion tree of the delayed node(root id: %llu, inode id: %llu, errno: %d)", +- name_len, name, delayed_node->root->root_key.objectid, +- delayed_node->inode_id, ret); ++"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d", ++ name_len, name, index, btrfs_root_id(delayed_node->root), ++ delayed_node->inode_id, dir->index_cnt, ++ delayed_node->index_cnt, ret); + BUG(); + } + mutex_unlock(&delayed_node->mutex); +-- +2.40.1 + diff --git a/queue-6.1/btrfs-remove-bug-after-failure-to-insert-delayed-dir.patch b/queue-6.1/btrfs-remove-bug-after-failure-to-insert-delayed-dir.patch new file mode 100644 index 00000000000..1b8f64e3d3d --- /dev/null +++ b/queue-6.1/btrfs-remove-bug-after-failure-to-insert-delayed-dir.patch @@ -0,0 +1,140 @@ +From de2e6aae108bb735f0ad075cb0d2a7650bb38629 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 28 Aug 2023 09:06:43 +0100 +Subject: btrfs: remove BUG() after failure to insert delayed dir index item + +From: Filipe Manana + +[ Upstream commit 2c58c3931ede7cd08cbecf1f1a4acaf0a04a41a9 ] + +Instead of calling BUG() when we fail to insert a delayed dir index item +into the delayed node's tree, we can just release all the resources we +have allocated/acquired before and return the error to the caller. This is +fine because all existing call chains undo anything they have done before +calling btrfs_insert_delayed_dir_index() or BUG_ON (when creating pending +snapshots in the transaction commit path). + +So remove the BUG() call and do proper error handling. + +This relates to a syzbot report linked below, but does not fix it because +it only prevents hitting a BUG(), it does not fix the issue where somehow +we attempt to use twice the same index number for different index items. + +Link: https://lore.kernel.org/linux-btrfs/00000000000036e1290603e097e0@google.com/ +CC: stable@vger.kernel.org # 5.4+ +Reviewed-by: Qu Wenruo +Signed-off-by: Filipe Manana +Reviewed-by: David Sterba +Signed-off-by: David Sterba +Signed-off-by: Sasha Levin +--- + fs/btrfs/delayed-inode.c | 74 +++++++++++++++++++++++++--------------- + 1 file changed, 47 insertions(+), 27 deletions(-) + +diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c +index 34e843460e4db..9dacf72a75d0e 100644 +--- a/fs/btrfs/delayed-inode.c ++++ b/fs/btrfs/delayed-inode.c +@@ -1421,7 +1421,29 @@ void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info) + btrfs_wq_run_delayed_node(delayed_root, fs_info, BTRFS_DELAYED_BATCH); + } + +-/* Will return 0 or -ENOMEM */ ++static void btrfs_release_dir_index_item_space(struct btrfs_trans_handle *trans) ++{ ++ struct btrfs_fs_info *fs_info = trans->fs_info; ++ const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1); ++ ++ if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) ++ return; ++ ++ /* ++ * Adding the new dir index item does not require touching another ++ * leaf, so we can release 1 unit of metadata that was previously ++ * reserved when starting the transaction. This applies only to ++ * the case where we had a transaction start and excludes the ++ * transaction join case (when replaying log trees). ++ */ ++ trace_btrfs_space_reservation(fs_info, "transaction", ++ trans->transid, bytes, 0); ++ btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL); ++ ASSERT(trans->bytes_reserved >= bytes); ++ trans->bytes_reserved -= bytes; ++} ++ ++/* Will return 0, -ENOMEM or -EEXIST (index number collision, unexpected). */ + int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans, + const char *name, int name_len, + struct btrfs_inode *dir, +@@ -1463,6 +1485,27 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans, + + mutex_lock(&delayed_node->mutex); + ++ /* ++ * First attempt to insert the delayed item. This is to make the error ++ * handling path simpler in case we fail (-EEXIST). There's no risk of ++ * any other task coming in and running the delayed item before we do ++ * the metadata space reservation below, because we are holding the ++ * delayed node's mutex and that mutex must also be locked before the ++ * node's delayed items can be run. ++ */ ++ ret = __btrfs_add_delayed_item(delayed_node, delayed_item); ++ if (unlikely(ret)) { ++ btrfs_err(trans->fs_info, ++"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d", ++ name_len, name, index, btrfs_root_id(delayed_node->root), ++ delayed_node->inode_id, dir->index_cnt, ++ delayed_node->index_cnt, ret); ++ btrfs_release_delayed_item(delayed_item); ++ btrfs_release_dir_index_item_space(trans); ++ mutex_unlock(&delayed_node->mutex); ++ goto release_node; ++ } ++ + if (delayed_node->index_item_leaves == 0 || + delayed_node->curr_index_batch_size + data_len > leaf_data_size) { + delayed_node->curr_index_batch_size = data_len; +@@ -1480,37 +1523,14 @@ int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans, + * impossible. + */ + if (WARN_ON(ret)) { +- mutex_unlock(&delayed_node->mutex); + btrfs_release_delayed_item(delayed_item); ++ mutex_unlock(&delayed_node->mutex); + goto release_node; + } + + delayed_node->index_item_leaves++; +- } else if (!test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) { +- const u64 bytes = btrfs_calc_insert_metadata_size(fs_info, 1); +- +- /* +- * Adding the new dir index item does not require touching another +- * leaf, so we can release 1 unit of metadata that was previously +- * reserved when starting the transaction. This applies only to +- * the case where we had a transaction start and excludes the +- * transaction join case (when replaying log trees). +- */ +- trace_btrfs_space_reservation(fs_info, "transaction", +- trans->transid, bytes, 0); +- btrfs_block_rsv_release(fs_info, trans->block_rsv, bytes, NULL); +- ASSERT(trans->bytes_reserved >= bytes); +- trans->bytes_reserved -= bytes; +- } +- +- ret = __btrfs_add_delayed_item(delayed_node, delayed_item); +- if (unlikely(ret)) { +- btrfs_err(trans->fs_info, +-"error adding delayed dir index item, name: %.*s, index: %llu, root: %llu, dir: %llu, dir->index_cnt: %llu, delayed_node->index_cnt: %llu, error: %d", +- name_len, name, index, btrfs_root_id(delayed_node->root), +- delayed_node->inode_id, dir->index_cnt, +- delayed_node->index_cnt, ret); +- BUG(); ++ } else { ++ btrfs_release_dir_index_item_space(trans); + } + mutex_unlock(&delayed_node->mutex); + +-- +2.40.1 + diff --git a/queue-6.1/dm-fix-a-race-condition-in-retrieve_deps.patch b/queue-6.1/dm-fix-a-race-condition-in-retrieve_deps.patch new file mode 100644 index 00000000000..b46d212af66 --- /dev/null +++ b/queue-6.1/dm-fix-a-race-condition-in-retrieve_deps.patch @@ -0,0 +1,171 @@ +From 59253695e326ac0138f09a58dd72e3f07f70b4a2 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 9 Aug 2023 12:44:20 +0200 +Subject: dm: fix a race condition in retrieve_deps + +From: Mikulas Patocka + +[ Upstream commit f6007dce0cd35d634d9be91ef3515a6385dcee16 ] + +There's a race condition in the multipath target when retrieve_deps +races with multipath_message calling dm_get_device and dm_put_device. +retrieve_deps walks the list of open devices without holding any lock +but multipath may add or remove devices to the list while it is +running. The end result may be memory corruption or use-after-free +memory access. + +See this description of a UAF with multipath_message(): +https://listman.redhat.com/archives/dm-devel/2022-October/052373.html + +Fix this bug by introducing a new rw semaphore "devices_lock". We grab +devices_lock for read in retrieve_deps and we grab it for write in +dm_get_device and dm_put_device. + +Reported-by: Luo Meng +Signed-off-by: Mikulas Patocka +Cc: stable@vger.kernel.org +Tested-by: Li Lingfeng +Signed-off-by: Mike Snitzer +Signed-off-by: Sasha Levin +--- + drivers/md/dm-core.h | 1 + + drivers/md/dm-ioctl.c | 7 ++++++- + drivers/md/dm-table.c | 32 ++++++++++++++++++++++++-------- + 3 files changed, 31 insertions(+), 9 deletions(-) + +diff --git a/drivers/md/dm-core.h b/drivers/md/dm-core.h +index 28c641352de9b..71dcd8fd4050a 100644 +--- a/drivers/md/dm-core.h ++++ b/drivers/md/dm-core.h +@@ -214,6 +214,7 @@ struct dm_table { + + /* a list of devices used by this table */ + struct list_head devices; ++ struct rw_semaphore devices_lock; + + /* events get handed up using this callback */ + void (*event_fn)(void *); +diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c +index 2afd2d2a0f407..206e6ce554dc7 100644 +--- a/drivers/md/dm-ioctl.c ++++ b/drivers/md/dm-ioctl.c +@@ -1566,6 +1566,8 @@ static void retrieve_deps(struct dm_table *table, + struct dm_dev_internal *dd; + struct dm_target_deps *deps; + ++ down_read(&table->devices_lock); ++ + deps = get_result_buffer(param, param_size, &len); + + /* +@@ -1580,7 +1582,7 @@ static void retrieve_deps(struct dm_table *table, + needed = struct_size(deps, dev, count); + if (len < needed) { + param->flags |= DM_BUFFER_FULL_FLAG; +- return; ++ goto out; + } + + /* +@@ -1592,6 +1594,9 @@ static void retrieve_deps(struct dm_table *table, + deps->dev[count++] = huge_encode_dev(dd->dm_dev->bdev->bd_dev); + + param->data_size = param->data_start + needed; ++ ++out: ++ up_read(&table->devices_lock); + } + + static int table_deps(struct file *filp, struct dm_ioctl *param, size_t param_size) +diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c +index 288f600ee56dc..dac6a5f25f2be 100644 +--- a/drivers/md/dm-table.c ++++ b/drivers/md/dm-table.c +@@ -134,6 +134,7 @@ int dm_table_create(struct dm_table **result, fmode_t mode, + return -ENOMEM; + + INIT_LIST_HEAD(&t->devices); ++ init_rwsem(&t->devices_lock); + + if (!num_targets) + num_targets = KEYS_PER_NODE; +@@ -362,15 +363,19 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode, + return -ENODEV; + } + ++ down_write(&t->devices_lock); ++ + dd = find_device(&t->devices, dev); + if (!dd) { + dd = kmalloc(sizeof(*dd), GFP_KERNEL); +- if (!dd) +- return -ENOMEM; ++ if (!dd) { ++ r = -ENOMEM; ++ goto unlock_ret_r; ++ } + + if ((r = dm_get_table_device(t->md, dev, mode, &dd->dm_dev))) { + kfree(dd); +- return r; ++ goto unlock_ret_r; + } + + refcount_set(&dd->count, 1); +@@ -380,12 +385,17 @@ int dm_get_device(struct dm_target *ti, const char *path, fmode_t mode, + } else if (dd->dm_dev->mode != (mode | dd->dm_dev->mode)) { + r = upgrade_mode(dd, mode, t->md); + if (r) +- return r; ++ goto unlock_ret_r; + } + refcount_inc(&dd->count); + out: ++ up_write(&t->devices_lock); + *result = dd->dm_dev; + return 0; ++ ++unlock_ret_r: ++ up_write(&t->devices_lock); ++ return r; + } + EXPORT_SYMBOL(dm_get_device); + +@@ -421,9 +431,12 @@ static int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev, + void dm_put_device(struct dm_target *ti, struct dm_dev *d) + { + int found = 0; +- struct list_head *devices = &ti->table->devices; ++ struct dm_table *t = ti->table; ++ struct list_head *devices = &t->devices; + struct dm_dev_internal *dd; + ++ down_write(&t->devices_lock); ++ + list_for_each_entry(dd, devices, list) { + if (dd->dm_dev == d) { + found = 1; +@@ -432,14 +445,17 @@ void dm_put_device(struct dm_target *ti, struct dm_dev *d) + } + if (!found) { + DMERR("%s: device %s not in table devices list", +- dm_device_name(ti->table->md), d->name); +- return; ++ dm_device_name(t->md), d->name); ++ goto unlock_ret; + } + if (refcount_dec_and_test(&dd->count)) { +- dm_put_table_device(ti->table->md, d); ++ dm_put_table_device(t->md, d); + list_del(&dd->list); + kfree(dd); + } ++ ++unlock_ret: ++ up_write(&t->devices_lock); + } + EXPORT_SYMBOL(dm_put_device); + +-- +2.40.1 + diff --git a/queue-6.1/ext4-do-not-let-fstrim-block-system-suspend.patch b/queue-6.1/ext4-do-not-let-fstrim-block-system-suspend.patch new file mode 100644 index 00000000000..84db9d036f8 --- /dev/null +++ b/queue-6.1/ext4-do-not-let-fstrim-block-system-suspend.patch @@ -0,0 +1,76 @@ +From 3720e910eb24e057017973a389388c3190b7ebe8 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 13 Sep 2023 17:04:55 +0200 +Subject: ext4: do not let fstrim block system suspend + +From: Jan Kara + +[ Upstream commit 5229a658f6453362fbb9da6bf96872ef25a7097e ] + +Len Brown has reported that system suspend sometimes fail due to +inability to freeze a task working in ext4_trim_fs() for one minute. +Trimming a large filesystem on a disk that slowly processes discard +requests can indeed take a long time. Since discard is just an advisory +call, it is perfectly fine to interrupt it at any time and the return +number of discarded blocks until that moment. Do that when we detect the +task is being frozen. + +Cc: stable@kernel.org +Reported-by: Len Brown +Suggested-by: Dave Chinner +References: https://bugzilla.kernel.org/show_bug.cgi?id=216322 +Signed-off-by: Jan Kara +Link: https://lore.kernel.org/r/20230913150504.9054-2-jack@suse.cz +Signed-off-by: Theodore Ts'o +Signed-off-by: Sasha Levin +--- + fs/ext4/mballoc.c | 12 ++++++++++-- + 1 file changed, 10 insertions(+), 2 deletions(-) + +diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c +index 41f0385f85d38..3c8300e08f412 100644 +--- a/fs/ext4/mballoc.c ++++ b/fs/ext4/mballoc.c +@@ -16,6 +16,7 @@ + #include + #include + #include ++#include + #include + + /* +@@ -6430,6 +6431,11 @@ static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb, + EXT4_CLUSTER_BITS(sb); + } + ++static bool ext4_trim_interrupted(void) ++{ ++ return fatal_signal_pending(current) || freezing(current); ++} ++ + static int ext4_try_to_trim_range(struct super_block *sb, + struct ext4_buddy *e4b, ext4_grpblk_t start, + ext4_grpblk_t max, ext4_grpblk_t minblocks) +@@ -6463,8 +6469,8 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group)) + free_count += next - start; + start = next + 1; + +- if (fatal_signal_pending(current)) +- return -ERESTARTSYS; ++ if (ext4_trim_interrupted()) ++ return count; + + if (need_resched()) { + ext4_unlock_group(sb, e4b->bd_group); +@@ -6586,6 +6592,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range) + end = EXT4_CLUSTERS_PER_GROUP(sb) - 1; + + for (group = first_group; group <= last_group; group++) { ++ if (ext4_trim_interrupted()) ++ break; + grp = ext4_get_group_info(sb, group); + if (!grp) + continue; +-- +2.40.1 + diff --git a/queue-6.1/ext4-move-setting-of-trimmed-bit-into-ext4_try_to_tr.patch b/queue-6.1/ext4-move-setting-of-trimmed-bit-into-ext4_try_to_tr.patch new file mode 100644 index 00000000000..7db54b11a3e --- /dev/null +++ b/queue-6.1/ext4-move-setting-of-trimmed-bit-into-ext4_try_to_tr.patch @@ -0,0 +1,170 @@ +From 1957852d6e7de27d4a0ab3955ba02407dd546538 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 13 Sep 2023 17:04:54 +0200 +Subject: ext4: move setting of trimmed bit into ext4_try_to_trim_range() + +From: Jan Kara + +[ Upstream commit 45e4ab320c9b5fa67b1fc3b6a9b381cfcc0c8488 ] + +Currently we set the group's trimmed bit in ext4_trim_all_free() based +on return value of ext4_try_to_trim_range(). However when we will want +to abort trimming because of suspend attempt, we want to return success +from ext4_try_to_trim_range() but not set the trimmed bit. Instead +implementing awkward propagation of this information, just move setting +of trimmed bit into ext4_try_to_trim_range() when the whole group is +trimmed. + +Cc: stable@kernel.org +Signed-off-by: Jan Kara +Link: https://lore.kernel.org/r/20230913150504.9054-1-jack@suse.cz +Signed-off-by: Theodore Ts'o +Signed-off-by: Sasha Levin +--- + fs/ext4/mballoc.c | 46 +++++++++++++++++++++++++--------------------- + 1 file changed, 25 insertions(+), 21 deletions(-) + +diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c +index a2c5024943d82..41f0385f85d38 100644 +--- a/fs/ext4/mballoc.c ++++ b/fs/ext4/mballoc.c +@@ -6420,6 +6420,16 @@ __acquires(bitlock) + return ret; + } + ++static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb, ++ ext4_group_t grp) ++{ ++ if (grp < ext4_get_groups_count(sb)) ++ return EXT4_CLUSTERS_PER_GROUP(sb) - 1; ++ return (ext4_blocks_count(EXT4_SB(sb)->s_es) - ++ ext4_group_first_block_no(sb, grp) - 1) >> ++ EXT4_CLUSTER_BITS(sb); ++} ++ + static int ext4_try_to_trim_range(struct super_block *sb, + struct ext4_buddy *e4b, ext4_grpblk_t start, + ext4_grpblk_t max, ext4_grpblk_t minblocks) +@@ -6427,9 +6437,12 @@ __acquires(ext4_group_lock_ptr(sb, e4b->bd_group)) + __releases(ext4_group_lock_ptr(sb, e4b->bd_group)) + { + ext4_grpblk_t next, count, free_count; ++ bool set_trimmed = false; + void *bitmap; + + bitmap = e4b->bd_bitmap; ++ if (start == 0 && max >= ext4_last_grp_cluster(sb, e4b->bd_group)) ++ set_trimmed = true; + start = max(e4b->bd_info->bb_first_free, start); + count = 0; + free_count = 0; +@@ -6444,16 +6457,14 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group)) + int ret = ext4_trim_extent(sb, start, next - start, e4b); + + if (ret && ret != -EOPNOTSUPP) +- break; ++ return count; + count += next - start; + } + free_count += next - start; + start = next + 1; + +- if (fatal_signal_pending(current)) { +- count = -ERESTARTSYS; +- break; +- } ++ if (fatal_signal_pending(current)) ++ return -ERESTARTSYS; + + if (need_resched()) { + ext4_unlock_group(sb, e4b->bd_group); +@@ -6465,6 +6476,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group)) + break; + } + ++ if (set_trimmed) ++ EXT4_MB_GRP_SET_TRIMMED(e4b->bd_info); ++ + return count; + } + +@@ -6475,7 +6489,6 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group)) + * @start: first group block to examine + * @max: last group block to examine + * @minblocks: minimum extent block count +- * @set_trimmed: set the trimmed flag if at least one block is trimmed + * + * ext4_trim_all_free walks through group's block bitmap searching for free + * extents. When the free extent is found, mark it as used in group buddy +@@ -6485,7 +6498,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group)) + static ext4_grpblk_t + ext4_trim_all_free(struct super_block *sb, ext4_group_t group, + ext4_grpblk_t start, ext4_grpblk_t max, +- ext4_grpblk_t minblocks, bool set_trimmed) ++ ext4_grpblk_t minblocks) + { + struct ext4_buddy e4b; + int ret; +@@ -6502,13 +6515,10 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group, + ext4_lock_group(sb, group); + + if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) || +- minblocks < EXT4_SB(sb)->s_last_trim_minblks) { ++ minblocks < EXT4_SB(sb)->s_last_trim_minblks) + ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks); +- if (ret >= 0 && set_trimmed) +- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info); +- } else { ++ else + ret = 0; +- } + + ext4_unlock_group(sb, group); + ext4_mb_unload_buddy(&e4b); +@@ -6541,7 +6551,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range) + ext4_fsblk_t first_data_blk = + le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block); + ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es); +- bool whole_group, eof = false; + int ret = 0; + + start = range->start >> sb->s_blocksize_bits; +@@ -6560,10 +6569,8 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range) + if (minlen > EXT4_CLUSTERS_PER_GROUP(sb)) + goto out; + } +- if (end >= max_blks - 1) { ++ if (end >= max_blks - 1) + end = max_blks - 1; +- eof = true; +- } + if (end <= first_data_blk) + goto out; + if (start < first_data_blk) +@@ -6577,7 +6584,6 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range) + + /* end now represents the last cluster to discard in this group */ + end = EXT4_CLUSTERS_PER_GROUP(sb) - 1; +- whole_group = true; + + for (group = first_group; group <= last_group; group++) { + grp = ext4_get_group_info(sb, group); +@@ -6596,13 +6602,11 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range) + * change it for the last group, note that last_cluster is + * already computed earlier by ext4_get_group_no_and_offset() + */ +- if (group == last_group) { ++ if (group == last_group) + end = last_cluster; +- whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1; +- } + if (grp->bb_free >= minlen) { + cnt = ext4_trim_all_free(sb, group, first_cluster, +- end, minlen, whole_group); ++ end, minlen); + if (cnt < 0) { + ret = cnt; + break; +-- +2.40.1 + diff --git a/queue-6.1/ext4-replace-the-traditional-ternary-conditional-ope.patch b/queue-6.1/ext4-replace-the-traditional-ternary-conditional-ope.patch new file mode 100644 index 00000000000..b935003a65a --- /dev/null +++ b/queue-6.1/ext4-replace-the-traditional-ternary-conditional-ope.patch @@ -0,0 +1,49 @@ +From a2c3dfcdafb1da01227d5892804184b889ca5d89 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 1 Aug 2023 22:32:00 +0800 +Subject: ext4: replace the traditional ternary conditional operator with with + max()/min() + +From: Kemeng Shi + +[ Upstream commit de8bf0e5ee7482585450357c6d4eddec8efc5cb7 ] + +Replace the traditional ternary conditional operator with with max()/min() + +Signed-off-by: Kemeng Shi +Reviewed-by: Ritesh Harjani (IBM) +Link: https://lore.kernel.org/r/20230801143204.2284343-7-shikemeng@huaweicloud.com +Signed-off-by: Theodore Ts'o +Stable-dep-of: 45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()") +Signed-off-by: Sasha Levin +--- + fs/ext4/mballoc.c | 6 ++---- + 1 file changed, 2 insertions(+), 4 deletions(-) + +diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c +index 016925b1a0908..a2c5024943d82 100644 +--- a/fs/ext4/mballoc.c ++++ b/fs/ext4/mballoc.c +@@ -6430,8 +6430,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group)) + void *bitmap; + + bitmap = e4b->bd_bitmap; +- start = (e4b->bd_info->bb_first_free > start) ? +- e4b->bd_info->bb_first_free : start; ++ start = max(e4b->bd_info->bb_first_free, start); + count = 0; + free_count = 0; + +@@ -6648,8 +6647,7 @@ ext4_mballoc_query_range( + + ext4_lock_group(sb, group); + +- start = (e4b.bd_info->bb_first_free > start) ? +- e4b.bd_info->bb_first_free : start; ++ start = max(e4b.bd_info->bb_first_free, start); + if (end >= EXT4_CLUSTERS_PER_GROUP(sb)) + end = EXT4_CLUSTERS_PER_GROUP(sb) - 1; + +-- +2.40.1 + diff --git a/queue-6.1/media-v4l-use-correct-dependency-for-camera-sensor-d.patch b/queue-6.1/media-v4l-use-correct-dependency-for-camera-sensor-d.patch new file mode 100644 index 00000000000..dcb39b1765f --- /dev/null +++ b/queue-6.1/media-v4l-use-correct-dependency-for-camera-sensor-d.patch @@ -0,0 +1,81 @@ +From b9a9b359e1b6fa5d6d2f56fc48ad98a83e76f8a0 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 18 Aug 2023 12:51:49 +0300 +Subject: media: v4l: Use correct dependency for camera sensor drivers + +From: Sakari Ailus + +[ Upstream commit 86e16b87afac20779da1228d690a95c54d7e2ad0 ] + +The Kconfig option that enables compiling camera sensor drivers is +VIDEO_CAMERA_SENSOR rather than MEDIA_CAMERA_SUPPORT as it was previously. +Fix this. + +Also select VIDEO_OV7670 for marvell platform drivers only if +MEDIA_SUBDRV_AUTOSELECT and VIDEO_CAMERA_SENSOR are enabled. + +Reported-by: Randy Dunlap +Fixes: 7d3c7d2a2914 ("media: i2c: Add a camera sensor top level menu") +Signed-off-by: Sakari Ailus +Signed-off-by: Hans Verkuil +Signed-off-by: Sasha Levin +--- + drivers/media/platform/marvell/Kconfig | 4 ++-- + drivers/media/usb/em28xx/Kconfig | 4 ++-- + drivers/media/usb/go7007/Kconfig | 2 +- + 3 files changed, 5 insertions(+), 5 deletions(-) + +diff --git a/drivers/media/platform/marvell/Kconfig b/drivers/media/platform/marvell/Kconfig +index ec1a16734a280..d6499ffe30e8b 100644 +--- a/drivers/media/platform/marvell/Kconfig ++++ b/drivers/media/platform/marvell/Kconfig +@@ -7,7 +7,7 @@ config VIDEO_CAFE_CCIC + depends on V4L_PLATFORM_DRIVERS + depends on PCI && I2C && VIDEO_DEV + depends on COMMON_CLK +- select VIDEO_OV7670 ++ select VIDEO_OV7670 if MEDIA_SUBDRV_AUTOSELECT && VIDEO_CAMERA_SENSOR + select VIDEOBUF2_VMALLOC + select VIDEOBUF2_DMA_CONTIG + select VIDEOBUF2_DMA_SG +@@ -22,7 +22,7 @@ config VIDEO_MMP_CAMERA + depends on I2C && VIDEO_DEV + depends on ARCH_MMP || COMPILE_TEST + depends on COMMON_CLK +- select VIDEO_OV7670 ++ select VIDEO_OV7670 if MEDIA_SUBDRV_AUTOSELECT && VIDEO_CAMERA_SENSOR + select I2C_GPIO + select VIDEOBUF2_VMALLOC + select VIDEOBUF2_DMA_CONTIG +diff --git a/drivers/media/usb/em28xx/Kconfig b/drivers/media/usb/em28xx/Kconfig +index b3c472b8c5a96..cb61fd6cc6c61 100644 +--- a/drivers/media/usb/em28xx/Kconfig ++++ b/drivers/media/usb/em28xx/Kconfig +@@ -12,8 +12,8 @@ config VIDEO_EM28XX_V4L2 + select VIDEO_SAA711X if MEDIA_SUBDRV_AUTOSELECT + select VIDEO_TVP5150 if MEDIA_SUBDRV_AUTOSELECT + select VIDEO_MSP3400 if MEDIA_SUBDRV_AUTOSELECT +- select VIDEO_MT9V011 if MEDIA_SUBDRV_AUTOSELECT && MEDIA_CAMERA_SUPPORT +- select VIDEO_OV2640 if MEDIA_SUBDRV_AUTOSELECT && MEDIA_CAMERA_SUPPORT ++ select VIDEO_MT9V011 if MEDIA_SUBDRV_AUTOSELECT && VIDEO_CAMERA_SENSOR ++ select VIDEO_OV2640 if MEDIA_SUBDRV_AUTOSELECT && VIDEO_CAMERA_SENSOR + help + This is a video4linux driver for Empia 28xx based TV cards. + +diff --git a/drivers/media/usb/go7007/Kconfig b/drivers/media/usb/go7007/Kconfig +index 4ff79940ad8d4..b2a15d9fb1f33 100644 +--- a/drivers/media/usb/go7007/Kconfig ++++ b/drivers/media/usb/go7007/Kconfig +@@ -12,8 +12,8 @@ config VIDEO_GO7007 + select VIDEO_TW2804 if MEDIA_SUBDRV_AUTOSELECT + select VIDEO_TW9903 if MEDIA_SUBDRV_AUTOSELECT + select VIDEO_TW9906 if MEDIA_SUBDRV_AUTOSELECT +- select VIDEO_OV7640 if MEDIA_SUBDRV_AUTOSELECT && MEDIA_CAMERA_SUPPORT + select VIDEO_UDA1342 if MEDIA_SUBDRV_AUTOSELECT ++ select VIDEO_OV7640 if MEDIA_SUBDRV_AUTOSELECT && VIDEO_CAMERA_SENSOR + help + This is a video4linux driver for the WIS GO7007 MPEG + encoder chip. +-- +2.40.1 + diff --git a/queue-6.1/media-via-use-correct-dependency-for-camera-sensor-d.patch b/queue-6.1/media-via-use-correct-dependency-for-camera-sensor-d.patch new file mode 100644 index 00000000000..9f296c60d62 --- /dev/null +++ b/queue-6.1/media-via-use-correct-dependency-for-camera-sensor-d.patch @@ -0,0 +1,39 @@ +From 8a528876ceed0122802b03e03ad62f9307ac82cf Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 22 Aug 2023 11:10:34 +0300 +Subject: media: via: Use correct dependency for camera sensor drivers + +From: Sakari Ailus + +[ Upstream commit 41425941dfcf47cc6df8e500af6ff16a7be6539f ] + +The via camera controller driver selected ov7670 driver, however now that +driver has dependencies and may no longer be selected unconditionally. + +Reported-by: Randy Dunlap +Fixes: 7d3c7d2a2914 ("media: i2c: Add a camera sensor top level menu") +Signed-off-by: Sakari Ailus +Acked-by: Randy Dunlap +Tested-by: Randy Dunlap +Signed-off-by: Hans Verkuil +Signed-off-by: Sasha Levin +--- + drivers/media/platform/via/Kconfig | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/drivers/media/platform/via/Kconfig b/drivers/media/platform/via/Kconfig +index 8926eb0803b27..6e603c0382487 100644 +--- a/drivers/media/platform/via/Kconfig ++++ b/drivers/media/platform/via/Kconfig +@@ -7,7 +7,7 @@ config VIDEO_VIA_CAMERA + depends on V4L_PLATFORM_DRIVERS + depends on FB_VIA && VIDEO_DEV + select VIDEOBUF2_DMA_SG +- select VIDEO_OV7670 ++ select VIDEO_OV7670 if VIDEO_CAMERA_SENSOR + help + Driver support for the integrated camera controller in VIA + Chrome9 chipsets. Currently only tested on OLPC xo-1.5 systems +-- +2.40.1 + diff --git a/queue-6.1/netfs-only-call-folio_start_fscache-one-time-for-eac.patch b/queue-6.1/netfs-only-call-folio_start_fscache-one-time-for-eac.patch new file mode 100644 index 00000000000..acd07957096 --- /dev/null +++ b/queue-6.1/netfs-only-call-folio_start_fscache-one-time-for-eac.patch @@ -0,0 +1,95 @@ +From 9d1030c277fd8e434ebf2a95ba275850e72db4c8 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 18 Sep 2023 14:17:11 +0100 +Subject: netfs: Only call folio_start_fscache() one time for each folio + +From: Dave Wysochanski + +[ Upstream commit df1c357f25d808e30b216188330e708e09e1a412 ] + +If a network filesystem using netfs implements a clamp_length() +function, it can set subrequest lengths smaller than a page size. + +When we loop through the folios in netfs_rreq_unlock_folios() to +set any folios to be written back, we need to make sure we only +call folio_start_fscache() once for each folio. + +Otherwise, this simple testcase: + + mount -o fsc,rsize=1024,wsize=1024 127.0.0.1:/export /mnt/nfs + dd if=/dev/zero of=/mnt/nfs/file.bin bs=4096 count=1 + 1+0 records in + 1+0 records out + 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.0126359 s, 324 kB/s + echo 3 > /proc/sys/vm/drop_caches + cat /mnt/nfs/file.bin > /dev/null + +will trigger an oops similar to the following: + + page dumped because: VM_BUG_ON_FOLIO(folio_test_private_2(folio)) + ------------[ cut here ]------------ + kernel BUG at include/linux/netfs.h:44! + ... + CPU: 5 PID: 134 Comm: kworker/u16:5 Kdump: loaded Not tainted 6.4.0-rc5 + ... + RIP: 0010:netfs_rreq_unlock_folios+0x68e/0x730 [netfs] + ... + Call Trace: + netfs_rreq_assess+0x497/0x660 [netfs] + netfs_subreq_terminated+0x32b/0x610 [netfs] + nfs_netfs_read_completion+0x14e/0x1a0 [nfs] + nfs_read_completion+0x2f9/0x330 [nfs] + rpc_free_task+0x72/0xa0 [sunrpc] + rpc_async_release+0x46/0x70 [sunrpc] + process_one_work+0x3bd/0x710 + worker_thread+0x89/0x610 + kthread+0x181/0x1c0 + ret_from_fork+0x29/0x50 + +Fixes: 3d3c95046742 ("netfs: Provide readahead and readpage netfs helpers" +Link: https://bugzilla.redhat.com/show_bug.cgi?id=2210612 +Signed-off-by: Dave Wysochanski +Reviewed-by: Jeff Layton +Signed-off-by: David Howells +Link: https://lore.kernel.org/r/20230608214137.856006-1-dwysocha@redhat.com/ # v1 +Link: https://lore.kernel.org/r/20230915185704.1082982-1-dwysocha@redhat.com/ # v2 +Signed-off-by: Linus Torvalds +Signed-off-by: Sasha Levin +--- + fs/netfs/buffered_read.c | 6 +++++- + 1 file changed, 5 insertions(+), 1 deletion(-) + +diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c +index 7679a68e81930..caa0a053e8a9d 100644 +--- a/fs/netfs/buffered_read.c ++++ b/fs/netfs/buffered_read.c +@@ -47,12 +47,14 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq) + xas_for_each(&xas, folio, last_page) { + loff_t pg_end; + bool pg_failed = false; ++ bool folio_started; + + if (xas_retry(&xas, folio)) + continue; + + pg_end = folio_pos(folio) + folio_size(folio) - 1; + ++ folio_started = false; + for (;;) { + loff_t sreq_end; + +@@ -60,8 +62,10 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq) + pg_failed = true; + break; + } +- if (test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) ++ if (!folio_started && test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) { + folio_start_fscache(folio); ++ folio_started = true; ++ } + pg_failed |= subreq_failed; + sreq_end = subreq->start + subreq->len - 1; + if (pg_end < sreq_end) +-- +2.40.1 + diff --git a/queue-6.1/nfs-fix-error-handling-for-o_direct-write-scheduling.patch b/queue-6.1/nfs-fix-error-handling-for-o_direct-write-scheduling.patch new file mode 100644 index 00000000000..3831bca8993 --- /dev/null +++ b/queue-6.1/nfs-fix-error-handling-for-o_direct-write-scheduling.patch @@ -0,0 +1,147 @@ +From 7cba41bccbaebca1e1af90163963f93451dfe5bb Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 4 Sep 2023 12:34:37 -0400 +Subject: NFS: Fix error handling for O_DIRECT write scheduling + +From: Trond Myklebust + +[ Upstream commit 954998b60caa8f2a3bf3abe490de6f08d283687a ] + +If we fail to schedule a request for transmission, there are 2 +possibilities: +1) Either we hit a fatal error, and we just want to drop the remaining + requests on the floor. +2) We were asked to try again, in which case we should allow the + outstanding RPC calls to complete, so that we can recoalesce requests + and try again. + +Fixes: d600ad1f2bdb ("NFS41: pop some layoutget errors to application") +Signed-off-by: Trond Myklebust +Signed-off-by: Anna Schumaker +Signed-off-by: Sasha Levin +--- + fs/nfs/direct.c | 62 ++++++++++++++++++++++++++++++++++++------------- + 1 file changed, 46 insertions(+), 16 deletions(-) + +diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c +index 3bb530d4bb5ce..d71762f32b6c4 100644 +--- a/fs/nfs/direct.c ++++ b/fs/nfs/direct.c +@@ -530,10 +530,9 @@ nfs_direct_write_scan_commit_list(struct inode *inode, + static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) + { + struct nfs_pageio_descriptor desc; +- struct nfs_page *req, *tmp; ++ struct nfs_page *req; + LIST_HEAD(reqs); + struct nfs_commit_info cinfo; +- LIST_HEAD(failed); + + nfs_init_cinfo_from_dreq(&cinfo, dreq); + nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo); +@@ -551,27 +550,36 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) + &nfs_direct_write_completion_ops); + desc.pg_dreq = dreq; + +- list_for_each_entry_safe(req, tmp, &reqs, wb_list) { ++ while (!list_empty(&reqs)) { ++ req = nfs_list_entry(reqs.next); + /* Bump the transmission count */ + req->wb_nio++; + if (!nfs_pageio_add_request(&desc, req)) { +- nfs_list_move_request(req, &failed); + spin_lock(&cinfo.inode->i_lock); +- dreq->flags = 0; +- if (desc.pg_error < 0) ++ if (dreq->error < 0) { ++ desc.pg_error = dreq->error; ++ } else if (desc.pg_error != -EAGAIN) { ++ dreq->flags = 0; ++ if (!desc.pg_error) ++ desc.pg_error = -EIO; + dreq->error = desc.pg_error; +- else +- dreq->error = -EIO; ++ } else ++ dreq->flags = NFS_ODIRECT_RESCHED_WRITES; + spin_unlock(&cinfo.inode->i_lock); ++ break; + } + nfs_release_request(req); + } + nfs_pageio_complete(&desc); + +- while (!list_empty(&failed)) { +- req = nfs_list_entry(failed.next); ++ while (!list_empty(&reqs)) { ++ req = nfs_list_entry(reqs.next); + nfs_list_remove_request(req); + nfs_unlock_and_release_request(req); ++ if (desc.pg_error == -EAGAIN) ++ nfs_mark_request_commit(req, NULL, &cinfo, 0); ++ else ++ nfs_release_request(req); + } + + if (put_dreq(dreq)) +@@ -796,9 +804,11 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, + { + struct nfs_pageio_descriptor desc; + struct inode *inode = dreq->inode; ++ struct nfs_commit_info cinfo; + ssize_t result = 0; + size_t requested_bytes = 0; + size_t wsize = max_t(size_t, NFS_SERVER(inode)->wsize, PAGE_SIZE); ++ bool defer = false; + + trace_nfs_direct_write_schedule_iovec(dreq); + +@@ -839,19 +849,39 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, + break; + } + ++ pgbase = 0; ++ bytes -= req_len; ++ requested_bytes += req_len; ++ pos += req_len; ++ dreq->bytes_left -= req_len; ++ ++ if (defer) { ++ nfs_mark_request_commit(req, NULL, &cinfo, 0); ++ continue; ++ } ++ + nfs_lock_request(req); + req->wb_index = pos >> PAGE_SHIFT; + req->wb_offset = pos & ~PAGE_MASK; +- if (!nfs_pageio_add_request(&desc, req)) { ++ if (nfs_pageio_add_request(&desc, req)) ++ continue; ++ ++ /* Exit on hard errors */ ++ if (desc.pg_error < 0 && desc.pg_error != -EAGAIN) { + result = desc.pg_error; + nfs_unlock_and_release_request(req); + break; + } +- pgbase = 0; +- bytes -= req_len; +- requested_bytes += req_len; +- pos += req_len; +- dreq->bytes_left -= req_len; ++ ++ /* If the error is soft, defer remaining requests */ ++ nfs_init_cinfo_from_dreq(&cinfo, dreq); ++ spin_lock(&cinfo.inode->i_lock); ++ dreq->flags = NFS_ODIRECT_RESCHED_WRITES; ++ spin_unlock(&cinfo.inode->i_lock); ++ nfs_unlock_request(req); ++ nfs_mark_request_commit(req, NULL, &cinfo, 0); ++ desc.pg_error = 0; ++ defer = true; + } + nfs_direct_release_pages(pagevec, npages); + kvfree(pagevec); +-- +2.40.1 + diff --git a/queue-6.1/nfs-fix-o_direct-locking-issues.patch b/queue-6.1/nfs-fix-o_direct-locking-issues.patch new file mode 100644 index 00000000000..238563341ba --- /dev/null +++ b/queue-6.1/nfs-fix-o_direct-locking-issues.patch @@ -0,0 +1,56 @@ +From 6a5cfb43584ce3ad304dd70c913da210d652dc02 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 4 Sep 2023 12:34:38 -0400 +Subject: NFS: Fix O_DIRECT locking issues + +From: Trond Myklebust + +[ Upstream commit 7c6339322ce0c6128acbe36aacc1eeb986dd7bf1 ] + +The dreq fields are protected by the dreq->lock. + +Fixes: 954998b60caa ("NFS: Fix error handling for O_DIRECT write scheduling") +Signed-off-by: Trond Myklebust +Signed-off-by: Anna Schumaker +Signed-off-by: Sasha Levin +--- + fs/nfs/direct.c | 8 ++++---- + 1 file changed, 4 insertions(+), 4 deletions(-) + +diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c +index d71762f32b6c4..449d248fc1ec7 100644 +--- a/fs/nfs/direct.c ++++ b/fs/nfs/direct.c +@@ -555,7 +555,7 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) + /* Bump the transmission count */ + req->wb_nio++; + if (!nfs_pageio_add_request(&desc, req)) { +- spin_lock(&cinfo.inode->i_lock); ++ spin_lock(&dreq->lock); + if (dreq->error < 0) { + desc.pg_error = dreq->error; + } else if (desc.pg_error != -EAGAIN) { +@@ -565,7 +565,7 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) + dreq->error = desc.pg_error; + } else + dreq->flags = NFS_ODIRECT_RESCHED_WRITES; +- spin_unlock(&cinfo.inode->i_lock); ++ spin_unlock(&dreq->lock); + break; + } + nfs_release_request(req); +@@ -875,9 +875,9 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, + + /* If the error is soft, defer remaining requests */ + nfs_init_cinfo_from_dreq(&cinfo, dreq); +- spin_lock(&cinfo.inode->i_lock); ++ spin_lock(&dreq->lock); + dreq->flags = NFS_ODIRECT_RESCHED_WRITES; +- spin_unlock(&cinfo.inode->i_lock); ++ spin_unlock(&dreq->lock); + nfs_unlock_request(req); + nfs_mark_request_commit(req, NULL, &cinfo, 0); + desc.pg_error = 0; +-- +2.40.1 + diff --git a/queue-6.1/nfs-more-fixes-for-nfs_direct_write_reschedule_io.patch b/queue-6.1/nfs-more-fixes-for-nfs_direct_write_reschedule_io.patch new file mode 100644 index 00000000000..59b83d7ff60 --- /dev/null +++ b/queue-6.1/nfs-more-fixes-for-nfs_direct_write_reschedule_io.patch @@ -0,0 +1,57 @@ +From 0c4a3f148b0b55c367a53d4db17cd0deaf232cdd Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 4 Sep 2023 12:34:41 -0400 +Subject: NFS: More fixes for nfs_direct_write_reschedule_io() + +From: Trond Myklebust + +[ Upstream commit b11243f720ee5f9376861099019c8542969b6318 ] + +Ensure that all requests are put back onto the commit list so that they +can be rescheduled. + +Fixes: 4daaeba93822 ("NFS: Fix nfs_direct_write_reschedule_io()") +Signed-off-by: Trond Myklebust +Signed-off-by: Anna Schumaker +Signed-off-by: Sasha Levin +--- + fs/nfs/direct.c | 17 +++++++++++------ + 1 file changed, 11 insertions(+), 6 deletions(-) + +diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c +index 04ebe96336304..5a976fa343df1 100644 +--- a/fs/nfs/direct.c ++++ b/fs/nfs/direct.c +@@ -782,18 +782,23 @@ static void nfs_write_sync_pgio_error(struct list_head *head, int error) + static void nfs_direct_write_reschedule_io(struct nfs_pgio_header *hdr) + { + struct nfs_direct_req *dreq = hdr->dreq; ++ struct nfs_page *req; ++ struct nfs_commit_info cinfo; + + trace_nfs_direct_write_reschedule_io(dreq); + ++ nfs_init_cinfo_from_dreq(&cinfo, dreq); + spin_lock(&dreq->lock); +- if (dreq->error == 0) { ++ if (dreq->error == 0) + dreq->flags = NFS_ODIRECT_RESCHED_WRITES; +- /* fake unstable write to let common nfs resend pages */ +- hdr->verf.committed = NFS_UNSTABLE; +- hdr->good_bytes = hdr->args.offset + hdr->args.count - +- hdr->io_start; +- } ++ set_bit(NFS_IOHDR_REDO, &hdr->flags); + spin_unlock(&dreq->lock); ++ while (!list_empty(&hdr->pages)) { ++ req = nfs_list_entry(hdr->pages.next); ++ nfs_list_remove_request(req); ++ nfs_unlock_request(req); ++ nfs_mark_request_commit(req, NULL, &cinfo, 0); ++ } + } + + static const struct nfs_pgio_completion_ops nfs_direct_write_completion_ops = { +-- +2.40.1 + diff --git a/queue-6.1/nfs-more-o_direct-accounting-fixes-for-error-paths.patch b/queue-6.1/nfs-more-o_direct-accounting-fixes-for-error-paths.patch new file mode 100644 index 00000000000..5bd2dd0352e --- /dev/null +++ b/queue-6.1/nfs-more-o_direct-accounting-fixes-for-error-paths.patch @@ -0,0 +1,140 @@ +From aee2c7781be4dc1bee685f9eb307627a77461c8b Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 4 Sep 2023 12:34:39 -0400 +Subject: NFS: More O_DIRECT accounting fixes for error paths + +From: Trond Myklebust + +[ Upstream commit 8982f7aff39fb526aba4441fff2525fcedd5e1a3 ] + +If we hit a fatal error when retransmitting, we do need to record the +removal of the request from the count of written bytes. + +Fixes: 031d73ed768a ("NFS: Fix O_DIRECT accounting of number of bytes read/written") +Signed-off-by: Trond Myklebust +Signed-off-by: Anna Schumaker +Signed-off-by: Sasha Levin +--- + fs/nfs/direct.c | 47 +++++++++++++++++++++++++++++++---------------- + 1 file changed, 31 insertions(+), 16 deletions(-) + +diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c +index 449d248fc1ec7..d879c3229efdb 100644 +--- a/fs/nfs/direct.c ++++ b/fs/nfs/direct.c +@@ -93,12 +93,10 @@ nfs_direct_handle_truncated(struct nfs_direct_req *dreq, + dreq->max_count = dreq_len; + if (dreq->count > dreq_len) + dreq->count = dreq_len; +- +- if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) +- dreq->error = hdr->error; +- else /* Clear outstanding error if this is EOF */ +- dreq->error = 0; + } ++ ++ if (test_bit(NFS_IOHDR_ERROR, &hdr->flags) && !dreq->error) ++ dreq->error = hdr->error; + } + + static void +@@ -120,6 +118,18 @@ nfs_direct_count_bytes(struct nfs_direct_req *dreq, + dreq->count = dreq_len; + } + ++static void nfs_direct_truncate_request(struct nfs_direct_req *dreq, ++ struct nfs_page *req) ++{ ++ loff_t offs = req_offset(req); ++ size_t req_start = (size_t)(offs - dreq->io_start); ++ ++ if (req_start < dreq->max_count) ++ dreq->max_count = req_start; ++ if (req_start < dreq->count) ++ dreq->count = req_start; ++} ++ + /** + * nfs_swap_rw - NFS address space operation for swap I/O + * @iocb: target I/O control block +@@ -539,10 +549,6 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) + + nfs_direct_join_group(&reqs, dreq->inode); + +- dreq->count = 0; +- dreq->max_count = 0; +- list_for_each_entry(req, &reqs, wb_list) +- dreq->max_count += req->wb_bytes; + nfs_clear_pnfs_ds_commit_verifiers(&dreq->ds_cinfo); + get_dreq(dreq); + +@@ -576,10 +582,14 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) + req = nfs_list_entry(reqs.next); + nfs_list_remove_request(req); + nfs_unlock_and_release_request(req); +- if (desc.pg_error == -EAGAIN) ++ if (desc.pg_error == -EAGAIN) { + nfs_mark_request_commit(req, NULL, &cinfo, 0); +- else ++ } else { ++ spin_lock(&dreq->lock); ++ nfs_direct_truncate_request(dreq, req); ++ spin_unlock(&dreq->lock); + nfs_release_request(req); ++ } + } + + if (put_dreq(dreq)) +@@ -599,8 +609,6 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data) + if (status < 0) { + /* Errors in commit are fatal */ + dreq->error = status; +- dreq->max_count = 0; +- dreq->count = 0; + dreq->flags = NFS_ODIRECT_DONE; + } else { + status = dreq->error; +@@ -611,7 +619,12 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data) + while (!list_empty(&data->pages)) { + req = nfs_list_entry(data->pages.next); + nfs_list_remove_request(req); +- if (status >= 0 && !nfs_write_match_verf(verf, req)) { ++ if (status < 0) { ++ spin_lock(&dreq->lock); ++ nfs_direct_truncate_request(dreq, req); ++ spin_unlock(&dreq->lock); ++ nfs_release_request(req); ++ } else if (!nfs_write_match_verf(verf, req)) { + dreq->flags = NFS_ODIRECT_RESCHED_WRITES; + /* + * Despite the reboot, the write was successful, +@@ -619,7 +632,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data) + */ + req->wb_nio = 0; + nfs_mark_request_commit(req, NULL, &cinfo, 0); +- } else /* Error or match */ ++ } else + nfs_release_request(req); + nfs_unlock_and_release_request(req); + } +@@ -672,6 +685,7 @@ static void nfs_direct_write_clear_reqs(struct nfs_direct_req *dreq) + while (!list_empty(&reqs)) { + req = nfs_list_entry(reqs.next); + nfs_list_remove_request(req); ++ nfs_direct_truncate_request(dreq, req); + nfs_release_request(req); + nfs_unlock_and_release_request(req); + } +@@ -721,7 +735,8 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr) + } + + nfs_direct_count_bytes(dreq, hdr); +- if (test_bit(NFS_IOHDR_UNSTABLE_WRITES, &hdr->flags)) { ++ if (test_bit(NFS_IOHDR_UNSTABLE_WRITES, &hdr->flags) && ++ !test_bit(NFS_IOHDR_ERROR, &hdr->flags)) { + if (!dreq->flags) + dreq->flags = NFS_ODIRECT_DO_COMMIT; + flags = dreq->flags; +-- +2.40.1 + diff --git a/queue-6.1/nfs-pnfs-report-einval-errors-from-connect-to-the-se.patch b/queue-6.1/nfs-pnfs-report-einval-errors-from-connect-to-the-se.patch new file mode 100644 index 00000000000..538f92ba68c --- /dev/null +++ b/queue-6.1/nfs-pnfs-report-einval-errors-from-connect-to-the-se.patch @@ -0,0 +1,36 @@ +From a86dc978ad7954b945f071a3e4f706f23678f20d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 4 Sep 2023 12:43:58 -0400 +Subject: NFS/pNFS: Report EINVAL errors from connect() to the server + +From: Trond Myklebust + +[ Upstream commit dd7d7ee3ba2a70d12d02defb478790cf57d5b87b ] + +With IPv6, connect() can occasionally return EINVAL if a route is +unavailable. If this happens during I/O to a data server, we want to +report it using LAYOUTERROR as an inability to connect. + +Fixes: dd52128afdde ("NFSv4.1/pnfs Ensure flexfiles reports all connection related errors") +Signed-off-by: Trond Myklebust +Signed-off-by: Anna Schumaker +Signed-off-by: Sasha Levin +--- + fs/nfs/flexfilelayout/flexfilelayout.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c +index 1ec79ccf89ad2..5c69a6e9ab3e1 100644 +--- a/fs/nfs/flexfilelayout/flexfilelayout.c ++++ b/fs/nfs/flexfilelayout/flexfilelayout.c +@@ -1235,6 +1235,7 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg, + case -EPFNOSUPPORT: + case -EPROTONOSUPPORT: + case -EOPNOTSUPP: ++ case -EINVAL: + case -ECONNREFUSED: + case -ECONNRESET: + case -EHOSTDOWN: +-- +2.40.1 + diff --git a/queue-6.1/nfs-use-the-correct-commit-info-in-nfs_join_page_gro.patch b/queue-6.1/nfs-use-the-correct-commit-info-in-nfs_join_page_gro.patch new file mode 100644 index 00000000000..1f2ed4b6864 --- /dev/null +++ b/queue-6.1/nfs-use-the-correct-commit-info-in-nfs_join_page_gro.patch @@ -0,0 +1,150 @@ +From c70c60272a0f2b0c34e5ab2b6a607a3acea168f9 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 4 Sep 2023 12:34:40 -0400 +Subject: NFS: Use the correct commit info in nfs_join_page_group() + +From: Trond Myklebust + +[ Upstream commit b193a78ddb5ee7dba074d3f28dc050069ba083c0 ] + +Ensure that nfs_clear_request_commit() updates the correct counters when +it removes them from the commit list. + +Fixes: ed5d588fe47f ("NFS: Try to join page groups before an O_DIRECT retransmission") +Signed-off-by: Trond Myklebust +Signed-off-by: Anna Schumaker +Signed-off-by: Sasha Levin +--- + fs/nfs/direct.c | 8 +++++--- + fs/nfs/write.c | 23 ++++++++++++----------- + include/linux/nfs_page.h | 4 +++- + 3 files changed, 20 insertions(+), 15 deletions(-) + +diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c +index d879c3229efdb..04ebe96336304 100644 +--- a/fs/nfs/direct.c ++++ b/fs/nfs/direct.c +@@ -500,7 +500,9 @@ static void nfs_direct_add_page_head(struct list_head *list, + kref_get(&head->wb_kref); + } + +-static void nfs_direct_join_group(struct list_head *list, struct inode *inode) ++static void nfs_direct_join_group(struct list_head *list, ++ struct nfs_commit_info *cinfo, ++ struct inode *inode) + { + struct nfs_page *req, *subreq; + +@@ -522,7 +524,7 @@ static void nfs_direct_join_group(struct list_head *list, struct inode *inode) + nfs_release_request(subreq); + } + } while ((subreq = subreq->wb_this_page) != req); +- nfs_join_page_group(req, inode); ++ nfs_join_page_group(req, cinfo, inode); + } + } + +@@ -547,7 +549,7 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq) + nfs_init_cinfo_from_dreq(&cinfo, dreq); + nfs_direct_write_scan_commit_list(dreq->inode, &reqs, &cinfo); + +- nfs_direct_join_group(&reqs, dreq->inode); ++ nfs_direct_join_group(&reqs, &cinfo, dreq->inode); + + nfs_clear_pnfs_ds_commit_verifiers(&dreq->ds_cinfo); + get_dreq(dreq); +diff --git a/fs/nfs/write.c b/fs/nfs/write.c +index f41d24b54fd1f..0a8aed0ac9945 100644 +--- a/fs/nfs/write.c ++++ b/fs/nfs/write.c +@@ -58,7 +58,8 @@ static const struct nfs_pgio_completion_ops nfs_async_write_completion_ops; + static const struct nfs_commit_completion_ops nfs_commit_completion_ops; + static const struct nfs_rw_ops nfs_rw_write_ops; + static void nfs_inode_remove_request(struct nfs_page *req); +-static void nfs_clear_request_commit(struct nfs_page *req); ++static void nfs_clear_request_commit(struct nfs_commit_info *cinfo, ++ struct nfs_page *req); + static void nfs_init_cinfo_from_inode(struct nfs_commit_info *cinfo, + struct inode *inode); + static struct nfs_page * +@@ -502,8 +503,8 @@ nfs_destroy_unlinked_subrequests(struct nfs_page *destroy_list, + * the (former) group. All subrequests are removed from any write or commit + * lists, unlinked from the group and destroyed. + */ +-void +-nfs_join_page_group(struct nfs_page *head, struct inode *inode) ++void nfs_join_page_group(struct nfs_page *head, struct nfs_commit_info *cinfo, ++ struct inode *inode) + { + struct nfs_page *subreq; + struct nfs_page *destroy_list = NULL; +@@ -533,7 +534,7 @@ nfs_join_page_group(struct nfs_page *head, struct inode *inode) + * Commit list removal accounting is done after locks are dropped */ + subreq = head; + do { +- nfs_clear_request_commit(subreq); ++ nfs_clear_request_commit(cinfo, subreq); + subreq = subreq->wb_this_page; + } while (subreq != head); + +@@ -567,8 +568,10 @@ nfs_lock_and_join_requests(struct page *page) + { + struct inode *inode = page_file_mapping(page)->host; + struct nfs_page *head; ++ struct nfs_commit_info cinfo; + int ret; + ++ nfs_init_cinfo_from_inode(&cinfo, inode); + /* + * A reference is taken only on the head request which acts as a + * reference to the whole page group - the group will not be destroyed +@@ -585,7 +588,7 @@ nfs_lock_and_join_requests(struct page *page) + return ERR_PTR(ret); + } + +- nfs_join_page_group(head, inode); ++ nfs_join_page_group(head, &cinfo, inode); + + return head; + } +@@ -956,18 +959,16 @@ nfs_clear_page_commit(struct page *page) + } + + /* Called holding the request lock on @req */ +-static void +-nfs_clear_request_commit(struct nfs_page *req) ++static void nfs_clear_request_commit(struct nfs_commit_info *cinfo, ++ struct nfs_page *req) + { + if (test_bit(PG_CLEAN, &req->wb_flags)) { + struct nfs_open_context *ctx = nfs_req_openctx(req); + struct inode *inode = d_inode(ctx->dentry); +- struct nfs_commit_info cinfo; + +- nfs_init_cinfo_from_inode(&cinfo, inode); + mutex_lock(&NFS_I(inode)->commit_mutex); +- if (!pnfs_clear_request_commit(req, &cinfo)) { +- nfs_request_remove_commit_list(req, &cinfo); ++ if (!pnfs_clear_request_commit(req, cinfo)) { ++ nfs_request_remove_commit_list(req, cinfo); + } + mutex_unlock(&NFS_I(inode)->commit_mutex); + nfs_clear_page_commit(req->wb_page); +diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h +index ba7e2e4b09264..e39a8cf8b1797 100644 +--- a/include/linux/nfs_page.h ++++ b/include/linux/nfs_page.h +@@ -145,7 +145,9 @@ extern void nfs_unlock_request(struct nfs_page *req); + extern void nfs_unlock_and_release_request(struct nfs_page *); + extern struct nfs_page *nfs_page_group_lock_head(struct nfs_page *req); + extern int nfs_page_group_lock_subrequests(struct nfs_page *head); +-extern void nfs_join_page_group(struct nfs_page *head, struct inode *inode); ++extern void nfs_join_page_group(struct nfs_page *head, ++ struct nfs_commit_info *cinfo, ++ struct inode *inode); + extern int nfs_page_group_lock(struct nfs_page *); + extern void nfs_page_group_unlock(struct nfs_page *); + extern bool nfs_page_group_sync_on_bit(struct nfs_page *, unsigned int); +-- +2.40.1 + diff --git a/queue-6.1/nfsv4.1-fix-pnfs-mds-ds-session-trunking.patch b/queue-6.1/nfsv4.1-fix-pnfs-mds-ds-session-trunking.patch new file mode 100644 index 00000000000..2aa63ba2e2a --- /dev/null +++ b/queue-6.1/nfsv4.1-fix-pnfs-mds-ds-session-trunking.patch @@ -0,0 +1,134 @@ +From 2867234cbbfb40315c12bde4d7b744fd3754ad2f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 30 Aug 2023 15:29:34 -0400 +Subject: NFSv4.1: fix pnfs MDS=DS session trunking + +From: Olga Kornievskaia + +[ Upstream commit 806a3bc421a115fbb287c1efce63a48c54ee804b ] + +Currently, when GETDEVICEINFO returns multiple locations where each +is a different IP but the server's identity is same as MDS, then +nfs4_set_ds_client() finds the existing nfs_client structure which +has the MDS's max_connect value (and if it's 1), then the 1st IP +on the DS's list will get dropped due to MDS trunking rules. Other +IPs would be added as they fall under the pnfs trunking rules. + +For the list of IPs the 1st goes thru calling nfs4_set_ds_client() +which will eventually call nfs4_add_trunk() and call into +rpc_clnt_test_and_add_xprt() which has the check for MDS trunking. +The other IPs (after the 1st one), would call rpc_clnt_add_xprt() +which doesn't go thru that check. + +nfs4_add_trunk() is called when MDS trunking is happening and it +needs to enforce the usage of max_connect mount option of the +1st mount. However, this shouldn't be applied to pnfs flow. + +Instead, this patch proposed to treat MDS=DS as DS trunking and +make sure that MDS's max_connect limit does not apply to the +1st IP returned in the GETDEVICEINFO list. It does so by +marking the newly created client with a new flag NFS_CS_PNFS +which then used to pass max_connect value to use into the +rpc_clnt_test_and_add_xprt() instead of the existing rpc +client's max_connect value set by the MDS connection. + +For example, mount was done without max_connect value set +so MDS's rpc client has cl_max_connect=1. Upon calling into +rpc_clnt_test_and_add_xprt() and using rpc client's value, +the caller passes in max_connect value which is previously +been set in the pnfs path (as a part of handling +GETDEVICEINFO list of IPs) in nfs4_set_ds_client(). + +However, when NFS_CS_PNFS flag is not set and we know we +are doing MDS trunking, comparing a new IP of the same +server, we then set the max_connect value to the +existing MDS's value and pass that into +rpc_clnt_test_and_add_xprt(). + +Fixes: dc48e0abee24 ("SUNRPC enforce creation of no more than max_connect xprts") +Signed-off-by: Olga Kornievskaia +Signed-off-by: Anna Schumaker +Signed-off-by: Sasha Levin +--- + fs/nfs/nfs4client.c | 6 +++++- + include/linux/nfs_fs_sb.h | 1 + + net/sunrpc/clnt.c | 11 +++++++---- + 3 files changed, 13 insertions(+), 5 deletions(-) + +diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c +index d3e2b0867dc11..84b345efcec00 100644 +--- a/fs/nfs/nfs4client.c ++++ b/fs/nfs/nfs4client.c +@@ -416,6 +416,8 @@ static void nfs4_add_trunk(struct nfs_client *clp, struct nfs_client *old) + .net = old->cl_net, + .servername = old->cl_hostname, + }; ++ int max_connect = test_bit(NFS_CS_PNFS, &clp->cl_flags) ? ++ clp->cl_max_connect : old->cl_max_connect; + + if (clp->cl_proto != old->cl_proto) + return; +@@ -429,7 +431,7 @@ static void nfs4_add_trunk(struct nfs_client *clp, struct nfs_client *old) + xprt_args.addrlen = clp_salen; + + rpc_clnt_add_xprt(old->cl_rpcclient, &xprt_args, +- rpc_clnt_test_and_add_xprt, NULL); ++ rpc_clnt_test_and_add_xprt, &max_connect); + } + + /** +@@ -996,6 +998,8 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_server *mds_srv, + __set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags); + + __set_bit(NFS_CS_DS, &cl_init.init_flags); ++ __set_bit(NFS_CS_PNFS, &cl_init.init_flags); ++ cl_init.max_connect = NFS_MAX_TRANSPORTS; + /* + * Set an authflavor equual to the MDS value. Use the MDS nfs_client + * cl_ipaddr so as to use the same EXCHANGE_ID co_ownerid as the MDS +diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h +index ea2f7e6b1b0b5..ef8ba5fbc6503 100644 +--- a/include/linux/nfs_fs_sb.h ++++ b/include/linux/nfs_fs_sb.h +@@ -48,6 +48,7 @@ struct nfs_client { + #define NFS_CS_NOPING 6 /* - don't ping on connect */ + #define NFS_CS_DS 7 /* - Server is a DS */ + #define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */ ++#define NFS_CS_PNFS 9 /* - Server used for pnfs */ + struct sockaddr_storage cl_addr; /* server identifier */ + size_t cl_addrlen; + char * cl_hostname; /* hostname of server */ +diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c +index ff6728e41e044..b3f6f67ed2521 100644 +--- a/net/sunrpc/clnt.c ++++ b/net/sunrpc/clnt.c +@@ -2890,19 +2890,22 @@ static const struct rpc_call_ops rpc_cb_add_xprt_call_ops = { + * @clnt: pointer to struct rpc_clnt + * @xps: pointer to struct rpc_xprt_switch, + * @xprt: pointer struct rpc_xprt +- * @dummy: unused ++ * @in_max_connect: pointer to the max_connect value for the passed in xprt transport + */ + int rpc_clnt_test_and_add_xprt(struct rpc_clnt *clnt, + struct rpc_xprt_switch *xps, struct rpc_xprt *xprt, +- void *dummy) ++ void *in_max_connect) + { + struct rpc_cb_add_xprt_calldata *data; + struct rpc_task *task; ++ int max_connect = clnt->cl_max_connect; + +- if (xps->xps_nunique_destaddr_xprts + 1 > clnt->cl_max_connect) { ++ if (in_max_connect) ++ max_connect = *(int *)in_max_connect; ++ if (xps->xps_nunique_destaddr_xprts + 1 > max_connect) { + rcu_read_lock(); + pr_warn("SUNRPC: reached max allowed number (%d) did not add " +- "transport to server: %s\n", clnt->cl_max_connect, ++ "transport to server: %s\n", max_connect, + rpc_peeraddr2str(clnt, RPC_DISPLAY_ADDR)); + rcu_read_unlock(); + return -EINVAL; +-- +2.40.1 + diff --git a/queue-6.1/nfsv4.1-use-exchgid4_flag_use_pnfs_ds-for-ds-server.patch b/queue-6.1/nfsv4.1-use-exchgid4_flag_use_pnfs_ds-for-ds-server.patch new file mode 100644 index 00000000000..ba7f48df340 --- /dev/null +++ b/queue-6.1/nfsv4.1-use-exchgid4_flag_use_pnfs_ds-for-ds-server.patch @@ -0,0 +1,68 @@ +From 6e9208524ea3b70adfbf59cef2e0c04aeeb2c664 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 13 Jul 2023 13:02:38 -0400 +Subject: NFSv4.1: use EXCHGID4_FLAG_USE_PNFS_DS for DS server + +From: Olga Kornievskaia + +[ Upstream commit 51d674a5e4889f1c8e223ac131cf218e1631e423 ] + +After receiving the location(s) of the DS server(s) in the +GETDEVINCEINFO, create the request for the clientid to such +server and indicate that the client is connecting to a DS. + +Signed-off-by: Olga Kornievskaia +Signed-off-by: Anna Schumaker +Stable-dep-of: 806a3bc421a1 ("NFSv4.1: fix pnfs MDS=DS session trunking") +Signed-off-by: Sasha Levin +--- + fs/nfs/nfs4client.c | 3 +++ + fs/nfs/nfs4proc.c | 4 ++++ + 2 files changed, 7 insertions(+) + +diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c +index d3051b051a564..d3e2b0867dc11 100644 +--- a/fs/nfs/nfs4client.c ++++ b/fs/nfs/nfs4client.c +@@ -231,6 +231,8 @@ struct nfs_client *nfs4_alloc_client(const struct nfs_client_initdata *cl_init) + __set_bit(NFS_CS_DISCRTRY, &clp->cl_flags); + __set_bit(NFS_CS_NO_RETRANS_TIMEOUT, &clp->cl_flags); + ++ if (test_bit(NFS_CS_DS, &cl_init->init_flags)) ++ __set_bit(NFS_CS_DS, &clp->cl_flags); + /* + * Set up the connection to the server before we add add to the + * global list. +@@ -993,6 +995,7 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_server *mds_srv, + if (mds_srv->flags & NFS_MOUNT_NORESVPORT) + __set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags); + ++ __set_bit(NFS_CS_DS, &cl_init.init_flags); + /* + * Set an authflavor equual to the MDS value. Use the MDS nfs_client + * cl_ipaddr so as to use the same EXCHANGE_ID co_ownerid as the MDS +diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c +index 2dec0fed1ba16..acb1346da13e9 100644 +--- a/fs/nfs/nfs4proc.c ++++ b/fs/nfs/nfs4proc.c +@@ -8794,6 +8794,8 @@ nfs4_run_exchange_id(struct nfs_client *clp, const struct cred *cred, + #ifdef CONFIG_NFS_V4_1_MIGRATION + calldata->args.flags |= EXCHGID4_FLAG_SUPP_MOVED_MIGR; + #endif ++ if (test_bit(NFS_CS_DS, &clp->cl_flags)) ++ calldata->args.flags |= EXCHGID4_FLAG_USE_PNFS_DS; + msg.rpc_argp = &calldata->args; + msg.rpc_resp = &calldata->res; + task_setup_data.callback_data = calldata; +@@ -8871,6 +8873,8 @@ static int _nfs4_proc_exchange_id(struct nfs_client *clp, const struct cred *cre + /* Save the EXCHANGE_ID verifier session trunk tests */ + memcpy(clp->cl_confirm.data, argp->verifier.data, + sizeof(clp->cl_confirm.data)); ++ if (resp->flags & EXCHGID4_FLAG_USE_PNFS_DS) ++ set_bit(NFS_CS_DS, &clp->cl_flags); + out: + trace_nfs4_exchange_id(clp, status); + rpc_put_task(task); +-- +2.40.1 + diff --git a/queue-6.1/perf-build-update-build-rule-for-generated-files.patch b/queue-6.1/perf-build-update-build-rule-for-generated-files.patch new file mode 100644 index 00000000000..8e32f4c5f68 --- /dev/null +++ b/queue-6.1/perf-build-update-build-rule-for-generated-files.patch @@ -0,0 +1,87 @@ +From 085ba292175109908bc60f7980bbdbbf58c7128a Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 27 Jul 2023 19:24:46 -0700 +Subject: perf build: Update build rule for generated files + +From: Namhyung Kim + +[ Upstream commit 7822a8913f4c51c7d1aff793b525d60c3384fb5b ] + +The bison and flex generate C files from the source (.y and .l) +files. When O= option is used, they are saved in a separate directory +but the default build rule assumes the .C files are in the source +directory. So it might read invalid file if there are generated files +from an old version. The same is true for the pmu-events files. + +For example, the following command would cause a build failure: + + $ git checkout v6.3 + $ make -C tools/perf # build in the same directory + + $ git checkout v6.5-rc2 + $ mkdir build # create a build directory + $ make -C tools/perf O=build # build in a different directory but it + # refers files in the source directory + +Let's update the build rule to specify those cases explicitly to depend +on the files in the output directory. + +Note that it's not a complete fix and it needs the next patch for the +include path too. + +Fixes: 80eeb67fe577aa76 ("perf jevents: Program to convert JSON file") +Signed-off-by: Namhyung Kim +Cc: Adrian Hunter +Cc: Andi Kleen +Cc: Anup Sharma +Cc: Ian Rogers +Cc: Ingo Molnar +Cc: Jiri Olsa +Cc: Peter Zijlstra +Cc: stable@vger.kernel.org +Link: https://lore.kernel.org/r/20230728022447.1323563-1-namhyung@kernel.org +Signed-off-by: Arnaldo Carvalho de Melo +Signed-off-by: Sasha Levin +--- + tools/build/Makefile.build | 10 ++++++++++ + tools/perf/pmu-events/Build | 6 ++++++ + 2 files changed, 16 insertions(+) + +diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build +index 715092fc6a239..0f0aba16bdee7 100644 +--- a/tools/build/Makefile.build ++++ b/tools/build/Makefile.build +@@ -116,6 +116,16 @@ $(OUTPUT)%.s: %.c FORCE + $(call rule_mkdir) + $(call if_changed_dep,cc_s_c) + ++# bison and flex files are generated in the OUTPUT directory ++# so it needs a separate rule to depend on them properly ++$(OUTPUT)%-bison.o: $(OUTPUT)%-bison.c FORCE ++ $(call rule_mkdir) ++ $(call if_changed_dep,$(host)cc_o_c) ++ ++$(OUTPUT)%-flex.o: $(OUTPUT)%-flex.c FORCE ++ $(call rule_mkdir) ++ $(call if_changed_dep,$(host)cc_o_c) ++ + # Gather build data: + # obj-y - list of build objects + # subdir-y - list of directories to nest +diff --git a/tools/perf/pmu-events/Build b/tools/perf/pmu-events/Build +index 04ef95174660b..fcb61b94f1306 100644 +--- a/tools/perf/pmu-events/Build ++++ b/tools/perf/pmu-events/Build +@@ -25,3 +25,9 @@ $(OUTPUT)pmu-events/pmu-events.c: $(JSON) $(JSON_TEST) $(JEVENTS_PY) + $(call rule_mkdir) + $(Q)$(call echo-cmd,gen)$(PYTHON) $(JEVENTS_PY) $(JEVENTS_ARCH) pmu-events/arch $@ + endif ++ ++# pmu-events.c file is generated in the OUTPUT directory so it needs a ++# separate rule to depend on it properly ++$(OUTPUT)pmu-events/pmu-events.o: $(PMU_EVENTS_C) ++ $(call rule_mkdir) ++ $(call if_changed_dep,cc_o_c) +-- +2.40.1 + diff --git a/queue-6.1/series b/queue-6.1/series new file mode 100644 index 00000000000..75a08eb25eb --- /dev/null +++ b/queue-6.1/series @@ -0,0 +1,19 @@ +nfs-fix-error-handling-for-o_direct-write-scheduling.patch +nfs-fix-o_direct-locking-issues.patch +nfs-more-o_direct-accounting-fixes-for-error-paths.patch +nfs-use-the-correct-commit-info-in-nfs_join_page_gro.patch +nfs-more-fixes-for-nfs_direct_write_reschedule_io.patch +nfs-pnfs-report-einval-errors-from-connect-to-the-se.patch +sunrpc-mark-the-cred-for-revalidation-if-the-server-.patch +nfsv4.1-use-exchgid4_flag_use_pnfs_ds-for-ds-server.patch +nfsv4.1-fix-pnfs-mds-ds-session-trunking.patch +media-v4l-use-correct-dependency-for-camera-sensor-d.patch +media-via-use-correct-dependency-for-camera-sensor-d.patch +netfs-only-call-folio_start_fscache-one-time-for-eac.patch +perf-build-update-build-rule-for-generated-files.patch +dm-fix-a-race-condition-in-retrieve_deps.patch +btrfs-improve-error-message-after-failure-to-add-del.patch +btrfs-remove-bug-after-failure-to-insert-delayed-dir.patch +ext4-replace-the-traditional-ternary-conditional-ope.patch +ext4-move-setting-of-trimmed-bit-into-ext4_try_to_tr.patch +ext4-do-not-let-fstrim-block-system-suspend.patch diff --git a/queue-6.1/sunrpc-mark-the-cred-for-revalidation-if-the-server-.patch b/queue-6.1/sunrpc-mark-the-cred-for-revalidation-if-the-server-.patch new file mode 100644 index 00000000000..766c28e3db3 --- /dev/null +++ b/queue-6.1/sunrpc-mark-the-cred-for-revalidation-if-the-server-.patch @@ -0,0 +1,35 @@ +From 1a1671cfccba1655dc11357a40366c008c7e845d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 4 Sep 2023 12:50:09 -0400 +Subject: SUNRPC: Mark the cred for revalidation if the server rejects it + +From: Trond Myklebust + +[ Upstream commit 611fa42dfa9d2f3918ac5f4dd5705dfad81b323d ] + +If the server rejects the credential as being stale, or bad, then we +should mark it for revalidation before retransmitting. + +Fixes: 7f5667a5f8c4 ("SUNRPC: Clean up rpc_verify_header()") +Signed-off-by: Trond Myklebust +Signed-off-by: Anna Schumaker +Signed-off-by: Sasha Levin +--- + net/sunrpc/clnt.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c +index b0258507b236c..ff6728e41e044 100644 +--- a/net/sunrpc/clnt.c ++++ b/net/sunrpc/clnt.c +@@ -2736,6 +2736,7 @@ rpc_decode_header(struct rpc_task *task, struct xdr_stream *xdr) + case rpc_autherr_rejectedverf: + case rpcsec_gsserr_credproblem: + case rpcsec_gsserr_ctxproblem: ++ rpcauth_invalcred(task); + if (!task->tk_cred_retry) + break; + task->tk_cred_retry--; +-- +2.40.1 +