From: Sasha Levin Date: Tue, 10 Jan 2023 01:55:36 +0000 (-0500) Subject: Fixes for 5.15 X-Git-Tag: v5.15.87~44 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=4f69f8e7678747602d44bd1ba0ae1213f59dbd16;p=thirdparty%2Fkernel%2Fstable-queue.git Fixes for 5.15 Signed-off-by: Sasha Levin --- diff --git a/queue-5.15/asoc-intel-bytcr_rt5640-add-quirk-for-the-advantech-.patch b/queue-5.15/asoc-intel-bytcr_rt5640-add-quirk-for-the-advantech-.patch new file mode 100644 index 00000000000..aad3292d490 --- /dev/null +++ b/queue-5.15/asoc-intel-bytcr_rt5640-add-quirk-for-the-advantech-.patch @@ -0,0 +1,59 @@ +From e970b2d6b1628d1e38f505466c40d1513938a58b Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 13 Dec 2022 13:32:46 +0100 +Subject: ASoC: Intel: bytcr_rt5640: Add quirk for the Advantech MICA-071 + tablet + +From: Hans de Goede + +[ Upstream commit a1dec9d70b6ad97087b60b81d2492134a84208c6 ] + +The Advantech MICA-071 tablet deviates from the defaults for +a non CR Bay Trail based tablet in several ways: + +1. It uses an analog MIC on IN3 rather then using DMIC1 +2. It only has 1 speaker +3. It needs the OVCD current threshold to be set to 1500uA instead of + the default 2000uA to reliable differentiate between headphones vs + headsets + +Add a quirk with these settings for this tablet. + +Signed-off-by: Hans de Goede +Acked-by: Pierre-Louis Bossart +Link: https://lore.kernel.org/r/20221213123246.11226-1-hdegoede@redhat.com +Signed-off-by: Mark Brown +Signed-off-by: Sasha Levin +--- + sound/soc/intel/boards/bytcr_rt5640.c | 15 +++++++++++++++ + 1 file changed, 15 insertions(+) + +diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c +index f9c82ebc552c..888e04c57757 100644 +--- a/sound/soc/intel/boards/bytcr_rt5640.c ++++ b/sound/soc/intel/boards/bytcr_rt5640.c +@@ -570,6 +570,21 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = { + BYT_RT5640_SSP0_AIF1 | + BYT_RT5640_MCLK_EN), + }, ++ { ++ /* Advantech MICA-071 */ ++ .matches = { ++ DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Advantech"), ++ DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "MICA-071"), ++ }, ++ /* OVCD Th = 1500uA to reliable detect head-phones vs -set */ ++ .driver_data = (void *)(BYT_RT5640_IN3_MAP | ++ BYT_RT5640_JD_SRC_JD2_IN4N | ++ BYT_RT5640_OVCD_TH_1500UA | ++ BYT_RT5640_OVCD_SF_0P75 | ++ BYT_RT5640_MONO_SPEAKER | ++ BYT_RT5640_DIFF_MIC | ++ BYT_RT5640_MCLK_EN), ++ }, + { + .matches = { + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "ARCHOS"), +-- +2.35.1 + diff --git a/queue-5.15/bpf-pull-before-calling-skb_postpull_rcsum.patch b/queue-5.15/bpf-pull-before-calling-skb_postpull_rcsum.patch new file mode 100644 index 00000000000..02f1ac826a8 --- /dev/null +++ b/queue-5.15/bpf-pull-before-calling-skb_postpull_rcsum.patch @@ -0,0 +1,61 @@ +From 3ed2382793987a1ebd8ba00f4bf62108c6f2ad5d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 19 Dec 2022 16:47:00 -0800 +Subject: bpf: pull before calling skb_postpull_rcsum() + +From: Jakub Kicinski + +[ Upstream commit 54c3f1a81421f85e60ae2eaae7be3727a09916ee ] + +Anand hit a BUG() when pulling off headers on egress to a SW tunnel. +We get to skb_checksum_help() with an invalid checksum offset +(commit d7ea0d9df2a6 ("net: remove two BUG() from skb_checksum_help()") +converted those BUGs to WARN_ONs()). +He points out oddness in how skb_postpull_rcsum() gets used. +Indeed looks like we should pull before "postpull", otherwise +the CHECKSUM_PARTIAL fixup from skb_postpull_rcsum() will not +be able to do its job: + + if (skb->ip_summed == CHECKSUM_PARTIAL && + skb_checksum_start_offset(skb) < 0) + skb->ip_summed = CHECKSUM_NONE; + +Reported-by: Anand Parthasarathy +Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper") +Signed-off-by: Jakub Kicinski +Acked-by: Stanislav Fomichev +Link: https://lore.kernel.org/r/20221220004701.402165-1-kuba@kernel.org +Signed-off-by: Martin KaFai Lau +Signed-off-by: Sasha Levin +--- + net/core/filter.c | 7 +++++-- + 1 file changed, 5 insertions(+), 2 deletions(-) + +diff --git a/net/core/filter.c b/net/core/filter.c +index 2da05622afbe..b2031148dd8b 100644 +--- a/net/core/filter.c ++++ b/net/core/filter.c +@@ -3182,15 +3182,18 @@ static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len) + + static int bpf_skb_generic_pop(struct sk_buff *skb, u32 off, u32 len) + { ++ void *old_data; ++ + /* skb_ensure_writable() is not needed here, as we're + * already working on an uncloned skb. + */ + if (unlikely(!pskb_may_pull(skb, off + len))) + return -ENOMEM; + +- skb_postpull_rcsum(skb, skb->data + off, len); +- memmove(skb->data + len, skb->data, off); ++ old_data = skb->data; + __skb_pull(skb, len); ++ skb_postpull_rcsum(skb, old_data + off, len); ++ memmove(skb->data, old_data, off); + + return 0; + } +-- +2.35.1 + diff --git a/queue-5.15/btrfs-check-superblock-to-ensure-the-fs-was-not-modi.patch b/queue-5.15/btrfs-check-superblock-to-ensure-the-fs-was-not-modi.patch new file mode 100644 index 00000000000..a43f2967a6e --- /dev/null +++ b/queue-5.15/btrfs-check-superblock-to-ensure-the-fs-was-not-modi.patch @@ -0,0 +1,254 @@ +From 365bb8b8c528592f7580d57c07585a3afbda7fae Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 24 Aug 2022 20:16:22 +0800 +Subject: btrfs: check superblock to ensure the fs was not modified at thaw + time + +From: Qu Wenruo + +[ Upstream commit a05d3c9153145283ce9c58a1d7a9056fbb85f6a1 ] + +[BACKGROUND] +There is an incident report that, one user hibernated the system, with +one btrfs on removable device still mounted. + +Then by some incident, the btrfs got mounted and modified by another +system/OS, then back to the hibernated system. + +After resuming from the hibernation, new write happened into the victim btrfs. + +Now the fs is completely broken, since the underlying btrfs is no longer +the same one before the hibernation, and the user lost their data due to +various transid mismatch. + +[REPRODUCER] +We can emulate the situation using the following small script: + + truncate -s 1G $dev + mkfs.btrfs -f $dev + mount $dev $mnt + fsstress -w -d $mnt -n 500 + sync + xfs_freeze -f $mnt + cp $dev $dev.backup + + # There is no way to mount the same cloned fs on the same system, + # as the conflicting fsid will be rejected by btrfs. + # Thus here we have to wipe the fs using a different btrfs. + mkfs.btrfs -f $dev.backup + + dd if=$dev.backup of=$dev bs=1M + xfs_freeze -u $mnt + fsstress -w -d $mnt -n 20 + umount $mnt + btrfs check $dev + +The final fsck will fail due to some tree blocks has incorrect fsid. + +This is enough to emulate the problem hit by the unfortunate user. + +[ENHANCEMENT] +Although such case should not be that common, it can still happen from +time to time. + +From the view of btrfs, we can detect any unexpected super block change, +and if there is any unexpected change, we just mark the fs read-only, +and thaw the fs. + +By this we can limit the damage to minimal, and I hope no one would lose +their data by this anymore. + +Suggested-by: Goffredo Baroncelli +Link: https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/ +Reviewed-by: Anand Jain +Signed-off-by: Qu Wenruo +Signed-off-by: David Sterba +Signed-off-by: Sasha Levin +--- + fs/btrfs/disk-io.c | 25 ++++++++++++++----- + fs/btrfs/disk-io.h | 4 +++- + fs/btrfs/super.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++ + fs/btrfs/volumes.c | 2 +- + 4 files changed, 83 insertions(+), 8 deletions(-) + +diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c +index 2fd46093e5bb..6484f61c6fbb 100644 +--- a/fs/btrfs/disk-io.c ++++ b/fs/btrfs/disk-io.c +@@ -2491,8 +2491,8 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info) + * 1, 2 2nd and 3rd backup copy + * -1 skip bytenr check + */ +-static int validate_super(struct btrfs_fs_info *fs_info, +- struct btrfs_super_block *sb, int mirror_num) ++int btrfs_validate_super(struct btrfs_fs_info *fs_info, ++ struct btrfs_super_block *sb, int mirror_num) + { + u64 nodesize = btrfs_super_nodesize(sb); + u64 sectorsize = btrfs_super_sectorsize(sb); +@@ -2675,7 +2675,7 @@ static int validate_super(struct btrfs_fs_info *fs_info, + */ + static int btrfs_validate_mount_super(struct btrfs_fs_info *fs_info) + { +- return validate_super(fs_info, fs_info->super_copy, 0); ++ return btrfs_validate_super(fs_info, fs_info->super_copy, 0); + } + + /* +@@ -2689,7 +2689,7 @@ static int btrfs_validate_write_super(struct btrfs_fs_info *fs_info, + { + int ret; + +- ret = validate_super(fs_info, sb, -1); ++ ret = btrfs_validate_super(fs_info, sb, -1); + if (ret < 0) + goto out; + if (!btrfs_supported_super_csum(btrfs_super_csum_type(sb))) { +@@ -3703,7 +3703,7 @@ static void btrfs_end_super_write(struct bio *bio) + } + + struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev, +- int copy_num) ++ int copy_num, bool drop_cache) + { + struct btrfs_super_block *super; + struct page *page; +@@ -3721,6 +3721,19 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev, + if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode)) + return ERR_PTR(-EINVAL); + ++ if (drop_cache) { ++ /* This should only be called with the primary sb. */ ++ ASSERT(copy_num == 0); ++ ++ /* ++ * Drop the page of the primary superblock, so later read will ++ * always read from the device. ++ */ ++ invalidate_inode_pages2_range(mapping, ++ bytenr >> PAGE_SHIFT, ++ (bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT); ++ } ++ + page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS); + if (IS_ERR(page)) + return ERR_CAST(page); +@@ -3752,7 +3765,7 @@ struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev) + * later supers, using BTRFS_SUPER_MIRROR_MAX instead + */ + for (i = 0; i < 1; i++) { +- super = btrfs_read_dev_one_super(bdev, i); ++ super = btrfs_read_dev_one_super(bdev, i, false); + if (IS_ERR(super)) + continue; + +diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h +index 1b8fd3deafc9..9de0c39f63a2 100644 +--- a/fs/btrfs/disk-io.h ++++ b/fs/btrfs/disk-io.h +@@ -56,10 +56,12 @@ int __cold open_ctree(struct super_block *sb, + struct btrfs_fs_devices *fs_devices, + char *options); + void __cold close_ctree(struct btrfs_fs_info *fs_info); ++int btrfs_validate_super(struct btrfs_fs_info *fs_info, ++ struct btrfs_super_block *sb, int mirror_num); + int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors); + struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev); + struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev, +- int copy_num); ++ int copy_num, bool drop_cache); + int btrfs_commit_super(struct btrfs_fs_info *fs_info); + struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root, + struct btrfs_key *key); +diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c +index 61b84391be58..bde5ead01c24 100644 +--- a/fs/btrfs/super.c ++++ b/fs/btrfs/super.c +@@ -2497,11 +2497,71 @@ static int btrfs_freeze(struct super_block *sb) + return btrfs_commit_transaction(trans); + } + ++static int check_dev_super(struct btrfs_device *dev) ++{ ++ struct btrfs_fs_info *fs_info = dev->fs_info; ++ struct btrfs_super_block *sb; ++ int ret = 0; ++ ++ /* This should be called with fs still frozen. */ ++ ASSERT(test_bit(BTRFS_FS_FROZEN, &fs_info->flags)); ++ ++ /* Missing dev, no need to check. */ ++ if (!dev->bdev) ++ return 0; ++ ++ /* Only need to check the primary super block. */ ++ sb = btrfs_read_dev_one_super(dev->bdev, 0, true); ++ if (IS_ERR(sb)) ++ return PTR_ERR(sb); ++ ++ /* Btrfs_validate_super() includes fsid check against super->fsid. */ ++ ret = btrfs_validate_super(fs_info, sb, 0); ++ if (ret < 0) ++ goto out; ++ ++ if (btrfs_super_generation(sb) != fs_info->last_trans_committed) { ++ btrfs_err(fs_info, "transid mismatch, has %llu expect %llu", ++ btrfs_super_generation(sb), ++ fs_info->last_trans_committed); ++ ret = -EUCLEAN; ++ goto out; ++ } ++out: ++ btrfs_release_disk_super(sb); ++ return ret; ++} ++ + static int btrfs_unfreeze(struct super_block *sb) + { + struct btrfs_fs_info *fs_info = btrfs_sb(sb); ++ struct btrfs_device *device; ++ int ret = 0; + ++ /* ++ * Make sure the fs is not changed by accident (like hibernation then ++ * modified by other OS). ++ * If we found anything wrong, we mark the fs error immediately. ++ * ++ * And since the fs is frozen, no one can modify the fs yet, thus ++ * we don't need to hold device_list_mutex. ++ */ ++ list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) { ++ ret = check_dev_super(device); ++ if (ret < 0) { ++ btrfs_handle_fs_error(fs_info, ret, ++ "super block on devid %llu got modified unexpectedly", ++ device->devid); ++ break; ++ } ++ } + clear_bit(BTRFS_FS_FROZEN, &fs_info->flags); ++ ++ /* ++ * We still return 0, to allow VFS layer to unfreeze the fs even the ++ * above checks failed. Since the fs is either fine or read-only, we're ++ * safe to continue, without causing further damage. ++ */ + return 0; + } + +diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c +index 6b86a3cec04c..f01549b8c7c5 100644 +--- a/fs/btrfs/volumes.c ++++ b/fs/btrfs/volumes.c +@@ -2074,7 +2074,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, + struct page *page; + int ret; + +- disk_super = btrfs_read_dev_one_super(bdev, copy_num); ++ disk_super = btrfs_read_dev_one_super(bdev, copy_num, false); + if (IS_ERR(disk_super)) + continue; + +-- +2.35.1 + diff --git a/queue-5.15/btrfs-fix-an-error-handling-path-in-btrfs_defrag_lea.patch b/queue-5.15/btrfs-fix-an-error-handling-path-in-btrfs_defrag_lea.patch new file mode 100644 index 00000000000..072a449103a --- /dev/null +++ b/queue-5.15/btrfs-fix-an-error-handling-path-in-btrfs_defrag_lea.patch @@ -0,0 +1,45 @@ +From b16dcfb12f1cd471dda901c372b210a2f66682b7 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Sun, 8 Jan 2023 08:24:19 -0500 +Subject: btrfs: fix an error handling path in btrfs_defrag_leaves() + +[ Upstream commit db0a4a7b8e95f9312a59a67cbd5bc589f090e13d ] + +All error handling paths end to 'out', except this memory allocation +failure. + +This is spurious. So branch to the error handling path also in this case. +It will add a call to: + + memset(&root->defrag_progress, 0, + sizeof(root->defrag_progress)); + +Fixes: 6702ed490ca0 ("Btrfs: Add run time btree defrag, and an ioctl to force btree defrag") +Signed-off-by: Christophe JAILLET +Reviewed-by: David Sterba +Signed-off-by: David Sterba +Signed-off-by: Sasha Levin +--- + fs/btrfs/tree-defrag.c | 6 ++++-- + 1 file changed, 4 insertions(+), 2 deletions(-) + +diff --git a/fs/btrfs/tree-defrag.c b/fs/btrfs/tree-defrag.c +index 7c45d960b53c..259a3b5f9303 100644 +--- a/fs/btrfs/tree-defrag.c ++++ b/fs/btrfs/tree-defrag.c +@@ -39,8 +39,10 @@ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, + goto out; + + path = btrfs_alloc_path(); +- if (!path) +- return -ENOMEM; ++ if (!path) { ++ ret = -ENOMEM; ++ goto out; ++ } + + level = btrfs_header_level(root->node); + +-- +2.35.1 + diff --git a/queue-5.15/caif-fix-memory-leak-in-cfctrl_linkup_request.patch b/queue-5.15/caif-fix-memory-leak-in-cfctrl_linkup_request.patch new file mode 100644 index 00000000000..a4b1e2a0788 --- /dev/null +++ b/queue-5.15/caif-fix-memory-leak-in-cfctrl_linkup_request.patch @@ -0,0 +1,47 @@ +From af77123c68c7504e243f9e496046b7364b0a5760 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 4 Jan 2023 14:51:46 +0800 +Subject: caif: fix memory leak in cfctrl_linkup_request() + +From: Zhengchao Shao + +[ Upstream commit fe69230f05897b3de758427b574fc98025dfc907 ] + +When linktype is unknown or kzalloc failed in cfctrl_linkup_request(), +pkt is not released. Add release process to error path. + +Fixes: b482cd2053e3 ("net-caif: add CAIF core protocol stack") +Fixes: 8d545c8f958f ("caif: Disconnect without waiting for response") +Signed-off-by: Zhengchao Shao +Reviewed-by: Jiri Pirko +Link: https://lore.kernel.org/r/20230104065146.1153009-1-shaozhengchao@huawei.com +Signed-off-by: Paolo Abeni +Signed-off-by: Sasha Levin +--- + net/caif/cfctrl.c | 6 +++++- + 1 file changed, 5 insertions(+), 1 deletion(-) + +diff --git a/net/caif/cfctrl.c b/net/caif/cfctrl.c +index 2809cbd6b7f7..d8cb4b2a076b 100644 +--- a/net/caif/cfctrl.c ++++ b/net/caif/cfctrl.c +@@ -269,11 +269,15 @@ int cfctrl_linkup_request(struct cflayer *layer, + default: + pr_warn("Request setup of bad link type = %d\n", + param->linktype); ++ cfpkt_destroy(pkt); + return -EINVAL; + } + req = kzalloc(sizeof(*req), GFP_KERNEL); +- if (!req) ++ if (!req) { ++ cfpkt_destroy(pkt); + return -ENOMEM; ++ } ++ + req->client_layer = user_layer; + req->cmd = CFCTRL_CMD_LINK_SETUP; + req->param = *param; +-- +2.35.1 + diff --git a/queue-5.15/ceph-switch-to-vfs_inode_has_locks-to-fix-file-lock-.patch b/queue-5.15/ceph-switch-to-vfs_inode_has_locks-to-fix-file-lock-.patch new file mode 100644 index 00000000000..46648f1702d --- /dev/null +++ b/queue-5.15/ceph-switch-to-vfs_inode_has_locks-to-fix-file-lock-.patch @@ -0,0 +1,85 @@ +From df1a1a1717f9305627520ed4b6e8ff313b78ecb7 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 17 Nov 2022 10:43:21 +0800 +Subject: ceph: switch to vfs_inode_has_locks() to fix file lock bug + +From: Xiubo Li + +[ Upstream commit 461ab10ef7e6ea9b41a0571a7fc6a72af9549a3c ] + +For the POSIX locks they are using the same owner, which is the +thread id. And multiple POSIX locks could be merged into single one, +so when checking whether the 'file' has locks may fail. + +For a file where some openers use locking and others don't is a +really odd usage pattern though. Locks are like stoplights -- they +only work if everyone pays attention to them. + +Just switch ceph_get_caps() to check whether any locks are set on +the inode. If there are POSIX/OFD/FLOCK locks on the file at the +time, we should set CHECK_FILELOCK, regardless of what fd was used +to set the lock. + +Fixes: ff5d913dfc71 ("ceph: return -EIO if read/write against filp that lost file locks") +Signed-off-by: Xiubo Li +Reviewed-by: Jeff Layton +Reviewed-by: Ilya Dryomov +Signed-off-by: Ilya Dryomov +Signed-off-by: Sasha Levin +--- + fs/ceph/caps.c | 2 +- + fs/ceph/locks.c | 4 ---- + fs/ceph/super.h | 1 - + 3 files changed, 1 insertion(+), 6 deletions(-) + +diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c +index be96fe615bec..67b782b0a90a 100644 +--- a/fs/ceph/caps.c ++++ b/fs/ceph/caps.c +@@ -2872,7 +2872,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got + + while (true) { + flags &= CEPH_FILE_MODE_MASK; +- if (atomic_read(&fi->num_locks)) ++ if (vfs_inode_has_locks(inode)) + flags |= CHECK_FILELOCK; + _got = 0; + ret = try_get_cap_refs(inode, need, want, endoff, +diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c +index bdeb271f47d9..3e3b8be76b21 100644 +--- a/fs/ceph/locks.c ++++ b/fs/ceph/locks.c +@@ -32,18 +32,14 @@ void __init ceph_flock_init(void) + + static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src) + { +- struct ceph_file_info *fi = dst->fl_file->private_data; + struct inode *inode = file_inode(dst->fl_file); + atomic_inc(&ceph_inode(inode)->i_filelock_ref); +- atomic_inc(&fi->num_locks); + } + + static void ceph_fl_release_lock(struct file_lock *fl) + { +- struct ceph_file_info *fi = fl->fl_file->private_data; + struct inode *inode = file_inode(fl->fl_file); + struct ceph_inode_info *ci = ceph_inode(inode); +- atomic_dec(&fi->num_locks); + if (atomic_dec_and_test(&ci->i_filelock_ref)) { + /* clear error when all locks are released */ + spin_lock(&ci->i_ceph_lock); +diff --git a/fs/ceph/super.h b/fs/ceph/super.h +index 14f951cd5b61..8c9021d0f837 100644 +--- a/fs/ceph/super.h ++++ b/fs/ceph/super.h +@@ -773,7 +773,6 @@ struct ceph_file_info { + struct list_head rw_contexts; + + u32 filp_gen; +- atomic_t num_locks; + }; + + struct ceph_dir_file_info { +-- +2.35.1 + diff --git a/queue-5.15/drivers-net-bonding-bond_3ad-return-when-there-s-no-.patch b/queue-5.15/drivers-net-bonding-bond_3ad-return-when-there-s-no-.patch new file mode 100644 index 00000000000..bd0d4431ad1 --- /dev/null +++ b/queue-5.15/drivers-net-bonding-bond_3ad-return-when-there-s-no-.patch @@ -0,0 +1,39 @@ +From 690c014d0ee7c307708afcc0f56c2b6cca687dc5 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 2 Jan 2023 12:53:35 +0300 +Subject: drivers/net/bonding/bond_3ad: return when there's no aggregator + +From: Daniil Tatianin + +[ Upstream commit 9c807965483f42df1d053b7436eedd6cf28ece6f ] + +Otherwise we would dereference a NULL aggregator pointer when calling +__set_agg_ports_ready on the line below. + +Found by Linux Verification Center (linuxtesting.org) with the SVACE +static analysis tool. + +Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") +Signed-off-by: Daniil Tatianin +Reviewed-by: Jiri Pirko +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/bonding/bond_3ad.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c +index 8ad095c19f27..ff6d4e74a186 100644 +--- a/drivers/net/bonding/bond_3ad.c ++++ b/drivers/net/bonding/bond_3ad.c +@@ -1539,6 +1539,7 @@ static void ad_port_selection_logic(struct port *port, bool *update_slave_arr) + slave_err(bond->dev, port->slave->dev, + "Port %d did not find a suitable aggregator\n", + port->actor_port_number); ++ return; + } + } + /* if all aggregator's ports are READY_N == TRUE, set ready=TRUE +-- +2.35.1 + diff --git a/queue-5.15/drm-i915-migrate-don-t-check-the-scratch-page.patch b/queue-5.15/drm-i915-migrate-don-t-check-the-scratch-page.patch new file mode 100644 index 00000000000..5d1ec7e8ab3 --- /dev/null +++ b/queue-5.15/drm-i915-migrate-don-t-check-the-scratch-page.patch @@ -0,0 +1,58 @@ +From 619105275f5675150354f7fd7db762a309c91662 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 6 Dec 2021 11:25:36 +0000 +Subject: drm/i915/migrate: don't check the scratch page +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Matthew Auld + +[ Upstream commit 8eb7fcce34d16f77ac8efa80e8dfecec2503e8c5 ] + +The scratch page might not be allocated in LMEM(like on DG2), so instead +of using that as the deciding factor for where the paging structures +live, let's just query the pt before mapping it. + +Signed-off-by: Matthew Auld +Cc: Thomas Hellström +Cc: Ramalingam C +Reviewed-by: Ramalingam C +Link: https://patchwork.freedesktop.org/patch/msgid/20211206112539.3149779-1-matthew.auld@intel.com +Signed-off-by: Sasha Levin +--- + drivers/gpu/drm/i915/gt/intel_migrate.c | 4 +--- + 1 file changed, 1 insertion(+), 3 deletions(-) + +diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c +index 1dac21aa7e5c..aa05c26ff792 100644 +--- a/drivers/gpu/drm/i915/gt/intel_migrate.c ++++ b/drivers/gpu/drm/i915/gt/intel_migrate.c +@@ -13,7 +13,6 @@ + + struct insert_pte_data { + u64 offset; +- bool is_lmem; + }; + + #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */ +@@ -40,7 +39,7 @@ static void insert_pte(struct i915_address_space *vm, + struct insert_pte_data *d = data; + + vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, +- d->is_lmem ? PTE_LM : 0); ++ i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0); + d->offset += PAGE_SIZE; + } + +@@ -134,7 +133,6 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) + goto err_vm; + + /* Now allow the GPU to rewrite the PTE via its own ppGTT */ +- d.is_lmem = i915_gem_object_is_lmem(vm->vm.scratch[0]); + vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d); + } + +-- +2.35.1 + diff --git a/queue-5.15/drm-i915-migrate-fix-length-calculation.patch b/queue-5.15/drm-i915-migrate-fix-length-calculation.patch new file mode 100644 index 00000000000..80dfacea308 --- /dev/null +++ b/queue-5.15/drm-i915-migrate-fix-length-calculation.patch @@ -0,0 +1,42 @@ +From c0bf242232f66d961668453c34c9312bd9be48cc Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 6 Dec 2021 11:25:38 +0000 +Subject: drm/i915/migrate: fix length calculation +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Matthew Auld + +[ Upstream commit 31d70749bfe110593fbe8bf45e7c7788c7d85035 ] + +No need to insert PTEs for the PTE window itself, also foreach expects a +length not an end offset, which could be gigantic here with a second +engine. + +Signed-off-by: Matthew Auld +Cc: Thomas Hellström +Cc: Ramalingam C +Reviewed-by: Ramalingam C +Link: https://patchwork.freedesktop.org/patch/msgid/20211206112539.3149779-3-matthew.auld@intel.com +Signed-off-by: Sasha Levin +--- + drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c +index fb7fe3a2b6c6..5b59a6effc20 100644 +--- a/drivers/gpu/drm/i915/gt/intel_migrate.c ++++ b/drivers/gpu/drm/i915/gt/intel_migrate.c +@@ -133,7 +133,7 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) + goto err_vm; + + /* Now allow the GPU to rewrite the PTE via its own ppGTT */ +- vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d); ++ vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d); + } + + return &vm->vm; +-- +2.35.1 + diff --git a/queue-5.15/drm-i915-migrate-fix-offset-calculation.patch b/queue-5.15/drm-i915-migrate-fix-offset-calculation.patch new file mode 100644 index 00000000000..8a546470799 --- /dev/null +++ b/queue-5.15/drm-i915-migrate-fix-offset-calculation.patch @@ -0,0 +1,44 @@ +From 8d9eb2f4e7710d886a242f7731ea85dda01a5e69 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 6 Dec 2021 11:25:37 +0000 +Subject: drm/i915/migrate: fix offset calculation +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Matthew Auld + +[ Upstream commit 08c7c122ad90799cc3ae674e7f29f236f91063ce ] + +Ensure we add the engine base only after we calculate the qword offset +into the PTE window. + +Signed-off-by: Matthew Auld +Cc: Thomas Hellström +Cc: Ramalingam C +Reviewed-by: Ramalingam C +Link: https://patchwork.freedesktop.org/patch/msgid/20211206112539.3149779-2-matthew.auld@intel.com +Signed-off-by: Sasha Levin +--- + drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c +index aa05c26ff792..fb7fe3a2b6c6 100644 +--- a/drivers/gpu/drm/i915/gt/intel_migrate.c ++++ b/drivers/gpu/drm/i915/gt/intel_migrate.c +@@ -279,10 +279,10 @@ static int emit_pte(struct i915_request *rq, + GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8); + + /* Compute the page directory offset for the target address range */ +- offset += (u64)rq->engine->instance << 32; + offset >>= 12; + offset *= sizeof(u64); + offset += 2 * CHUNK_SZ; ++ offset += (u64)rq->engine->instance << 32; + + cs = intel_ring_begin(rq, 6); + if (IS_ERR(cs)) +-- +2.35.1 + diff --git a/queue-5.15/drm-i915-unpin-on-error-in-intel_vgpu_shadow_mm_pin.patch b/queue-5.15/drm-i915-unpin-on-error-in-intel_vgpu_shadow_mm_pin.patch new file mode 100644 index 00000000000..0379081f278 --- /dev/null +++ b/queue-5.15/drm-i915-unpin-on-error-in-intel_vgpu_shadow_mm_pin.patch @@ -0,0 +1,36 @@ +From 44132230eddc282157bb222a6a6982cebfa121d9 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 15 Nov 2022 16:15:18 +0300 +Subject: drm/i915: unpin on error in intel_vgpu_shadow_mm_pin() + +From: Dan Carpenter + +[ Upstream commit 3792fc508c095abd84b10ceae12bd773e61fdc36 ] + +Call intel_vgpu_unpin_mm() on this error path. + +Fixes: 418741480809 ("drm/i915/gvt: Adding ppgtt to GVT GEM context after shadow pdps settled.") +Signed-off-by: Dan Carpenter +Signed-off-by: Zhenyu Wang +Link: http://patchwork.freedesktop.org/patch/msgid/Y3OQ5tgZIVxyQ/WV@kili +Reviewed-by: Zhenyu Wang +Signed-off-by: Sasha Levin +--- + drivers/gpu/drm/i915/gvt/scheduler.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c +index 1bb1be5c48c8..0291d42cfba8 100644 +--- a/drivers/gpu/drm/i915/gvt/scheduler.c ++++ b/drivers/gpu/drm/i915/gvt/scheduler.c +@@ -694,6 +694,7 @@ intel_vgpu_shadow_mm_pin(struct intel_vgpu_workload *workload) + + if (workload->shadow_mm->type != INTEL_GVT_MM_PPGTT || + !workload->shadow_mm->ppgtt_mm.shadowed) { ++ intel_vgpu_unpin_mm(workload->shadow_mm); + gvt_vgpu_err("workload shadow ppgtt isn't ready\n"); + return -EINVAL; + } +-- +2.35.1 + diff --git a/queue-5.15/drm-imx-ipuv3-plane-fix-overlay-plane-width.patch b/queue-5.15/drm-imx-ipuv3-plane-fix-overlay-plane-width.patch new file mode 100644 index 00000000000..c9d9ca17aad --- /dev/null +++ b/queue-5.15/drm-imx-ipuv3-plane-fix-overlay-plane-width.patch @@ -0,0 +1,82 @@ +From a2bf84de9ad6a8014c52c915b3e492ace07e3f19 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 8 Nov 2022 15:14:20 +0100 +Subject: drm/imx: ipuv3-plane: Fix overlay plane width + +From: Philipp Zabel + +[ Upstream commit 92d43bd3bc9728c1fb114d7011d46f5ea9489e28 ] + +ipu_src_rect_width() was introduced to support odd screen resolutions +such as 1366x768 by internally rounding up primary plane width to a +multiple of 8 and compensating with reduced horizontal blanking. +This also caused overlay plane width to be rounded up, which was not +intended. Fix overlay plane width by limiting the rounding up to the +primary plane. + +drm_rect_width(&new_state->src) >> 16 is the same value as +drm_rect_width(dst) because there is no plane scaling support. + +Fixes: 94dfec48fca7 ("drm/imx: Add 8 pixel alignment fix") +Reviewed-by: Lucas Stach +Link: https://lore.kernel.org/r/20221108141420.176696-1-p.zabel@pengutronix.de +Signed-off-by: Philipp Zabel +Link: https://patchwork.freedesktop.org/patch/msgid/20221108141420.176696-1-p.zabel@pengutronix.de +Tested-by: Ian Ray +(cherry picked from commit 4333472f8d7befe62359fecb1083cd57a6e07bfc) +Signed-off-by: Philipp Zabel +Signed-off-by: Sasha Levin +--- + drivers/gpu/drm/imx/ipuv3-plane.c | 14 ++++++++------ + 1 file changed, 8 insertions(+), 6 deletions(-) + +diff --git a/drivers/gpu/drm/imx/ipuv3-plane.c b/drivers/gpu/drm/imx/ipuv3-plane.c +index 846c1aae69c8..924a66f53951 100644 +--- a/drivers/gpu/drm/imx/ipuv3-plane.c ++++ b/drivers/gpu/drm/imx/ipuv3-plane.c +@@ -619,6 +619,11 @@ static void ipu_plane_atomic_update(struct drm_plane *plane, + break; + } + ++ if (ipu_plane->dp_flow == IPU_DP_FLOW_SYNC_BG) ++ width = ipu_src_rect_width(new_state); ++ else ++ width = drm_rect_width(&new_state->src) >> 16; ++ + eba = drm_plane_state_to_eba(new_state, 0); + + /* +@@ -627,8 +632,7 @@ static void ipu_plane_atomic_update(struct drm_plane *plane, + */ + if (ipu_state->use_pre) { + axi_id = ipu_chan_assign_axi_id(ipu_plane->dma); +- ipu_prg_channel_configure(ipu_plane->ipu_ch, axi_id, +- ipu_src_rect_width(new_state), ++ ipu_prg_channel_configure(ipu_plane->ipu_ch, axi_id, width, + drm_rect_height(&new_state->src) >> 16, + fb->pitches[0], fb->format->format, + fb->modifier, &eba); +@@ -683,9 +687,8 @@ static void ipu_plane_atomic_update(struct drm_plane *plane, + break; + } + +- ipu_dmfc_config_wait4eot(ipu_plane->dmfc, ALIGN(drm_rect_width(dst), 8)); ++ ipu_dmfc_config_wait4eot(ipu_plane->dmfc, width); + +- width = ipu_src_rect_width(new_state); + height = drm_rect_height(&new_state->src) >> 16; + info = drm_format_info(fb->format->format); + ipu_calculate_bursts(width, info->cpp[0], fb->pitches[0], +@@ -749,8 +752,7 @@ static void ipu_plane_atomic_update(struct drm_plane *plane, + ipu_cpmem_set_burstsize(ipu_plane->ipu_ch, 16); + + ipu_cpmem_zero(ipu_plane->alpha_ch); +- ipu_cpmem_set_resolution(ipu_plane->alpha_ch, +- ipu_src_rect_width(new_state), ++ ipu_cpmem_set_resolution(ipu_plane->alpha_ch, width, + drm_rect_height(&new_state->src) >> 16); + ipu_cpmem_set_format_passthrough(ipu_plane->alpha_ch, 8); + ipu_cpmem_set_high_priority(ipu_plane->alpha_ch); +-- +2.35.1 + diff --git a/queue-5.15/drm-meson-reduce-the-fifo-lines-held-when-afbc-is-no.patch b/queue-5.15/drm-meson-reduce-the-fifo-lines-held-when-afbc-is-no.patch new file mode 100644 index 00000000000..8574ba88d2a --- /dev/null +++ b/queue-5.15/drm-meson-reduce-the-fifo-lines-held-when-afbc-is-no.patch @@ -0,0 +1,56 @@ +From 6e49e0034301262b77ad78e712f84fe8ccaf5c86 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 19 Dec 2022 09:43:05 +0100 +Subject: drm/meson: Reduce the FIFO lines held when AFBC is not used + +From: Carlo Caione + +[ Upstream commit 3b754ed6d1cd90017e66e5cc16f3923e4a952ffc ] + +Having a bigger number of FIFO lines held after vsync is only useful to +SoCs using AFBC to give time to the AFBC decoder to be reset, configured +and enabled again. + +For SoCs not using AFBC this, on the contrary, is causing on some +displays issues and a few pixels vertical offset in the displayed image. + +Conditionally increase the number of lines held after vsync only for +SoCs using AFBC, leaving the default value for all the others. + +Fixes: 24e0d4058eff ("drm/meson: hold 32 lines after vsync to give time for AFBC start") +Signed-off-by: Carlo Caione +Acked-by: Martin Blumenstingl +Acked-by: Neil Armstrong +[narmstrong: added fixes tag] +Signed-off-by: Neil Armstrong +Link: https://patchwork.freedesktop.org/patch/msgid/20221216-afbc_s905x-v1-0-033bebf780d9@baylibre.com +Signed-off-by: Sasha Levin +--- + drivers/gpu/drm/meson/meson_viu.c | 5 ++--- + 1 file changed, 2 insertions(+), 3 deletions(-) + +diff --git a/drivers/gpu/drm/meson/meson_viu.c b/drivers/gpu/drm/meson/meson_viu.c +index d4b907889a21..cd399b0b7181 100644 +--- a/drivers/gpu/drm/meson/meson_viu.c ++++ b/drivers/gpu/drm/meson/meson_viu.c +@@ -436,15 +436,14 @@ void meson_viu_init(struct meson_drm *priv) + + /* Initialize OSD1 fifo control register */ + reg = VIU_OSD_DDR_PRIORITY_URGENT | +- VIU_OSD_HOLD_FIFO_LINES(31) | + VIU_OSD_FIFO_DEPTH_VAL(32) | /* fifo_depth_val: 32*8=256 */ + VIU_OSD_WORDS_PER_BURST(4) | /* 4 words in 1 burst */ + VIU_OSD_FIFO_LIMITS(2); /* fifo_lim: 2*16=32 */ + + if (meson_vpu_is_compatible(priv, VPU_COMPATIBLE_G12A)) +- reg |= VIU_OSD_BURST_LENGTH_32; ++ reg |= (VIU_OSD_BURST_LENGTH_32 | VIU_OSD_HOLD_FIFO_LINES(31)); + else +- reg |= VIU_OSD_BURST_LENGTH_64; ++ reg |= (VIU_OSD_BURST_LENGTH_64 | VIU_OSD_HOLD_FIFO_LINES(4)); + + writel_relaxed(reg, priv->io_base + _REG(VIU_OSD1_FIFO_CTRL_STAT)); + writel_relaxed(reg, priv->io_base + _REG(VIU_OSD2_FIFO_CTRL_STAT)); +-- +2.35.1 + diff --git a/queue-5.15/drm-panfrost-fix-gem-handle-creation-ref-counting.patch b/queue-5.15/drm-panfrost-fix-gem-handle-creation-ref-counting.patch new file mode 100644 index 00000000000..b4a116fb58f --- /dev/null +++ b/queue-5.15/drm-panfrost-fix-gem-handle-creation-ref-counting.patch @@ -0,0 +1,138 @@ +From 7ecd0e5544767cf44db4a4a23d32b56835f268c3 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 19 Dec 2022 14:01:30 +0000 +Subject: drm/panfrost: Fix GEM handle creation ref-counting + +From: Steven Price + +[ Upstream commit 4217c6ac817451d5116687f3cc6286220dc43d49 ] + +panfrost_gem_create_with_handle() previously returned a BO but with the +only reference being from the handle, which user space could in theory +guess and release, causing a use-after-free. Additionally if the call to +panfrost_gem_mapping_get() in panfrost_ioctl_create_bo() failed then +a(nother) reference on the BO was dropped. + +The _create_with_handle() is a problematic pattern, so ditch it and +instead create the handle in panfrost_ioctl_create_bo(). If the call to +panfrost_gem_mapping_get() fails then this means that user space has +indeed gone behind our back and freed the handle. In which case just +return an error code. + +Reported-by: Rob Clark +Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver") +Signed-off-by: Steven Price +Reviewed-by: Rob Clark +Link: https://patchwork.freedesktop.org/patch/msgid/20221219140130.410578-1-steven.price@arm.com +Signed-off-by: Sasha Levin +--- + drivers/gpu/drm/panfrost/panfrost_drv.c | 27 ++++++++++++++++--------- + drivers/gpu/drm/panfrost/panfrost_gem.c | 16 +-------------- + drivers/gpu/drm/panfrost/panfrost_gem.h | 5 +---- + 3 files changed, 20 insertions(+), 28 deletions(-) + +diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c +index e48e357ea4f1..4c271244092b 100644 +--- a/drivers/gpu/drm/panfrost/panfrost_drv.c ++++ b/drivers/gpu/drm/panfrost/panfrost_drv.c +@@ -82,6 +82,7 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data, + struct panfrost_gem_object *bo; + struct drm_panfrost_create_bo *args = data; + struct panfrost_gem_mapping *mapping; ++ int ret; + + if (!args->size || args->pad || + (args->flags & ~(PANFROST_BO_NOEXEC | PANFROST_BO_HEAP))) +@@ -92,21 +93,29 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data, + !(args->flags & PANFROST_BO_NOEXEC)) + return -EINVAL; + +- bo = panfrost_gem_create_with_handle(file, dev, args->size, args->flags, +- &args->handle); ++ bo = panfrost_gem_create(dev, args->size, args->flags); + if (IS_ERR(bo)) + return PTR_ERR(bo); + ++ ret = drm_gem_handle_create(file, &bo->base.base, &args->handle); ++ if (ret) ++ goto out; ++ + mapping = panfrost_gem_mapping_get(bo, priv); +- if (!mapping) { +- drm_gem_object_put(&bo->base.base); +- return -EINVAL; ++ if (mapping) { ++ args->offset = mapping->mmnode.start << PAGE_SHIFT; ++ panfrost_gem_mapping_put(mapping); ++ } else { ++ /* This can only happen if the handle from ++ * drm_gem_handle_create() has already been guessed and freed ++ * by user space ++ */ ++ ret = -EINVAL; + } + +- args->offset = mapping->mmnode.start << PAGE_SHIFT; +- panfrost_gem_mapping_put(mapping); +- +- return 0; ++out: ++ drm_gem_object_put(&bo->base.base); ++ return ret; + } + + /** +diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c +index 6d9bdb9180cb..55e3a68ed28a 100644 +--- a/drivers/gpu/drm/panfrost/panfrost_gem.c ++++ b/drivers/gpu/drm/panfrost/panfrost_gem.c +@@ -234,12 +234,8 @@ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t + } + + struct panfrost_gem_object * +-panfrost_gem_create_with_handle(struct drm_file *file_priv, +- struct drm_device *dev, size_t size, +- u32 flags, +- uint32_t *handle) ++panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags) + { +- int ret; + struct drm_gem_shmem_object *shmem; + struct panfrost_gem_object *bo; + +@@ -255,16 +251,6 @@ panfrost_gem_create_with_handle(struct drm_file *file_priv, + bo->noexec = !!(flags & PANFROST_BO_NOEXEC); + bo->is_heap = !!(flags & PANFROST_BO_HEAP); + +- /* +- * Allocate an id of idr table where the obj is registered +- * and handle has the id what user can see. +- */ +- ret = drm_gem_handle_create(file_priv, &shmem->base, handle); +- /* drop reference from allocate - handle holds it now. */ +- drm_gem_object_put(&shmem->base); +- if (ret) +- return ERR_PTR(ret); +- + return bo; + } + +diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h +index 8088d5fd8480..ad2877eeeccd 100644 +--- a/drivers/gpu/drm/panfrost/panfrost_gem.h ++++ b/drivers/gpu/drm/panfrost/panfrost_gem.h +@@ -69,10 +69,7 @@ panfrost_gem_prime_import_sg_table(struct drm_device *dev, + struct sg_table *sgt); + + struct panfrost_gem_object * +-panfrost_gem_create_with_handle(struct drm_file *file_priv, +- struct drm_device *dev, size_t size, +- u32 flags, +- uint32_t *handle); ++panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags); + + int panfrost_gem_open(struct drm_gem_object *obj, struct drm_file *file_priv); + void panfrost_gem_close(struct drm_gem_object *obj, +-- +2.35.1 + diff --git a/queue-5.15/ext4-correct-inconsistent-error-msg-in-nojournal-mod.patch b/queue-5.15/ext4-correct-inconsistent-error-msg-in-nojournal-mod.patch new file mode 100644 index 00000000000..89258f33089 --- /dev/null +++ b/queue-5.15/ext4-correct-inconsistent-error-msg-in-nojournal-mod.patch @@ -0,0 +1,55 @@ +From 32a8e0136c13e7cdf6183188dee2e50c3d69b6f6 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 9 Nov 2022 15:43:43 +0800 +Subject: ext4: correct inconsistent error msg in nojournal mode + +From: Baokun Li + +[ Upstream commit 89481b5fa8c0640e62ba84c6020cee895f7ac643 ] + +When we used the journal_async_commit mounting option in nojournal mode, +the kernel told me that "can't mount with journal_checksum", was very +confusing. I find that when we mount with journal_async_commit, both the +JOURNAL_ASYNC_COMMIT and EXPLICIT_JOURNAL_CHECKSUM flags are set. However, +in the error branch, CHECKSUM is checked before ASYNC_COMMIT. As a result, +the above inconsistency occurs, and the ASYNC_COMMIT branch becomes dead +code that cannot be executed. Therefore, we exchange the positions of the +two judgments to make the error msg more accurate. + +Signed-off-by: Baokun Li +Reviewed-by: Jan Kara +Link: https://lore.kernel.org/r/20221109074343.4184862-1-libaokun1@huawei.com +Signed-off-by: Theodore Ts'o +Cc: stable@kernel.org +Signed-off-by: Sasha Levin +--- + fs/ext4/super.c | 9 +++++---- + 1 file changed, 5 insertions(+), 4 deletions(-) + +diff --git a/fs/ext4/super.c b/fs/ext4/super.c +index fd7565707975..1bb2e902667d 100644 +--- a/fs/ext4/super.c ++++ b/fs/ext4/super.c +@@ -4667,14 +4667,15 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) + goto failed_mount3a; + } else { + /* Nojournal mode, all journal mount options are illegal */ +- if (test_opt2(sb, EXPLICIT_JOURNAL_CHECKSUM)) { ++ if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) { + ext4_msg(sb, KERN_ERR, "can't mount with " +- "journal_checksum, fs mounted w/o journal"); ++ "journal_async_commit, fs mounted w/o journal"); + goto failed_mount3a; + } +- if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) { ++ ++ if (test_opt2(sb, EXPLICIT_JOURNAL_CHECKSUM)) { + ext4_msg(sb, KERN_ERR, "can't mount with " +- "journal_async_commit, fs mounted w/o journal"); ++ "journal_checksum, fs mounted w/o journal"); + goto failed_mount3a; + } + if (sbi->s_commit_interval != JBD2_DEFAULT_MAX_COMMIT_AGE*HZ) { +-- +2.35.1 + diff --git a/queue-5.15/ext4-fix-deadlock-due-to-mbcache-entry-corruption.patch b/queue-5.15/ext4-fix-deadlock-due-to-mbcache-entry-corruption.patch new file mode 100644 index 00000000000..6a6e6408052 --- /dev/null +++ b/queue-5.15/ext4-fix-deadlock-due-to-mbcache-entry-corruption.patch @@ -0,0 +1,143 @@ +From a7462cd1035ae34a89e55937fdf0857f48cf8e42 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 23 Nov 2022 20:39:50 +0100 +Subject: ext4: fix deadlock due to mbcache entry corruption + +From: Jan Kara + +[ Upstream commit a44e84a9b7764c72896f7241a0ec9ac7e7ef38dd ] + +When manipulating xattr blocks, we can deadlock infinitely looping +inside ext4_xattr_block_set() where we constantly keep finding xattr +block for reuse in mbcache but we are unable to reuse it because its +reference count is too big. This happens because cache entry for the +xattr block is marked as reusable (e_reusable set) although its +reference count is too big. When this inconsistency happens, this +inconsistent state is kept indefinitely and so ext4_xattr_block_set() +keeps retrying indefinitely. + +The inconsistent state is caused by non-atomic update of e_reusable bit. +e_reusable is part of a bitfield and e_reusable update can race with +update of e_referenced bit in the same bitfield resulting in loss of one +of the updates. Fix the problem by using atomic bitops instead. + +This bug has been around for many years, but it became *much* easier +to hit after commit 65f8b80053a1 ("ext4: fix race when reusing xattr +blocks"). + +Cc: stable@vger.kernel.org +Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries") +Fixes: 65f8b80053a1 ("ext4: fix race when reusing xattr blocks") +Reported-and-tested-by: Jeremi Piotrowski +Reported-by: Thilo Fromm +Link: https://lore.kernel.org/r/c77bf00f-4618-7149-56f1-b8d1664b9d07@linux.microsoft.com/ +Signed-off-by: Jan Kara +Reviewed-by: Andreas Dilger +Link: https://lore.kernel.org/r/20221123193950.16758-1-jack@suse.cz +Signed-off-by: Theodore Ts'o +Signed-off-by: Sasha Levin +--- + fs/ext4/xattr.c | 4 ++-- + fs/mbcache.c | 14 ++++++++------ + include/linux/mbcache.h | 9 +++++++-- + 3 files changed, 17 insertions(+), 10 deletions(-) + +diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c +index 5ac31d3baab4..b92da41e9640 100644 +--- a/fs/ext4/xattr.c ++++ b/fs/ext4/xattr.c +@@ -1281,7 +1281,7 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, + ce = mb_cache_entry_get(ea_block_cache, hash, + bh->b_blocknr); + if (ce) { +- ce->e_reusable = 1; ++ set_bit(MBE_REUSABLE_B, &ce->e_flags); + mb_cache_entry_put(ea_block_cache, ce); + } + } +@@ -2045,7 +2045,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, + } + BHDR(new_bh)->h_refcount = cpu_to_le32(ref); + if (ref == EXT4_XATTR_REFCOUNT_MAX) +- ce->e_reusable = 0; ++ clear_bit(MBE_REUSABLE_B, &ce->e_flags); + ea_bdebug(new_bh, "reusing; refcount now=%d", + ref); + ext4_xattr_block_csum_set(inode, new_bh); +diff --git a/fs/mbcache.c b/fs/mbcache.c +index 950f1829a7fd..7a12ae87c806 100644 +--- a/fs/mbcache.c ++++ b/fs/mbcache.c +@@ -94,8 +94,9 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, + atomic_set(&entry->e_refcnt, 1); + entry->e_key = key; + entry->e_value = value; +- entry->e_reusable = reusable; +- entry->e_referenced = 0; ++ entry->e_flags = 0; ++ if (reusable) ++ set_bit(MBE_REUSABLE_B, &entry->e_flags); + head = mb_cache_entry_head(cache, key); + hlist_bl_lock(head); + hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) { +@@ -162,7 +163,8 @@ static struct mb_cache_entry *__entry_find(struct mb_cache *cache, + while (node) { + entry = hlist_bl_entry(node, struct mb_cache_entry, + e_hash_list); +- if (entry->e_key == key && entry->e_reusable && ++ if (entry->e_key == key && ++ test_bit(MBE_REUSABLE_B, &entry->e_flags) && + atomic_inc_not_zero(&entry->e_refcnt)) + goto out; + node = node->next; +@@ -318,7 +320,7 @@ EXPORT_SYMBOL(mb_cache_entry_delete_or_get); + void mb_cache_entry_touch(struct mb_cache *cache, + struct mb_cache_entry *entry) + { +- entry->e_referenced = 1; ++ set_bit(MBE_REFERENCED_B, &entry->e_flags); + } + EXPORT_SYMBOL(mb_cache_entry_touch); + +@@ -343,9 +345,9 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache, + entry = list_first_entry(&cache->c_list, + struct mb_cache_entry, e_list); + /* Drop initial hash reference if there is no user */ +- if (entry->e_referenced || ++ if (test_bit(MBE_REFERENCED_B, &entry->e_flags) || + atomic_cmpxchg(&entry->e_refcnt, 1, 0) != 1) { +- entry->e_referenced = 0; ++ clear_bit(MBE_REFERENCED_B, &entry->e_flags); + list_move_tail(&entry->e_list, &cache->c_list); + continue; + } +diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h +index e9d5ece87794..591bc4cefe1d 100644 +--- a/include/linux/mbcache.h ++++ b/include/linux/mbcache.h +@@ -10,6 +10,12 @@ + + struct mb_cache; + ++/* Cache entry flags */ ++enum { ++ MBE_REFERENCED_B = 0, ++ MBE_REUSABLE_B ++}; ++ + struct mb_cache_entry { + /* List of entries in cache - protected by cache->c_list_lock */ + struct list_head e_list; +@@ -26,8 +32,7 @@ struct mb_cache_entry { + atomic_t e_refcnt; + /* Key in hash - stable during lifetime of the entry */ + u32 e_key; +- u32 e_referenced:1; +- u32 e_reusable:1; ++ unsigned long e_flags; + /* User provided value - stable during lifetime of the entry */ + u64 e_value; + }; +-- +2.35.1 + diff --git a/queue-5.15/ext4-goto-right-label-failed_mount3a.patch b/queue-5.15/ext4-goto-right-label-failed_mount3a.patch new file mode 100644 index 00000000000..3204c3e3ce1 --- /dev/null +++ b/queue-5.15/ext4-goto-right-label-failed_mount3a.patch @@ -0,0 +1,69 @@ +From 2c25b8d2e8d8d5697fb90d6a1cbb8b4cada60fcb Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 16 Sep 2022 22:15:12 +0800 +Subject: ext4: goto right label 'failed_mount3a' + +From: Jason Yan + +[ Upstream commit 43bd6f1b49b61f43de4d4e33661b8dbe8c911f14 ] + +Before these two branches neither loaded the journal nor created the +xattr cache. So the right label to goto is 'failed_mount3a'. Although +this did not cause any issues because the error handler validated if the +pointer is null. However this still made me confused when reading +the code. So it's still worth to modify to goto the right label. + +Signed-off-by: Jason Yan +Reviewed-by: Jan Kara +Reviewed-by: Ritesh Harjani (IBM) +Link: https://lore.kernel.org/r/20220916141527.1012715-2-yanaijie@huawei.com +Signed-off-by: Theodore Ts'o +Stable-dep-of: 89481b5fa8c0 ("ext4: correct inconsistent error msg in nojournal mode") +Signed-off-by: Sasha Levin +--- + fs/ext4/super.c | 10 +++++----- + 1 file changed, 5 insertions(+), 5 deletions(-) + +diff --git a/fs/ext4/super.c b/fs/ext4/super.c +index cdc2b1e6aa41..fd7565707975 100644 +--- a/fs/ext4/super.c ++++ b/fs/ext4/super.c +@@ -4664,30 +4664,30 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) + ext4_has_feature_journal_needs_recovery(sb)) { + ext4_msg(sb, KERN_ERR, "required journal recovery " + "suppressed and not mounted read-only"); +- goto failed_mount_wq; ++ goto failed_mount3a; + } else { + /* Nojournal mode, all journal mount options are illegal */ + if (test_opt2(sb, EXPLICIT_JOURNAL_CHECKSUM)) { + ext4_msg(sb, KERN_ERR, "can't mount with " + "journal_checksum, fs mounted w/o journal"); +- goto failed_mount_wq; ++ goto failed_mount3a; + } + if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) { + ext4_msg(sb, KERN_ERR, "can't mount with " + "journal_async_commit, fs mounted w/o journal"); +- goto failed_mount_wq; ++ goto failed_mount3a; + } + if (sbi->s_commit_interval != JBD2_DEFAULT_MAX_COMMIT_AGE*HZ) { + ext4_msg(sb, KERN_ERR, "can't mount with " + "commit=%lu, fs mounted w/o journal", + sbi->s_commit_interval / HZ); +- goto failed_mount_wq; ++ goto failed_mount3a; + } + if (EXT4_MOUNT_DATA_FLAGS & + (sbi->s_mount_opt ^ sbi->s_def_mount_opt)) { + ext4_msg(sb, KERN_ERR, "can't mount with " + "data=, fs mounted w/o journal"); +- goto failed_mount_wq; ++ goto failed_mount3a; + } + sbi->s_def_mount_opt &= ~EXT4_MOUNT_JOURNAL_CHECKSUM; + clear_opt(sb, JOURNAL_CHECKSUM); +-- +2.35.1 + diff --git a/queue-5.15/filelock-new-helper-vfs_inode_has_locks.patch b/queue-5.15/filelock-new-helper-vfs_inode_has_locks.patch new file mode 100644 index 00000000000..7405d921b46 --- /dev/null +++ b/queue-5.15/filelock-new-helper-vfs_inode_has_locks.patch @@ -0,0 +1,89 @@ +From 3b7bab1da1226b7a3a979b1f000c1c1c760ff88b Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 14 Nov 2022 08:33:09 -0500 +Subject: filelock: new helper: vfs_inode_has_locks + +From: Jeff Layton + +[ Upstream commit ab1ddef98a715eddb65309ffa83267e4e84a571e ] + +Ceph has a need to know whether a particular inode has any locks set on +it. It's currently tracking that by a num_locks field in its +filp->private_data, but that's problematic as it tries to decrement this +field when releasing locks and that can race with the file being torn +down. + +Add a new vfs_inode_has_locks helper that just returns whether any locks +are currently held on the inode. + +Reviewed-by: Xiubo Li +Reviewed-by: Christoph Hellwig +Signed-off-by: Jeff Layton +Stable-dep-of: 461ab10ef7e6 ("ceph: switch to vfs_inode_has_locks() to fix file lock bug") +Signed-off-by: Sasha Levin +--- + fs/locks.c | 23 +++++++++++++++++++++++ + include/linux/fs.h | 6 ++++++ + 2 files changed, 29 insertions(+) + +diff --git a/fs/locks.c b/fs/locks.c +index 3d6fb4ae847b..82a4487e95b3 100644 +--- a/fs/locks.c ++++ b/fs/locks.c +@@ -2703,6 +2703,29 @@ int vfs_cancel_lock(struct file *filp, struct file_lock *fl) + } + EXPORT_SYMBOL_GPL(vfs_cancel_lock); + ++/** ++ * vfs_inode_has_locks - are any file locks held on @inode? ++ * @inode: inode to check for locks ++ * ++ * Return true if there are any FL_POSIX or FL_FLOCK locks currently ++ * set on @inode. ++ */ ++bool vfs_inode_has_locks(struct inode *inode) ++{ ++ struct file_lock_context *ctx; ++ bool ret; ++ ++ ctx = smp_load_acquire(&inode->i_flctx); ++ if (!ctx) ++ return false; ++ ++ spin_lock(&ctx->flc_lock); ++ ret = !list_empty(&ctx->flc_posix) || !list_empty(&ctx->flc_flock); ++ spin_unlock(&ctx->flc_lock); ++ return ret; ++} ++EXPORT_SYMBOL_GPL(vfs_inode_has_locks); ++ + #ifdef CONFIG_PROC_FS + #include + #include +diff --git a/include/linux/fs.h b/include/linux/fs.h +index 68fcf3ec9cf6..1e1ac116dd13 100644 +--- a/include/linux/fs.h ++++ b/include/linux/fs.h +@@ -1195,6 +1195,7 @@ extern int locks_delete_block(struct file_lock *); + extern int vfs_test_lock(struct file *, struct file_lock *); + extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *); + extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl); ++bool vfs_inode_has_locks(struct inode *inode); + extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl); + extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type); + extern void lease_get_mtime(struct inode *, struct timespec64 *time); +@@ -1307,6 +1308,11 @@ static inline int vfs_cancel_lock(struct file *filp, struct file_lock *fl) + return 0; + } + ++static inline bool vfs_inode_has_locks(struct inode *inode) ++{ ++ return false; ++} ++ + static inline int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl) + { + return -ENOLCK; +-- +2.35.1 + diff --git a/queue-5.15/fs-ntfs3-don-t-hold-ni_lock-when-calling-truncate_se.patch b/queue-5.15/fs-ntfs3-don-t-hold-ni_lock-when-calling-truncate_se.patch new file mode 100644 index 00000000000..647c6d59b76 --- /dev/null +++ b/queue-5.15/fs-ntfs3-don-t-hold-ni_lock-when-calling-truncate_se.patch @@ -0,0 +1,51 @@ +From 3ba063b5b6536ed5c14d48f34039cf73377734ee Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 2 Jan 2023 23:05:33 +0900 +Subject: fs/ntfs3: don't hold ni_lock when calling truncate_setsize() + +From: Tetsuo Handa + +[ Upstream commit 0226635c304cfd5c9db9b78c259cb713819b057e ] + +syzbot is reporting hung task at do_user_addr_fault() [1], for there is +a silent deadlock between PG_locked bit and ni_lock lock. + +Since filemap_update_page() calls filemap_read_folio() after calling +folio_trylock() which will set PG_locked bit, ntfs_truncate() must not +call truncate_setsize() which will wait for PG_locked bit to be cleared +when holding ni_lock lock. + +Link: https://lore.kernel.org/all/00000000000060d41f05f139aa44@google.com/ +Link: https://syzkaller.appspot.com/bug?extid=bed15dbf10294aa4f2ae [1] +Reported-by: syzbot +Debugged-by: Linus Torvalds +Co-developed-by: Hillf Danton +Signed-off-by: Hillf Danton +Signed-off-by: Tetsuo Handa +Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation") +Signed-off-by: Linus Torvalds +Signed-off-by: Sasha Levin +--- + fs/ntfs3/file.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c +index 7a678a5b1ca5..c526e0427f2b 100644 +--- a/fs/ntfs3/file.c ++++ b/fs/ntfs3/file.c +@@ -488,10 +488,10 @@ static int ntfs_truncate(struct inode *inode, loff_t new_size) + + new_valid = ntfs_up_block(sb, min_t(u64, ni->i_valid, new_size)); + +- ni_lock(ni); +- + truncate_setsize(inode, new_size); + ++ ni_lock(ni); ++ + down_write(&ni->file.run_lock); + err = attr_set_size(ni, ATTR_DATA, NULL, 0, &ni->file.run, new_size, + &new_valid, ni->mi.sbi->options->prealloc, NULL); +-- +2.35.1 + diff --git a/queue-5.15/gpio-sifive-fix-refcount-leak-in-sifive_gpio_probe.patch b/queue-5.15/gpio-sifive-fix-refcount-leak-in-sifive_gpio_probe.patch new file mode 100644 index 00000000000..b3680bed136 --- /dev/null +++ b/queue-5.15/gpio-sifive-fix-refcount-leak-in-sifive_gpio_probe.patch @@ -0,0 +1,36 @@ +From 5a3e595e47414d19a5c1166ffd952a3600c2c7b0 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 2 Jan 2023 12:20:39 +0400 +Subject: gpio: sifive: Fix refcount leak in sifive_gpio_probe + +From: Miaoqian Lin + +[ Upstream commit 694175cd8a1643cde3acb45c9294bca44a8e08e9 ] + +of_irq_find_parent() returns a node pointer with refcount incremented, +We should use of_node_put() on it when not needed anymore. +Add missing of_node_put() to avoid refcount leak. + +Fixes: 96868dce644d ("gpio/sifive: Add GPIO driver for SiFive SoCs") +Signed-off-by: Miaoqian Lin +Signed-off-by: Bartosz Golaszewski +Signed-off-by: Sasha Levin +--- + drivers/gpio/gpio-sifive.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/drivers/gpio/gpio-sifive.c b/drivers/gpio/gpio-sifive.c +index 7d82388b4ab7..f50236e68e88 100644 +--- a/drivers/gpio/gpio-sifive.c ++++ b/drivers/gpio/gpio-sifive.c +@@ -209,6 +209,7 @@ static int sifive_gpio_probe(struct platform_device *pdev) + return -ENODEV; + } + parent = irq_find_host(irq_parent); ++ of_node_put(irq_parent); + if (!parent) { + dev_err(dev, "no IRQ parent domain\n"); + return -ENODEV; +-- +2.35.1 + diff --git a/queue-5.15/io_uring-check-for-valid-register-opcode-earlier.patch b/queue-5.15/io_uring-check-for-valid-register-opcode-earlier.patch new file mode 100644 index 00000000000..c7ca51fd4ae --- /dev/null +++ b/queue-5.15/io_uring-check-for-valid-register-opcode-earlier.patch @@ -0,0 +1,45 @@ +From 0639ac34c362fcfb9ffef38f2932ed3d38981dec Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 23 Dec 2022 06:37:08 -0700 +Subject: io_uring: check for valid register opcode earlier + +From: Jens Axboe + +[ Upstream commit 343190841a1f22b96996d9f8cfab902a4d1bfd0e ] + +We only check the register opcode value inside the restricted ring +section, move it into the main io_uring_register() function instead +and check it up front. + +Signed-off-by: Jens Axboe +Signed-off-by: Sasha Levin +--- + io_uring/io_uring.c | 5 +++-- + 1 file changed, 3 insertions(+), 2 deletions(-) + +diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c +index eebbe8a6da0c..52a08632326a 100644 +--- a/io_uring/io_uring.c ++++ b/io_uring/io_uring.c +@@ -10895,8 +10895,6 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, + return -ENXIO; + + if (ctx->restricted) { +- if (opcode >= IORING_REGISTER_LAST) +- return -EINVAL; + opcode = array_index_nospec(opcode, IORING_REGISTER_LAST); + if (!test_bit(opcode, ctx->restrictions.register_op)) + return -EACCES; +@@ -11028,6 +11026,9 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, + long ret = -EBADF; + struct fd f; + ++ if (opcode >= IORING_REGISTER_LAST) ++ return -EINVAL; ++ + f = fdget(fd); + if (!f.file) + return -EBADF; +-- +2.35.1 + diff --git a/queue-5.15/mbcache-automatically-delete-entries-from-cache-on-f.patch b/queue-5.15/mbcache-automatically-delete-entries-from-cache-on-f.patch new file mode 100644 index 00000000000..3fcdff5072d --- /dev/null +++ b/queue-5.15/mbcache-automatically-delete-entries-from-cache-on-f.patch @@ -0,0 +1,274 @@ +From 14234b3988b436ae30fd64577c026b533e0e120a Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 12 Jul 2022 12:54:29 +0200 +Subject: mbcache: automatically delete entries from cache on freeing + +From: Jan Kara + +[ Upstream commit 307af6c879377c1c63e71cbdd978201f9c7ee8df ] + +Use the fact that entries with elevated refcount are not removed from +the hash and just move removal of the entry from the hash to the entry +freeing time. When doing this we also change the generic code to hold +one reference to the cache entry, not two of them, which makes code +somewhat more obvious. + +Signed-off-by: Jan Kara +Link: https://lore.kernel.org/r/20220712105436.32204-10-jack@suse.cz +Signed-off-by: Theodore Ts'o +Stable-dep-of: a44e84a9b776 ("ext4: fix deadlock due to mbcache entry corruption") +Signed-off-by: Sasha Levin +--- + fs/mbcache.c | 108 +++++++++++++++------------------------- + include/linux/mbcache.h | 24 ++++++--- + 2 files changed, 55 insertions(+), 77 deletions(-) + +diff --git a/fs/mbcache.c b/fs/mbcache.c +index 2010bc80a3f2..950f1829a7fd 100644 +--- a/fs/mbcache.c ++++ b/fs/mbcache.c +@@ -90,7 +90,7 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, + return -ENOMEM; + + INIT_LIST_HEAD(&entry->e_list); +- /* One ref for hash, one ref returned */ ++ /* Initial hash reference */ + atomic_set(&entry->e_refcnt, 1); + entry->e_key = key; + entry->e_value = value; +@@ -106,21 +106,28 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, + } + } + hlist_bl_add_head(&entry->e_hash_list, head); +- hlist_bl_unlock(head); +- ++ /* ++ * Add entry to LRU list before it can be found by ++ * mb_cache_entry_delete() to avoid races ++ */ + spin_lock(&cache->c_list_lock); + list_add_tail(&entry->e_list, &cache->c_list); +- /* Grab ref for LRU list */ +- atomic_inc(&entry->e_refcnt); + cache->c_entry_count++; + spin_unlock(&cache->c_list_lock); ++ hlist_bl_unlock(head); + + return 0; + } + EXPORT_SYMBOL(mb_cache_entry_create); + +-void __mb_cache_entry_free(struct mb_cache_entry *entry) ++void __mb_cache_entry_free(struct mb_cache *cache, struct mb_cache_entry *entry) + { ++ struct hlist_bl_head *head; ++ ++ head = mb_cache_entry_head(cache, entry->e_key); ++ hlist_bl_lock(head); ++ hlist_bl_del(&entry->e_hash_list); ++ hlist_bl_unlock(head); + kmem_cache_free(mb_entry_cache, entry); + } + EXPORT_SYMBOL(__mb_cache_entry_free); +@@ -134,7 +141,7 @@ EXPORT_SYMBOL(__mb_cache_entry_free); + */ + void mb_cache_entry_wait_unused(struct mb_cache_entry *entry) + { +- wait_var_event(&entry->e_refcnt, atomic_read(&entry->e_refcnt) <= 3); ++ wait_var_event(&entry->e_refcnt, atomic_read(&entry->e_refcnt) <= 2); + } + EXPORT_SYMBOL(mb_cache_entry_wait_unused); + +@@ -155,10 +162,9 @@ static struct mb_cache_entry *__entry_find(struct mb_cache *cache, + while (node) { + entry = hlist_bl_entry(node, struct mb_cache_entry, + e_hash_list); +- if (entry->e_key == key && entry->e_reusable) { +- atomic_inc(&entry->e_refcnt); ++ if (entry->e_key == key && entry->e_reusable && ++ atomic_inc_not_zero(&entry->e_refcnt)) + goto out; +- } + node = node->next; + } + entry = NULL; +@@ -218,10 +224,9 @@ struct mb_cache_entry *mb_cache_entry_get(struct mb_cache *cache, u32 key, + head = mb_cache_entry_head(cache, key); + hlist_bl_lock(head); + hlist_bl_for_each_entry(entry, node, head, e_hash_list) { +- if (entry->e_key == key && entry->e_value == value) { +- atomic_inc(&entry->e_refcnt); ++ if (entry->e_key == key && entry->e_value == value && ++ atomic_inc_not_zero(&entry->e_refcnt)) + goto out; +- } + } + entry = NULL; + out: +@@ -281,37 +286,25 @@ EXPORT_SYMBOL(mb_cache_entry_delete); + struct mb_cache_entry *mb_cache_entry_delete_or_get(struct mb_cache *cache, + u32 key, u64 value) + { +- struct hlist_bl_node *node; +- struct hlist_bl_head *head; + struct mb_cache_entry *entry; + +- head = mb_cache_entry_head(cache, key); +- hlist_bl_lock(head); +- hlist_bl_for_each_entry(entry, node, head, e_hash_list) { +- if (entry->e_key == key && entry->e_value == value) { +- if (atomic_read(&entry->e_refcnt) > 2) { +- atomic_inc(&entry->e_refcnt); +- hlist_bl_unlock(head); +- return entry; +- } +- /* We keep hash list reference to keep entry alive */ +- hlist_bl_del_init(&entry->e_hash_list); +- hlist_bl_unlock(head); +- spin_lock(&cache->c_list_lock); +- if (!list_empty(&entry->e_list)) { +- list_del_init(&entry->e_list); +- if (!WARN_ONCE(cache->c_entry_count == 0, +- "mbcache: attempt to decrement c_entry_count past zero")) +- cache->c_entry_count--; +- atomic_dec(&entry->e_refcnt); +- } +- spin_unlock(&cache->c_list_lock); +- mb_cache_entry_put(cache, entry); +- return NULL; +- } +- } +- hlist_bl_unlock(head); ++ entry = mb_cache_entry_get(cache, key, value); ++ if (!entry) ++ return NULL; + ++ /* ++ * Drop the ref we got from mb_cache_entry_get() and the initial hash ++ * ref if we are the last user ++ */ ++ if (atomic_cmpxchg(&entry->e_refcnt, 2, 0) != 2) ++ return entry; ++ ++ spin_lock(&cache->c_list_lock); ++ if (!list_empty(&entry->e_list)) ++ list_del_init(&entry->e_list); ++ cache->c_entry_count--; ++ spin_unlock(&cache->c_list_lock); ++ __mb_cache_entry_free(cache, entry); + return NULL; + } + EXPORT_SYMBOL(mb_cache_entry_delete_or_get); +@@ -343,42 +336,24 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache, + unsigned long nr_to_scan) + { + struct mb_cache_entry *entry; +- struct hlist_bl_head *head; + unsigned long shrunk = 0; + + spin_lock(&cache->c_list_lock); + while (nr_to_scan-- && !list_empty(&cache->c_list)) { + entry = list_first_entry(&cache->c_list, + struct mb_cache_entry, e_list); +- if (entry->e_referenced || atomic_read(&entry->e_refcnt) > 2) { ++ /* Drop initial hash reference if there is no user */ ++ if (entry->e_referenced || ++ atomic_cmpxchg(&entry->e_refcnt, 1, 0) != 1) { + entry->e_referenced = 0; + list_move_tail(&entry->e_list, &cache->c_list); + continue; + } + list_del_init(&entry->e_list); + cache->c_entry_count--; +- /* +- * We keep LRU list reference so that entry doesn't go away +- * from under us. +- */ + spin_unlock(&cache->c_list_lock); +- head = mb_cache_entry_head(cache, entry->e_key); +- hlist_bl_lock(head); +- /* Now a reliable check if the entry didn't get used... */ +- if (atomic_read(&entry->e_refcnt) > 2) { +- hlist_bl_unlock(head); +- spin_lock(&cache->c_list_lock); +- list_add_tail(&entry->e_list, &cache->c_list); +- cache->c_entry_count++; +- continue; +- } +- if (!hlist_bl_unhashed(&entry->e_hash_list)) { +- hlist_bl_del_init(&entry->e_hash_list); +- atomic_dec(&entry->e_refcnt); +- } +- hlist_bl_unlock(head); +- if (mb_cache_entry_put(cache, entry)) +- shrunk++; ++ __mb_cache_entry_free(cache, entry); ++ shrunk++; + cond_resched(); + spin_lock(&cache->c_list_lock); + } +@@ -470,11 +445,6 @@ void mb_cache_destroy(struct mb_cache *cache) + * point. + */ + list_for_each_entry_safe(entry, next, &cache->c_list, e_list) { +- if (!hlist_bl_unhashed(&entry->e_hash_list)) { +- hlist_bl_del_init(&entry->e_hash_list); +- atomic_dec(&entry->e_refcnt); +- } else +- WARN_ON(1); + list_del(&entry->e_list); + WARN_ON(atomic_read(&entry->e_refcnt) != 1); + mb_cache_entry_put(cache, entry); +diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h +index 8eca7f25c432..e9d5ece87794 100644 +--- a/include/linux/mbcache.h ++++ b/include/linux/mbcache.h +@@ -13,8 +13,16 @@ struct mb_cache; + struct mb_cache_entry { + /* List of entries in cache - protected by cache->c_list_lock */ + struct list_head e_list; +- /* Hash table list - protected by hash chain bitlock */ ++ /* ++ * Hash table list - protected by hash chain bitlock. The entry is ++ * guaranteed to be hashed while e_refcnt > 0. ++ */ + struct hlist_bl_node e_hash_list; ++ /* ++ * Entry refcount. Once it reaches zero, entry is unhashed and freed. ++ * While refcount > 0, the entry is guaranteed to stay in the hash and ++ * e.g. mb_cache_entry_try_delete() will fail. ++ */ + atomic_t e_refcnt; + /* Key in hash - stable during lifetime of the entry */ + u32 e_key; +@@ -29,20 +37,20 @@ void mb_cache_destroy(struct mb_cache *cache); + + int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, + u64 value, bool reusable); +-void __mb_cache_entry_free(struct mb_cache_entry *entry); ++void __mb_cache_entry_free(struct mb_cache *cache, ++ struct mb_cache_entry *entry); + void mb_cache_entry_wait_unused(struct mb_cache_entry *entry); +-static inline int mb_cache_entry_put(struct mb_cache *cache, +- struct mb_cache_entry *entry) ++static inline void mb_cache_entry_put(struct mb_cache *cache, ++ struct mb_cache_entry *entry) + { + unsigned int cnt = atomic_dec_return(&entry->e_refcnt); + + if (cnt > 0) { +- if (cnt <= 3) ++ if (cnt <= 2) + wake_up_var(&entry->e_refcnt); +- return 0; ++ return; + } +- __mb_cache_entry_free(entry); +- return 1; ++ __mb_cache_entry_free(cache, entry); + } + + struct mb_cache_entry *mb_cache_entry_delete_or_get(struct mb_cache *cache, +-- +2.35.1 + diff --git a/queue-5.15/net-amd-xgbe-add-missed-tasklet_kill.patch b/queue-5.15/net-amd-xgbe-add-missed-tasklet_kill.patch new file mode 100644 index 00000000000..c194933ce97 --- /dev/null +++ b/queue-5.15/net-amd-xgbe-add-missed-tasklet_kill.patch @@ -0,0 +1,71 @@ +From 1413a513eb77c55072c45a0dc569fe400c04ace2 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 28 Dec 2022 16:14:47 +0800 +Subject: net: amd-xgbe: add missed tasklet_kill + +From: Jiguang Xiao + +[ Upstream commit d530ece70f16f912e1d1bfeea694246ab78b0a4b ] + +The driver does not call tasklet_kill in several places. +Add the calls to fix it. + +Fixes: 85b85c853401 ("amd-xgbe: Re-issue interrupt if interrupt status not cleared") +Signed-off-by: Jiguang Xiao +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 3 +++ + drivers/net/ethernet/amd/xgbe/xgbe-i2c.c | 4 +++- + drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 4 +++- + 3 files changed, 9 insertions(+), 2 deletions(-) + +diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c +index e6883d52d230..555db1871ec9 100644 +--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c ++++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c +@@ -1064,6 +1064,9 @@ static void xgbe_free_irqs(struct xgbe_prv_data *pdata) + + devm_free_irq(pdata->dev, pdata->dev_irq, pdata); + ++ tasklet_kill(&pdata->tasklet_dev); ++ tasklet_kill(&pdata->tasklet_ecc); ++ + if (pdata->vdata->ecc_support && (pdata->dev_irq != pdata->ecc_irq)) + devm_free_irq(pdata->dev, pdata->ecc_irq, pdata); + +diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c b/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c +index 22d4fc547a0a..a9ccc4258ee5 100644 +--- a/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c ++++ b/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c +@@ -447,8 +447,10 @@ static void xgbe_i2c_stop(struct xgbe_prv_data *pdata) + xgbe_i2c_disable(pdata); + xgbe_i2c_clear_all_interrupts(pdata); + +- if (pdata->dev_irq != pdata->i2c_irq) ++ if (pdata->dev_irq != pdata->i2c_irq) { + devm_free_irq(pdata->dev, pdata->i2c_irq, pdata); ++ tasklet_kill(&pdata->tasklet_i2c); ++ } + } + + static int xgbe_i2c_start(struct xgbe_prv_data *pdata) +diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c +index 4e97b4869522..0c5c1b155683 100644 +--- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c ++++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c +@@ -1390,8 +1390,10 @@ static void xgbe_phy_stop(struct xgbe_prv_data *pdata) + /* Disable auto-negotiation */ + xgbe_an_disable_all(pdata); + +- if (pdata->dev_irq != pdata->an_irq) ++ if (pdata->dev_irq != pdata->an_irq) { + devm_free_irq(pdata->dev, pdata->an_irq, pdata); ++ tasklet_kill(&pdata->tasklet_an); ++ } + + pdata->phy_if.phy_impl.stop(pdata); + +-- +2.35.1 + diff --git a/queue-5.15/net-dsa-mv88e6xxx-depend-on-ptp-conditionally.patch b/queue-5.15/net-dsa-mv88e6xxx-depend-on-ptp-conditionally.patch new file mode 100644 index 00000000000..298046a4dbc --- /dev/null +++ b/queue-5.15/net-dsa-mv88e6xxx-depend-on-ptp-conditionally.patch @@ -0,0 +1,55 @@ +From 7d185fb6cc74b4c84572cfd3955634ee2ff27950 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 22 Dec 2022 22:34:05 +0800 +Subject: net: dsa: mv88e6xxx: depend on PTP conditionally + +From: Johnny S. Lee + +[ Upstream commit 30e725537546248bddc12eaac2fe0a258917f190 ] + +PTP hardware timestamping related objects are not linked when PTP +support for MV88E6xxx (NET_DSA_MV88E6XXX_PTP) is disabled, therefore +NET_DSA_MV88E6XXX should not depend on PTP_1588_CLOCK_OPTIONAL +regardless of NET_DSA_MV88E6XXX_PTP. + +Instead, condition more strictly on how NET_DSA_MV88E6XXX_PTP's +dependencies are met, making sure that it cannot be enabled when +NET_DSA_MV88E6XXX=y and PTP_1588_CLOCK=m. + +In other words, this commit allows NET_DSA_MV88E6XXX to be built-in +while PTP_1588_CLOCK is a module, as long as NET_DSA_MV88E6XXX_PTP is +prevented from being enabled. + +Fixes: e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies") +Signed-off-by: Johnny S. Lee +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/dsa/mv88e6xxx/Kconfig | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +diff --git a/drivers/net/dsa/mv88e6xxx/Kconfig b/drivers/net/dsa/mv88e6xxx/Kconfig +index 7a2445a34eb7..e3181d5471df 100644 +--- a/drivers/net/dsa/mv88e6xxx/Kconfig ++++ b/drivers/net/dsa/mv88e6xxx/Kconfig +@@ -2,7 +2,6 @@ + config NET_DSA_MV88E6XXX + tristate "Marvell 88E6xxx Ethernet switch fabric support" + depends on NET_DSA +- depends on PTP_1588_CLOCK_OPTIONAL + select IRQ_DOMAIN + select NET_DSA_TAG_EDSA + select NET_DSA_TAG_DSA +@@ -13,7 +12,8 @@ config NET_DSA_MV88E6XXX + config NET_DSA_MV88E6XXX_PTP + bool "PTP support for Marvell 88E6xxx" + default n +- depends on NET_DSA_MV88E6XXX && PTP_1588_CLOCK ++ depends on (NET_DSA_MV88E6XXX = y && PTP_1588_CLOCK = y) || \ ++ (NET_DSA_MV88E6XXX = m && PTP_1588_CLOCK) + help + Say Y to enable PTP hardware timestamping on Marvell 88E6xxx switch + chips that support it. +-- +2.35.1 + diff --git a/queue-5.15/net-ena-account-for-the-number-of-processed-bytes-in.patch b/queue-5.15/net-ena-account-for-the-number-of-processed-bytes-in.patch new file mode 100644 index 00000000000..e23ff4369ea --- /dev/null +++ b/queue-5.15/net-ena-account-for-the-number-of-processed-bytes-in.patch @@ -0,0 +1,36 @@ +From 66d988f7b7b9a92d75a8e11a67ef1154e46619ad Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 29 Dec 2022 07:30:07 +0000 +Subject: net: ena: Account for the number of processed bytes in XDP + +From: David Arinzon + +[ Upstream commit c7f5e34d906320fdc996afa616676161c029cc02 ] + +The size of packets that were forwarded or dropped by XDP wasn't added +to the total processed bytes statistic. + +Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") +Signed-off-by: Shay Agroskin +Signed-off-by: David Arinzon +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/amazon/ena/ena_netdev.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c +index da16f428e7fa..31afbd17e690 100644 +--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c ++++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c +@@ -1729,6 +1729,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, + } + if (xdp_verdict != XDP_PASS) { + xdp_flags |= xdp_verdict; ++ total_len += ena_rx_ctx.ena_bufs[0].len; + res_budget--; + continue; + } +-- +2.35.1 + diff --git a/queue-5.15/net-ena-don-t-register-memory-info-on-xdp-exchange.patch b/queue-5.15/net-ena-don-t-register-memory-info-on-xdp-exchange.patch new file mode 100644 index 00000000000..646f3036bf9 --- /dev/null +++ b/queue-5.15/net-ena-don-t-register-memory-info-on-xdp-exchange.patch @@ -0,0 +1,50 @@ +From 22d4acc09fe1ba6e4da5773f370ebbab6014b18c Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 29 Dec 2022 07:30:06 +0000 +Subject: net: ena: Don't register memory info on XDP exchange + +From: David Arinzon + +[ Upstream commit 9c9e539956fa67efb8a65e32b72a853740b33445 ] + +Since the queues aren't destroyed when we only exchange XDP programs, +there's no need to re-register them again. + +Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") +Signed-off-by: Shay Agroskin +Signed-off-by: David Arinzon +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/amazon/ena/ena_netdev.c | 8 +++++--- + 1 file changed, 5 insertions(+), 3 deletions(-) + +diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c +index f032e58a4c3c..da16f428e7fa 100644 +--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c ++++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c +@@ -516,16 +516,18 @@ static void ena_xdp_exchange_program_rx_in_range(struct ena_adapter *adapter, + struct bpf_prog *prog, + int first, int count) + { ++ struct bpf_prog *old_bpf_prog; + struct ena_ring *rx_ring; + int i = 0; + + for (i = first; i < count; i++) { + rx_ring = &adapter->rx_ring[i]; +- xchg(&rx_ring->xdp_bpf_prog, prog); +- if (prog) { ++ old_bpf_prog = xchg(&rx_ring->xdp_bpf_prog, prog); ++ ++ if (!old_bpf_prog && prog) { + ena_xdp_register_rxq_info(rx_ring); + rx_ring->rx_headroom = XDP_PACKET_HEADROOM; +- } else { ++ } else if (old_bpf_prog && !prog) { + ena_xdp_unregister_rxq_info(rx_ring); + rx_ring->rx_headroom = NET_SKB_PAD; + } +-- +2.35.1 + diff --git a/queue-5.15/net-ena-fix-rx_copybreak-value-update.patch b/queue-5.15/net-ena-fix-rx_copybreak-value-update.patch new file mode 100644 index 00000000000..062450364b3 --- /dev/null +++ b/queue-5.15/net-ena-fix-rx_copybreak-value-update.patch @@ -0,0 +1,94 @@ +From 146def65bf37c9fc19143046b87b7436cae0b8d2 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 29 Dec 2022 07:30:09 +0000 +Subject: net: ena: Fix rx_copybreak value update + +From: David Arinzon + +[ Upstream commit c7062aaee099f2f43d6f07a71744b44b94b94b34 ] + +Make the upper bound on rx_copybreak tighter, by +making sure it is smaller than the minimum of mtu and +ENA_PAGE_SIZE. With the current upper bound of mtu, +rx_copybreak can be larger than a page. Such large +rx_copybreak will not bring any performance benefit to +the user and therefore makes no sense. + +In addition, the value update was only reflected in +the adapter structure, but not applied for each ring, +causing it to not take effect. + +Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") +Signed-off-by: Osama Abboud +Signed-off-by: Arthur Kiyanovski +Signed-off-by: David Arinzon +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/amazon/ena/ena_ethtool.c | 6 +----- + drivers/net/ethernet/amazon/ena/ena_netdev.c | 18 ++++++++++++++++++ + drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 ++ + 3 files changed, 21 insertions(+), 5 deletions(-) + +diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c +index 13e745cf3781..413082f10dc1 100644 +--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c ++++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c +@@ -880,11 +880,7 @@ static int ena_set_tunable(struct net_device *netdev, + switch (tuna->id) { + case ETHTOOL_RX_COPYBREAK: + len = *(u32 *)data; +- if (len > adapter->netdev->mtu) { +- ret = -EINVAL; +- break; +- } +- adapter->rx_copybreak = len; ++ ret = ena_set_rx_copybreak(adapter, len); + break; + default: + ret = -EINVAL; +diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c +index 294f21a839cf..8f1b205e7333 100644 +--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c ++++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c +@@ -2829,6 +2829,24 @@ int ena_update_queue_sizes(struct ena_adapter *adapter, + return dev_was_up ? ena_up(adapter) : 0; + } + ++int ena_set_rx_copybreak(struct ena_adapter *adapter, u32 rx_copybreak) ++{ ++ struct ena_ring *rx_ring; ++ int i; ++ ++ if (rx_copybreak > min_t(u16, adapter->netdev->mtu, ENA_PAGE_SIZE)) ++ return -EINVAL; ++ ++ adapter->rx_copybreak = rx_copybreak; ++ ++ for (i = 0; i < adapter->num_io_queues; i++) { ++ rx_ring = &adapter->rx_ring[i]; ++ rx_ring->rx_copybreak = rx_copybreak; ++ } ++ ++ return 0; ++} ++ + int ena_update_queue_count(struct ena_adapter *adapter, u32 new_channel_count) + { + struct ena_com_dev *ena_dev = adapter->ena_dev; +diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h +index ada2f8faa33a..2b5eb573ff23 100644 +--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h ++++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h +@@ -404,6 +404,8 @@ int ena_update_queue_sizes(struct ena_adapter *adapter, + + int ena_update_queue_count(struct ena_adapter *adapter, u32 new_channel_count); + ++int ena_set_rx_copybreak(struct ena_adapter *adapter, u32 rx_copybreak); ++ + int ena_get_sset_count(struct net_device *netdev, int sset); + + enum ena_xdp_errors_t { +-- +2.35.1 + diff --git a/queue-5.15/net-ena-fix-toeplitz-initial-hash-value.patch b/queue-5.15/net-ena-fix-toeplitz-initial-hash-value.patch new file mode 100644 index 00000000000..3caad908594 --- /dev/null +++ b/queue-5.15/net-ena-fix-toeplitz-initial-hash-value.patch @@ -0,0 +1,72 @@ +From c8d2e8c195b9e54c0b0a251e16276f3096ad1b2d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 29 Dec 2022 07:30:05 +0000 +Subject: net: ena: Fix toeplitz initial hash value + +From: David Arinzon + +[ Upstream commit 332b49ff637d6c1a75b971022a8b992cf3c57db1 ] + +On driver initialization, RSS hash initial value is set to zero, +instead of the default value. This happens because we pass NULL as +the RSS key parameter, which caused us to never initialize +the RSS hash value. + +This patch fixes it by making sure the initial value is set, no matter +what the value of the RSS key is. + +Fixes: 91a65b7d3ed8 ("net: ena: fix potential crash when rxfh key is NULL") +Signed-off-by: Nati Koler +Signed-off-by: David Arinzon +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/amazon/ena/ena_com.c | 29 +++++++---------------- + 1 file changed, 9 insertions(+), 20 deletions(-) + +diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c +index ab413fc1f68e..f0faad149a3b 100644 +--- a/drivers/net/ethernet/amazon/ena/ena_com.c ++++ b/drivers/net/ethernet/amazon/ena/ena_com.c +@@ -2392,29 +2392,18 @@ int ena_com_fill_hash_function(struct ena_com_dev *ena_dev, + return -EOPNOTSUPP; + } + +- switch (func) { +- case ENA_ADMIN_TOEPLITZ: +- if (key) { +- if (key_len != sizeof(hash_key->key)) { +- netdev_err(ena_dev->net_device, +- "key len (%u) doesn't equal the supported size (%zu)\n", +- key_len, sizeof(hash_key->key)); +- return -EINVAL; +- } +- memcpy(hash_key->key, key, key_len); +- rss->hash_init_val = init_val; +- hash_key->key_parts = key_len / sizeof(hash_key->key[0]); ++ if ((func == ENA_ADMIN_TOEPLITZ) && key) { ++ if (key_len != sizeof(hash_key->key)) { ++ netdev_err(ena_dev->net_device, ++ "key len (%u) doesn't equal the supported size (%zu)\n", ++ key_len, sizeof(hash_key->key)); ++ return -EINVAL; + } +- break; +- case ENA_ADMIN_CRC32: +- rss->hash_init_val = init_val; +- break; +- default: +- netdev_err(ena_dev->net_device, "Invalid hash function (%d)\n", +- func); +- return -EINVAL; ++ memcpy(hash_key->key, key, key_len); ++ hash_key->key_parts = key_len / sizeof(hash_key->key[0]); + } + ++ rss->hash_init_val = init_val; + old_func = rss->hash_func; + rss->hash_func = func; + rc = ena_com_set_hash_function(ena_dev); +-- +2.35.1 + diff --git a/queue-5.15/net-ena-set-default-value-for-rx-interrupt-moderatio.patch b/queue-5.15/net-ena-set-default-value-for-rx-interrupt-moderatio.patch new file mode 100644 index 00000000000..3f9f1e4e4dd --- /dev/null +++ b/queue-5.15/net-ena-set-default-value-for-rx-interrupt-moderatio.patch @@ -0,0 +1,42 @@ +From ecdf40feacba70bd3b8e6eeff4b7481689d5cc2a Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 29 Dec 2022 07:30:10 +0000 +Subject: net: ena: Set default value for RX interrupt moderation + +From: David Arinzon + +[ Upstream commit e712f3e4920b3a1a5e6b536827d118e14862896c ] + +RX ring can be NULL in XDP use cases where only TX queues +are configured. In this scenario, the RX interrupt moderation +value sent to the device remains in its default value of 0. + +In this change, setting the default value of the RX interrupt +moderation to be the same as of the TX. + +Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") +Signed-off-by: David Arinzon +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/amazon/ena/ena_netdev.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c +index 8f1b205e7333..b1533a45f645 100644 +--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c ++++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c +@@ -1836,8 +1836,9 @@ static void ena_adjust_adaptive_rx_intr_moderation(struct ena_napi *ena_napi) + static void ena_unmask_interrupt(struct ena_ring *tx_ring, + struct ena_ring *rx_ring) + { ++ u32 rx_interval = tx_ring->smoothed_interval; + struct ena_eth_io_intr_reg intr_reg; +- u32 rx_interval = 0; ++ + /* Rx ring can be NULL when for XDP tx queues which don't have an + * accompanying rx_ring pair. + */ +-- +2.35.1 + diff --git a/queue-5.15/net-ena-update-numa-tph-hint-register-upon-numa-node.patch b/queue-5.15/net-ena-update-numa-tph-hint-register-upon-numa-node.patch new file mode 100644 index 00000000000..a01d1e7a15d --- /dev/null +++ b/queue-5.15/net-ena-update-numa-tph-hint-register-upon-numa-node.patch @@ -0,0 +1,155 @@ +From c8840be9fe51bd37e8b99d5e3d7405a7c4784757 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 29 Dec 2022 07:30:11 +0000 +Subject: net: ena: Update NUMA TPH hint register upon NUMA node update + +From: David Arinzon + +[ Upstream commit a8ee104f986e720cea52133885cc822d459398c7 ] + +The device supports a PCIe optimization hint, which indicates on +which NUMA the queue is currently processed. This hint is utilized +by PCIe in order to reduce its access time by accessing the +correct NUMA resources and maintaining cache coherence. + +The driver calls the register update for the hint (called TPH - +TLP Processing Hint) during the NAPI loop. + +Though the update is expected upon a NUMA change (when a queue +is moved from one NUMA to the other), the current logic performs +a register update when the queue is moved to a different CPU, +but the CPU is not necessarily in a different NUMA. + +The changes include: +1. Performing the TPH update only when the queue has switched +a NUMA node. +2. Moving the TPH update call to be triggered only when NAPI was +scheduled from interrupt context, as opposed to a busy-polling loop. +This is due to the fact that during busy-polling, the frequency +of CPU switches for a particular queue is significantly higher, +thus, the likelihood to switch NUMA is much higher. Therefore, +providing the frequent updates to the device upon a NUMA update +are unlikely to be beneficial. + +Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") +Signed-off-by: David Arinzon +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/amazon/ena/ena_netdev.c | 27 +++++++++++++------- + drivers/net/ethernet/amazon/ena/ena_netdev.h | 6 +++-- + 2 files changed, 22 insertions(+), 11 deletions(-) + +diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c +index b1533a45f645..23c9750850e9 100644 +--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c ++++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c +@@ -684,6 +684,7 @@ static void ena_init_io_rings_common(struct ena_adapter *adapter, + ring->ena_dev = adapter->ena_dev; + ring->per_napi_packets = 0; + ring->cpu = 0; ++ ring->numa_node = 0; + ring->no_interrupt_event_cnt = 0; + u64_stats_init(&ring->syncp); + } +@@ -787,6 +788,7 @@ static int ena_setup_tx_resources(struct ena_adapter *adapter, int qid) + tx_ring->next_to_use = 0; + tx_ring->next_to_clean = 0; + tx_ring->cpu = ena_irq->cpu; ++ tx_ring->numa_node = node; + return 0; + + err_push_buf_intermediate_buf: +@@ -919,6 +921,7 @@ static int ena_setup_rx_resources(struct ena_adapter *adapter, + rx_ring->next_to_clean = 0; + rx_ring->next_to_use = 0; + rx_ring->cpu = ena_irq->cpu; ++ rx_ring->numa_node = node; + + return 0; + } +@@ -1876,20 +1879,27 @@ static void ena_update_ring_numa_node(struct ena_ring *tx_ring, + if (likely(tx_ring->cpu == cpu)) + goto out; + ++ tx_ring->cpu = cpu; ++ if (rx_ring) ++ rx_ring->cpu = cpu; ++ + numa_node = cpu_to_node(cpu); ++ ++ if (likely(tx_ring->numa_node == numa_node)) ++ goto out; ++ + put_cpu(); + + if (numa_node != NUMA_NO_NODE) { + ena_com_update_numa_node(tx_ring->ena_com_io_cq, numa_node); +- if (rx_ring) ++ tx_ring->numa_node = numa_node; ++ if (rx_ring) { ++ rx_ring->numa_node = numa_node; + ena_com_update_numa_node(rx_ring->ena_com_io_cq, + numa_node); ++ } + } + +- tx_ring->cpu = cpu; +- if (rx_ring) +- rx_ring->cpu = cpu; +- + return; + out: + put_cpu(); +@@ -2010,11 +2020,10 @@ static int ena_io_poll(struct napi_struct *napi, int budget) + if (ena_com_get_adaptive_moderation_enabled(rx_ring->ena_dev)) + ena_adjust_adaptive_rx_intr_moderation(ena_napi); + ++ ena_update_ring_numa_node(tx_ring, rx_ring); + ena_unmask_interrupt(tx_ring, rx_ring); + } + +- ena_update_ring_numa_node(tx_ring, rx_ring); +- + ret = rx_work_done; + } else { + ret = budget; +@@ -2401,7 +2410,7 @@ static int ena_create_io_tx_queue(struct ena_adapter *adapter, int qid) + ctx.mem_queue_type = ena_dev->tx_mem_queue_type; + ctx.msix_vector = msix_vector; + ctx.queue_size = tx_ring->ring_size; +- ctx.numa_node = cpu_to_node(tx_ring->cpu); ++ ctx.numa_node = tx_ring->numa_node; + + rc = ena_com_create_io_queue(ena_dev, &ctx); + if (rc) { +@@ -2469,7 +2478,7 @@ static int ena_create_io_rx_queue(struct ena_adapter *adapter, int qid) + ctx.mem_queue_type = ENA_ADMIN_PLACEMENT_POLICY_HOST; + ctx.msix_vector = msix_vector; + ctx.queue_size = rx_ring->ring_size; +- ctx.numa_node = cpu_to_node(rx_ring->cpu); ++ ctx.numa_node = rx_ring->numa_node; + + rc = ena_com_create_io_queue(ena_dev, &ctx); + if (rc) { +diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h +index 2b5eb573ff23..bf2a39c91c00 100644 +--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h ++++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h +@@ -273,9 +273,11 @@ struct ena_ring { + bool disable_meta_caching; + u16 no_interrupt_event_cnt; + +- /* cpu for TPH */ ++ /* cpu and NUMA for TPH */ + int cpu; +- /* number of tx/rx_buffer_info's entries */ ++ int numa_node; ++ ++ /* number of tx/rx_buffer_info's entries */ + int ring_size; + + enum ena_admin_placement_policy_type tx_mem_queue_type; +-- +2.35.1 + diff --git a/queue-5.15/net-ena-use-bitmask-to-indicate-packet-redirection.patch b/queue-5.15/net-ena-use-bitmask-to-indicate-packet-redirection.patch new file mode 100644 index 00000000000..19ce0f939ae --- /dev/null +++ b/queue-5.15/net-ena-use-bitmask-to-indicate-packet-redirection.patch @@ -0,0 +1,193 @@ +From 193a3dfc5d2af36183b2ae561910d8ca0d0c9a3b Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 29 Dec 2022 07:30:08 +0000 +Subject: net: ena: Use bitmask to indicate packet redirection + +From: David Arinzon + +[ Upstream commit 59811faa2c54dbcf44d575b5a8f6e7077da88dc2 ] + +Redirecting packets with XDP Redirect is done in two phases: +1. A packet is passed by the driver to the kernel using + xdp_do_redirect(). +2. After finishing polling for new packets the driver lets the kernel + know that it can now process the redirected packet using + xdp_do_flush_map(). + The packets' redirection is handled in the napi context of the + queue that called xdp_do_redirect() + +To avoid calling xdp_do_flush_map() each time the driver first checks +whether any packets were redirected, using + xdp_flags |= xdp_verdict; +and + if (xdp_flags & XDP_REDIRECT) + xdp_do_flush_map() + +essentially treating XDP instructions as a bitmask, which isn't the case: + enum xdp_action { + XDP_ABORTED = 0, + XDP_DROP, + XDP_PASS, + XDP_TX, + XDP_REDIRECT, + }; + +Given the current possible values of xdp_action, the current design +doesn't have a bug (since XDP_REDIRECT = 100b), but it is still +flawed. + +This patch makes the driver use a bitmask instead, to avoid future +issues. + +Fixes: a318c70ad152 ("net: ena: introduce XDP redirect implementation") +Signed-off-by: Shay Agroskin +Signed-off-by: David Arinzon +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/amazon/ena/ena_netdev.c | 26 ++++++++++++-------- + drivers/net/ethernet/amazon/ena/ena_netdev.h | 9 +++++++ + 2 files changed, 25 insertions(+), 10 deletions(-) + +diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c +index 31afbd17e690..294f21a839cf 100644 +--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c ++++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c +@@ -378,9 +378,9 @@ static int ena_xdp_xmit(struct net_device *dev, int n, + + static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp) + { ++ u32 verdict = ENA_XDP_PASS; + struct bpf_prog *xdp_prog; + struct ena_ring *xdp_ring; +- u32 verdict = XDP_PASS; + struct xdp_frame *xdpf; + u64 *xdp_stat; + +@@ -397,7 +397,7 @@ static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp) + if (unlikely(!xdpf)) { + trace_xdp_exception(rx_ring->netdev, xdp_prog, verdict); + xdp_stat = &rx_ring->rx_stats.xdp_aborted; +- verdict = XDP_ABORTED; ++ verdict = ENA_XDP_DROP; + break; + } + +@@ -413,29 +413,35 @@ static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp) + + spin_unlock(&xdp_ring->xdp_tx_lock); + xdp_stat = &rx_ring->rx_stats.xdp_tx; ++ verdict = ENA_XDP_TX; + break; + case XDP_REDIRECT: + if (likely(!xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog))) { + xdp_stat = &rx_ring->rx_stats.xdp_redirect; ++ verdict = ENA_XDP_REDIRECT; + break; + } + trace_xdp_exception(rx_ring->netdev, xdp_prog, verdict); + xdp_stat = &rx_ring->rx_stats.xdp_aborted; +- verdict = XDP_ABORTED; ++ verdict = ENA_XDP_DROP; + break; + case XDP_ABORTED: + trace_xdp_exception(rx_ring->netdev, xdp_prog, verdict); + xdp_stat = &rx_ring->rx_stats.xdp_aborted; ++ verdict = ENA_XDP_DROP; + break; + case XDP_DROP: + xdp_stat = &rx_ring->rx_stats.xdp_drop; ++ verdict = ENA_XDP_DROP; + break; + case XDP_PASS: + xdp_stat = &rx_ring->rx_stats.xdp_pass; ++ verdict = ENA_XDP_PASS; + break; + default: + bpf_warn_invalid_xdp_action(verdict); + xdp_stat = &rx_ring->rx_stats.xdp_invalid; ++ verdict = ENA_XDP_DROP; + } + + ena_increase_stat(xdp_stat, 1, &rx_ring->syncp); +@@ -1631,12 +1637,12 @@ static int ena_xdp_handle_buff(struct ena_ring *rx_ring, struct xdp_buff *xdp) + * we expect, then we simply drop it + */ + if (unlikely(rx_ring->ena_bufs[0].len > ENA_XDP_MAX_MTU)) +- return XDP_DROP; ++ return ENA_XDP_DROP; + + ret = ena_xdp_execute(rx_ring, xdp); + + /* The xdp program might expand the headers */ +- if (ret == XDP_PASS) { ++ if (ret == ENA_XDP_PASS) { + rx_info->page_offset = xdp->data - xdp->data_hard_start; + rx_ring->ena_bufs[0].len = xdp->data_end - xdp->data; + } +@@ -1675,7 +1681,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, + xdp_init_buff(&xdp, ENA_PAGE_SIZE, &rx_ring->xdp_rxq); + + do { +- xdp_verdict = XDP_PASS; ++ xdp_verdict = ENA_XDP_PASS; + skb = NULL; + ena_rx_ctx.ena_bufs = rx_ring->ena_bufs; + ena_rx_ctx.max_bufs = rx_ring->sgl_size; +@@ -1703,7 +1709,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, + xdp_verdict = ena_xdp_handle_buff(rx_ring, &xdp); + + /* allocate skb and fill it */ +- if (xdp_verdict == XDP_PASS) ++ if (xdp_verdict == ENA_XDP_PASS) + skb = ena_rx_skb(rx_ring, + rx_ring->ena_bufs, + ena_rx_ctx.descs, +@@ -1721,13 +1727,13 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, + /* Packets was passed for transmission, unmap it + * from RX side. + */ +- if (xdp_verdict == XDP_TX || xdp_verdict == XDP_REDIRECT) { ++ if (xdp_verdict & ENA_XDP_FORWARDED) { + ena_unmap_rx_buff(rx_ring, + &rx_ring->rx_buffer_info[req_id]); + rx_ring->rx_buffer_info[req_id].page = NULL; + } + } +- if (xdp_verdict != XDP_PASS) { ++ if (xdp_verdict != ENA_XDP_PASS) { + xdp_flags |= xdp_verdict; + total_len += ena_rx_ctx.ena_bufs[0].len; + res_budget--; +@@ -1773,7 +1779,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, + ena_refill_rx_bufs(rx_ring, refill_required); + } + +- if (xdp_flags & XDP_REDIRECT) ++ if (xdp_flags & ENA_XDP_REDIRECT) + xdp_do_flush_map(); + + return work_done; +diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h +index 0c39fc2fa345..ada2f8faa33a 100644 +--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h ++++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h +@@ -412,6 +412,15 @@ enum ena_xdp_errors_t { + ENA_XDP_NO_ENOUGH_QUEUES, + }; + ++enum ENA_XDP_ACTIONS { ++ ENA_XDP_PASS = 0, ++ ENA_XDP_TX = BIT(0), ++ ENA_XDP_REDIRECT = BIT(1), ++ ENA_XDP_DROP = BIT(2) ++}; ++ ++#define ENA_XDP_FORWARDED (ENA_XDP_TX | ENA_XDP_REDIRECT) ++ + static inline bool ena_xdp_present(struct ena_adapter *adapter) + { + return !!adapter->xdp_bpf_prog; +-- +2.35.1 + diff --git a/queue-5.15/net-hns3-add-interrupts-re-initialization-while-doin.patch b/queue-5.15/net-hns3-add-interrupts-re-initialization-while-doin.patch new file mode 100644 index 00000000000..6d3e91e0494 --- /dev/null +++ b/queue-5.15/net-hns3-add-interrupts-re-initialization-while-doin.patch @@ -0,0 +1,43 @@ +From 4617693309338fddf281f2da6254a5163431898b Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 22 Dec 2022 14:43:41 +0800 +Subject: net: hns3: add interrupts re-initialization while doing VF FLR + +From: Jie Wang + +[ Upstream commit 09e6b30eeb254f1818a008cace3547159e908dfd ] + +Currently keep alive message between PF and VF may be lost and the VF is +unalive in PF. So the VF will not do reset during PF FLR reset process. +This would make the allocated interrupt resources of VF invalid and VF +would't receive or respond to PF any more. + +So this patch adds VF interrupts re-initialization during VF FLR for VF +recovery in above cases. + +Fixes: 862d969a3a4d ("net: hns3: do VF's pci re-initialization while PF doing FLR") +Signed-off-by: Jie Wang +Signed-off-by: Hao Lan +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c +index 21678c12afa2..3c1ff3313221 100644 +--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c ++++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c +@@ -3258,7 +3258,8 @@ static int hclgevf_pci_reset(struct hclgevf_dev *hdev) + struct pci_dev *pdev = hdev->pdev; + int ret = 0; + +- if (hdev->reset_type == HNAE3_VF_FULL_RESET && ++ if ((hdev->reset_type == HNAE3_VF_FULL_RESET || ++ hdev->reset_type == HNAE3_FLR_RESET) && + test_bit(HCLGEVF_STATE_IRQ_INITED, &hdev->state)) { + hclgevf_misc_irq_uninit(hdev); + hclgevf_uninit_msi(hdev); +-- +2.35.1 + diff --git a/queue-5.15/net-hns3-extract-macro-to-simplify-ring-stats-update.patch b/queue-5.15/net-hns3-extract-macro-to-simplify-ring-stats-update.patch new file mode 100644 index 00000000000..ec16e8a43d9 --- /dev/null +++ b/queue-5.15/net-hns3-extract-macro-to-simplify-ring-stats-update.patch @@ -0,0 +1,374 @@ +From fdb6909e1bb660e312896bbbd5dbc2defeb551f5 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 2 Dec 2021 16:35:55 +0800 +Subject: net: hns3: extract macro to simplify ring stats update code + +From: Peng Li + +[ Upstream commit e6d72f6ac2ad4965491354d74b48e35a60abf298 ] + +As the code to update ring stats is alike for different ring stats +type, this patch extract macro to simplify ring stats update code. + +Signed-off-by: Peng Li +Signed-off-by: Guangbin Huang +Signed-off-by: David S. Miller +Stable-dep-of: 7d89b53cea1a ("net: hns3: fix miss L3E checking for rx packet") +Signed-off-by: Sasha Levin +--- + .../net/ethernet/hisilicon/hns3/hns3_enet.c | 123 +++++------------- + .../net/ethernet/hisilicon/hns3/hns3_enet.h | 7 + + 2 files changed, 38 insertions(+), 92 deletions(-) + +diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +index e9f2d51a8b7b..d06e2d0bae2e 100644 +--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c ++++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +@@ -1005,9 +1005,7 @@ static bool hns3_can_use_tx_bounce(struct hns3_enet_ring *ring, + return false; + + if (ALIGN(len, dma_get_cache_alignment()) > space) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.tx_spare_full++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, tx_spare_full); + return false; + } + +@@ -1024,9 +1022,7 @@ static bool hns3_can_use_tx_sgl(struct hns3_enet_ring *ring, + return false; + + if (space < HNS3_MAX_SGL_SIZE) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.tx_spare_full++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, tx_spare_full); + return false; + } + +@@ -1554,9 +1550,7 @@ static int hns3_fill_skb_desc(struct hns3_enet_ring *ring, + + ret = hns3_handle_vtags(ring, skb); + if (unlikely(ret < 0)) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.tx_vlan_err++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, tx_vlan_err); + return ret; + } else if (ret == HNS3_INNER_VLAN_TAG) { + inner_vtag = skb_vlan_tag_get(skb); +@@ -1591,9 +1585,7 @@ static int hns3_fill_skb_desc(struct hns3_enet_ring *ring, + + ret = hns3_get_l4_protocol(skb, &ol4_proto, &il4_proto); + if (unlikely(ret < 0)) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.tx_l4_proto_err++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, tx_l4_proto_err); + return ret; + } + +@@ -1601,18 +1593,14 @@ static int hns3_fill_skb_desc(struct hns3_enet_ring *ring, + &type_cs_vlan_tso, + &ol_type_vlan_len_msec); + if (unlikely(ret < 0)) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.tx_l2l3l4_err++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, tx_l2l3l4_err); + return ret; + } + + ret = hns3_set_tso(skb, &paylen_ol4cs, &mss_hw_csum, + &type_cs_vlan_tso, &desc_cb->send_bytes); + if (unlikely(ret < 0)) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.tx_tso_err++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, tx_tso_err); + return ret; + } + } +@@ -1705,9 +1693,7 @@ static int hns3_map_and_fill_desc(struct hns3_enet_ring *ring, void *priv, + } + + if (unlikely(dma_mapping_error(dev, dma))) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.sw_err_cnt++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, sw_err_cnt); + return -ENOMEM; + } + +@@ -1853,9 +1839,7 @@ static int hns3_skb_linearize(struct hns3_enet_ring *ring, + * recursion level of over HNS3_MAX_RECURSION_LEVEL. + */ + if (bd_num == UINT_MAX) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.over_max_recursion++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, over_max_recursion); + return -ENOMEM; + } + +@@ -1864,16 +1848,12 @@ static int hns3_skb_linearize(struct hns3_enet_ring *ring, + */ + if (skb->len > HNS3_MAX_TSO_SIZE || + (!skb_is_gso(skb) && skb->len > HNS3_MAX_NON_TSO_SIZE)) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.hw_limitation++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, hw_limitation); + return -ENOMEM; + } + + if (__skb_linearize(skb)) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.sw_err_cnt++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, sw_err_cnt); + return -ENOMEM; + } + +@@ -1903,9 +1883,7 @@ static int hns3_nic_maybe_stop_tx(struct hns3_enet_ring *ring, + + bd_num = hns3_tx_bd_count(skb->len); + +- u64_stats_update_begin(&ring->syncp); +- ring->stats.tx_copy++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, tx_copy); + } + + out: +@@ -1925,9 +1903,7 @@ static int hns3_nic_maybe_stop_tx(struct hns3_enet_ring *ring, + return bd_num; + } + +- u64_stats_update_begin(&ring->syncp); +- ring->stats.tx_busy++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, tx_busy); + + return -EBUSY; + } +@@ -2012,9 +1988,7 @@ static void hns3_tx_doorbell(struct hns3_enet_ring *ring, int num, + ring->pending_buf += num; + + if (!doorbell) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.tx_more++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, tx_more); + return; + } + +@@ -2064,9 +2038,7 @@ static int hns3_handle_tx_bounce(struct hns3_enet_ring *ring, + ret = skb_copy_bits(skb, 0, buf, size); + if (unlikely(ret < 0)) { + hns3_tx_spare_rollback(ring, cb_len); +- u64_stats_update_begin(&ring->syncp); +- ring->stats.copy_bits_err++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, copy_bits_err); + return ret; + } + +@@ -2089,9 +2061,8 @@ static int hns3_handle_tx_bounce(struct hns3_enet_ring *ring, + dma_sync_single_for_device(ring_to_dev(ring), dma, size, + DMA_TO_DEVICE); + +- u64_stats_update_begin(&ring->syncp); +- ring->stats.tx_bounce++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, tx_bounce); ++ + return bd_num; + } + +@@ -2121,9 +2092,7 @@ static int hns3_handle_tx_sgl(struct hns3_enet_ring *ring, + nents = skb_to_sgvec(skb, sgt->sgl, 0, skb->len); + if (unlikely(nents < 0)) { + hns3_tx_spare_rollback(ring, cb_len); +- u64_stats_update_begin(&ring->syncp); +- ring->stats.skb2sgl_err++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, skb2sgl_err); + return -ENOMEM; + } + +@@ -2132,9 +2101,7 @@ static int hns3_handle_tx_sgl(struct hns3_enet_ring *ring, + DMA_TO_DEVICE); + if (unlikely(!sgt->nents)) { + hns3_tx_spare_rollback(ring, cb_len); +- u64_stats_update_begin(&ring->syncp); +- ring->stats.map_sg_err++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, map_sg_err); + return -ENOMEM; + } + +@@ -2146,10 +2113,7 @@ static int hns3_handle_tx_sgl(struct hns3_enet_ring *ring, + for (i = 0; i < sgt->nents; i++) + bd_num += hns3_fill_desc(ring, sg_dma_address(sgt->sgl + i), + sg_dma_len(sgt->sgl + i)); +- +- u64_stats_update_begin(&ring->syncp); +- ring->stats.tx_sgl++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, tx_sgl); + + return bd_num; + } +@@ -2188,9 +2152,7 @@ netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb, struct net_device *netdev) + if (skb_put_padto(skb, HNS3_MIN_TX_LEN)) { + hns3_tx_doorbell(ring, 0, !netdev_xmit_more()); + +- u64_stats_update_begin(&ring->syncp); +- ring->stats.sw_err_cnt++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, sw_err_cnt); + + return NETDEV_TX_OK; + } +@@ -3522,17 +3484,13 @@ static bool hns3_nic_alloc_rx_buffers(struct hns3_enet_ring *ring, + for (i = 0; i < cleand_count; i++) { + desc_cb = &ring->desc_cb[ring->next_to_use]; + if (desc_cb->reuse_flag) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.reuse_pg_cnt++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, reuse_pg_cnt); + + hns3_reuse_buffer(ring, ring->next_to_use); + } else { + ret = hns3_alloc_and_map_buffer(ring, &res_cbs); + if (ret) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.sw_err_cnt++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, sw_err_cnt); + + hns3_rl_err(ring_to_netdev(ring), + "alloc rx buffer failed: %d\n", +@@ -3544,9 +3502,7 @@ static bool hns3_nic_alloc_rx_buffers(struct hns3_enet_ring *ring, + } + hns3_replace_buffer(ring, ring->next_to_use, &res_cbs); + +- u64_stats_update_begin(&ring->syncp); +- ring->stats.non_reuse_pg++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, non_reuse_pg); + } + + ring_ptr_move_fw(ring, next_to_use); +@@ -3573,9 +3529,7 @@ static int hns3_handle_rx_copybreak(struct sk_buff *skb, int i, + void *frag = napi_alloc_frag(frag_size); + + if (unlikely(!frag)) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.frag_alloc_err++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, frag_alloc_err); + + hns3_rl_err(ring_to_netdev(ring), + "failed to allocate rx frag\n"); +@@ -3587,9 +3541,7 @@ static int hns3_handle_rx_copybreak(struct sk_buff *skb, int i, + skb_add_rx_frag(skb, i, virt_to_page(frag), + offset_in_page(frag), frag_size, frag_size); + +- u64_stats_update_begin(&ring->syncp); +- ring->stats.frag_alloc++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, frag_alloc); + return 0; + } + +@@ -3722,9 +3674,7 @@ static bool hns3_checksum_complete(struct hns3_enet_ring *ring, + hns3_rx_ptype_tbl[ptype].ip_summed != CHECKSUM_COMPLETE) + return false; + +- u64_stats_update_begin(&ring->syncp); +- ring->stats.csum_complete++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, csum_complete); + skb->ip_summed = CHECKSUM_COMPLETE; + skb->csum = csum_unfold((__force __sum16)csum); + +@@ -3798,9 +3748,7 @@ static void hns3_rx_checksum(struct hns3_enet_ring *ring, struct sk_buff *skb, + if (unlikely(l234info & (BIT(HNS3_RXD_L3E_B) | BIT(HNS3_RXD_L4E_B) | + BIT(HNS3_RXD_OL3E_B) | + BIT(HNS3_RXD_OL4E_B)))) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.l3l4_csum_err++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, l3l4_csum_err); + + return; + } +@@ -3891,10 +3839,7 @@ static int hns3_alloc_skb(struct hns3_enet_ring *ring, unsigned int length, + skb = ring->skb; + if (unlikely(!skb)) { + hns3_rl_err(netdev, "alloc rx skb fail\n"); +- +- u64_stats_update_begin(&ring->syncp); +- ring->stats.sw_err_cnt++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, sw_err_cnt); + + return -ENOMEM; + } +@@ -3925,9 +3870,7 @@ static int hns3_alloc_skb(struct hns3_enet_ring *ring, unsigned int length, + if (ring->page_pool) + skb_mark_for_recycle(skb); + +- u64_stats_update_begin(&ring->syncp); +- ring->stats.seg_pkt_cnt++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, seg_pkt_cnt); + + ring->pull_len = eth_get_headlen(netdev, va, HNS3_RX_HEAD_SIZE); + __skb_put(skb, ring->pull_len); +@@ -4119,9 +4062,7 @@ static int hns3_handle_bdinfo(struct hns3_enet_ring *ring, struct sk_buff *skb) + ret = hns3_set_gro_and_checksum(ring, skb, l234info, + bd_base_info, ol_info, csum); + if (unlikely(ret)) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.rx_err_cnt++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, rx_err_cnt); + return ret; + } + +@@ -5333,9 +5274,7 @@ static int hns3_clear_rx_ring(struct hns3_enet_ring *ring) + if (!ring->desc_cb[ring->next_to_use].reuse_flag) { + ret = hns3_alloc_and_map_buffer(ring, &res_cbs); + if (ret) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.sw_err_cnt++; +- u64_stats_update_end(&ring->syncp); ++ hns3_ring_stats_update(ring, sw_err_cnt); + /* if alloc new buffer fail, exit directly + * and reclear in up flow. + */ +diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h +index f09a61d9c626..91b656adaacb 100644 +--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h ++++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h +@@ -654,6 +654,13 @@ static inline bool hns3_nic_resetting(struct net_device *netdev) + + #define hns3_buf_size(_ring) ((_ring)->buf_size) + ++#define hns3_ring_stats_update(ring, cnt) do { \ ++ typeof(ring) (tmp) = (ring); \ ++ u64_stats_update_begin(&(tmp)->syncp); \ ++ ((tmp)->stats.cnt)++; \ ++ u64_stats_update_end(&(tmp)->syncp); \ ++} while (0) \ ++ + static inline unsigned int hns3_page_order(struct hns3_enet_ring *ring) + { + #if (PAGE_SIZE < 8192) +-- +2.35.1 + diff --git a/queue-5.15/net-hns3-fix-miss-l3e-checking-for-rx-packet.patch b/queue-5.15/net-hns3-fix-miss-l3e-checking-for-rx-packet.patch new file mode 100644 index 00000000000..478a860709a --- /dev/null +++ b/queue-5.15/net-hns3-fix-miss-l3e-checking-for-rx-packet.patch @@ -0,0 +1,69 @@ +From 6d980f9e299ff331a05eae4d83dcf181b3817eb1 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 22 Dec 2022 14:43:42 +0800 +Subject: net: hns3: fix miss L3E checking for rx packet + +From: Jian Shen + +[ Upstream commit 7d89b53cea1a702f97117fb4361523519bb1e52c ] + +For device supports RXD advanced layout, the driver will +return directly if the hardware finish the checksum +calculate. It cause missing L3E checking for ip packets. +Fixes it. + +Fixes: 1ddc028ac849 ("net: hns3: refactor out RX completion checksum") +Signed-off-by: Jian Shen +Signed-off-by: Hao Lan +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 10 ++++------ + 1 file changed, 4 insertions(+), 6 deletions(-) + +diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +index d06e2d0bae2e..822193b0d709 100644 +--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c ++++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +@@ -3667,18 +3667,16 @@ static int hns3_gro_complete(struct sk_buff *skb, u32 l234info) + return 0; + } + +-static bool hns3_checksum_complete(struct hns3_enet_ring *ring, ++static void hns3_checksum_complete(struct hns3_enet_ring *ring, + struct sk_buff *skb, u32 ptype, u16 csum) + { + if (ptype == HNS3_INVALID_PTYPE || + hns3_rx_ptype_tbl[ptype].ip_summed != CHECKSUM_COMPLETE) +- return false; ++ return; + + hns3_ring_stats_update(ring, csum_complete); + skb->ip_summed = CHECKSUM_COMPLETE; + skb->csum = csum_unfold((__force __sum16)csum); +- +- return true; + } + + static void hns3_rx_handle_csum(struct sk_buff *skb, u32 l234info, +@@ -3738,8 +3736,7 @@ static void hns3_rx_checksum(struct hns3_enet_ring *ring, struct sk_buff *skb, + ptype = hnae3_get_field(ol_info, HNS3_RXD_PTYPE_M, + HNS3_RXD_PTYPE_S); + +- if (hns3_checksum_complete(ring, skb, ptype, csum)) +- return; ++ hns3_checksum_complete(ring, skb, ptype, csum); + + /* check if hardware has done checksum */ + if (!(bd_base_info & BIT(HNS3_RXD_L3L4P_B))) +@@ -3748,6 +3745,7 @@ static void hns3_rx_checksum(struct hns3_enet_ring *ring, struct sk_buff *skb, + if (unlikely(l234info & (BIT(HNS3_RXD_L3E_B) | BIT(HNS3_RXD_L4E_B) | + BIT(HNS3_RXD_OL3E_B) | + BIT(HNS3_RXD_OL4E_B)))) { ++ skb->ip_summed = CHECKSUM_NONE; + hns3_ring_stats_update(ring, l3l4_csum_err); + + return; +-- +2.35.1 + diff --git a/queue-5.15/net-hns3-fix-vf-promisc-mode-not-update-when-mac-tab.patch b/queue-5.15/net-hns3-fix-vf-promisc-mode-not-update-when-mac-tab.patch new file mode 100644 index 00000000000..ed1f9b3b3f4 --- /dev/null +++ b/queue-5.15/net-hns3-fix-vf-promisc-mode-not-update-when-mac-tab.patch @@ -0,0 +1,134 @@ +From f6286801d8cc0213b4179873d1dd72b04df42eae Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 22 Dec 2022 14:43:43 +0800 +Subject: net: hns3: fix VF promisc mode not update when mac table full + +From: Jian Shen + +[ Upstream commit 8ee57c7b8406c7aa8ca31e014440c87c6383f429 ] + +Currently, it missed set HCLGE_VPORT_STATE_PROMISC_CHANGE +flag for VF when vport->overflow_promisc_flags changed. +So the VF won't check whether to update promisc mode in +this case. So add it. + +Fixes: 1e6e76101fd9 ("net: hns3: configure promisc mode for VF asynchronously") +Signed-off-by: Jian Shen +Signed-off-by: Hao Lan +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + .../hisilicon/hns3/hns3pf/hclge_main.c | 75 +++++++++++-------- + 1 file changed, 43 insertions(+), 32 deletions(-) + +diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +index 2102b38b9c35..f4d58fcdba27 100644 +--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c ++++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +@@ -12825,60 +12825,71 @@ static int hclge_gro_en(struct hnae3_handle *handle, bool enable) + return ret; + } + +-static void hclge_sync_promisc_mode(struct hclge_dev *hdev) ++static int hclge_sync_vport_promisc_mode(struct hclge_vport *vport) + { +- struct hclge_vport *vport = &hdev->vport[0]; + struct hnae3_handle *handle = &vport->nic; ++ struct hclge_dev *hdev = vport->back; ++ bool uc_en = false; ++ bool mc_en = false; + u8 tmp_flags; ++ bool bc_en; + int ret; +- u16 i; + + if (vport->last_promisc_flags != vport->overflow_promisc_flags) { + set_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, &vport->state); + vport->last_promisc_flags = vport->overflow_promisc_flags; + } + +- if (test_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, &vport->state)) { ++ if (!test_and_clear_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, ++ &vport->state)) ++ return 0; ++ ++ /* for PF */ ++ if (!vport->vport_id) { + tmp_flags = handle->netdev_flags | vport->last_promisc_flags; + ret = hclge_set_promisc_mode(handle, tmp_flags & HNAE3_UPE, + tmp_flags & HNAE3_MPE); +- if (!ret) { +- clear_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, +- &vport->state); ++ if (!ret) + set_bit(HCLGE_VPORT_STATE_VLAN_FLTR_CHANGE, + &vport->state); +- } ++ else ++ set_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, ++ &vport->state); ++ return ret; + } + +- for (i = 1; i < hdev->num_alloc_vport; i++) { +- bool uc_en = false; +- bool mc_en = false; +- bool bc_en; ++ /* for VF */ ++ if (vport->vf_info.trusted) { ++ uc_en = vport->vf_info.request_uc_en > 0 || ++ vport->overflow_promisc_flags & HNAE3_OVERFLOW_UPE; ++ mc_en = vport->vf_info.request_mc_en > 0 || ++ vport->overflow_promisc_flags & HNAE3_OVERFLOW_MPE; ++ } ++ bc_en = vport->vf_info.request_bc_en > 0; + +- vport = &hdev->vport[i]; ++ ret = hclge_cmd_set_promisc_mode(hdev, vport->vport_id, uc_en, ++ mc_en, bc_en); ++ if (ret) { ++ set_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, &vport->state); ++ return ret; ++ } ++ hclge_set_vport_vlan_fltr_change(vport); + +- if (!test_and_clear_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, +- &vport->state)) +- continue; ++ return 0; ++} + +- if (vport->vf_info.trusted) { +- uc_en = vport->vf_info.request_uc_en > 0 || +- vport->overflow_promisc_flags & +- HNAE3_OVERFLOW_UPE; +- mc_en = vport->vf_info.request_mc_en > 0 || +- vport->overflow_promisc_flags & +- HNAE3_OVERFLOW_MPE; +- } +- bc_en = vport->vf_info.request_bc_en > 0; ++static void hclge_sync_promisc_mode(struct hclge_dev *hdev) ++{ ++ struct hclge_vport *vport; ++ int ret; ++ u16 i; + +- ret = hclge_cmd_set_promisc_mode(hdev, vport->vport_id, uc_en, +- mc_en, bc_en); +- if (ret) { +- set_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, +- &vport->state); ++ for (i = 0; i < hdev->num_alloc_vport; i++) { ++ vport = &hdev->vport[i]; ++ ++ ret = hclge_sync_vport_promisc_mode(vport); ++ if (ret) + return; +- } +- hclge_set_vport_vlan_fltr_change(vport); + } + } + +-- +2.35.1 + diff --git a/queue-5.15/net-hns3-refactor-hns3_nic_reuse_page.patch b/queue-5.15/net-hns3-refactor-hns3_nic_reuse_page.patch new file mode 100644 index 00000000000..622b77a2acf --- /dev/null +++ b/queue-5.15/net-hns3-refactor-hns3_nic_reuse_page.patch @@ -0,0 +1,105 @@ +From e61e573eed74d10a6a5d0b37c7ab1034f9e92f45 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 29 Nov 2021 22:00:19 +0800 +Subject: net: hns3: refactor hns3_nic_reuse_page() + +From: Hao Chen + +[ Upstream commit e74a726da2c4dcedb8b0631f423d0044c7901a20 ] + +Split rx copybreak handle into a separate function from function +hns3_nic_reuse_page() to improve code simplicity. + +Signed-off-by: Hao Chen +Signed-off-by: Guangbin Huang +Signed-off-by: David S. Miller +Stable-dep-of: 7d89b53cea1a ("net: hns3: fix miss L3E checking for rx packet") +Signed-off-by: Sasha Levin +--- + .../net/ethernet/hisilicon/hns3/hns3_enet.c | 55 ++++++++++++------- + 1 file changed, 35 insertions(+), 20 deletions(-) + +diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +index 818a028703c6..e9f2d51a8b7b 100644 +--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c ++++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +@@ -3561,6 +3561,38 @@ static bool hns3_can_reuse_page(struct hns3_desc_cb *cb) + return page_count(cb->priv) == cb->pagecnt_bias; + } + ++static int hns3_handle_rx_copybreak(struct sk_buff *skb, int i, ++ struct hns3_enet_ring *ring, ++ int pull_len, ++ struct hns3_desc_cb *desc_cb) ++{ ++ struct hns3_desc *desc = &ring->desc[ring->next_to_clean]; ++ u32 frag_offset = desc_cb->page_offset + pull_len; ++ int size = le16_to_cpu(desc->rx.size); ++ u32 frag_size = size - pull_len; ++ void *frag = napi_alloc_frag(frag_size); ++ ++ if (unlikely(!frag)) { ++ u64_stats_update_begin(&ring->syncp); ++ ring->stats.frag_alloc_err++; ++ u64_stats_update_end(&ring->syncp); ++ ++ hns3_rl_err(ring_to_netdev(ring), ++ "failed to allocate rx frag\n"); ++ return -ENOMEM; ++ } ++ ++ desc_cb->reuse_flag = 1; ++ memcpy(frag, desc_cb->buf + frag_offset, frag_size); ++ skb_add_rx_frag(skb, i, virt_to_page(frag), ++ offset_in_page(frag), frag_size, frag_size); ++ ++ u64_stats_update_begin(&ring->syncp); ++ ring->stats.frag_alloc++; ++ u64_stats_update_end(&ring->syncp); ++ return 0; ++} ++ + static void hns3_nic_reuse_page(struct sk_buff *skb, int i, + struct hns3_enet_ring *ring, int pull_len, + struct hns3_desc_cb *desc_cb) +@@ -3570,6 +3602,7 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int i, + int size = le16_to_cpu(desc->rx.size); + u32 truesize = hns3_buf_size(ring); + u32 frag_size = size - pull_len; ++ int ret = 0; + bool reused; + + if (ring->page_pool) { +@@ -3604,27 +3637,9 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int i, + desc_cb->page_offset = 0; + desc_cb->reuse_flag = 1; + } else if (frag_size <= ring->rx_copybreak) { +- void *frag = napi_alloc_frag(frag_size); +- +- if (unlikely(!frag)) { +- u64_stats_update_begin(&ring->syncp); +- ring->stats.frag_alloc_err++; +- u64_stats_update_end(&ring->syncp); +- +- hns3_rl_err(ring_to_netdev(ring), +- "failed to allocate rx frag\n"); ++ ret = hns3_handle_rx_copybreak(skb, i, ring, pull_len, desc_cb); ++ if (ret) + goto out; +- } +- +- desc_cb->reuse_flag = 1; +- memcpy(frag, desc_cb->buf + frag_offset, frag_size); +- skb_add_rx_frag(skb, i, virt_to_page(frag), +- offset_in_page(frag), frag_size, frag_size); +- +- u64_stats_update_begin(&ring->syncp); +- ring->stats.frag_alloc++; +- u64_stats_update_end(&ring->syncp); +- return; + } + + out: +-- +2.35.1 + diff --git a/queue-5.15/net-mlx5-add-forgotten-cleanup-calls-into-mlx5_init_.patch b/queue-5.15/net-mlx5-add-forgotten-cleanup-calls-into-mlx5_init_.patch new file mode 100644 index 00000000000..da4084f3b9d --- /dev/null +++ b/queue-5.15/net-mlx5-add-forgotten-cleanup-calls-into-mlx5_init_.patch @@ -0,0 +1,39 @@ +From c9289bcefac1179fa54c89a227d471c91ca7aa7c Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 18 Oct 2022 12:51:52 +0200 +Subject: net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error + path + +From: Jiri Pirko + +[ Upstream commit 2a35b2c2e6a252eda2134aae6a756861d9299531 ] + +There are two cleanup calls missing in mlx5_init_once() error path. +Add them making the error path flow to be the same as +mlx5_cleanup_once(). + +Fixes: 52ec462eca9b ("net/mlx5: Add reserved-gids support") +Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section") +Signed-off-by: Jiri Pirko +Signed-off-by: Saeed Mahameed +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 ++ + 1 file changed, 2 insertions(+) + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c +index 19c11d33f4b6..145e56f5eeee 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c +@@ -928,6 +928,8 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) + err_tables_cleanup: + mlx5_geneve_destroy(dev->geneve); + mlx5_vxlan_destroy(dev->vxlan); ++ mlx5_cleanup_clock(dev); ++ mlx5_cleanup_reserved_gids(dev); + mlx5_cq_debugfs_cleanup(dev); + mlx5_fw_reset_cleanup(dev); + err_events_cleanup: +-- +2.35.1 + diff --git a/queue-5.15/net-mlx5-avoid-recovery-in-probe-flows.patch b/queue-5.15/net-mlx5-avoid-recovery-in-probe-flows.patch new file mode 100644 index 00000000000..7a95bdf5103 --- /dev/null +++ b/queue-5.15/net-mlx5-avoid-recovery-in-probe-flows.patch @@ -0,0 +1,49 @@ +From 05e16f2b4f0ed1e8fa6a2d17738dd0cdee0f4710 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 24 Nov 2022 13:34:12 +0200 +Subject: net/mlx5: Avoid recovery in probe flows + +From: Shay Drory + +[ Upstream commit 9078e843efec530f279a155f262793c58b0746bd ] + +Currently, recovery is done without considering whether the device is +still in probe flow. +This may lead to recovery before device have finished probed +successfully. e.g.: while mlx5_init_one() is running. Recovery flow is +using functionality that is loaded only by mlx5_init_one(), and there +is no point in running recovery without mlx5_init_one() finished +successfully. + +Fix it by waiting for probe flow to finish and checking whether the +device is probed before trying to perform recovery. + +Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling") +Signed-off-by: Shay Drory +Reviewed-by: Moshe Shemesh +Signed-off-by: Saeed Mahameed +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/mellanox/mlx5/core/health.c | 6 ++++++ + 1 file changed, 6 insertions(+) + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c +index 037e18dd4be0..3dceab45986d 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c +@@ -614,6 +614,12 @@ static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work) + priv = container_of(health, struct mlx5_priv, health); + dev = container_of(priv, struct mlx5_core_dev, priv); + ++ mutex_lock(&dev->intf_state_mutex); ++ if (test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags)) { ++ mlx5_core_err(dev, "health works are not permitted at this stage\n"); ++ return; ++ } ++ mutex_unlock(&dev->intf_state_mutex); + enter_error_state(dev, false); + if (IS_ERR_OR_NULL(health->fw_fatal_reporter)) { + if (mlx5_health_try_recover(dev)) +-- +2.35.1 + diff --git a/queue-5.15/net-mlx5-e-switch-properly-handle-ingress-tagged-pac.patch b/queue-5.15/net-mlx5-e-switch-properly-handle-ingress-tagged-pac.patch new file mode 100644 index 00000000000..89af1fea080 --- /dev/null +++ b/queue-5.15/net-mlx5-e-switch-properly-handle-ingress-tagged-pac.patch @@ -0,0 +1,261 @@ +From 4abe4be36b0cce3d3d78c150f9e8a5748d55514f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 12 Dec 2022 10:42:15 +0200 +Subject: net/mlx5: E-Switch, properly handle ingress tagged packets on VST + +From: Moshe Shemesh + +[ Upstream commit 1f0ae22ab470946143485a02cc1cd7e05c0f9120 ] + +Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already +present in the frame. Previous VST mode behavior was to drop packets or +override existing tag, depending on the device version. + +In this patch we fix this behavior by correctly building the HW steering +rule with a push vlan action, or for older devices we ask the FW to stack +the vlan when a vlan is already present. + +Fixes: 07bab9502641 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes") +Fixes: dfcb1ed3c331 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode") +Signed-off-by: Moshe Shemesh +Reviewed-by: Mark Bloch +Signed-off-by: Saeed Mahameed +Signed-off-by: Sasha Levin +--- + .../mellanox/mlx5/core/esw/acl/egress_lgcy.c | 7 +++- + .../mellanox/mlx5/core/esw/acl/ingress_lgcy.c | 33 ++++++++++++++++--- + .../net/ethernet/mellanox/mlx5/core/eswitch.c | 30 ++++++++++++----- + .../net/ethernet/mellanox/mlx5/core/eswitch.h | 6 ++++ + include/linux/mlx5/device.h | 5 +++ + include/linux/mlx5/mlx5_ifc.h | 3 +- + 6 files changed, 68 insertions(+), 16 deletions(-) + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c +index 60a73990017c..6b4c9ffad95b 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c +@@ -67,6 +67,7 @@ static void esw_acl_egress_lgcy_groups_destroy(struct mlx5_vport *vport) + int esw_acl_egress_lgcy_setup(struct mlx5_eswitch *esw, + struct mlx5_vport *vport) + { ++ bool vst_mode_steering = esw_vst_mode_is_steering(esw); + struct mlx5_flow_destination drop_ctr_dst = {}; + struct mlx5_flow_destination *dst = NULL; + struct mlx5_fc *drop_counter = NULL; +@@ -77,6 +78,7 @@ int esw_acl_egress_lgcy_setup(struct mlx5_eswitch *esw, + */ + int table_size = 2; + int dest_num = 0; ++ int actions_flag; + int err = 0; + + if (vport->egress.legacy.drop_counter) { +@@ -119,8 +121,11 @@ int esw_acl_egress_lgcy_setup(struct mlx5_eswitch *esw, + vport->vport, vport->info.vlan, vport->info.qos); + + /* Allowed vlan rule */ ++ actions_flag = MLX5_FLOW_CONTEXT_ACTION_ALLOW; ++ if (vst_mode_steering) ++ actions_flag |= MLX5_FLOW_CONTEXT_ACTION_VLAN_POP; + err = esw_egress_acl_vlan_create(esw, vport, NULL, vport->info.vlan, +- MLX5_FLOW_CONTEXT_ACTION_ALLOW); ++ actions_flag); + if (err) + goto out; + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c +index b1a5199260f6..093ed86a0acd 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c +@@ -139,11 +139,14 @@ static void esw_acl_ingress_lgcy_groups_destroy(struct mlx5_vport *vport) + int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw, + struct mlx5_vport *vport) + { ++ bool vst_mode_steering = esw_vst_mode_is_steering(esw); + struct mlx5_flow_destination drop_ctr_dst = {}; + struct mlx5_flow_destination *dst = NULL; + struct mlx5_flow_act flow_act = {}; + struct mlx5_flow_spec *spec = NULL; + struct mlx5_fc *counter = NULL; ++ bool vst_check_cvlan = false; ++ bool vst_push_cvlan = false; + /* The ingress acl table contains 4 groups + * (2 active rules at the same time - + * 1 allow rule from one of the first 3 groups. +@@ -203,7 +206,26 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw, + goto out; + } + +- if (vport->info.vlan || vport->info.qos) ++ if ((vport->info.vlan || vport->info.qos)) { ++ if (vst_mode_steering) ++ vst_push_cvlan = true; ++ else if (!MLX5_CAP_ESW(esw->dev, vport_cvlan_insert_always)) ++ vst_check_cvlan = true; ++ } ++ ++ if (vst_check_cvlan || vport->info.spoofchk) ++ spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS; ++ ++ /* Create ingress allow rule */ ++ flow_act.action = MLX5_FLOW_CONTEXT_ACTION_ALLOW; ++ if (vst_push_cvlan) { ++ flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH; ++ flow_act.vlan[0].prio = vport->info.qos; ++ flow_act.vlan[0].vid = vport->info.vlan; ++ flow_act.vlan[0].ethtype = ETH_P_8021Q; ++ } ++ ++ if (vst_check_cvlan) + MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, + outer_headers.cvlan_tag); + +@@ -218,9 +240,6 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw, + ether_addr_copy(smac_v, vport->info.mac); + } + +- /* Create ingress allow rule */ +- spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS; +- flow_act.action = MLX5_FLOW_CONTEXT_ACTION_ALLOW; + vport->ingress.allow_rule = mlx5_add_flow_rules(vport->ingress.acl, spec, + &flow_act, NULL, 0); + if (IS_ERR(vport->ingress.allow_rule)) { +@@ -232,6 +251,9 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw, + goto out; + } + ++ if (!vst_check_cvlan && !vport->info.spoofchk) ++ goto out; ++ + memset(&flow_act, 0, sizeof(flow_act)); + flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP; + /* Attach drop flow counter */ +@@ -257,7 +279,8 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw, + return 0; + + out: +- esw_acl_ingress_lgcy_cleanup(esw, vport); ++ if (err) ++ esw_acl_ingress_lgcy_cleanup(esw, vport); + kvfree(spec); + return err; + } +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +index 51a8cecc4a7c..2b9278002354 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +@@ -160,10 +160,17 @@ static int modify_esw_vport_cvlan(struct mlx5_core_dev *dev, u16 vport, + esw_vport_context.vport_cvlan_strip, 1); + + if (set_flags & SET_VLAN_INSERT) { +- /* insert only if no vlan in packet */ +- MLX5_SET(modify_esw_vport_context_in, in, +- esw_vport_context.vport_cvlan_insert, 1); +- ++ if (MLX5_CAP_ESW(dev, vport_cvlan_insert_always)) { ++ /* insert either if vlan exist in packet or not */ ++ MLX5_SET(modify_esw_vport_context_in, in, ++ esw_vport_context.vport_cvlan_insert, ++ MLX5_VPORT_CVLAN_INSERT_ALWAYS); ++ } else { ++ /* insert only if no vlan in packet */ ++ MLX5_SET(modify_esw_vport_context_in, in, ++ esw_vport_context.vport_cvlan_insert, ++ MLX5_VPORT_CVLAN_INSERT_WHEN_NO_CVLAN); ++ } + MLX5_SET(modify_esw_vport_context_in, in, + esw_vport_context.cvlan_pcp, qos); + MLX5_SET(modify_esw_vport_context_in, in, +@@ -773,6 +780,7 @@ static void esw_vport_cleanup_acl(struct mlx5_eswitch *esw, + + static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport) + { ++ bool vst_mode_steering = esw_vst_mode_is_steering(esw); + u16 vport_num = vport->vport; + int flags; + int err; +@@ -802,8 +810,9 @@ static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport) + + flags = (vport->info.vlan || vport->info.qos) ? + SET_VLAN_STRIP | SET_VLAN_INSERT : 0; +- modify_esw_vport_cvlan(esw->dev, vport_num, vport->info.vlan, +- vport->info.qos, flags); ++ if (esw->mode == MLX5_ESWITCH_OFFLOADS || !vst_mode_steering) ++ modify_esw_vport_cvlan(esw->dev, vport_num, vport->info.vlan, ++ vport->info.qos, flags); + + return 0; + } +@@ -1846,6 +1855,7 @@ int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw, + u16 vport, u16 vlan, u8 qos, u8 set_flags) + { + struct mlx5_vport *evport = mlx5_eswitch_get_vport(esw, vport); ++ bool vst_mode_steering = esw_vst_mode_is_steering(esw); + int err = 0; + + if (IS_ERR(evport)) +@@ -1853,9 +1863,11 @@ int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw, + if (vlan > 4095 || qos > 7) + return -EINVAL; + +- err = modify_esw_vport_cvlan(esw->dev, vport, vlan, qos, set_flags); +- if (err) +- return err; ++ if (esw->mode == MLX5_ESWITCH_OFFLOADS || !vst_mode_steering) { ++ err = modify_esw_vport_cvlan(esw->dev, vport, vlan, qos, set_flags); ++ if (err) ++ return err; ++ } + + evport->info.vlan = vlan; + evport->info.qos = qos; +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +index 2c7444101bb9..0e2c9e6fccb6 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +@@ -505,6 +505,12 @@ static inline bool mlx5_esw_qos_enabled(struct mlx5_eswitch *esw) + return esw->qos.enabled; + } + ++static inline bool esw_vst_mode_is_steering(struct mlx5_eswitch *esw) ++{ ++ return (MLX5_CAP_ESW_EGRESS_ACL(esw->dev, pop_vlan) && ++ MLX5_CAP_ESW_INGRESS_ACL(esw->dev, push_vlan)); ++} ++ + static inline bool mlx5_eswitch_vlan_actions_supported(struct mlx5_core_dev *dev, + u8 vlan_depth) + { +diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h +index 66eaf0aa7f69..3e72133545ca 100644 +--- a/include/linux/mlx5/device.h ++++ b/include/linux/mlx5/device.h +@@ -1074,6 +1074,11 @@ enum { + MLX5_VPORT_ADMIN_STATE_AUTO = 0x2, + }; + ++enum { ++ MLX5_VPORT_CVLAN_INSERT_WHEN_NO_CVLAN = 0x1, ++ MLX5_VPORT_CVLAN_INSERT_ALWAYS = 0x3, ++}; ++ + enum { + MLX5_L3_PROT_TYPE_IPV4 = 0, + MLX5_L3_PROT_TYPE_IPV6 = 1, +diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h +index cd9d1c95129e..49ea0004109e 100644 +--- a/include/linux/mlx5/mlx5_ifc.h ++++ b/include/linux/mlx5/mlx5_ifc.h +@@ -822,7 +822,8 @@ struct mlx5_ifc_e_switch_cap_bits { + u8 vport_svlan_insert[0x1]; + u8 vport_cvlan_insert_if_not_exist[0x1]; + u8 vport_cvlan_insert_overwrite[0x1]; +- u8 reserved_at_5[0x2]; ++ u8 reserved_at_5[0x1]; ++ u8 vport_cvlan_insert_always[0x1]; + u8 esw_shared_ingress_acl[0x1]; + u8 esw_uplink_ingress_acl[0x1]; + u8 root_ft_on_other_esw[0x1]; +-- +2.35.1 + diff --git a/queue-5.15/net-mlx5e-always-clear-dest-encap-in-neigh-update-de.patch b/queue-5.15/net-mlx5e-always-clear-dest-encap-in-neigh-update-de.patch new file mode 100644 index 00000000000..c6c08d92031 --- /dev/null +++ b/queue-5.15/net-mlx5e-always-clear-dest-encap-in-neigh-update-de.patch @@ -0,0 +1,54 @@ +From 832690e10d6c21471c98482c33b67ca6c5e6d54f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 5 Dec 2022 09:22:50 +0800 +Subject: net/mlx5e: Always clear dest encap in neigh-update-del + +From: Chris Mi + +[ Upstream commit 2951b2e142ecf6e0115df785ba91e91b6da74602 ] + +The cited commit introduced a bug for multiple encapsulations flow. +If one dest encap becomes invalid, the flow is set slow path flag. +But when other dests encap become invalid, they are not cleared due +to slow path flag of the flow. When neigh-update-add is running, it +will use invalid encap. + +Fix it by checking slow path flag after clearing dest encap. + +Fixes: 9a5f9cc794e1 ("net/mlx5e: Fix possible use-after-free deleting fdb rule") +Signed-off-by: Chris Mi +Reviewed-by: Roi Dayan +Signed-off-by: Saeed Mahameed +Signed-off-by: Sasha Levin +--- + .../net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c | 9 ++++++++- + 1 file changed, 8 insertions(+), 1 deletion(-) + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c +index 3b63d9c20580..a8d7f07ee2ca 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c +@@ -188,12 +188,19 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv, + int err; + + list_for_each_entry(flow, flow_list, tmp_list) { +- if (!mlx5e_is_offloaded_flow(flow) || flow_flag_test(flow, SLOW)) ++ if (!mlx5e_is_offloaded_flow(flow)) + continue; + attr = flow->attr; + esw_attr = attr->esw_attr; + spec = &attr->parse_attr->spec; + ++ /* Clear pkt_reformat before checking slow path flag. Because ++ * in next iteration, the same flow is already set slow path ++ * flag, but still need to clear the pkt_reformat. ++ */ ++ if (flow_flag_test(flow, SLOW)) ++ continue; ++ + /* update from encap rule to slow path rule */ + rule = mlx5e_tc_offload_to_slow_path(esw, flow, spec); + /* mark the flow's encap dest as non-valid */ +-- +2.35.1 + diff --git a/queue-5.15/net-mlx5e-fix-hw-mtu-initializing-at-xdp-sq-allocati.patch b/queue-5.15/net-mlx5e-fix-hw-mtu-initializing-at-xdp-sq-allocati.patch new file mode 100644 index 00000000000..9d2ccaf8d68 --- /dev/null +++ b/queue-5.15/net-mlx5e-fix-hw-mtu-initializing-at-xdp-sq-allocati.patch @@ -0,0 +1,48 @@ +From 58e76fdfc2426354183126d9dc8e4bb2981fa6d3 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 14 Dec 2022 16:02:57 +0200 +Subject: net/mlx5e: Fix hw mtu initializing at XDP SQ allocation + +From: Adham Faris + +[ Upstream commit 1e267ab88dc44c48f556218f7b7f14c76f7aa066 ] + +Current xdp xmit functions logic (mlx5e_xmit_xdp_frame_mpwqe or +mlx5e_xmit_xdp_frame), validates xdp packet length by comparing it to +hw mtu (configured at xdp sq allocation) before xmiting it. This check +does not account for ethernet fcs length (calculated and filled by the +nic). Hence, when we try sending packets with length > (hw-mtu - +ethernet-fcs-size), the device port drops it and tx_errors_phy is +incremented. Desired behavior is to catch these packets and drop them +by the driver. + +Fix this behavior in XDP SQ allocation function (mlx5e_alloc_xdpsq) by +subtracting ethernet FCS header size (4 Bytes) from current hw mtu +value, since ethernet FCS is calculated and written to ethernet frames +by the nic. + +Fixes: d8bec2b29a82 ("net/mlx5e: Support bpf_xdp_adjust_head()") +Signed-off-by: Adham Faris +Reviewed-by: Tariq Toukan +Signed-off-by: Saeed Mahameed +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +index c1c4f380803a..be19f5cf9d15 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +@@ -977,7 +977,7 @@ static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c, + sq->channel = c; + sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; + sq->min_inline_mode = params->tx_min_inline_mode; +- sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); ++ sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu) - ETH_FCS_LEN; + sq->xsk_pool = xsk_pool; + + sq->stats = sq->xsk_pool ? +-- +2.35.1 + diff --git a/queue-5.15/net-mlx5e-ipoib-don-t-allow-cqe-compression-to-be-tu.patch b/queue-5.15/net-mlx5e-ipoib-don-t-allow-cqe-compression-to-be-tu.patch new file mode 100644 index 00000000000..efa5d59ee5a --- /dev/null +++ b/queue-5.15/net-mlx5e-ipoib-don-t-allow-cqe-compression-to-be-tu.patch @@ -0,0 +1,45 @@ +From df6d34b51816db24f8e7948e7e64633a1f6e00dc Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 28 Nov 2022 15:24:21 +0200 +Subject: net/mlx5e: IPoIB, Don't allow CQE compression to be turned on by + default + +From: Dragos Tatulea + +[ Upstream commit b12d581e83e3ae1080c32ab83f123005bd89a840 ] + +mlx5e_build_nic_params will turn CQE compression on if the hardware +capability is enabled and the slow_pci_heuristic condition is detected. +As IPoIB doesn't support CQE compression, make sure to disable the +feature in the IPoIB profile init. + +Please note that the feature is not exposed to the user for IPoIB +interfaces, so it can't be subsequently turned on. + +Fixes: b797a684b0dd ("net/mlx5e: Enable CQE compression when PCI is slower than link") +Signed-off-by: Dragos Tatulea +Reviewed-by: Gal Pressman +Signed-off-by: Saeed Mahameed +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 4 ++++ + 1 file changed, 4 insertions(+) + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +index cfde0a45b8b8..10940b8dc83e 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +@@ -70,6 +70,10 @@ static void mlx5i_build_nic_params(struct mlx5_core_dev *mdev, + params->packet_merge.type = MLX5E_PACKET_MERGE_NONE; + params->hard_mtu = MLX5_IB_GRH_BYTES + MLX5_IPOIB_HARD_LEN; + params->tunneled_offload_en = false; ++ ++ /* CQE compression is not supported for IPoIB */ ++ params->rx_cqe_compress_def = false; ++ MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_CQE_COMPRESS, params->rx_cqe_compress_def); + } + + /* Called directly after IPoIB netdevice was created to initialize SW structs */ +-- +2.35.1 + diff --git a/queue-5.15/net-mlx5e-tc-refactor-mlx5e_tc_add_flow_mod_hdr-to-g.patch b/queue-5.15/net-mlx5e-tc-refactor-mlx5e_tc_add_flow_mod_hdr-to-g.patch new file mode 100644 index 00000000000..eb453f71f50 --- /dev/null +++ b/queue-5.15/net-mlx5e-tc-refactor-mlx5e_tc_add_flow_mod_hdr-to-g.patch @@ -0,0 +1,93 @@ +From c19f012f0f102568ee5fa886f8c96647b712917c Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 25 Nov 2021 14:32:58 +0200 +Subject: net/mlx5e: TC, Refactor mlx5e_tc_add_flow_mod_hdr() to get flow attr + +From: Roi Dayan + +[ Upstream commit ff99316700799b84e842f819a44db608557bae3e ] + +In later commit we are going to instantiate multiple attr instances +for flow instead of single attr. +Make sure mlx5e_tc_add_flow_mod_hdr() use the correct attr and not flow->attr. + +Signed-off-by: Roi Dayan +Reviewed-by: Oz Shlomo +Signed-off-by: Saeed Mahameed +Stable-dep-of: 2951b2e142ec ("net/mlx5e: Always clear dest encap in neigh-update-del") +Signed-off-by: Sasha Levin +--- + .../ethernet/mellanox/mlx5/core/en/tc_tun_encap.c | 2 +- + drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12 ++++++------ + drivers/net/ethernet/mellanox/mlx5/core/en_tc.h | 4 ++-- + 3 files changed, 9 insertions(+), 9 deletions(-) + +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c +index 700c463ea367..3b63d9c20580 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c +@@ -1342,7 +1342,7 @@ static void mlx5e_reoffload_encap(struct mlx5e_priv *priv, + continue; + } + +- err = mlx5e_tc_add_flow_mod_hdr(priv, parse_attr, flow); ++ err = mlx5e_tc_add_flow_mod_hdr(priv, flow, attr); + if (err) { + mlx5_core_warn(priv->mdev, "Failed to update flow mod_hdr err=%d", + err); +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +index 843c8435387f..8f2f99689aba 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +@@ -1342,10 +1342,10 @@ int mlx5e_tc_query_route_vport(struct net_device *out_dev, struct net_device *ro + } + + int mlx5e_tc_add_flow_mod_hdr(struct mlx5e_priv *priv, +- struct mlx5e_tc_flow_parse_attr *parse_attr, +- struct mlx5e_tc_flow *flow) ++ struct mlx5e_tc_flow *flow, ++ struct mlx5_flow_attr *attr) + { +- struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts = &parse_attr->mod_hdr_acts; ++ struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts = &attr->parse_attr->mod_hdr_acts; + struct mlx5_modify_hdr *mod_hdr; + + mod_hdr = mlx5_modify_header_alloc(priv->mdev, +@@ -1355,8 +1355,8 @@ int mlx5e_tc_add_flow_mod_hdr(struct mlx5e_priv *priv, + if (IS_ERR(mod_hdr)) + return PTR_ERR(mod_hdr); + +- WARN_ON(flow->attr->modify_hdr); +- flow->attr->modify_hdr = mod_hdr; ++ WARN_ON(attr->modify_hdr); ++ attr->modify_hdr = mod_hdr; + + return 0; + } +@@ -1457,7 +1457,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, + if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR && + !(attr->ct_attr.ct_action & TCA_CT_ACT_CLEAR)) { + if (vf_tun) { +- err = mlx5e_tc_add_flow_mod_hdr(priv, parse_attr, flow); ++ err = mlx5e_tc_add_flow_mod_hdr(priv, flow, attr); + if (err) + goto err_out; + } else { +diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h +index 1a4cd882f0fb..f48af82781f8 100644 +--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h ++++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h +@@ -241,8 +241,8 @@ int mlx5e_tc_match_to_reg_set_and_get_id(struct mlx5_core_dev *mdev, + u32 data); + + int mlx5e_tc_add_flow_mod_hdr(struct mlx5e_priv *priv, +- struct mlx5e_tc_flow_parse_attr *parse_attr, +- struct mlx5e_tc_flow *flow); ++ struct mlx5e_tc_flow *flow, ++ struct mlx5_flow_attr *attr); + + int alloc_mod_hdr_actions(struct mlx5_core_dev *mdev, + int namespace, +-- +2.35.1 + diff --git a/queue-5.15/net-phy-xgmiitorgmii-fix-refcount-leak-in-xgmiitorgm.patch b/queue-5.15/net-phy-xgmiitorgmii-fix-refcount-leak-in-xgmiitorgm.patch new file mode 100644 index 00000000000..b0aa2c00145 --- /dev/null +++ b/queue-5.15/net-phy-xgmiitorgmii-fix-refcount-leak-in-xgmiitorgm.patch @@ -0,0 +1,35 @@ +From 3975799d05c51e7cc5969695f782f66cc5d950a6 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 29 Dec 2022 10:29:25 +0400 +Subject: net: phy: xgmiitorgmii: Fix refcount leak in xgmiitorgmii_probe + +From: Miaoqian Lin + +[ Upstream commit d039535850ee47079d59527e96be18d8e0daa84b ] + +of_phy_find_device() return device node with refcount incremented. +Call put_device() to relese it when not needed anymore. + +Fixes: ab4e6ee578e8 ("net: phy: xgmiitorgmii: Check phy_driver ready before accessing") +Signed-off-by: Miaoqian Lin +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/phy/xilinx_gmii2rgmii.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/drivers/net/phy/xilinx_gmii2rgmii.c b/drivers/net/phy/xilinx_gmii2rgmii.c +index 8dcb49ed1f3d..7fd9fe6a602b 100644 +--- a/drivers/net/phy/xilinx_gmii2rgmii.c ++++ b/drivers/net/phy/xilinx_gmii2rgmii.c +@@ -105,6 +105,7 @@ static int xgmiitorgmii_probe(struct mdio_device *mdiodev) + + if (!priv->phy_dev->drv) { + dev_info(dev, "Attached phy not ready\n"); ++ put_device(&priv->phy_dev->mdio.dev); + return -EPROBE_DEFER; + } + +-- +2.35.1 + diff --git a/queue-5.15/net-sched-atm-dont-intepret-cls-results-when-asked-t.patch b/queue-5.15/net-sched-atm-dont-intepret-cls-results-when-asked-t.patch new file mode 100644 index 00000000000..b7d06d49666 --- /dev/null +++ b/queue-5.15/net-sched-atm-dont-intepret-cls-results-when-asked-t.patch @@ -0,0 +1,42 @@ +From df428364ce260d3e549a06999370ef16a0e69a62 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Sun, 1 Jan 2023 16:57:43 -0500 +Subject: net: sched: atm: dont intepret cls results when asked to drop + +From: Jamal Hadi Salim + +[ Upstream commit a2965c7be0522eaa18808684b7b82b248515511b ] + +If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume +res.class contains a valid pointer +Fixes: b0188d4dbe5f ("[NET_SCHED]: sch_atm: Lindent") + +Signed-off-by: Jamal Hadi Salim +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + net/sched/sch_atm.c | 5 ++++- + 1 file changed, 4 insertions(+), 1 deletion(-) + +diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c +index 70fe1c5e44ad..33737169cc2d 100644 +--- a/net/sched/sch_atm.c ++++ b/net/sched/sch_atm.c +@@ -397,10 +397,13 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch, + result = tcf_classify(skb, NULL, fl, &res, true); + if (result < 0) + continue; ++ if (result == TC_ACT_SHOT) ++ goto done; ++ + flow = (struct atm_flow_data *)res.class; + if (!flow) + flow = lookup_flow(sch, res.classid); +- goto done; ++ goto drop; + } + } + flow = NULL; +-- +2.35.1 + diff --git a/queue-5.15/net-sched-cbq-dont-intepret-cls-results-when-asked-t.patch b/queue-5.15/net-sched-cbq-dont-intepret-cls-results-when-asked-t.patch new file mode 100644 index 00000000000..3e0641e7977 --- /dev/null +++ b/queue-5.15/net-sched-cbq-dont-intepret-cls-results-when-asked-t.patch @@ -0,0 +1,147 @@ +From 246a465d74c4fe2cae5764964db64fe634785eda Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Sun, 1 Jan 2023 16:57:44 -0500 +Subject: net: sched: cbq: dont intepret cls results when asked to drop + +From: Jamal Hadi Salim + +[ Upstream commit caa4b35b4317d5147b3ab0fbdc9c075c7d2e9c12 ] + +If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume that +res.class contains a valid pointer + +Sample splat reported by Kyle Zeng + +[ 5.405624] 0: reclassify loop, rule prio 0, protocol 800 +[ 5.406326] ================================================================== +[ 5.407240] BUG: KASAN: slab-out-of-bounds in cbq_enqueue+0x54b/0xea0 +[ 5.407987] Read of size 1 at addr ffff88800e3122aa by task poc/299 +[ 5.408731] +[ 5.408897] CPU: 0 PID: 299 Comm: poc Not tainted 5.10.155+ #15 +[ 5.409516] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), +BIOS 1.15.0-1 04/01/2014 +[ 5.410439] Call Trace: +[ 5.410764] dump_stack+0x87/0xcd +[ 5.411153] print_address_description+0x7a/0x6b0 +[ 5.411687] ? vprintk_func+0xb9/0xc0 +[ 5.411905] ? printk+0x76/0x96 +[ 5.412110] ? cbq_enqueue+0x54b/0xea0 +[ 5.412323] kasan_report+0x17d/0x220 +[ 5.412591] ? cbq_enqueue+0x54b/0xea0 +[ 5.412803] __asan_report_load1_noabort+0x10/0x20 +[ 5.413119] cbq_enqueue+0x54b/0xea0 +[ 5.413400] ? __kasan_check_write+0x10/0x20 +[ 5.413679] __dev_queue_xmit+0x9c0/0x1db0 +[ 5.413922] dev_queue_xmit+0xc/0x10 +[ 5.414136] ip_finish_output2+0x8bc/0xcd0 +[ 5.414436] __ip_finish_output+0x472/0x7a0 +[ 5.414692] ip_finish_output+0x5c/0x190 +[ 5.414940] ip_output+0x2d8/0x3c0 +[ 5.415150] ? ip_mc_finish_output+0x320/0x320 +[ 5.415429] __ip_queue_xmit+0x753/0x1760 +[ 5.415664] ip_queue_xmit+0x47/0x60 +[ 5.415874] __tcp_transmit_skb+0x1ef9/0x34c0 +[ 5.416129] tcp_connect+0x1f5e/0x4cb0 +[ 5.416347] tcp_v4_connect+0xc8d/0x18c0 +[ 5.416577] __inet_stream_connect+0x1ae/0xb40 +[ 5.416836] ? local_bh_enable+0x11/0x20 +[ 5.417066] ? lock_sock_nested+0x175/0x1d0 +[ 5.417309] inet_stream_connect+0x5d/0x90 +[ 5.417548] ? __inet_stream_connect+0xb40/0xb40 +[ 5.417817] __sys_connect+0x260/0x2b0 +[ 5.418037] __x64_sys_connect+0x76/0x80 +[ 5.418267] do_syscall_64+0x31/0x50 +[ 5.418477] entry_SYSCALL_64_after_hwframe+0x61/0xc6 +[ 5.418770] RIP: 0033:0x473bb7 +[ 5.418952] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 +00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00 +00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34 +24 89 +[ 5.420046] RSP: 002b:00007fffd20eb0f8 EFLAGS: 00000246 ORIG_RAX: +000000000000002a +[ 5.420472] RAX: ffffffffffffffda RBX: 00007fffd20eb578 RCX: 0000000000473bb7 +[ 5.420872] RDX: 0000000000000010 RSI: 00007fffd20eb110 RDI: 0000000000000007 +[ 5.421271] RBP: 00007fffd20eb150 R08: 0000000000000001 R09: 0000000000000004 +[ 5.421671] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 +[ 5.422071] R13: 00007fffd20eb568 R14: 00000000004fc740 R15: 0000000000000002 +[ 5.422471] +[ 5.422562] Allocated by task 299: +[ 5.422782] __kasan_kmalloc+0x12d/0x160 +[ 5.423007] kasan_kmalloc+0x5/0x10 +[ 5.423208] kmem_cache_alloc_trace+0x201/0x2e0 +[ 5.423492] tcf_proto_create+0x65/0x290 +[ 5.423721] tc_new_tfilter+0x137e/0x1830 +[ 5.423957] rtnetlink_rcv_msg+0x730/0x9f0 +[ 5.424197] netlink_rcv_skb+0x166/0x300 +[ 5.424428] rtnetlink_rcv+0x11/0x20 +[ 5.424639] netlink_unicast+0x673/0x860 +[ 5.424870] netlink_sendmsg+0x6af/0x9f0 +[ 5.425100] __sys_sendto+0x58d/0x5a0 +[ 5.425315] __x64_sys_sendto+0xda/0xf0 +[ 5.425539] do_syscall_64+0x31/0x50 +[ 5.425764] entry_SYSCALL_64_after_hwframe+0x61/0xc6 +[ 5.426065] +[ 5.426157] The buggy address belongs to the object at ffff88800e312200 +[ 5.426157] which belongs to the cache kmalloc-128 of size 128 +[ 5.426955] The buggy address is located 42 bytes to the right of +[ 5.426955] 128-byte region [ffff88800e312200, ffff88800e312280) +[ 5.427688] The buggy address belongs to the page: +[ 5.427992] page:000000009875fabc refcount:1 mapcount:0 +mapping:0000000000000000 index:0x0 pfn:0xe312 +[ 5.428562] flags: 0x100000000000200(slab) +[ 5.428812] raw: 0100000000000200 dead000000000100 dead000000000122 +ffff888007843680 +[ 5.429325] raw: 0000000000000000 0000000000100010 00000001ffffffff +ffff88800e312401 +[ 5.429875] page dumped because: kasan: bad access detected +[ 5.430214] page->mem_cgroup:ffff88800e312401 +[ 5.430471] +[ 5.430564] Memory state around the buggy address: +[ 5.430846] ffff88800e312180: fc fc fc fc fc fc fc fc fc fc fc fc +fc fc fc fc +[ 5.431267] ffff88800e312200: 00 00 00 00 00 00 00 00 00 00 00 00 +00 00 00 fc +[ 5.431705] >ffff88800e312280: fc fc fc fc fc fc fc fc fc fc fc fc +fc fc fc fc +[ 5.432123] ^ +[ 5.432391] ffff88800e312300: 00 00 00 00 00 00 00 00 00 00 00 00 +00 00 00 fc +[ 5.432810] ffff88800e312380: fc fc fc fc fc fc fc fc fc fc fc fc +fc fc fc fc +[ 5.433229] ================================================================== +[ 5.433648] Disabling lock debugging due to kernel taint + +Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") +Reported-by: Kyle Zeng +Signed-off-by: Jamal Hadi Salim +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + net/sched/sch_cbq.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c +index fd7e10567371..46b3dd71777d 100644 +--- a/net/sched/sch_cbq.c ++++ b/net/sched/sch_cbq.c +@@ -231,6 +231,8 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) + result = tcf_classify(skb, NULL, fl, &res, true); + if (!fl || result < 0) + goto fallback; ++ if (result == TC_ACT_SHOT) ++ return NULL; + + cl = (void *)res.class; + if (!cl) { +@@ -251,8 +253,6 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) + case TC_ACT_TRAP: + *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; + fallthrough; +- case TC_ACT_SHOT: +- return NULL; + case TC_ACT_RECLASSIFY: + return cbq_reclassify(skb, cl); + } +-- +2.35.1 + diff --git a/queue-5.15/net-sched-fix-memory-leak-in-tcindex_set_parms.patch b/queue-5.15/net-sched-fix-memory-leak-in-tcindex_set_parms.patch new file mode 100644 index 00000000000..21bcd7aaeb2 --- /dev/null +++ b/queue-5.15/net-sched-fix-memory-leak-in-tcindex_set_parms.patch @@ -0,0 +1,150 @@ +From 1ec8a17054a30fe7fe0405e46c24de60de5b1cc1 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 22 Dec 2022 11:51:19 +0800 +Subject: net: sched: fix memory leak in tcindex_set_parms + +From: Hawkins Jiawei + +[ Upstream commit 399ab7fe0fa0d846881685fd4e57e9a8ef7559f7 ] + +Syzkaller reports a memory leak as follows: +==================================== +BUG: memory leak +unreferenced object 0xffff88810c287f00 (size 256): + comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s) + hex dump (first 32 bytes): + 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ + 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ + backtrace: + [] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 + [] kmalloc include/linux/slab.h:576 [inline] + [] kmalloc_array include/linux/slab.h:627 [inline] + [] kcalloc include/linux/slab.h:659 [inline] + [] tcf_exts_init include/net/pkt_cls.h:250 [inline] + [] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342 + [] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553 + [] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147 + [] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082 + [] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540 + [] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] + [] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 + [] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 + [] sock_sendmsg_nosec net/socket.c:714 [inline] + [] sock_sendmsg+0x56/0x80 net/socket.c:734 + [] ____sys_sendmsg+0x178/0x410 net/socket.c:2482 + [] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 + [] __sys_sendmmsg+0x105/0x330 net/socket.c:2622 + [] __do_sys_sendmmsg net/socket.c:2651 [inline] + [] __se_sys_sendmmsg net/socket.c:2648 [inline] + [] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648 + [] do_syscall_x64 arch/x86/entry/common.c:50 [inline] + [] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 + [] entry_SYSCALL_64_after_hwframe+0x63/0xcd +==================================== + +Kernel uses tcindex_change() to change an existing +filter properties. + +Yet the problem is that, during the process of changing, +if `old_r` is retrieved from `p->perfect`, then +kernel uses tcindex_alloc_perfect_hash() to newly +allocate filter results, uses tcindex_filter_result_init() +to clear the old filter result, without destroying +its tcf_exts structure, which triggers the above memory leak. + +To be more specific, there are only two source for the `old_r`, +according to the tcindex_lookup(). `old_r` is retrieved from +`p->perfect`, or `old_r` is retrieved from `p->h`. + + * If `old_r` is retrieved from `p->perfect`, kernel uses +tcindex_alloc_perfect_hash() to newly allocate the +filter results. Then `r` is assigned with `cp->perfect + handle`, +which is newly allocated. So condition `old_r && old_r != r` is +true in this situation, and kernel uses tcindex_filter_result_init() +to clear the old filter result, without destroying +its tcf_exts structure + + * If `old_r` is retrieved from `p->h`, then `p->perfect` is NULL +according to the tcindex_lookup(). Considering that `cp->h` +is directly copied from `p->h` and `p->perfect` is NULL, +`r` is assigned with `tcindex_lookup(cp, handle)`, whose value +should be the same as `old_r`, so condition `old_r && old_r != r` +is false in this situation, kernel ignores using +tcindex_filter_result_init() to clear the old filter result. + +So only when `old_r` is retrieved from `p->perfect` does kernel use +tcindex_filter_result_init() to clear the old filter result, which +triggers the above memory leak. + +Considering that there already exists a tc_filter_wq workqueue +to destroy the old tcindex_data by tcindex_partial_destroy_work() +at the end of tcindex_set_parms(), this patch solves +this memory leak bug by removing this old filter result +clearing part and delegating it to the tc_filter_wq workqueue. + +Note that this patch doesn't introduce any other issues. If +`old_r` is retrieved from `p->perfect`, this patch just +delegates old filter result clearing part to the +tc_filter_wq workqueue; If `old_r` is retrieved from `p->h`, +kernel doesn't reach the old filter result clearing part, so +removing this part has no effect. + +[Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni +and Dmitry Vyukov] + +Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") +Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/ +Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com +Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com +Cc: Cong Wang +Cc: Jakub Kicinski +Cc: Paolo Abeni +Cc: Dmitry Vyukov +Acked-by: Paolo Abeni +Signed-off-by: Hawkins Jiawei +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + net/sched/cls_tcindex.c | 12 ++---------- + 1 file changed, 2 insertions(+), 10 deletions(-) + +diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c +index 742c7d49a958..8d1ef858db87 100644 +--- a/net/sched/cls_tcindex.c ++++ b/net/sched/cls_tcindex.c +@@ -332,7 +332,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, + struct tcindex_filter_result *r, struct nlattr **tb, + struct nlattr *est, u32 flags, struct netlink_ext_ack *extack) + { +- struct tcindex_filter_result new_filter_result, *old_r = r; ++ struct tcindex_filter_result new_filter_result; + struct tcindex_data *cp = NULL, *oldp; + struct tcindex_filter *f = NULL; /* make gcc behave */ + struct tcf_result cr = {}; +@@ -401,7 +401,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, + err = tcindex_filter_result_init(&new_filter_result, cp, net); + if (err < 0) + goto errout_alloc; +- if (old_r) ++ if (r) + cr = r->res; + + err = -EBUSY; +@@ -478,14 +478,6 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, + tcf_bind_filter(tp, &cr, base); + } + +- if (old_r && old_r != r) { +- err = tcindex_filter_result_init(old_r, cp, net); +- if (err < 0) { +- kfree(f); +- goto errout_alloc; +- } +- } +- + oldp = p; + r->res = cr; + tcf_exts_change(&r->exts, &e); +-- +2.35.1 + diff --git a/queue-5.15/net-sparx5-fix-reading-of-the-mac-address.patch b/queue-5.15/net-sparx5-fix-reading-of-the-mac-address.patch new file mode 100644 index 00000000000..f5fb9395804 --- /dev/null +++ b/queue-5.15/net-sparx5-fix-reading-of-the-mac-address.patch @@ -0,0 +1,40 @@ +From cda4a152044e1a626ed102b101ddbb22e8c1ca9c Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 2 Jan 2023 13:12:15 +0100 +Subject: net: sparx5: Fix reading of the MAC address + +From: Horatiu Vultur + +[ Upstream commit 588ab2dc25f60efeb516b4abedb6c551949cc185 ] + +There is an issue with the checking of the return value of +'of_get_mac_address', which returns 0 on success and negative value on +failure. The driver interpretated the result the opposite way. Therefore +if there was a MAC address defined in the DT, then the driver was +generating a random MAC address otherwise it would use address 0. +Fix this by checking correctly the return value of 'of_get_mac_address' + +Fixes: b74ef9f9cb91 ("net: sparx5: Do not use mac_addr uninitialized in mchp_sparx5_probe()") +Signed-off-by: Horatiu Vultur +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/microchip/sparx5/sparx5_main.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_main.c b/drivers/net/ethernet/microchip/sparx5/sparx5_main.c +index 0463f20da17b..174d89ee6374 100644 +--- a/drivers/net/ethernet/microchip/sparx5/sparx5_main.c ++++ b/drivers/net/ethernet/microchip/sparx5/sparx5_main.c +@@ -779,7 +779,7 @@ static int mchp_sparx5_probe(struct platform_device *pdev) + if (err) + goto cleanup_config; + +- if (!of_get_mac_address(np, sparx5->base_mac)) { ++ if (of_get_mac_address(np, sparx5->base_mac)) { + dev_info(sparx5->dev, "MAC addr was not set, use random MAC\n"); + eth_random_addr(sparx5->base_mac); + sparx5->base_mac[5] = 0; +-- +2.35.1 + diff --git a/queue-5.15/netfilter-ipset-fix-hash-net-port-net-hang-with-0-su.patch b/queue-5.15/netfilter-ipset-fix-hash-net-port-net-hang-with-0-su.patch new file mode 100644 index 00000000000..fbc58bd841e --- /dev/null +++ b/queue-5.15/netfilter-ipset-fix-hash-net-port-net-hang-with-0-su.patch @@ -0,0 +1,109 @@ +From 343074436299a289e1c2449f38f32afc8e62b885 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 30 Dec 2022 13:24:37 +0100 +Subject: netfilter: ipset: fix hash:net,port,net hang with /0 subnet +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Jozsef Kadlecsik + +[ Upstream commit a31d47be64b9b74f8cfedffe03e0a8a1f9e51f23 ] + +The hash:net,port,net set type supports /0 subnets. However, the patch +commit 5f7b51bf09baca8e titled "netfilter: ipset: Limit the maximal range +of consecutive elements to add/delete" did not take into account it and +resulted in an endless loop. The bug is actually older but the patch +5f7b51bf09baca8e brings it out earlier. + +Handle /0 subnets properly in hash:net,port,net set types. + +Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete") +Reported-by: Марк Коренберг +Signed-off-by: Jozsef Kadlecsik +Signed-off-by: Pablo Neira Ayuso +Signed-off-by: Sasha Levin +--- + net/netfilter/ipset/ip_set_hash_netportnet.c | 40 ++++++++++---------- + 1 file changed, 21 insertions(+), 19 deletions(-) + +diff --git a/net/netfilter/ipset/ip_set_hash_netportnet.c b/net/netfilter/ipset/ip_set_hash_netportnet.c +index 19bcdb3141f6..005a7ce87217 100644 +--- a/net/netfilter/ipset/ip_set_hash_netportnet.c ++++ b/net/netfilter/ipset/ip_set_hash_netportnet.c +@@ -173,17 +173,26 @@ hash_netportnet4_kadt(struct ip_set *set, const struct sk_buff *skb, + return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags); + } + ++static u32 ++hash_netportnet4_range_to_cidr(u32 from, u32 to, u8 *cidr) ++{ ++ if (from == 0 && to == UINT_MAX) { ++ *cidr = 0; ++ return to; ++ } ++ return ip_set_range_to_cidr(from, to, cidr); ++} ++ + static int + hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) + { +- const struct hash_netportnet4 *h = set->data; ++ struct hash_netportnet4 *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_netportnet4_elem e = { }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 ip = 0, ip_to = 0, p = 0, port, port_to; +- u32 ip2_from = 0, ip2_to = 0, ip2, ipn; +- u64 n = 0, m = 0; ++ u32 ip2_from = 0, ip2_to = 0, ip2, i = 0; + bool with_ports = false; + int ret; + +@@ -285,19 +294,6 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[], + } else { + ip_set_mask_from_to(ip2_from, ip2_to, e.cidr[1]); + } +- ipn = ip; +- do { +- ipn = ip_set_range_to_cidr(ipn, ip_to, &e.cidr[0]); +- n++; +- } while (ipn++ < ip_to); +- ipn = ip2_from; +- do { +- ipn = ip_set_range_to_cidr(ipn, ip2_to, &e.cidr[1]); +- m++; +- } while (ipn++ < ip2_to); +- +- if (n*m*(port_to - port + 1) > IPSET_MAX_RANGE) +- return -ERANGE; + + if (retried) { + ip = ntohl(h->next.ip[0]); +@@ -310,13 +306,19 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[], + + do { + e.ip[0] = htonl(ip); +- ip = ip_set_range_to_cidr(ip, ip_to, &e.cidr[0]); ++ ip = hash_netportnet4_range_to_cidr(ip, ip_to, &e.cidr[0]); + for (; p <= port_to; p++) { + e.port = htons(p); + do { ++ i++; + e.ip[1] = htonl(ip2); +- ip2 = ip_set_range_to_cidr(ip2, ip2_to, +- &e.cidr[1]); ++ if (i > IPSET_MAX_RANGE) { ++ hash_netportnet4_data_next(&h->next, ++ &e); ++ return -ERANGE; ++ } ++ ip2 = hash_netportnet4_range_to_cidr(ip2, ++ ip2_to, &e.cidr[1]); + ret = adtfn(set, &e, &ext, &ext, flags); + if (ret && !ip_set_eexist(ret, flags)) + return ret; +-- +2.35.1 + diff --git a/queue-5.15/netfilter-ipset-rework-long-task-execution-when-addi.patch b/queue-5.15/netfilter-ipset-rework-long-task-execution-when-addi.patch new file mode 100644 index 00000000000..e61fbc77d89 --- /dev/null +++ b/queue-5.15/netfilter-ipset-rework-long-task-execution-when-addi.patch @@ -0,0 +1,462 @@ +From 668fc0babfd539b816162421aa4f25d5dcb68d7f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 30 Dec 2022 13:24:38 +0100 +Subject: netfilter: ipset: Rework long task execution when adding/deleting + entries + +From: Jozsef Kadlecsik + +[ Upstream commit 5e29dc36bd5e2166b834ceb19990d9e68a734d7d ] + +When adding/deleting large number of elements in one step in ipset, it can +take a reasonable amount of time and can result in soft lockup errors. The +patch 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of +consecutive elements to add/delete") tried to fix it by limiting the max +elements to process at all. However it was not enough, it is still possible +that we get hung tasks. Lowering the limit is not reasonable, so the +approach in this patch is as follows: rely on the method used at resizing +sets and save the state when we reach a smaller internal batch limit, +unlock/lock and proceed from the saved state. Thus we can avoid long +continuous tasks and at the same time removed the limit to add/delete large +number of elements in one step. + +The nfnl mutex is held during the whole operation which prevents one to +issue other ipset commands in parallel. + +Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete") +Reported-by: syzbot+9204e7399656300bf271@syzkaller.appspotmail.com +Signed-off-by: Jozsef Kadlecsik +Signed-off-by: Pablo Neira Ayuso +Signed-off-by: Sasha Levin +--- + include/linux/netfilter/ipset/ip_set.h | 2 +- + net/netfilter/ipset/ip_set_core.c | 7 ++++--- + net/netfilter/ipset/ip_set_hash_ip.c | 14 ++++++------- + net/netfilter/ipset/ip_set_hash_ipmark.c | 13 ++++++------ + net/netfilter/ipset/ip_set_hash_ipport.c | 13 ++++++------ + net/netfilter/ipset/ip_set_hash_ipportip.c | 13 ++++++------ + net/netfilter/ipset/ip_set_hash_ipportnet.c | 13 +++++++----- + net/netfilter/ipset/ip_set_hash_net.c | 17 +++++++-------- + net/netfilter/ipset/ip_set_hash_netiface.c | 15 ++++++-------- + net/netfilter/ipset/ip_set_hash_netnet.c | 23 +++++++-------------- + net/netfilter/ipset/ip_set_hash_netport.c | 19 +++++++---------- + 11 files changed, 68 insertions(+), 81 deletions(-) + +diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h +index ada1296c87d5..72f5ebc5c97a 100644 +--- a/include/linux/netfilter/ipset/ip_set.h ++++ b/include/linux/netfilter/ipset/ip_set.h +@@ -197,7 +197,7 @@ struct ip_set_region { + }; + + /* Max range where every element is added/deleted in one step */ +-#define IPSET_MAX_RANGE (1<<20) ++#define IPSET_MAX_RANGE (1<<14) + + /* The max revision number supported by any set type + 1 */ + #define IPSET_REVISION_MAX 9 +diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c +index 16ae92054baa..ae061b27e446 100644 +--- a/net/netfilter/ipset/ip_set_core.c ++++ b/net/netfilter/ipset/ip_set_core.c +@@ -1698,9 +1698,10 @@ call_ad(struct net *net, struct sock *ctnl, struct sk_buff *skb, + ret = set->variant->uadt(set, tb, adt, &lineno, flags, retried); + ip_set_unlock(set); + retried = true; +- } while (ret == -EAGAIN && +- set->variant->resize && +- (ret = set->variant->resize(set, retried)) == 0); ++ } while (ret == -ERANGE || ++ (ret == -EAGAIN && ++ set->variant->resize && ++ (ret = set->variant->resize(set, retried)) == 0)); + + if (!ret || (ret == -IPSET_ERR_EXIST && eexist)) + return 0; +diff --git a/net/netfilter/ipset/ip_set_hash_ip.c b/net/netfilter/ipset/ip_set_hash_ip.c +index 75d556d71652..24adcdd7a0b1 100644 +--- a/net/netfilter/ipset/ip_set_hash_ip.c ++++ b/net/netfilter/ipset/ip_set_hash_ip.c +@@ -98,11 +98,11 @@ static int + hash_ip4_uadt(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) + { +- const struct hash_ip4 *h = set->data; ++ struct hash_ip4 *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_ip4_elem e = { 0 }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); +- u32 ip = 0, ip_to = 0, hosts; ++ u32 ip = 0, ip_to = 0, hosts, i = 0; + int ret = 0; + + if (tb[IPSET_ATTR_LINENO]) +@@ -147,14 +147,14 @@ hash_ip4_uadt(struct ip_set *set, struct nlattr *tb[], + + hosts = h->netmask == 32 ? 1 : 2 << (32 - h->netmask - 1); + +- /* 64bit division is not allowed on 32bit */ +- if (((u64)ip_to - ip + 1) >> (32 - h->netmask) > IPSET_MAX_RANGE) +- return -ERANGE; +- + if (retried) + ip = ntohl(h->next.ip); +- for (; ip <= ip_to;) { ++ for (; ip <= ip_to; i++) { + e.ip = htonl(ip); ++ if (i > IPSET_MAX_RANGE) { ++ hash_ip4_data_next(&h->next, &e); ++ return -ERANGE; ++ } + ret = adtfn(set, &e, &ext, &ext, flags); + if (ret && !ip_set_eexist(ret, flags)) + return ret; +diff --git a/net/netfilter/ipset/ip_set_hash_ipmark.c b/net/netfilter/ipset/ip_set_hash_ipmark.c +index 153de3457423..a22ec1a6f6ec 100644 +--- a/net/netfilter/ipset/ip_set_hash_ipmark.c ++++ b/net/netfilter/ipset/ip_set_hash_ipmark.c +@@ -97,11 +97,11 @@ static int + hash_ipmark4_uadt(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) + { +- const struct hash_ipmark4 *h = set->data; ++ struct hash_ipmark4 *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_ipmark4_elem e = { }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); +- u32 ip, ip_to = 0; ++ u32 ip, ip_to = 0, i = 0; + int ret; + + if (tb[IPSET_ATTR_LINENO]) +@@ -148,13 +148,14 @@ hash_ipmark4_uadt(struct ip_set *set, struct nlattr *tb[], + ip_set_mask_from_to(ip, ip_to, cidr); + } + +- if (((u64)ip_to - ip + 1) > IPSET_MAX_RANGE) +- return -ERANGE; +- + if (retried) + ip = ntohl(h->next.ip); +- for (; ip <= ip_to; ip++) { ++ for (; ip <= ip_to; ip++, i++) { + e.ip = htonl(ip); ++ if (i > IPSET_MAX_RANGE) { ++ hash_ipmark4_data_next(&h->next, &e); ++ return -ERANGE; ++ } + ret = adtfn(set, &e, &ext, &ext, flags); + + if (ret && !ip_set_eexist(ret, flags)) +diff --git a/net/netfilter/ipset/ip_set_hash_ipport.c b/net/netfilter/ipset/ip_set_hash_ipport.c +index 7303138e46be..10481760a9b2 100644 +--- a/net/netfilter/ipset/ip_set_hash_ipport.c ++++ b/net/netfilter/ipset/ip_set_hash_ipport.c +@@ -105,11 +105,11 @@ static int + hash_ipport4_uadt(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) + { +- const struct hash_ipport4 *h = set->data; ++ struct hash_ipport4 *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_ipport4_elem e = { .ip = 0 }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); +- u32 ip, ip_to = 0, p = 0, port, port_to; ++ u32 ip, ip_to = 0, p = 0, port, port_to, i = 0; + bool with_ports = false; + int ret; + +@@ -173,17 +173,18 @@ hash_ipport4_uadt(struct ip_set *set, struct nlattr *tb[], + swap(port, port_to); + } + +- if (((u64)ip_to - ip + 1)*(port_to - port + 1) > IPSET_MAX_RANGE) +- return -ERANGE; +- + if (retried) + ip = ntohl(h->next.ip); + for (; ip <= ip_to; ip++) { + p = retried && ip == ntohl(h->next.ip) ? ntohs(h->next.port) + : port; +- for (; p <= port_to; p++) { ++ for (; p <= port_to; p++, i++) { + e.ip = htonl(ip); + e.port = htons(p); ++ if (i > IPSET_MAX_RANGE) { ++ hash_ipport4_data_next(&h->next, &e); ++ return -ERANGE; ++ } + ret = adtfn(set, &e, &ext, &ext, flags); + + if (ret && !ip_set_eexist(ret, flags)) +diff --git a/net/netfilter/ipset/ip_set_hash_ipportip.c b/net/netfilter/ipset/ip_set_hash_ipportip.c +index 334fb1ad0e86..39a01934b153 100644 +--- a/net/netfilter/ipset/ip_set_hash_ipportip.c ++++ b/net/netfilter/ipset/ip_set_hash_ipportip.c +@@ -108,11 +108,11 @@ static int + hash_ipportip4_uadt(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) + { +- const struct hash_ipportip4 *h = set->data; ++ struct hash_ipportip4 *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_ipportip4_elem e = { .ip = 0 }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); +- u32 ip, ip_to = 0, p = 0, port, port_to; ++ u32 ip, ip_to = 0, p = 0, port, port_to, i = 0; + bool with_ports = false; + int ret; + +@@ -180,17 +180,18 @@ hash_ipportip4_uadt(struct ip_set *set, struct nlattr *tb[], + swap(port, port_to); + } + +- if (((u64)ip_to - ip + 1)*(port_to - port + 1) > IPSET_MAX_RANGE) +- return -ERANGE; +- + if (retried) + ip = ntohl(h->next.ip); + for (; ip <= ip_to; ip++) { + p = retried && ip == ntohl(h->next.ip) ? ntohs(h->next.port) + : port; +- for (; p <= port_to; p++) { ++ for (; p <= port_to; p++, i++) { + e.ip = htonl(ip); + e.port = htons(p); ++ if (i > IPSET_MAX_RANGE) { ++ hash_ipportip4_data_next(&h->next, &e); ++ return -ERANGE; ++ } + ret = adtfn(set, &e, &ext, &ext, flags); + + if (ret && !ip_set_eexist(ret, flags)) +diff --git a/net/netfilter/ipset/ip_set_hash_ipportnet.c b/net/netfilter/ipset/ip_set_hash_ipportnet.c +index 7df94f437f60..5c6de605a9fb 100644 +--- a/net/netfilter/ipset/ip_set_hash_ipportnet.c ++++ b/net/netfilter/ipset/ip_set_hash_ipportnet.c +@@ -160,12 +160,12 @@ static int + hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) + { +- const struct hash_ipportnet4 *h = set->data; ++ struct hash_ipportnet4 *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_ipportnet4_elem e = { .cidr = HOST_MASK - 1 }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 ip = 0, ip_to = 0, p = 0, port, port_to; +- u32 ip2_from = 0, ip2_to = 0, ip2; ++ u32 ip2_from = 0, ip2_to = 0, ip2, i = 0; + bool with_ports = false; + u8 cidr; + int ret; +@@ -253,9 +253,6 @@ hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[], + swap(port, port_to); + } + +- if (((u64)ip_to - ip + 1)*(port_to - port + 1) > IPSET_MAX_RANGE) +- return -ERANGE; +- + ip2_to = ip2_from; + if (tb[IPSET_ATTR_IP2_TO]) { + ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP2_TO], &ip2_to); +@@ -282,9 +279,15 @@ hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[], + for (; p <= port_to; p++) { + e.port = htons(p); + do { ++ i++; + e.ip2 = htonl(ip2); + ip2 = ip_set_range_to_cidr(ip2, ip2_to, &cidr); + e.cidr = cidr - 1; ++ if (i > IPSET_MAX_RANGE) { ++ hash_ipportnet4_data_next(&h->next, ++ &e); ++ return -ERANGE; ++ } + ret = adtfn(set, &e, &ext, &ext, flags); + + if (ret && !ip_set_eexist(ret, flags)) +diff --git a/net/netfilter/ipset/ip_set_hash_net.c b/net/netfilter/ipset/ip_set_hash_net.c +index 1422739d9aa2..ce0a9ce5a91f 100644 +--- a/net/netfilter/ipset/ip_set_hash_net.c ++++ b/net/netfilter/ipset/ip_set_hash_net.c +@@ -136,11 +136,11 @@ static int + hash_net4_uadt(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) + { +- const struct hash_net4 *h = set->data; ++ struct hash_net4 *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_net4_elem e = { .cidr = HOST_MASK }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); +- u32 ip = 0, ip_to = 0, ipn, n = 0; ++ u32 ip = 0, ip_to = 0, i = 0; + int ret; + + if (tb[IPSET_ATTR_LINENO]) +@@ -188,19 +188,16 @@ hash_net4_uadt(struct ip_set *set, struct nlattr *tb[], + if (ip + UINT_MAX == ip_to) + return -IPSET_ERR_HASH_RANGE; + } +- ipn = ip; +- do { +- ipn = ip_set_range_to_cidr(ipn, ip_to, &e.cidr); +- n++; +- } while (ipn++ < ip_to); +- +- if (n > IPSET_MAX_RANGE) +- return -ERANGE; + + if (retried) + ip = ntohl(h->next.ip); + do { ++ i++; + e.ip = htonl(ip); ++ if (i > IPSET_MAX_RANGE) { ++ hash_net4_data_next(&h->next, &e); ++ return -ERANGE; ++ } + ip = ip_set_range_to_cidr(ip, ip_to, &e.cidr); + ret = adtfn(set, &e, &ext, &ext, flags); + if (ret && !ip_set_eexist(ret, flags)) +diff --git a/net/netfilter/ipset/ip_set_hash_netiface.c b/net/netfilter/ipset/ip_set_hash_netiface.c +index 9810f5bf63f5..031073286236 100644 +--- a/net/netfilter/ipset/ip_set_hash_netiface.c ++++ b/net/netfilter/ipset/ip_set_hash_netiface.c +@@ -202,7 +202,7 @@ hash_netiface4_uadt(struct ip_set *set, struct nlattr *tb[], + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_netiface4_elem e = { .cidr = HOST_MASK, .elem = 1 }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); +- u32 ip = 0, ip_to = 0, ipn, n = 0; ++ u32 ip = 0, ip_to = 0, i = 0; + int ret; + + if (tb[IPSET_ATTR_LINENO]) +@@ -256,19 +256,16 @@ hash_netiface4_uadt(struct ip_set *set, struct nlattr *tb[], + } else { + ip_set_mask_from_to(ip, ip_to, e.cidr); + } +- ipn = ip; +- do { +- ipn = ip_set_range_to_cidr(ipn, ip_to, &e.cidr); +- n++; +- } while (ipn++ < ip_to); +- +- if (n > IPSET_MAX_RANGE) +- return -ERANGE; + + if (retried) + ip = ntohl(h->next.ip); + do { ++ i++; + e.ip = htonl(ip); ++ if (i > IPSET_MAX_RANGE) { ++ hash_netiface4_data_next(&h->next, &e); ++ return -ERANGE; ++ } + ip = ip_set_range_to_cidr(ip, ip_to, &e.cidr); + ret = adtfn(set, &e, &ext, &ext, flags); + +diff --git a/net/netfilter/ipset/ip_set_hash_netnet.c b/net/netfilter/ipset/ip_set_hash_netnet.c +index 3d09eefe998a..c07b70bf32db 100644 +--- a/net/netfilter/ipset/ip_set_hash_netnet.c ++++ b/net/netfilter/ipset/ip_set_hash_netnet.c +@@ -163,13 +163,12 @@ static int + hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) + { +- const struct hash_netnet4 *h = set->data; ++ struct hash_netnet4 *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_netnet4_elem e = { }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); + u32 ip = 0, ip_to = 0; +- u32 ip2 = 0, ip2_from = 0, ip2_to = 0, ipn; +- u64 n = 0, m = 0; ++ u32 ip2 = 0, ip2_from = 0, ip2_to = 0, i = 0; + int ret; + + if (tb[IPSET_ATTR_LINENO]) +@@ -245,19 +244,6 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[], + } else { + ip_set_mask_from_to(ip2_from, ip2_to, e.cidr[1]); + } +- ipn = ip; +- do { +- ipn = ip_set_range_to_cidr(ipn, ip_to, &e.cidr[0]); +- n++; +- } while (ipn++ < ip_to); +- ipn = ip2_from; +- do { +- ipn = ip_set_range_to_cidr(ipn, ip2_to, &e.cidr[1]); +- m++; +- } while (ipn++ < ip2_to); +- +- if (n*m > IPSET_MAX_RANGE) +- return -ERANGE; + + if (retried) { + ip = ntohl(h->next.ip[0]); +@@ -270,7 +256,12 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[], + e.ip[0] = htonl(ip); + ip = ip_set_range_to_cidr(ip, ip_to, &e.cidr[0]); + do { ++ i++; + e.ip[1] = htonl(ip2); ++ if (i > IPSET_MAX_RANGE) { ++ hash_netnet4_data_next(&h->next, &e); ++ return -ERANGE; ++ } + ip2 = ip_set_range_to_cidr(ip2, ip2_to, &e.cidr[1]); + ret = adtfn(set, &e, &ext, &ext, flags); + if (ret && !ip_set_eexist(ret, flags)) +diff --git a/net/netfilter/ipset/ip_set_hash_netport.c b/net/netfilter/ipset/ip_set_hash_netport.c +index 09cf72eb37f8..d1a0628df4ef 100644 +--- a/net/netfilter/ipset/ip_set_hash_netport.c ++++ b/net/netfilter/ipset/ip_set_hash_netport.c +@@ -154,12 +154,11 @@ static int + hash_netport4_uadt(struct ip_set *set, struct nlattr *tb[], + enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) + { +- const struct hash_netport4 *h = set->data; ++ struct hash_netport4 *h = set->data; + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_netport4_elem e = { .cidr = HOST_MASK - 1 }; + struct ip_set_ext ext = IP_SET_INIT_UEXT(set); +- u32 port, port_to, p = 0, ip = 0, ip_to = 0, ipn; +- u64 n = 0; ++ u32 port, port_to, p = 0, ip = 0, ip_to = 0, i = 0; + bool with_ports = false; + u8 cidr; + int ret; +@@ -236,14 +235,6 @@ hash_netport4_uadt(struct ip_set *set, struct nlattr *tb[], + } else { + ip_set_mask_from_to(ip, ip_to, e.cidr + 1); + } +- ipn = ip; +- do { +- ipn = ip_set_range_to_cidr(ipn, ip_to, &cidr); +- n++; +- } while (ipn++ < ip_to); +- +- if (n*(port_to - port + 1) > IPSET_MAX_RANGE) +- return -ERANGE; + + if (retried) { + ip = ntohl(h->next.ip); +@@ -255,8 +246,12 @@ hash_netport4_uadt(struct ip_set *set, struct nlattr *tb[], + e.ip = htonl(ip); + ip = ip_set_range_to_cidr(ip, ip_to, &cidr); + e.cidr = cidr - 1; +- for (; p <= port_to; p++) { ++ for (; p <= port_to; p++, i++) { + e.port = htons(p); ++ if (i > IPSET_MAX_RANGE) { ++ hash_netport4_data_next(&h->next, &e); ++ return -ERANGE; ++ } + ret = adtfn(set, &e, &ext, &ext, flags); + if (ret && !ip_set_eexist(ret, flags)) + return ret; +-- +2.35.1 + diff --git a/queue-5.15/netfilter-nf_tables-add-function-to-create-set-state.patch b/queue-5.15/netfilter-nf_tables-add-function-to-create-set-state.patch new file mode 100644 index 00000000000..f54f91ea25d --- /dev/null +++ b/queue-5.15/netfilter-nf_tables-add-function-to-create-set-state.patch @@ -0,0 +1,185 @@ +From 633c320f8cc2e3423c133c2c427ca7629a72e762 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 19 Dec 2022 18:00:10 +0100 +Subject: netfilter: nf_tables: add function to create set stateful expressions + +From: Pablo Neira Ayuso + +[ Upstream commit a8fe4154fa5a1bae590b243ed60f871e5a5e1378 ] + +Add a helper function to allocate and initialize the stateful expressions +that are defined in a set. + +This patch allows to reuse this code from the set update path, to check +that type of the update matches the existing set in the kernel. + +Signed-off-by: Pablo Neira Ayuso +Stable-dep-of: f6594c372afd ("netfilter: nf_tables: perform type checking for existing sets") +Signed-off-by: Sasha Levin +--- + net/netfilter/nf_tables_api.c | 106 ++++++++++++++++++++++------------ + 1 file changed, 68 insertions(+), 38 deletions(-) + +diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c +index dd19726a9ac9..f892a926fb58 100644 +--- a/net/netfilter/nf_tables_api.c ++++ b/net/netfilter/nf_tables_api.c +@@ -4243,6 +4243,59 @@ static int nf_tables_set_desc_parse(struct nft_set_desc *desc, + return err; + } + ++static int nft_set_expr_alloc(struct nft_ctx *ctx, struct nft_set *set, ++ const struct nlattr * const *nla, ++ struct nft_expr **exprs, int *num_exprs, ++ u32 flags) ++{ ++ struct nft_expr *expr; ++ int err, i; ++ ++ if (nla[NFTA_SET_EXPR]) { ++ expr = nft_set_elem_expr_alloc(ctx, set, nla[NFTA_SET_EXPR]); ++ if (IS_ERR(expr)) { ++ err = PTR_ERR(expr); ++ goto err_set_expr_alloc; ++ } ++ exprs[0] = expr; ++ (*num_exprs)++; ++ } else if (nla[NFTA_SET_EXPRESSIONS]) { ++ struct nlattr *tmp; ++ int left; ++ ++ if (!(flags & NFT_SET_EXPR)) { ++ err = -EINVAL; ++ goto err_set_expr_alloc; ++ } ++ i = 0; ++ nla_for_each_nested(tmp, nla[NFTA_SET_EXPRESSIONS], left) { ++ if (i == NFT_SET_EXPR_MAX) { ++ err = -E2BIG; ++ goto err_set_expr_alloc; ++ } ++ if (nla_type(tmp) != NFTA_LIST_ELEM) { ++ err = -EINVAL; ++ goto err_set_expr_alloc; ++ } ++ expr = nft_set_elem_expr_alloc(ctx, set, tmp); ++ if (IS_ERR(expr)) { ++ err = PTR_ERR(expr); ++ goto err_set_expr_alloc; ++ } ++ exprs[i++] = expr; ++ (*num_exprs)++; ++ } ++ } ++ ++ return 0; ++ ++err_set_expr_alloc: ++ for (i = 0; i < *num_exprs; i++) ++ nft_expr_destroy(ctx, exprs[i]); ++ ++ return err; ++} ++ + static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + const struct nlattr * const nla[]) + { +@@ -4250,7 +4303,6 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + u8 genmask = nft_genmask_next(info->net); + u8 family = info->nfmsg->nfgen_family; + const struct nft_set_ops *ops; +- struct nft_expr *expr = NULL; + struct net *net = info->net; + struct nft_set_desc desc; + struct nft_table *table; +@@ -4258,6 +4310,7 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + struct nft_set *set; + struct nft_ctx ctx; + size_t alloc_size; ++ int num_exprs = 0; + char *name; + int err, i; + u16 udlen; +@@ -4384,6 +4437,8 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + return PTR_ERR(set); + } + } else { ++ struct nft_expr *exprs[NFT_SET_EXPR_MAX] = {}; ++ + if (info->nlh->nlmsg_flags & NLM_F_EXCL) { + NL_SET_BAD_ATTR(extack, nla[NFTA_SET_NAME]); + return -EEXIST; +@@ -4391,6 +4446,13 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + if (info->nlh->nlmsg_flags & NLM_F_REPLACE) + return -EOPNOTSUPP; + ++ err = nft_set_expr_alloc(&ctx, set, nla, exprs, &num_exprs, flags); ++ if (err < 0) ++ return err; ++ ++ for (i = 0; i < num_exprs; i++) ++ nft_expr_destroy(&ctx, exprs[i]); ++ + return 0; + } + +@@ -4458,43 +4520,11 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + if (err < 0) + goto err_set_init; + +- if (nla[NFTA_SET_EXPR]) { +- expr = nft_set_elem_expr_alloc(&ctx, set, nla[NFTA_SET_EXPR]); +- if (IS_ERR(expr)) { +- err = PTR_ERR(expr); +- goto err_set_expr_alloc; +- } +- set->exprs[0] = expr; +- set->num_exprs++; +- } else if (nla[NFTA_SET_EXPRESSIONS]) { +- struct nft_expr *expr; +- struct nlattr *tmp; +- int left; +- +- if (!(flags & NFT_SET_EXPR)) { +- err = -EINVAL; +- goto err_set_expr_alloc; +- } +- i = 0; +- nla_for_each_nested(tmp, nla[NFTA_SET_EXPRESSIONS], left) { +- if (i == NFT_SET_EXPR_MAX) { +- err = -E2BIG; +- goto err_set_expr_alloc; +- } +- if (nla_type(tmp) != NFTA_LIST_ELEM) { +- err = -EINVAL; +- goto err_set_expr_alloc; +- } +- expr = nft_set_elem_expr_alloc(&ctx, set, tmp); +- if (IS_ERR(expr)) { +- err = PTR_ERR(expr); +- goto err_set_expr_alloc; +- } +- set->exprs[i++] = expr; +- set->num_exprs++; +- } +- } ++ err = nft_set_expr_alloc(&ctx, set, nla, set->exprs, &num_exprs, flags); ++ if (err < 0) ++ goto err_set_destroy; + ++ set->num_exprs = num_exprs; + set->handle = nf_tables_alloc_handle(table); + + err = nft_trans_set_add(&ctx, NFT_MSG_NEWSET, set); +@@ -4508,7 +4538,7 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + err_set_expr_alloc: + for (i = 0; i < set->num_exprs; i++) + nft_expr_destroy(&ctx, set->exprs[i]); +- ++err_set_destroy: + ops->destroy(set); + err_set_init: + kfree(set->name); +-- +2.35.1 + diff --git a/queue-5.15/netfilter-nf_tables-consolidate-set-description.patch b/queue-5.15/netfilter-nf_tables-consolidate-set-description.patch new file mode 100644 index 00000000000..41f3e27f4d1 --- /dev/null +++ b/queue-5.15/netfilter-nf_tables-consolidate-set-description.patch @@ -0,0 +1,225 @@ +From 4852ad83332a627dbd89700e6769aa1553d78527 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 19 Dec 2022 20:07:52 +0100 +Subject: netfilter: nf_tables: consolidate set description + +From: Pablo Neira Ayuso + +[ Upstream commit bed4a63ea4ae77cfe5aae004ef87379f0655260a ] + +Add the following fields to the set description: + +- key type +- data type +- object type +- policy +- gc_int: garbage collection interval) +- timeout: element timeout + +This prepares for stricter set type checks on updates in a follow up +patch. + +Signed-off-by: Pablo Neira Ayuso +Stable-dep-of: f6594c372afd ("netfilter: nf_tables: perform type checking for existing sets") +Signed-off-by: Sasha Levin +--- + include/net/netfilter/nf_tables.h | 12 +++++++ + net/netfilter/nf_tables_api.c | 58 +++++++++++++++---------------- + 2 files changed, 40 insertions(+), 30 deletions(-) + +diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h +index 53746494eb84..5377dbfba120 100644 +--- a/include/net/netfilter/nf_tables.h ++++ b/include/net/netfilter/nf_tables.h +@@ -283,17 +283,29 @@ struct nft_set_iter { + /** + * struct nft_set_desc - description of set elements + * ++ * @ktype: key type + * @klen: key length ++ * @dtype: data type + * @dlen: data length ++ * @objtype: object type ++ * @flags: flags + * @size: number of set elements ++ * @policy: set policy ++ * @gc_int: garbage collector interval + * @field_len: length of each field in concatenation, bytes + * @field_count: number of concatenated fields in element + * @expr: set must support for expressions + */ + struct nft_set_desc { ++ u32 ktype; + unsigned int klen; ++ u32 dtype; + unsigned int dlen; ++ u32 objtype; + unsigned int size; ++ u32 policy; ++ u32 gc_int; ++ u64 timeout; + u8 field_len[NFT_REG32_COUNT]; + u8 field_count; + bool expr; +diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c +index 3fac57d66dda..dd19726a9ac9 100644 +--- a/net/netfilter/nf_tables_api.c ++++ b/net/netfilter/nf_tables_api.c +@@ -3635,8 +3635,7 @@ static bool nft_set_ops_candidate(const struct nft_set_type *type, u32 flags) + static const struct nft_set_ops * + nft_select_set_ops(const struct nft_ctx *ctx, + const struct nlattr * const nla[], +- const struct nft_set_desc *desc, +- enum nft_set_policies policy) ++ const struct nft_set_desc *desc) + { + struct nftables_pernet *nft_net = nft_pernet(ctx->net); + const struct nft_set_ops *ops, *bops; +@@ -3665,7 +3664,7 @@ nft_select_set_ops(const struct nft_ctx *ctx, + if (!ops->estimate(desc, flags, &est)) + continue; + +- switch (policy) { ++ switch (desc->policy) { + case NFT_SET_POL_PERFORMANCE: + if (est.lookup < best.lookup) + break; +@@ -4247,7 +4246,6 @@ static int nf_tables_set_desc_parse(struct nft_set_desc *desc, + static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + const struct nlattr * const nla[]) + { +- u32 ktype, dtype, flags, policy, gc_int, objtype; + struct netlink_ext_ack *extack = info->extack; + u8 genmask = nft_genmask_next(info->net); + u8 family = info->nfmsg->nfgen_family; +@@ -4260,10 +4258,10 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + struct nft_set *set; + struct nft_ctx ctx; + size_t alloc_size; +- u64 timeout; + char *name; + int err, i; + u16 udlen; ++ u32 flags; + u64 size; + + if (nla[NFTA_SET_TABLE] == NULL || +@@ -4274,10 +4272,10 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + + memset(&desc, 0, sizeof(desc)); + +- ktype = NFT_DATA_VALUE; ++ desc.ktype = NFT_DATA_VALUE; + if (nla[NFTA_SET_KEY_TYPE] != NULL) { +- ktype = ntohl(nla_get_be32(nla[NFTA_SET_KEY_TYPE])); +- if ((ktype & NFT_DATA_RESERVED_MASK) == NFT_DATA_RESERVED_MASK) ++ desc.ktype = ntohl(nla_get_be32(nla[NFTA_SET_KEY_TYPE])); ++ if ((desc.ktype & NFT_DATA_RESERVED_MASK) == NFT_DATA_RESERVED_MASK) + return -EINVAL; + } + +@@ -4302,17 +4300,17 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + return -EOPNOTSUPP; + } + +- dtype = 0; ++ desc.dtype = 0; + if (nla[NFTA_SET_DATA_TYPE] != NULL) { + if (!(flags & NFT_SET_MAP)) + return -EINVAL; + +- dtype = ntohl(nla_get_be32(nla[NFTA_SET_DATA_TYPE])); +- if ((dtype & NFT_DATA_RESERVED_MASK) == NFT_DATA_RESERVED_MASK && +- dtype != NFT_DATA_VERDICT) ++ desc.dtype = ntohl(nla_get_be32(nla[NFTA_SET_DATA_TYPE])); ++ if ((desc.dtype & NFT_DATA_RESERVED_MASK) == NFT_DATA_RESERVED_MASK && ++ desc.dtype != NFT_DATA_VERDICT) + return -EINVAL; + +- if (dtype != NFT_DATA_VERDICT) { ++ if (desc.dtype != NFT_DATA_VERDICT) { + if (nla[NFTA_SET_DATA_LEN] == NULL) + return -EINVAL; + desc.dlen = ntohl(nla_get_be32(nla[NFTA_SET_DATA_LEN])); +@@ -4327,34 +4325,34 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + if (!(flags & NFT_SET_OBJECT)) + return -EINVAL; + +- objtype = ntohl(nla_get_be32(nla[NFTA_SET_OBJ_TYPE])); +- if (objtype == NFT_OBJECT_UNSPEC || +- objtype > NFT_OBJECT_MAX) ++ desc.objtype = ntohl(nla_get_be32(nla[NFTA_SET_OBJ_TYPE])); ++ if (desc.objtype == NFT_OBJECT_UNSPEC || ++ desc.objtype > NFT_OBJECT_MAX) + return -EOPNOTSUPP; + } else if (flags & NFT_SET_OBJECT) + return -EINVAL; + else +- objtype = NFT_OBJECT_UNSPEC; ++ desc.objtype = NFT_OBJECT_UNSPEC; + +- timeout = 0; ++ desc.timeout = 0; + if (nla[NFTA_SET_TIMEOUT] != NULL) { + if (!(flags & NFT_SET_TIMEOUT)) + return -EINVAL; + +- err = nf_msecs_to_jiffies64(nla[NFTA_SET_TIMEOUT], &timeout); ++ err = nf_msecs_to_jiffies64(nla[NFTA_SET_TIMEOUT], &desc.timeout); + if (err) + return err; + } +- gc_int = 0; ++ desc.gc_int = 0; + if (nla[NFTA_SET_GC_INTERVAL] != NULL) { + if (!(flags & NFT_SET_TIMEOUT)) + return -EINVAL; +- gc_int = ntohl(nla_get_be32(nla[NFTA_SET_GC_INTERVAL])); ++ desc.gc_int = ntohl(nla_get_be32(nla[NFTA_SET_GC_INTERVAL])); + } + +- policy = NFT_SET_POL_PERFORMANCE; ++ desc.policy = NFT_SET_POL_PERFORMANCE; + if (nla[NFTA_SET_POLICY] != NULL) +- policy = ntohl(nla_get_be32(nla[NFTA_SET_POLICY])); ++ desc.policy = ntohl(nla_get_be32(nla[NFTA_SET_POLICY])); + + if (nla[NFTA_SET_DESC] != NULL) { + err = nf_tables_set_desc_parse(&desc, nla[NFTA_SET_DESC]); +@@ -4399,7 +4397,7 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + if (!(info->nlh->nlmsg_flags & NLM_F_CREATE)) + return -ENOENT; + +- ops = nft_select_set_ops(&ctx, nla, &desc, policy); ++ ops = nft_select_set_ops(&ctx, nla, &desc); + if (IS_ERR(ops)) + return PTR_ERR(ops); + +@@ -4439,18 +4437,18 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + set->table = table; + write_pnet(&set->net, net); + set->ops = ops; +- set->ktype = ktype; ++ set->ktype = desc.ktype; + set->klen = desc.klen; +- set->dtype = dtype; +- set->objtype = objtype; ++ set->dtype = desc.dtype; ++ set->objtype = desc.objtype; + set->dlen = desc.dlen; + set->flags = flags; + set->size = desc.size; +- set->policy = policy; ++ set->policy = desc.policy; + set->udlen = udlen; + set->udata = udata; +- set->timeout = timeout; +- set->gc_int = gc_int; ++ set->timeout = desc.timeout; ++ set->gc_int = desc.gc_int; + + set->field_count = desc.field_count; + for (i = 0; i < desc.field_count; i++) +-- +2.35.1 + diff --git a/queue-5.15/netfilter-nf_tables-honor-set-timeout-and-garbage-co.patch b/queue-5.15/netfilter-nf_tables-honor-set-timeout-and-garbage-co.patch new file mode 100644 index 00000000000..52e226d45aa --- /dev/null +++ b/queue-5.15/netfilter-nf_tables-honor-set-timeout-and-garbage-co.patch @@ -0,0 +1,209 @@ +From 833005df9947d2461980481836bad581ec72aa9e Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 19 Dec 2022 20:10:12 +0100 +Subject: netfilter: nf_tables: honor set timeout and garbage collection + updates + +From: Pablo Neira Ayuso + +[ Upstream commit 123b99619cca94bdca0bf7bde9abe28f0a0dfe06 ] + +Set timeout and garbage collection interval updates are ignored on +updates. Add transaction to update global set element timeout and +garbage collection interval. + +Fixes: 96518518cc41 ("netfilter: add nftables") +Suggested-by: Florian Westphal +Signed-off-by: Pablo Neira Ayuso +Signed-off-by: Sasha Levin +--- + include/net/netfilter/nf_tables.h | 13 ++++++- + net/netfilter/nf_tables_api.c | 63 ++++++++++++++++++++++--------- + 2 files changed, 57 insertions(+), 19 deletions(-) + +diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h +index 5377dbfba120..80df8ff5e675 100644 +--- a/include/net/netfilter/nf_tables.h ++++ b/include/net/netfilter/nf_tables.h +@@ -562,7 +562,9 @@ void *nft_set_catchall_gc(const struct nft_set *set); + + static inline unsigned long nft_set_gc_interval(const struct nft_set *set) + { +- return set->gc_int ? msecs_to_jiffies(set->gc_int) : HZ; ++ u32 gc_int = READ_ONCE(set->gc_int); ++ ++ return gc_int ? msecs_to_jiffies(gc_int) : HZ; + } + + /** +@@ -1511,6 +1513,9 @@ struct nft_trans_rule { + struct nft_trans_set { + struct nft_set *set; + u32 set_id; ++ u32 gc_int; ++ u64 timeout; ++ bool update; + bool bound; + }; + +@@ -1520,6 +1525,12 @@ struct nft_trans_set { + (((struct nft_trans_set *)trans->data)->set_id) + #define nft_trans_set_bound(trans) \ + (((struct nft_trans_set *)trans->data)->bound) ++#define nft_trans_set_update(trans) \ ++ (((struct nft_trans_set *)trans->data)->update) ++#define nft_trans_set_timeout(trans) \ ++ (((struct nft_trans_set *)trans->data)->timeout) ++#define nft_trans_set_gc_int(trans) \ ++ (((struct nft_trans_set *)trans->data)->gc_int) + + struct nft_trans_chain { + bool update; +diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c +index 82fe54b64714..81bd13b3d8fd 100644 +--- a/net/netfilter/nf_tables_api.c ++++ b/net/netfilter/nf_tables_api.c +@@ -465,8 +465,9 @@ static int nft_delrule_by_chain(struct nft_ctx *ctx) + return 0; + } + +-static int nft_trans_set_add(const struct nft_ctx *ctx, int msg_type, +- struct nft_set *set) ++static int __nft_trans_set_add(const struct nft_ctx *ctx, int msg_type, ++ struct nft_set *set, ++ const struct nft_set_desc *desc) + { + struct nft_trans *trans; + +@@ -474,17 +475,28 @@ static int nft_trans_set_add(const struct nft_ctx *ctx, int msg_type, + if (trans == NULL) + return -ENOMEM; + +- if (msg_type == NFT_MSG_NEWSET && ctx->nla[NFTA_SET_ID] != NULL) { ++ if (msg_type == NFT_MSG_NEWSET && ctx->nla[NFTA_SET_ID] && !desc) { + nft_trans_set_id(trans) = + ntohl(nla_get_be32(ctx->nla[NFTA_SET_ID])); + nft_activate_next(ctx->net, set); + } + nft_trans_set(trans) = set; ++ if (desc) { ++ nft_trans_set_update(trans) = true; ++ nft_trans_set_gc_int(trans) = desc->gc_int; ++ nft_trans_set_timeout(trans) = desc->timeout; ++ } + nft_trans_commit_list_add_tail(ctx->net, trans); + + return 0; + } + ++static int nft_trans_set_add(const struct nft_ctx *ctx, int msg_type, ++ struct nft_set *set) ++{ ++ return __nft_trans_set_add(ctx, msg_type, set, NULL); ++} ++ + static int nft_delset(const struct nft_ctx *ctx, struct nft_set *set) + { + int err; +@@ -3899,8 +3911,10 @@ static int nf_tables_fill_set_concat(struct sk_buff *skb, + static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx, + const struct nft_set *set, u16 event, u16 flags) + { +- struct nlmsghdr *nlh; ++ u64 timeout = READ_ONCE(set->timeout); ++ u32 gc_int = READ_ONCE(set->gc_int); + u32 portid = ctx->portid; ++ struct nlmsghdr *nlh; + struct nlattr *nest; + u32 seq = ctx->seq; + int i; +@@ -3936,13 +3950,13 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx, + nla_put_be32(skb, NFTA_SET_OBJ_TYPE, htonl(set->objtype))) + goto nla_put_failure; + +- if (set->timeout && ++ if (timeout && + nla_put_be64(skb, NFTA_SET_TIMEOUT, +- nf_jiffies64_to_msecs(set->timeout), ++ nf_jiffies64_to_msecs(timeout), + NFTA_SET_PAD)) + goto nla_put_failure; +- if (set->gc_int && +- nla_put_be32(skb, NFTA_SET_GC_INTERVAL, htonl(set->gc_int))) ++ if (gc_int && ++ nla_put_be32(skb, NFTA_SET_GC_INTERVAL, htonl(gc_int))) + goto nla_put_failure; + + if (set->policy != NFT_SET_POL_PERFORMANCE) { +@@ -4487,7 +4501,10 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + for (i = 0; i < num_exprs; i++) + nft_expr_destroy(&ctx, exprs[i]); + +- return err; ++ if (err < 0) ++ return err; ++ ++ return __nft_trans_set_add(&ctx, NFT_MSG_NEWSET, set, &desc); + } + + if (!(info->nlh->nlmsg_flags & NLM_F_CREATE)) +@@ -5877,7 +5894,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, + return err; + } else if (set->flags & NFT_SET_TIMEOUT && + !(flags & NFT_SET_ELEM_INTERVAL_END)) { +- timeout = set->timeout; ++ timeout = READ_ONCE(set->timeout); + } + + expiration = 0; +@@ -5978,7 +5995,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, + if (err < 0) + goto err_parse_key_end; + +- if (timeout != set->timeout) { ++ if (timeout != READ_ONCE(set->timeout)) { + err = nft_set_ext_add(&tmpl, NFT_SET_EXT_TIMEOUT); + if (err < 0) + goto err_parse_key_end; +@@ -8833,14 +8850,20 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) + nft_flow_rule_destroy(nft_trans_flow_rule(trans)); + break; + case NFT_MSG_NEWSET: +- nft_clear(net, nft_trans_set(trans)); +- /* This avoids hitting -EBUSY when deleting the table +- * from the transaction. +- */ +- if (nft_set_is_anonymous(nft_trans_set(trans)) && +- !list_empty(&nft_trans_set(trans)->bindings)) +- trans->ctx.table->use--; ++ if (nft_trans_set_update(trans)) { ++ struct nft_set *set = nft_trans_set(trans); + ++ WRITE_ONCE(set->timeout, nft_trans_set_timeout(trans)); ++ WRITE_ONCE(set->gc_int, nft_trans_set_gc_int(trans)); ++ } else { ++ nft_clear(net, nft_trans_set(trans)); ++ /* This avoids hitting -EBUSY when deleting the table ++ * from the transaction. ++ */ ++ if (nft_set_is_anonymous(nft_trans_set(trans)) && ++ !list_empty(&nft_trans_set(trans)->bindings)) ++ trans->ctx.table->use--; ++ } + nf_tables_set_notify(&trans->ctx, nft_trans_set(trans), + NFT_MSG_NEWSET, GFP_KERNEL); + nft_trans_destroy(trans); +@@ -9062,6 +9085,10 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) + nft_trans_destroy(trans); + break; + case NFT_MSG_NEWSET: ++ if (nft_trans_set_update(trans)) { ++ nft_trans_destroy(trans); ++ break; ++ } + trans->ctx.table->use--; + if (nft_trans_set_bound(trans)) { + nft_trans_destroy(trans); +-- +2.35.1 + diff --git a/queue-5.15/netfilter-nf_tables-perform-type-checking-for-existi.patch b/queue-5.15/netfilter-nf_tables-perform-type-checking-for-existi.patch new file mode 100644 index 00000000000..5068dd12607 --- /dev/null +++ b/queue-5.15/netfilter-nf_tables-perform-type-checking-for-existi.patch @@ -0,0 +1,89 @@ +From 805b7136e19d1fe7fa686df1caa1f4cbb24855a2 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 19 Dec 2022 20:09:00 +0100 +Subject: netfilter: nf_tables: perform type checking for existing sets + +From: Pablo Neira Ayuso + +[ Upstream commit f6594c372afd5cec8b1e9ee9ea8f8819d59c6fb1 ] + +If a ruleset declares a set name that matches an existing set in the +kernel, then validate that this declaration really refers to the same +set, otherwise bail out with EEXIST. + +Currently, the kernel reports success when adding a set that already +exists in the kernel. This usually results in EINVAL errors at a later +stage, when the user adds elements to the set, if the set declaration +mismatches the existing set representation in the kernel. + +Add a new function to check that the set declaration really refers to +the same existing set in the kernel. + +Fixes: 96518518cc41 ("netfilter: add nftables") +Reported-by: Florian Westphal +Signed-off-by: Pablo Neira Ayuso +Signed-off-by: Sasha Levin +--- + net/netfilter/nf_tables_api.c | 36 ++++++++++++++++++++++++++++++++++- + 1 file changed, 35 insertions(+), 1 deletion(-) + +diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c +index f892a926fb58..82fe54b64714 100644 +--- a/net/netfilter/nf_tables_api.c ++++ b/net/netfilter/nf_tables_api.c +@@ -4296,6 +4296,34 @@ static int nft_set_expr_alloc(struct nft_ctx *ctx, struct nft_set *set, + return err; + } + ++static bool nft_set_is_same(const struct nft_set *set, ++ const struct nft_set_desc *desc, ++ struct nft_expr *exprs[], u32 num_exprs, u32 flags) ++{ ++ int i; ++ ++ if (set->ktype != desc->ktype || ++ set->dtype != desc->dtype || ++ set->flags != flags || ++ set->klen != desc->klen || ++ set->dlen != desc->dlen || ++ set->field_count != desc->field_count || ++ set->num_exprs != num_exprs) ++ return false; ++ ++ for (i = 0; i < desc->field_count; i++) { ++ if (set->field_len[i] != desc->field_len[i]) ++ return false; ++ } ++ ++ for (i = 0; i < num_exprs; i++) { ++ if (set->exprs[i]->ops != exprs[i]->ops) ++ return false; ++ } ++ ++ return true; ++} ++ + static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + const struct nlattr * const nla[]) + { +@@ -4450,10 +4478,16 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, + if (err < 0) + return err; + ++ err = 0; ++ if (!nft_set_is_same(set, &desc, exprs, num_exprs, flags)) { ++ NL_SET_BAD_ATTR(extack, nla[NFTA_SET_NAME]); ++ err = -EEXIST; ++ } ++ + for (i = 0; i < num_exprs; i++) + nft_expr_destroy(&ctx, exprs[i]); + +- return 0; ++ return err; + } + + if (!(info->nlh->nlmsg_flags & NLM_F_CREATE)) +-- +2.35.1 + diff --git a/queue-5.15/nfc-fix-potential-resource-leaks.patch b/queue-5.15/nfc-fix-potential-resource-leaks.patch new file mode 100644 index 00000000000..a17def5e747 --- /dev/null +++ b/queue-5.15/nfc-fix-potential-resource-leaks.patch @@ -0,0 +1,127 @@ +From 38a839eaa02703050fffb6d52cb3fe71c14ef38e Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 23 Dec 2022 11:37:18 +0400 +Subject: nfc: Fix potential resource leaks + +From: Miaoqian Lin + +[ Upstream commit df49908f3c52d211aea5e2a14a93bbe67a2cb3af ] + +nfc_get_device() take reference for the device, add missing +nfc_put_device() to release it when not need anymore. +Also fix the style warnning by use error EOPNOTSUPP instead of +ENOTSUPP. + +Fixes: 5ce3f32b5264 ("NFC: netlink: SE API implementation") +Fixes: 29e76924cf08 ("nfc: netlink: Add capability to reply to vendor_cmd with data") +Signed-off-by: Miaoqian Lin +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + net/nfc/netlink.c | 52 ++++++++++++++++++++++++++++++++++------------- + 1 file changed, 38 insertions(+), 14 deletions(-) + +diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c +index a207f0b8137b..d928d5a24bbc 100644 +--- a/net/nfc/netlink.c ++++ b/net/nfc/netlink.c +@@ -1497,6 +1497,7 @@ static int nfc_genl_se_io(struct sk_buff *skb, struct genl_info *info) + u32 dev_idx, se_idx; + u8 *apdu; + size_t apdu_len; ++ int rc; + + if (!info->attrs[NFC_ATTR_DEVICE_INDEX] || + !info->attrs[NFC_ATTR_SE_INDEX] || +@@ -1510,25 +1511,37 @@ static int nfc_genl_se_io(struct sk_buff *skb, struct genl_info *info) + if (!dev) + return -ENODEV; + +- if (!dev->ops || !dev->ops->se_io) +- return -ENOTSUPP; ++ if (!dev->ops || !dev->ops->se_io) { ++ rc = -EOPNOTSUPP; ++ goto put_dev; ++ } + + apdu_len = nla_len(info->attrs[NFC_ATTR_SE_APDU]); +- if (apdu_len == 0) +- return -EINVAL; ++ if (apdu_len == 0) { ++ rc = -EINVAL; ++ goto put_dev; ++ } + + apdu = nla_data(info->attrs[NFC_ATTR_SE_APDU]); +- if (!apdu) +- return -EINVAL; ++ if (!apdu) { ++ rc = -EINVAL; ++ goto put_dev; ++ } + + ctx = kzalloc(sizeof(struct se_io_ctx), GFP_KERNEL); +- if (!ctx) +- return -ENOMEM; ++ if (!ctx) { ++ rc = -ENOMEM; ++ goto put_dev; ++ } + + ctx->dev_idx = dev_idx; + ctx->se_idx = se_idx; + +- return nfc_se_io(dev, se_idx, apdu, apdu_len, se_io_cb, ctx); ++ rc = nfc_se_io(dev, se_idx, apdu, apdu_len, se_io_cb, ctx); ++ ++put_dev: ++ nfc_put_device(dev); ++ return rc; + } + + static int nfc_genl_vendor_cmd(struct sk_buff *skb, +@@ -1551,14 +1564,21 @@ static int nfc_genl_vendor_cmd(struct sk_buff *skb, + subcmd = nla_get_u32(info->attrs[NFC_ATTR_VENDOR_SUBCMD]); + + dev = nfc_get_device(dev_idx); +- if (!dev || !dev->vendor_cmds || !dev->n_vendor_cmds) ++ if (!dev) + return -ENODEV; + ++ if (!dev->vendor_cmds || !dev->n_vendor_cmds) { ++ err = -ENODEV; ++ goto put_dev; ++ } ++ + if (info->attrs[NFC_ATTR_VENDOR_DATA]) { + data = nla_data(info->attrs[NFC_ATTR_VENDOR_DATA]); + data_len = nla_len(info->attrs[NFC_ATTR_VENDOR_DATA]); +- if (data_len == 0) +- return -EINVAL; ++ if (data_len == 0) { ++ err = -EINVAL; ++ goto put_dev; ++ } + } else { + data = NULL; + data_len = 0; +@@ -1573,10 +1593,14 @@ static int nfc_genl_vendor_cmd(struct sk_buff *skb, + dev->cur_cmd_info = info; + err = cmd->doit(dev, data, data_len); + dev->cur_cmd_info = NULL; +- return err; ++ goto put_dev; + } + +- return -EOPNOTSUPP; ++ err = -EOPNOTSUPP; ++ ++put_dev: ++ nfc_put_device(dev); ++ return err; + } + + /* message building helper */ +-- +2.35.1 + diff --git a/queue-5.15/nfsd-shut-down-the-nfsv4-state-objects-before-the-fi.patch b/queue-5.15/nfsd-shut-down-the-nfsv4-state-objects-before-the-fi.patch new file mode 100644 index 00000000000..e9b16b4c2f6 --- /dev/null +++ b/queue-5.15/nfsd-shut-down-the-nfsv4-state-objects-before-the-fi.patch @@ -0,0 +1,42 @@ +From 6d9d3d637d3fdb4042ca70491d0995c102ae8335 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 22 Dec 2022 09:51:30 -0500 +Subject: nfsd: shut down the NFSv4 state objects before the filecache + +From: Jeff Layton + +[ Upstream commit 789e1e10f214c00ca18fc6610824c5b9876ba5f2 ] + +Currently, we shut down the filecache before trying to clean up the +stateids that depend on it. This leads to the kernel trying to free an +nfsd_file twice, and a refcount overput on the nf_mark. + +Change the shutdown procedure to tear down all of the stateids prior +to shutting down the filecache. + +Reported-and-tested-by: Wang Yugui +Signed-off-by: Jeff Layton +Fixes: 5e113224c17e ("nfsd: nfsd_file cache entries should be per net namespace") +Signed-off-by: Chuck Lever +Signed-off-by: Sasha Levin +--- + fs/nfsd/nfssvc.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c +index ccb59e91011b..373695cc62a7 100644 +--- a/fs/nfsd/nfssvc.c ++++ b/fs/nfsd/nfssvc.c +@@ -425,8 +425,8 @@ static void nfsd_shutdown_net(struct net *net) + { + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + +- nfsd_file_cache_shutdown_net(net); + nfs4_state_shutdown_net(net); ++ nfsd_file_cache_shutdown_net(net); + if (nn->lockd_up) { + lockd_down(net); + nn->lockd_up = false; +-- +2.35.1 + diff --git a/queue-5.15/nvme-also-return-i-o-command-effects-from-nvme_comma.patch b/queue-5.15/nvme-also-return-i-o-command-effects-from-nvme_comma.patch new file mode 100644 index 00000000000..115da93c470 --- /dev/null +++ b/queue-5.15/nvme-also-return-i-o-command-effects-from-nvme_comma.patch @@ -0,0 +1,81 @@ +From be6f4436a3bf98a815cd6e12fee62282d00506ee Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 21 Dec 2022 10:12:17 +0100 +Subject: nvme: also return I/O command effects from nvme_command_effects + +From: Christoph Hellwig + +[ Upstream commit 831ed60c2aca2d7c517b2da22897a90224a97d27 ] + +To be able to use the Commands Supported and Effects Log for allowing +unprivileged passtrough, it needs to be corretly reported for I/O +commands as well. Return the I/O command effects from +nvme_command_effects, and also add a default list of effects for the +NVM command set. For other command sets, the Commands Supported and +Effects log is required to be present already. + +Signed-off-by: Christoph Hellwig +Reviewed-by: Keith Busch +Reviewed-by: Kanchan Joshi +Signed-off-by: Sasha Levin +--- + drivers/nvme/host/core.c | 32 ++++++++++++++++++++++++++------ + 1 file changed, 26 insertions(+), 6 deletions(-) + +diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c +index 2d5b5e0fb66a..672f53d5651a 100644 +--- a/drivers/nvme/host/core.c ++++ b/drivers/nvme/host/core.c +@@ -1113,6 +1113,18 @@ static u32 nvme_known_admin_effects(u8 opcode) + return 0; + } + ++static u32 nvme_known_nvm_effects(u8 opcode) ++{ ++ switch (opcode) { ++ case nvme_cmd_write: ++ case nvme_cmd_write_zeroes: ++ case nvme_cmd_write_uncor: ++ return NVME_CMD_EFFECTS_LBCC; ++ default: ++ return 0; ++ } ++} ++ + u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode) + { + u32 effects = 0; +@@ -1120,16 +1132,24 @@ u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode) + if (ns) { + if (ns->head->effects) + effects = le32_to_cpu(ns->head->effects->iocs[opcode]); ++ if (ns->head->ids.csi == NVME_CAP_CSS_NVM) ++ effects |= nvme_known_nvm_effects(opcode); + if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC)) + dev_warn_once(ctrl->device, +- "IO command:%02x has unhandled effects:%08x\n", ++ "IO command:%02x has unusual effects:%08x\n", + opcode, effects); +- return 0; +- } + +- if (ctrl->effects) +- effects = le32_to_cpu(ctrl->effects->acs[opcode]); +- effects |= nvme_known_admin_effects(opcode); ++ /* ++ * NVME_CMD_EFFECTS_CSE_MASK causes a freeze all I/O queues, ++ * which would deadlock when done on an I/O command. Note that ++ * We already warn about an unusual effect above. ++ */ ++ effects &= ~NVME_CMD_EFFECTS_CSE_MASK; ++ } else { ++ if (ctrl->effects) ++ effects = le32_to_cpu(ctrl->effects->acs[opcode]); ++ effects |= nvme_known_admin_effects(opcode); ++ } + + return effects; + } +-- +2.35.1 + diff --git a/queue-5.15/nvme-fix-multipath-crash-caused-by-flush-request-whe.patch b/queue-5.15/nvme-fix-multipath-crash-caused-by-flush-request-whe.patch new file mode 100644 index 00000000000..7147696714d --- /dev/null +++ b/queue-5.15/nvme-fix-multipath-crash-caused-by-flush-request-whe.patch @@ -0,0 +1,81 @@ +From 19cf32187a416461aaf06025dd4a62fe4dd21bc1 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 22 Dec 2022 09:57:21 +0800 +Subject: nvme: fix multipath crash caused by flush request when blktrace is + enabled + +From: Yanjun Zhang + +[ Upstream commit 3659fb5ac29a5e6102bebe494ac789fd47fb78f4 ] + +The flush request initialized by blk_kick_flush has NULL bio, +and it may be dealt with nvme_end_req during io completion. +When blktrace is enabled, nvme_trace_bio_complete with multipath +activated trying to access NULL pointer bio from flush request +results in the following crash: + +[ 2517.831677] BUG: kernel NULL pointer dereference, address: 000000000000001a +[ 2517.835213] #PF: supervisor read access in kernel mode +[ 2517.838724] #PF: error_code(0x0000) - not-present page +[ 2517.842222] PGD 7b2d51067 P4D 0 +[ 2517.845684] Oops: 0000 [#1] SMP NOPTI +[ 2517.849125] CPU: 2 PID: 732 Comm: kworker/2:1H Kdump: loaded Tainted: G S 5.15.67-0.cl9.x86_64 #1 +[ 2517.852723] Hardware name: XFUSION 2288H V6/BC13MBSBC, BIOS 1.13 07/27/2022 +[ 2517.856358] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp] +[ 2517.859993] RIP: 0010:blk_add_trace_bio_complete+0x6/0x30 +[ 2517.863628] Code: 1f 44 00 00 48 8b 46 08 31 c9 ba 04 00 10 00 48 8b 80 50 03 00 00 48 8b 78 50 e9 e5 fe ff ff 0f 1f 44 00 00 41 54 49 89 f4 55 <0f> b6 7a 1a 48 89 d5 e8 3e 1c 2b 00 48 89 ee 4c 89 e7 5d 89 c1 ba +[ 2517.871269] RSP: 0018:ff7f6a008d9dbcd0 EFLAGS: 00010286 +[ 2517.875081] RAX: ff3d5b4be00b1d50 RBX: 0000000002040002 RCX: ff3d5b0a270f2000 +[ 2517.878966] RDX: 0000000000000000 RSI: ff3d5b0b021fb9f8 RDI: 0000000000000000 +[ 2517.882849] RBP: ff3d5b0b96a6fa00 R08: 0000000000000001 R09: 0000000000000000 +[ 2517.886718] R10: 000000000000000c R11: 000000000000000c R12: ff3d5b0b021fb9f8 +[ 2517.890575] R13: 0000000002000000 R14: ff3d5b0b021fb1b0 R15: 0000000000000018 +[ 2517.894434] FS: 0000000000000000(0000) GS:ff3d5b42bfc80000(0000) knlGS:0000000000000000 +[ 2517.898299] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 +[ 2517.902157] CR2: 000000000000001a CR3: 00000004f023e005 CR4: 0000000000771ee0 +[ 2517.906053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 +[ 2517.909930] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 +[ 2517.913761] PKRU: 55555554 +[ 2517.917558] Call Trace: +[ 2517.921294] +[ 2517.924982] nvme_complete_rq+0x1c3/0x1e0 [nvme_core] +[ 2517.928715] nvme_tcp_recv_pdu+0x4d7/0x540 [nvme_tcp] +[ 2517.932442] nvme_tcp_recv_skb+0x4f/0x240 [nvme_tcp] +[ 2517.936137] ? nvme_tcp_recv_pdu+0x540/0x540 [nvme_tcp] +[ 2517.939830] tcp_read_sock+0x9c/0x260 +[ 2517.943486] nvme_tcp_try_recv+0x65/0xa0 [nvme_tcp] +[ 2517.947173] nvme_tcp_io_work+0x64/0x90 [nvme_tcp] +[ 2517.950834] process_one_work+0x1e8/0x390 +[ 2517.954473] worker_thread+0x53/0x3c0 +[ 2517.958069] ? process_one_work+0x390/0x390 +[ 2517.961655] kthread+0x10c/0x130 +[ 2517.965211] ? set_kthread_struct+0x40/0x40 +[ 2517.968760] ret_from_fork+0x1f/0x30 +[ 2517.972285] + +To avoid this situation, add a NULL check for req->bio before +calling trace_block_bio_complete. + +Signed-off-by: Yanjun Zhang +Signed-off-by: Christoph Hellwig +Signed-off-by: Sasha Levin +--- + drivers/nvme/host/nvme.h | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h +index 7f52b2b179b8..39ca48babbe8 100644 +--- a/drivers/nvme/host/nvme.h ++++ b/drivers/nvme/host/nvme.h +@@ -799,7 +799,7 @@ static inline void nvme_trace_bio_complete(struct request *req) + { + struct nvme_ns *ns = req->q->queuedata; + +- if (req->cmd_flags & REQ_NVME_MPATH) ++ if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio) + trace_block_bio_complete(ns->head->disk->queue, req->bio); + } + +-- +2.35.1 + diff --git a/queue-5.15/nvmet-use-nvme_cmd_effects_csupp-instead-of-open-cod.patch b/queue-5.15/nvmet-use-nvme_cmd_effects_csupp-instead-of-open-cod.patch new file mode 100644 index 00000000000..24d06dcd5cc --- /dev/null +++ b/queue-5.15/nvmet-use-nvme_cmd_effects_csupp-instead-of-open-cod.patch @@ -0,0 +1,75 @@ +From 2ef291b19f12de0dba0c01ea342afe0d0239b055 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 12 Dec 2022 15:20:04 +0100 +Subject: nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it + +From: Christoph Hellwig + +[ Upstream commit 61f37154c599cf9f2f84dcbd9be842f8645a7099 ] + +Use NVME_CMD_EFFECTS_CSUPP instead of open coding it and assign a +single value to multiple array entries instead of repeated assignments. + +Signed-off-by: Christoph Hellwig +Reviewed-by: Keith Busch +Reviewed-by: Sagi Grimberg +Reviewed-by: Kanchan Joshi +Reviewed-by: Chaitanya Kulkarni +Signed-off-by: Sasha Levin +--- + drivers/nvme/target/admin-cmd.c | 35 ++++++++++++++++++--------------- + 1 file changed, 19 insertions(+), 16 deletions(-) + +diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c +index 52bb262d267a..bf78c58ed41d 100644 +--- a/drivers/nvme/target/admin-cmd.c ++++ b/drivers/nvme/target/admin-cmd.c +@@ -164,26 +164,29 @@ static void nvmet_execute_get_log_page_smart(struct nvmet_req *req) + + static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log) + { +- log->acs[nvme_admin_get_log_page] = cpu_to_le32(1 << 0); +- log->acs[nvme_admin_identify] = cpu_to_le32(1 << 0); +- log->acs[nvme_admin_abort_cmd] = cpu_to_le32(1 << 0); +- log->acs[nvme_admin_set_features] = cpu_to_le32(1 << 0); +- log->acs[nvme_admin_get_features] = cpu_to_le32(1 << 0); +- log->acs[nvme_admin_async_event] = cpu_to_le32(1 << 0); +- log->acs[nvme_admin_keep_alive] = cpu_to_le32(1 << 0); +- +- log->iocs[nvme_cmd_read] = cpu_to_le32(1 << 0); +- log->iocs[nvme_cmd_write] = cpu_to_le32(1 << 0); +- log->iocs[nvme_cmd_flush] = cpu_to_le32(1 << 0); +- log->iocs[nvme_cmd_dsm] = cpu_to_le32(1 << 0); +- log->iocs[nvme_cmd_write_zeroes] = cpu_to_le32(1 << 0); ++ log->acs[nvme_admin_get_log_page] = ++ log->acs[nvme_admin_identify] = ++ log->acs[nvme_admin_abort_cmd] = ++ log->acs[nvme_admin_set_features] = ++ log->acs[nvme_admin_get_features] = ++ log->acs[nvme_admin_async_event] = ++ log->acs[nvme_admin_keep_alive] = ++ cpu_to_le32(NVME_CMD_EFFECTS_CSUPP); ++ ++ log->iocs[nvme_cmd_read] = ++ log->iocs[nvme_cmd_write] = ++ log->iocs[nvme_cmd_flush] = ++ log->iocs[nvme_cmd_dsm] = ++ log->iocs[nvme_cmd_write_zeroes] = ++ cpu_to_le32(NVME_CMD_EFFECTS_CSUPP); + } + + static void nvmet_get_cmd_effects_zns(struct nvme_effects_log *log) + { +- log->iocs[nvme_cmd_zone_append] = cpu_to_le32(1 << 0); +- log->iocs[nvme_cmd_zone_mgmt_send] = cpu_to_le32(1 << 0); +- log->iocs[nvme_cmd_zone_mgmt_recv] = cpu_to_le32(1 << 0); ++ log->iocs[nvme_cmd_zone_append] = ++ log->iocs[nvme_cmd_zone_mgmt_send] = ++ log->iocs[nvme_cmd_zone_mgmt_recv] = ++ cpu_to_le32(NVME_CMD_EFFECTS_CSUPP); + } + + static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req) +-- +2.35.1 + diff --git a/queue-5.15/octeontx2-pf-fix-lmtst-id-used-in-aura-free.patch b/queue-5.15/octeontx2-pf-fix-lmtst-id-used-in-aura-free.patch new file mode 100644 index 00000000000..ae8c3eaf49a --- /dev/null +++ b/queue-5.15/octeontx2-pf-fix-lmtst-id-used-in-aura-free.patch @@ -0,0 +1,111 @@ +From b0524b6914ccd85758732f881ce18f0cee803ade Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 3 Jan 2023 09:20:12 +0530 +Subject: octeontx2-pf: Fix lmtst ID used in aura free + +From: Geetha sowjanya + +[ Upstream commit 4af1b64f80fbe1275fb02c5f1c0cef099a4a231f ] + +Current code uses per_cpu pointer to get the lmtst_id mapped to +the core on which aura_free() is executed. Using per_cpu pointer +without preemption disable causing mismatch between lmtst_id and +core on which pointer gets freed. This patch fixes the issue by +disabling preemption around aura_free. + +Fixes: ef6c8da71eaf ("octeontx2-pf: cn10K: Reserve LMTST lines per core") +Signed-off-by: Sunil Goutham +Signed-off-by: Geetha sowjanya +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + .../marvell/octeontx2/nic/otx2_common.c | 30 +++++++++++++------ + 1 file changed, 21 insertions(+), 9 deletions(-) + +diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c +index e14624caddc6..f6306eedd59b 100644 +--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c ++++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c +@@ -962,6 +962,7 @@ static void otx2_pool_refill_task(struct work_struct *work) + rbpool = cq->rbpool; + free_ptrs = cq->pool_ptrs; + ++ get_cpu(); + while (cq->pool_ptrs) { + if (otx2_alloc_rbuf(pfvf, rbpool, &bufptr)) { + /* Schedule a WQ if we fails to free atleast half of the +@@ -981,6 +982,7 @@ static void otx2_pool_refill_task(struct work_struct *work) + pfvf->hw_ops->aura_freeptr(pfvf, qidx, bufptr + OTX2_HEAD_ROOM); + cq->pool_ptrs--; + } ++ put_cpu(); + cq->refill_task_sched = false; + } + +@@ -1314,6 +1316,7 @@ int otx2_sq_aura_pool_init(struct otx2_nic *pfvf) + if (err) + goto fail; + ++ get_cpu(); + /* Allocate pointers and free them to aura/pool */ + for (qidx = 0; qidx < hw->tx_queues; qidx++) { + pool_id = otx2_get_pool_idx(pfvf, AURA_NIX_SQ, qidx); +@@ -1322,18 +1325,24 @@ int otx2_sq_aura_pool_init(struct otx2_nic *pfvf) + sq = &qset->sq[qidx]; + sq->sqb_count = 0; + sq->sqb_ptrs = kcalloc(num_sqbs, sizeof(*sq->sqb_ptrs), GFP_KERNEL); +- if (!sq->sqb_ptrs) +- return -ENOMEM; ++ if (!sq->sqb_ptrs) { ++ err = -ENOMEM; ++ goto err_mem; ++ } + + for (ptr = 0; ptr < num_sqbs; ptr++) { +- if (otx2_alloc_rbuf(pfvf, pool, &bufptr)) +- return -ENOMEM; ++ err = otx2_alloc_rbuf(pfvf, pool, &bufptr); ++ if (err) ++ goto err_mem; + pfvf->hw_ops->aura_freeptr(pfvf, pool_id, bufptr); + sq->sqb_ptrs[sq->sqb_count++] = (u64)bufptr; + } + } + +- return 0; ++err_mem: ++ put_cpu(); ++ return err ? -ENOMEM : 0; ++ + fail: + otx2_mbox_reset(&pfvf->mbox.mbox, 0); + otx2_aura_pool_free(pfvf); +@@ -1372,18 +1381,21 @@ int otx2_rq_aura_pool_init(struct otx2_nic *pfvf) + if (err) + goto fail; + ++ get_cpu(); + /* Allocate pointers and free them to aura/pool */ + for (pool_id = 0; pool_id < hw->rqpool_cnt; pool_id++) { + pool = &pfvf->qset.pool[pool_id]; + for (ptr = 0; ptr < num_ptrs; ptr++) { +- if (otx2_alloc_rbuf(pfvf, pool, &bufptr)) +- return -ENOMEM; ++ err = otx2_alloc_rbuf(pfvf, pool, &bufptr); ++ if (err) ++ goto err_mem; + pfvf->hw_ops->aura_freeptr(pfvf, pool_id, + bufptr + OTX2_HEAD_ROOM); + } + } +- +- return 0; ++err_mem: ++ put_cpu(); ++ return err ? -ENOMEM : 0; + fail: + otx2_mbox_reset(&pfvf->mbox.mbox, 0); + otx2_aura_pool_free(pfvf); +-- +2.35.1 + diff --git a/queue-5.15/perf-probe-fix-to-get-the-dw_at_decl_file-and-dw_at_.patch b/queue-5.15/perf-probe-fix-to-get-the-dw_at_decl_file-and-dw_at_.patch new file mode 100644 index 00000000000..3df243c1120 --- /dev/null +++ b/queue-5.15/perf-probe-fix-to-get-the-dw_at_decl_file-and-dw_at_.patch @@ -0,0 +1,92 @@ +From 780fe7f307702d3ebb0579f2a0e6110076663f6c Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Sat, 5 Nov 2022 12:01:14 +0900 +Subject: perf probe: Fix to get the DW_AT_decl_file and DW_AT_call_file as + unsinged data + +From: Masami Hiramatsu (Google) + +[ Upstream commit a9dfc46c67b52ad43b8e335e28f4cf8002c67793 ] + +DWARF version 5 standard Sec 2.14 says that + + Any debugging information entry representing the declaration of an object, + module, subprogram or type may have DW_AT_decl_file, DW_AT_decl_line and + DW_AT_decl_column attributes, each of whose value is an unsigned integer + constant. + +So it should be an unsigned integer data. Also, even though the standard +doesn't clearly say the DW_AT_call_file is signed or unsigned, the +elfutils (eu-readelf) interprets it as unsigned integer data and it is +natural to handle it as unsigned integer data as same as DW_AT_decl_file. +This changes the DW_AT_call_file as unsigned integer data too. + +Fixes: 3f4460a28fb2f73d ("perf probe: Filter out redundant inline-instances") +Signed-off-by: Masami Hiramatsu +Acked-by: Namhyung Kim +Cc: Alexander Shishkin +Cc: Ingo Molnar +Cc: Jiri Olsa +Cc: Mark Rutland +Cc: Masami Hiramatsu +Cc: Peter Zijlstra +Cc: stable@vger.kernel.org +Cc: Steven Rostedt (VMware) +Link: https://lore.kernel.org/r/166761727445.480106.3738447577082071942.stgit@devnote3 +Signed-off-by: Arnaldo Carvalho de Melo +Signed-off-by: Sasha Levin +--- + tools/perf/util/dwarf-aux.c | 21 ++++----------------- + 1 file changed, 4 insertions(+), 17 deletions(-) + +diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c +index a07efbadb775..623527edeac1 100644 +--- a/tools/perf/util/dwarf-aux.c ++++ b/tools/perf/util/dwarf-aux.c +@@ -315,19 +315,6 @@ static int die_get_attr_udata(Dwarf_Die *tp_die, unsigned int attr_name, + return 0; + } + +-/* Get attribute and translate it as a sdata */ +-static int die_get_attr_sdata(Dwarf_Die *tp_die, unsigned int attr_name, +- Dwarf_Sword *result) +-{ +- Dwarf_Attribute attr; +- +- if (dwarf_attr_integrate(tp_die, attr_name, &attr) == NULL || +- dwarf_formsdata(&attr, result) != 0) +- return -ENOENT; +- +- return 0; +-} +- + /** + * die_is_signed_type - Check whether a type DIE is signed or not + * @tp_die: a DIE of a type +@@ -467,9 +454,9 @@ int die_get_data_member_location(Dwarf_Die *mb_die, Dwarf_Word *offs) + /* Get the call file index number in CU DIE */ + static int die_get_call_fileno(Dwarf_Die *in_die) + { +- Dwarf_Sword idx; ++ Dwarf_Word idx; + +- if (die_get_attr_sdata(in_die, DW_AT_call_file, &idx) == 0) ++ if (die_get_attr_udata(in_die, DW_AT_call_file, &idx) == 0) + return (int)idx; + else + return -ENOENT; +@@ -478,9 +465,9 @@ static int die_get_call_fileno(Dwarf_Die *in_die) + /* Get the declared file index number in CU DIE */ + static int die_get_decl_fileno(Dwarf_Die *pdie) + { +- Dwarf_Sword idx; ++ Dwarf_Word idx; + +- if (die_get_attr_sdata(pdie, DW_AT_decl_file, &idx) == 0) ++ if (die_get_attr_udata(pdie, DW_AT_decl_file, &idx) == 0) + return (int)idx; + else + return -ENOENT; +-- +2.35.1 + diff --git a/queue-5.15/perf-probe-use-dwarf_attr_integrate-as-generic-dwarf.patch b/queue-5.15/perf-probe-use-dwarf_attr_integrate-as-generic-dwarf.patch new file mode 100644 index 00000000000..9eb1e9b9a81 --- /dev/null +++ b/queue-5.15/perf-probe-use-dwarf_attr_integrate-as-generic-dwarf.patch @@ -0,0 +1,54 @@ +From 41027a876e38be15171d75518db76b231b06e074 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 1 Nov 2022 22:48:39 +0900 +Subject: perf probe: Use dwarf_attr_integrate as generic DWARF attr accessor + +From: Masami Hiramatsu (Google) + +[ Upstream commit f828929ab7f0dc3353e4a617f94f297fa8f3dec3 ] + +Use dwarf_attr_integrate() instead of dwarf_attr() for generic attribute +acccessor functions, so that it can find the specified attribute from +abstact origin DIE etc. + +Signed-off-by: Masami Hiramatsu +Acked-by: Namhyung Kim +Cc: Alexander Shishkin +Cc: Ingo Molnar +Cc: Jiri Olsa +Cc: Mark Rutland +Cc: Peter Zijlstra +Cc: Steven Rostedt (VMware) +Link: https://lore.kernel.org/r/166731051988.2100653.13595339994343449770.stgit@devnote3 +Signed-off-by: Arnaldo Carvalho de Melo +Stable-dep-of: a9dfc46c67b5 ("perf probe: Fix to get the DW_AT_decl_file and DW_AT_call_file as unsinged data") +Signed-off-by: Sasha Levin +--- + tools/perf/util/dwarf-aux.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c +index 609ca1671501..a07efbadb775 100644 +--- a/tools/perf/util/dwarf-aux.c ++++ b/tools/perf/util/dwarf-aux.c +@@ -308,7 +308,7 @@ static int die_get_attr_udata(Dwarf_Die *tp_die, unsigned int attr_name, + { + Dwarf_Attribute attr; + +- if (dwarf_attr(tp_die, attr_name, &attr) == NULL || ++ if (dwarf_attr_integrate(tp_die, attr_name, &attr) == NULL || + dwarf_formudata(&attr, result) != 0) + return -ENOENT; + +@@ -321,7 +321,7 @@ static int die_get_attr_sdata(Dwarf_Die *tp_die, unsigned int attr_name, + { + Dwarf_Attribute attr; + +- if (dwarf_attr(tp_die, attr_name, &attr) == NULL || ++ if (dwarf_attr_integrate(tp_die, attr_name, &attr) == NULL || + dwarf_formsdata(&attr, result) != 0) + return -ENOENT; + +-- +2.35.1 + diff --git a/queue-5.15/perf-stat-fix-handling-of-for-each-cgroup-with-bpf-c.patch b/queue-5.15/perf-stat-fix-handling-of-for-each-cgroup-with-bpf-c.patch new file mode 100644 index 00000000000..35fb12ac822 --- /dev/null +++ b/queue-5.15/perf-stat-fix-handling-of-for-each-cgroup-with-bpf-c.patch @@ -0,0 +1,146 @@ +From 6b9bb3f100db7ed53486c9bd8e7fe7d867a423f6 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 3 Jan 2023 22:44:02 -0800 +Subject: perf stat: Fix handling of --for-each-cgroup with --bpf-counters to + match non BPF mode + +From: Namhyung Kim + +[ Upstream commit 54b353a20c7e8be98414754f5aff98c8a68fcc1f ] + +The --for-each-cgroup can have the same cgroup multiple times, but this +confuses BPF counters (since they have the same cgroup id), making only +the last cgroup events to be counted. + +Let's check the cgroup name before adding a new entry to the cgroups +list. + +Before: + + $ sudo ./perf stat -a --bpf-counters --for-each-cgroup /,/ sleep 1 + + Performance counter stats for 'system wide': + + msec cpu-clock / + context-switches / + cpu-migrations / + page-faults / + cycles / + instructions / + branches / + branch-misses / + 8,016.04 msec cpu-clock / # 7.998 CPUs utilized + 6,152 context-switches / # 767.461 /sec + 250 cpu-migrations / # 31.187 /sec + 442 page-faults / # 55.139 /sec + 613,111,487 cycles / # 0.076 GHz + 280,599,604 instructions / # 0.46 insn per cycle + 57,692,724 branches / # 7.197 M/sec + 3,385,168 branch-misses / # 5.87% of all branches + + 1.002220125 seconds time elapsed + +After it becomes similar to the non-BPF mode: + + $ sudo ./perf stat -a --bpf-counters --for-each-cgroup /,/ sleep 1 + + Performance counter stats for 'system wide': + + 8,013.38 msec cpu-clock / # 7.998 CPUs utilized + 6,859 context-switches / # 855.944 /sec + 334 cpu-migrations / # 41.680 /sec + 345 page-faults / # 43.053 /sec + 782,326,119 cycles / # 0.098 GHz + 471,645,724 instructions / # 0.60 insn per cycle + 94,963,430 branches / # 11.851 M/sec + 3,685,511 branch-misses / # 3.88% of all branches + + 1.001864539 seconds time elapsed + +Committer notes: + +As a reminder, to test with BPF counters one has to use BUILD_BPF_SKEL=1 +in the make command line and have clang/llvm installed when building +perf, otherwise the --bpf-counters option will not be available: + + # perf stat -a --bpf-counters --for-each-cgroup /,/ sleep 1 + Error: unknown option `bpf-counters' + + Usage: perf stat [] [] + + -a, --all-cpus system-wide collection from all CPUs + + # + +Fixes: bb1c15b60b981d10 ("perf stat: Support regex pattern in --for-each-cgroup") +Signed-off-by: Namhyung Kim +Tested-by: Arnaldo Carvalho de Melo +Cc: Adrian Hunter +Cc: bpf@vger.kernel.org +Cc: Ian Rogers +Cc: Ingo Molnar +Cc: Jiri Olsa +Cc: Namhyung Kim +Cc: Peter Zijlstra +Cc: Song Liu +Link: https://lore.kernel.org/r/20230104064402.1551516-5-namhyung@kernel.org +Signed-off-by: Arnaldo Carvalho de Melo +Signed-off-by: Sasha Levin +--- + tools/perf/util/cgroup.c | 23 ++++++++++++++++++----- + 1 file changed, 18 insertions(+), 5 deletions(-) + +diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c +index e99b41f9be45..cd978c240e0d 100644 +--- a/tools/perf/util/cgroup.c ++++ b/tools/perf/util/cgroup.c +@@ -224,6 +224,19 @@ static int add_cgroup_name(const char *fpath, const struct stat *sb __maybe_unus + return 0; + } + ++static int check_and_add_cgroup_name(const char *fpath) ++{ ++ struct cgroup_name *cn; ++ ++ list_for_each_entry(cn, &cgroup_list, list) { ++ if (!strcmp(cn->name, fpath)) ++ return 0; ++ } ++ ++ /* pretend if it's added by ftw() */ ++ return add_cgroup_name(fpath, NULL, FTW_D, NULL); ++} ++ + static void release_cgroup_list(void) + { + struct cgroup_name *cn; +@@ -242,7 +255,7 @@ static int list_cgroups(const char *str) + struct cgroup_name *cn; + char *s; + +- /* use given name as is - for testing purpose */ ++ /* use given name as is when no regex is given */ + for (;;) { + p = strchr(str, ','); + e = p ? p : eos; +@@ -253,13 +266,13 @@ static int list_cgroups(const char *str) + s = strndup(str, e - str); + if (!s) + return -1; +- /* pretend if it's added by ftw() */ +- ret = add_cgroup_name(s, NULL, FTW_D, NULL); ++ ++ ret = check_and_add_cgroup_name(s); + free(s); +- if (ret) ++ if (ret < 0) + return -1; + } else { +- if (add_cgroup_name("", NULL, FTW_D, NULL) < 0) ++ if (check_and_add_cgroup_name("/") < 0) + return -1; + } + +-- +2.35.1 + diff --git a/queue-5.15/perf-tools-fix-resources-leak-in-perf_data__open_dir.patch b/queue-5.15/perf-tools-fix-resources-leak-in-perf_data__open_dir.patch new file mode 100644 index 00000000000..93bc3e57042 --- /dev/null +++ b/queue-5.15/perf-tools-fix-resources-leak-in-perf_data__open_dir.patch @@ -0,0 +1,52 @@ +From f1be56b1ba40ac9fdae4b4f5c4d91d23dbb96538 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 29 Dec 2022 13:09:00 +0400 +Subject: perf tools: Fix resources leak in perf_data__open_dir() + +From: Miaoqian Lin + +[ Upstream commit 0a6564ebd953c4590663c9a3c99a3ea9920ade6f ] + +In perf_data__open_dir(), opendir() opens the directory stream. Add +missing closedir() to release it after use. + +Fixes: eb6176709b235b96 ("perf data: Add perf_data__open_dir_data function") +Reviewed-by: Adrian Hunter +Signed-off-by: Miaoqian Lin +Cc: Alexander Shishkin +Cc: Alexey Bayduraev +Cc: Ingo Molnar +Cc: Jiri Olsa +Cc: Mark Rutland +Cc: Namhyung Kim +Cc: Peter Zijlstra +Link: https://lore.kernel.org/r/20221229090903.1402395-1-linmq006@gmail.com +Signed-off-by: Arnaldo Carvalho de Melo +Signed-off-by: Sasha Levin +--- + tools/perf/util/data.c | 2 ++ + 1 file changed, 2 insertions(+) + +diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c +index 15a4547d608e..090a76be522b 100644 +--- a/tools/perf/util/data.c ++++ b/tools/perf/util/data.c +@@ -127,6 +127,7 @@ int perf_data__open_dir(struct perf_data *data) + file->size = st.st_size; + } + ++ closedir(dir); + if (!files) + return -EINVAL; + +@@ -135,6 +136,7 @@ int perf_data__open_dir(struct perf_data *data) + return 0; + + out_err: ++ closedir(dir); + close_dir(files, nr); + return ret; + } +-- +2.35.1 + diff --git a/queue-5.15/qlcnic-prevent-dcb-use-after-free-on-qlcnic_dcb_enab.patch b/queue-5.15/qlcnic-prevent-dcb-use-after-free-on-qlcnic_dcb_enab.patch new file mode 100644 index 00000000000..82be7557165 --- /dev/null +++ b/queue-5.15/qlcnic-prevent-dcb-use-after-free-on-qlcnic_dcb_enab.patch @@ -0,0 +1,103 @@ +From 4c64ad03c3aa84ffe522507f3dd416f4bc7c607c Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 22 Dec 2022 14:52:28 +0300 +Subject: qlcnic: prevent ->dcb use-after-free on qlcnic_dcb_enable() failure + +From: Daniil Tatianin + +[ Upstream commit 13a7c8964afcd8ca43c0b6001ebb0127baa95362 ] + +adapter->dcb would get silently freed inside qlcnic_dcb_enable() in +case qlcnic_dcb_attach() would return an error, which always happens +under OOM conditions. This would lead to use-after-free because both +of the existing callers invoke qlcnic_dcb_get_info() on the obtained +pointer, which is potentially freed at that point. + +Propagate errors from qlcnic_dcb_enable(), and instead free the dcb +pointer at callsite using qlcnic_dcb_free(). This also removes the now +unused qlcnic_clear_dcb_ops() helper, which was a simple wrapper around +kfree() also causing memory leaks for partially initialized dcb. + +Found by Linux Verification Center (linuxtesting.org) with the SVACE +static analysis tool. + +Fixes: 3c44bba1d270 ("qlcnic: Disable DCB operations from SR-IOV VFs") +Reviewed-by: Michal Swiatkowski +Signed-off-by: Daniil Tatianin +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c | 8 +++++++- + drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h | 10 ++-------- + drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 8 +++++++- + 3 files changed, 16 insertions(+), 10 deletions(-) + +diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c +index 27dffa299ca6..7c3cf9ad4563 100644 +--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c ++++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c +@@ -2505,7 +2505,13 @@ int qlcnic_83xx_init(struct qlcnic_adapter *adapter, int pci_using_dac) + goto disable_mbx_intr; + + qlcnic_83xx_clear_function_resources(adapter); +- qlcnic_dcb_enable(adapter->dcb); ++ ++ err = qlcnic_dcb_enable(adapter->dcb); ++ if (err) { ++ qlcnic_dcb_free(adapter->dcb); ++ goto disable_mbx_intr; ++ } ++ + qlcnic_83xx_initialize_nic(adapter, 1); + qlcnic_dcb_get_info(adapter->dcb); + +diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h +index 7519773eaca6..22afa2be85fd 100644 +--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h ++++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h +@@ -41,11 +41,6 @@ struct qlcnic_dcb { + unsigned long state; + }; + +-static inline void qlcnic_clear_dcb_ops(struct qlcnic_dcb *dcb) +-{ +- kfree(dcb); +-} +- + static inline int qlcnic_dcb_get_hw_capability(struct qlcnic_dcb *dcb) + { + if (dcb && dcb->ops->get_hw_capability) +@@ -112,9 +107,8 @@ static inline void qlcnic_dcb_init_dcbnl_ops(struct qlcnic_dcb *dcb) + dcb->ops->init_dcbnl_ops(dcb); + } + +-static inline void qlcnic_dcb_enable(struct qlcnic_dcb *dcb) ++static inline int qlcnic_dcb_enable(struct qlcnic_dcb *dcb) + { +- if (dcb && qlcnic_dcb_attach(dcb)) +- qlcnic_clear_dcb_ops(dcb); ++ return dcb ? qlcnic_dcb_attach(dcb) : 0; + } + #endif +diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c +index 75960a29f80e..cec07d5bbe67 100644 +--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c ++++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c +@@ -2616,7 +2616,13 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent) + "Device does not support MSI interrupts\n"); + + if (qlcnic_82xx_check(adapter)) { +- qlcnic_dcb_enable(adapter->dcb); ++ err = qlcnic_dcb_enable(adapter->dcb); ++ if (err) { ++ qlcnic_dcb_free(adapter->dcb); ++ dev_err(&pdev->dev, "Failed to enable DCB\n"); ++ goto err_out_free_hw; ++ } ++ + qlcnic_dcb_get_info(adapter->dcb); + err = qlcnic_setup_intr(adapter); + +-- +2.35.1 + diff --git a/queue-5.15/ravb-fix-failed-to-switch-device-to-config-mode-mess.patch b/queue-5.15/ravb-fix-failed-to-switch-device-to-config-mode-mess.patch new file mode 100644 index 00000000000..bacf3c36100 --- /dev/null +++ b/queue-5.15/ravb-fix-failed-to-switch-device-to-config-mode-mess.patch @@ -0,0 +1,68 @@ +From 4ac99028b99e64efe7431f8d0eb5dd446fc8081f Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 14 Dec 2022 10:51:18 +0000 +Subject: ravb: Fix "failed to switch device to config mode" message during + unbind + +From: Biju Das + +[ Upstream commit c72a7e42592b2e18d862cf120876070947000d7a ] + +This patch fixes the error "ravb 11c20000.ethernet eth0: failed to switch +device to config mode" during unbind. + +We are doing register access after pm_runtime_put_sync(). + +We usually do cleanup in reverse order of init. Currently in +remove(), the "pm_runtime_put_sync" is not in reverse order. + +Probe + reset_control_deassert(rstc); + pm_runtime_enable(&pdev->dev); + pm_runtime_get_sync(&pdev->dev); + +remove + pm_runtime_put_sync(&pdev->dev); + unregister_netdev(ndev); + .. + ravb_mdio_release(priv); + pm_runtime_disable(&pdev->dev); + +Consider the call to unregister_netdev() +unregister_netdev->unregister_netdevice_queue->rollback_registered_many +that calls the below functions which access the registers after +pm_runtime_put_sync() + 1) ravb_get_stats + 2) ravb_close + +Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") +Cc: stable@vger.kernel.org +Signed-off-by: Biju Das +Reviewed-by: Leon Romanovsky +Link: https://lore.kernel.org/r/20221214105118.2495313-1-biju.das.jz@bp.renesas.com +Signed-off-by: Paolo Abeni +Signed-off-by: Sasha Levin +--- + drivers/net/ethernet/renesas/ravb_main.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c +index 77a19336abec..c89bcdd15f16 100644 +--- a/drivers/net/ethernet/renesas/ravb_main.c ++++ b/drivers/net/ethernet/renesas/ravb_main.c +@@ -2378,11 +2378,11 @@ static int ravb_remove(struct platform_device *pdev) + priv->desc_bat_dma); + /* Set reset mode */ + ravb_write(ndev, CCC_OPC_RESET, CCC); +- pm_runtime_put_sync(&pdev->dev); + unregister_netdev(ndev); + netif_napi_del(&priv->napi[RAVB_NC]); + netif_napi_del(&priv->napi[RAVB_BE]); + ravb_mdio_release(priv); ++ pm_runtime_put_sync(&pdev->dev); + pm_runtime_disable(&pdev->dev); + reset_control_assert(priv->rstc); + free_netdev(ndev); +-- +2.35.1 + diff --git a/queue-5.15/rdma-mlx5-fix-mlx5_ib_get_hw_stats-when-used-for-dev.patch b/queue-5.15/rdma-mlx5-fix-mlx5_ib_get_hw_stats-when-used-for-dev.patch new file mode 100644 index 00000000000..9c01d0d2c6b --- /dev/null +++ b/queue-5.15/rdma-mlx5-fix-mlx5_ib_get_hw_stats-when-used-for-dev.patch @@ -0,0 +1,112 @@ +From 04cbe8c6d9c9cb69aaedda49a8a8cc8869452154 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 28 Dec 2022 14:56:09 +0200 +Subject: RDMA/mlx5: Fix mlx5_ib_get_hw_stats when used for device + +From: Shay Drory + +[ Upstream commit 38b50aa44495d5eb4218f0b82fc2da76505cec53 ] + +Currently, when mlx5_ib_get_hw_stats() is used for device (port_num = 0), +there is a special handling in order to use the correct counters, but, +port_num is being passed down the stack without any change. Also, some +functions assume that port_num >=1. As a result, the following oops can +occur. + + BUG: unable to handle page fault for address: ffff89510294f1a8 + #PF: supervisor write access in kernel mode + #PF: error_code(0x0002) - not-present page + PGD 0 P4D 0 + Oops: 0002 [#1] SMP + CPU: 8 PID: 1382 Comm: devlink Tainted: G W 6.1.0-rc4_for_upstream_base_2022_11_10_16_12 #1 + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 + RIP: 0010:_raw_spin_lock+0xc/0x20 + Call Trace: + + mlx5_ib_get_native_port_mdev+0x73/0xe0 [mlx5_ib] + do_get_hw_stats.constprop.0+0x109/0x160 [mlx5_ib] + mlx5_ib_get_hw_stats+0xad/0x180 [mlx5_ib] + ib_setup_device_attrs+0xf0/0x290 [ib_core] + ib_register_device+0x3bb/0x510 [ib_core] + ? atomic_notifier_chain_register+0x67/0x80 + __mlx5_ib_add+0x2b/0x80 [mlx5_ib] + mlx5r_probe+0xb8/0x150 [mlx5_ib] + ? auxiliary_match_id+0x6a/0x90 + auxiliary_bus_probe+0x3c/0x70 + ? driver_sysfs_add+0x6b/0x90 + really_probe+0xcd/0x380 + __driver_probe_device+0x80/0x170 + driver_probe_device+0x1e/0x90 + __device_attach_driver+0x7d/0x100 + ? driver_allows_async_probing+0x60/0x60 + ? driver_allows_async_probing+0x60/0x60 + bus_for_each_drv+0x7b/0xc0 + __device_attach+0xbc/0x200 + bus_probe_device+0x87/0xa0 + device_add+0x404/0x940 + ? dev_set_name+0x53/0x70 + __auxiliary_device_add+0x43/0x60 + add_adev+0x99/0xe0 [mlx5_core] + mlx5_attach_device+0xc8/0x120 [mlx5_core] + mlx5_load_one_devl_locked+0xb2/0xe0 [mlx5_core] + devlink_reload+0x133/0x250 + devlink_nl_cmd_reload+0x480/0x570 + ? devlink_nl_pre_doit+0x44/0x2b0 + genl_family_rcv_msg_doit.isra.0+0xc2/0x110 + genl_rcv_msg+0x180/0x2b0 + ? devlink_nl_cmd_region_read_dumpit+0x540/0x540 + ? devlink_reload+0x250/0x250 + ? devlink_put+0x50/0x50 + ? genl_family_rcv_msg_doit.isra.0+0x110/0x110 + netlink_rcv_skb+0x54/0x100 + genl_rcv+0x24/0x40 + netlink_unicast+0x1f6/0x2c0 + netlink_sendmsg+0x237/0x490 + sock_sendmsg+0x33/0x40 + __sys_sendto+0x103/0x160 + ? handle_mm_fault+0x10e/0x290 + ? do_user_addr_fault+0x1c0/0x5f0 + __x64_sys_sendto+0x25/0x30 + do_syscall_64+0x3d/0x90 + entry_SYSCALL_64_after_hwframe+0x46/0xb0 + +Fix it by setting port_num to 1 in order to get device status and remove +unused variable. + +Fixes: aac4492ef23a ("IB/mlx5: Update counter implementation for dual port RoCE") +Link: https://lore.kernel.org/r/98b82994c3cd3fa593b8a75ed3f3901e208beb0f.1672231736.git.leonro@nvidia.com +Signed-off-by: Shay Drory +Reviewed-by: Patrisious Haddad +Signed-off-by: Leon Romanovsky +Signed-off-by: Sasha Levin +--- + drivers/infiniband/hw/mlx5/counters.c | 6 +++--- + 1 file changed, 3 insertions(+), 3 deletions(-) + +diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c +index 224ba36f2946..1a0ecf439c09 100644 +--- a/drivers/infiniband/hw/mlx5/counters.c ++++ b/drivers/infiniband/hw/mlx5/counters.c +@@ -249,7 +249,6 @@ static int mlx5_ib_get_hw_stats(struct ib_device *ibdev, + const struct mlx5_ib_counters *cnts = get_counters(dev, port_num - 1); + struct mlx5_core_dev *mdev; + int ret, num_counters; +- u32 mdev_port_num; + + if (!stats) + return -EINVAL; +@@ -270,8 +269,9 @@ static int mlx5_ib_get_hw_stats(struct ib_device *ibdev, + } + + if (MLX5_CAP_GEN(dev->mdev, cc_query_allowed)) { +- mdev = mlx5_ib_get_native_port_mdev(dev, port_num, +- &mdev_port_num); ++ if (!port_num) ++ port_num = 1; ++ mdev = mlx5_ib_get_native_port_mdev(dev, port_num, NULL); + if (!mdev) { + /* If port is not affiliated yet, its in down state + * which doesn't have any counters yet, so it would be +-- +2.35.1 + diff --git a/queue-5.15/rdma-mlx5-fix-validation-of-max_rd_atomic-caps-for-d.patch b/queue-5.15/rdma-mlx5-fix-validation-of-max_rd_atomic-caps-for-d.patch new file mode 100644 index 00000000000..a0155db748f --- /dev/null +++ b/queue-5.15/rdma-mlx5-fix-validation-of-max_rd_atomic-caps-for-d.patch @@ -0,0 +1,95 @@ +From 3c95e94eb9f48431d8040f207902f074ddce693e Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 28 Dec 2022 14:56:10 +0200 +Subject: RDMA/mlx5: Fix validation of max_rd_atomic caps for DC + +From: Maor Gottlieb + +[ Upstream commit 8de8482fe5732fbef4f5af82bc0c0362c804cd1f ] + +Currently, when modifying DC, we validate max_rd_atomic user attribute +against the RC cap, validate against DC. RC and DC QP types have different +device limitations. + +This can cause userspace created DC QPs to malfunction. + +Fixes: c32a4f296e1d ("IB/mlx5: Add support for DC Initiator QP") +Link: https://lore.kernel.org/r/0c5aee72cea188c3bb770f4207cce7abc9b6fc74.1672231736.git.leonro@nvidia.com +Signed-off-by: Maor Gottlieb +Signed-off-by: Leon Romanovsky +Signed-off-by: Sasha Levin +--- + drivers/infiniband/hw/mlx5/qp.c | 49 +++++++++++++++++++++++---------- + 1 file changed, 35 insertions(+), 14 deletions(-) + +diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c +index e5abbcfc1d57..55b05a3e31b8 100644 +--- a/drivers/infiniband/hw/mlx5/qp.c ++++ b/drivers/infiniband/hw/mlx5/qp.c +@@ -4499,6 +4499,40 @@ static bool mlx5_ib_modify_qp_allowed(struct mlx5_ib_dev *dev, + return false; + } + ++static int validate_rd_atomic(struct mlx5_ib_dev *dev, struct ib_qp_attr *attr, ++ int attr_mask, enum ib_qp_type qp_type) ++{ ++ int log_max_ra_res; ++ int log_max_ra_req; ++ ++ if (qp_type == MLX5_IB_QPT_DCI) { ++ log_max_ra_res = 1 << MLX5_CAP_GEN(dev->mdev, ++ log_max_ra_res_dc); ++ log_max_ra_req = 1 << MLX5_CAP_GEN(dev->mdev, ++ log_max_ra_req_dc); ++ } else { ++ log_max_ra_res = 1 << MLX5_CAP_GEN(dev->mdev, ++ log_max_ra_res_qp); ++ log_max_ra_req = 1 << MLX5_CAP_GEN(dev->mdev, ++ log_max_ra_req_qp); ++ } ++ ++ if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC && ++ attr->max_rd_atomic > log_max_ra_res) { ++ mlx5_ib_dbg(dev, "invalid max_rd_atomic value %d\n", ++ attr->max_rd_atomic); ++ return false; ++ } ++ ++ if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC && ++ attr->max_dest_rd_atomic > log_max_ra_req) { ++ mlx5_ib_dbg(dev, "invalid max_dest_rd_atomic value %d\n", ++ attr->max_dest_rd_atomic); ++ return false; ++ } ++ return true; ++} ++ + int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, + int attr_mask, struct ib_udata *udata) + { +@@ -4586,21 +4620,8 @@ int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, + goto out; + } + +- if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC && +- attr->max_rd_atomic > +- (1 << MLX5_CAP_GEN(dev->mdev, log_max_ra_res_qp))) { +- mlx5_ib_dbg(dev, "invalid max_rd_atomic value %d\n", +- attr->max_rd_atomic); +- goto out; +- } +- +- if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC && +- attr->max_dest_rd_atomic > +- (1 << MLX5_CAP_GEN(dev->mdev, log_max_ra_req_qp))) { +- mlx5_ib_dbg(dev, "invalid max_dest_rd_atomic value %d\n", +- attr->max_dest_rd_atomic); ++ if (!validate_rd_atomic(dev, attr, attr_mask, qp_type)) + goto out; +- } + + if (cur_state == new_state && cur_state == IB_QPS_RESET) { + err = 0; +-- +2.35.1 + diff --git a/queue-5.15/series b/queue-5.15/series index 4cb645cb6b5..e130629312b 100644 --- a/queue-5.15/series +++ b/queue-5.15/series @@ -184,3 +184,81 @@ x86-mce-amd-clear-dfr-errors-found-in-thr-handler.patch media-s5p-mfc-fix-to-handle-reference-queue-during-f.patch media-s5p-mfc-clear-workbit-to-handle-error-conditio.patch media-s5p-mfc-fix-in-register-read-and-write-for-h26.patch +perf-probe-use-dwarf_attr_integrate-as-generic-dwarf.patch +perf-probe-fix-to-get-the-dw_at_decl_file-and-dw_at_.patch +ravb-fix-failed-to-switch-device-to-config-mode-mess.patch +ext4-goto-right-label-failed_mount3a.patch +ext4-correct-inconsistent-error-msg-in-nojournal-mod.patch +mbcache-automatically-delete-entries-from-cache-on-f.patch +ext4-fix-deadlock-due-to-mbcache-entry-corruption.patch +drm-i915-migrate-don-t-check-the-scratch-page.patch +drm-i915-migrate-fix-offset-calculation.patch +drm-i915-migrate-fix-length-calculation.patch +sunrpc-ensure-the-matching-upcall-is-in-flight-upon-.patch +btrfs-fix-an-error-handling-path-in-btrfs_defrag_lea.patch +bpf-pull-before-calling-skb_postpull_rcsum.patch +drm-panfrost-fix-gem-handle-creation-ref-counting.patch +netfilter-nf_tables-consolidate-set-description.patch +netfilter-nf_tables-add-function-to-create-set-state.patch +netfilter-nf_tables-perform-type-checking-for-existi.patch +vmxnet3-correctly-report-csum_level-for-encapsulated.patch +netfilter-nf_tables-honor-set-timeout-and-garbage-co.patch +veth-fix-race-with-af_xdp-exposing-old-or-uninitiali.patch +nfsd-shut-down-the-nfsv4-state-objects-before-the-fi.patch +net-hns3-add-interrupts-re-initialization-while-doin.patch +net-hns3-refactor-hns3_nic_reuse_page.patch +net-hns3-extract-macro-to-simplify-ring-stats-update.patch +net-hns3-fix-miss-l3e-checking-for-rx-packet.patch +net-hns3-fix-vf-promisc-mode-not-update-when-mac-tab.patch +net-sched-fix-memory-leak-in-tcindex_set_parms.patch +qlcnic-prevent-dcb-use-after-free-on-qlcnic_dcb_enab.patch +net-dsa-mv88e6xxx-depend-on-ptp-conditionally.patch +nfc-fix-potential-resource-leaks.patch +vdpa_sim-fix-possible-memory-leak-in-vdpasim_net_ini.patch +vhost-vsock-fix-error-handling-in-vhost_vsock_init.patch +vringh-fix-range-used-in-iotlb_translate.patch +vhost-fix-range-used-in-translate_desc.patch +vdpa_sim-fix-vringh-initialization-in-vdpasim_queue_.patch +net-mlx5-e-switch-properly-handle-ingress-tagged-pac.patch +net-mlx5-add-forgotten-cleanup-calls-into-mlx5_init_.patch +net-mlx5-avoid-recovery-in-probe-flows.patch +net-mlx5e-ipoib-don-t-allow-cqe-compression-to-be-tu.patch +net-mlx5e-tc-refactor-mlx5e_tc_add_flow_mod_hdr-to-g.patch +net-mlx5e-always-clear-dest-encap-in-neigh-update-de.patch +net-mlx5e-fix-hw-mtu-initializing-at-xdp-sq-allocati.patch +net-amd-xgbe-add-missed-tasklet_kill.patch +net-ena-fix-toeplitz-initial-hash-value.patch +net-ena-don-t-register-memory-info-on-xdp-exchange.patch +net-ena-account-for-the-number-of-processed-bytes-in.patch +net-ena-use-bitmask-to-indicate-packet-redirection.patch +net-ena-fix-rx_copybreak-value-update.patch +net-ena-set-default-value-for-rx-interrupt-moderatio.patch +net-ena-update-numa-tph-hint-register-upon-numa-node.patch +net-phy-xgmiitorgmii-fix-refcount-leak-in-xgmiitorgm.patch +rdma-mlx5-fix-mlx5_ib_get_hw_stats-when-used-for-dev.patch +rdma-mlx5-fix-validation-of-max_rd_atomic-caps-for-d.patch +drm-meson-reduce-the-fifo-lines-held-when-afbc-is-no.patch +filelock-new-helper-vfs_inode_has_locks.patch +ceph-switch-to-vfs_inode_has_locks-to-fix-file-lock-.patch +gpio-sifive-fix-refcount-leak-in-sifive_gpio_probe.patch +net-sched-atm-dont-intepret-cls-results-when-asked-t.patch +net-sched-cbq-dont-intepret-cls-results-when-asked-t.patch +net-sparx5-fix-reading-of-the-mac-address.patch +netfilter-ipset-fix-hash-net-port-net-hang-with-0-su.patch +netfilter-ipset-rework-long-task-execution-when-addi.patch +perf-tools-fix-resources-leak-in-perf_data__open_dir.patch +drm-imx-ipuv3-plane-fix-overlay-plane-width.patch +fs-ntfs3-don-t-hold-ni_lock-when-calling-truncate_se.patch +drivers-net-bonding-bond_3ad-return-when-there-s-no-.patch +octeontx2-pf-fix-lmtst-id-used-in-aura-free.patch +usb-rndis_host-secure-rndis_query-check-against-int-.patch +perf-stat-fix-handling-of-for-each-cgroup-with-bpf-c.patch +drm-i915-unpin-on-error-in-intel_vgpu_shadow_mm_pin.patch +caif-fix-memory-leak-in-cfctrl_linkup_request.patch +udf-fix-extension-of-the-last-extent-in-the-file.patch +asoc-intel-bytcr_rt5640-add-quirk-for-the-advantech-.patch +nvme-fix-multipath-crash-caused-by-flush-request-whe.patch +io_uring-check-for-valid-register-opcode-earlier.patch +nvmet-use-nvme_cmd_effects_csupp-instead-of-open-cod.patch +nvme-also-return-i-o-command-effects-from-nvme_comma.patch +btrfs-check-superblock-to-ensure-the-fs-was-not-modi.patch diff --git a/queue-5.15/sunrpc-ensure-the-matching-upcall-is-in-flight-upon-.patch b/queue-5.15/sunrpc-ensure-the-matching-upcall-is-in-flight-upon-.patch new file mode 100644 index 00000000000..6b6390b25cf --- /dev/null +++ b/queue-5.15/sunrpc-ensure-the-matching-upcall-is-in-flight-upon-.patch @@ -0,0 +1,133 @@ +From 9ae19f77716cd8e54b2d73fe2597e5ae8ee6f8b5 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 13 Dec 2022 13:14:31 +0900 +Subject: SUNRPC: ensure the matching upcall is in-flight upon downcall + +From: minoura makoto + +[ Upstream commit b18cba09e374637a0a3759d856a6bca94c133952 ] + +Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid +but different gss service") introduced `auth` argument to +__gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL +since it (and auth->service) was not (yet) determined. + +When multiple upcalls with the same uid and different service are +ongoing, it could happen that __gss_find_upcall(), which returns the +first match found in the pipe->in_downcall list, could not find the +correct gss_msg corresponding to the downcall we are looking for. +Moreover, it might return a msg which is not sent to rpc.gssd yet. + +We could see mount.nfs process hung in D state with multiple mount.nfs +are executed in parallel. The call trace below is of CentOS 7.9 +kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/ +elrepo kernel-ml-6.0.7-1.el7. + +PID: 71258 TASK: ffff91ebd4be0000 CPU: 36 COMMAND: "mount.nfs" + #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f + #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9 + #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss] + #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc +[sunrpc] + #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss] + #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc] + #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc] + #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc] + #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc] + #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc] + +The scenario is like this. Let's say there are two upcalls for +services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe. + +When rpc.gssd reads pipe to get the upcall msg corresponding to +service B from pipe->pipe and then writes the response, in +gss_pipe_downcall the msg corresponding to service A will be picked +because only uid is used to find the msg and it is before the one for +B in pipe->in_downcall. And the process waiting for the msg +corresponding to service A will be woken up. + +Actual scheduing of that process might be after rpc.gssd processes the +next msg. In rpc_pipe_generic_upcall it clears msg->errno (for A). +The process is scheduled to see gss_msg->ctx == NULL and +gss_msg->msg.errno == 0, therefore it cannot break the loop in +gss_create_upcall and is never woken up after that. + +This patch adds a simple check to ensure that a msg which is not +sent to rpc.gssd yet is not chosen as the matching upcall upon +receiving a downcall. + +Signed-off-by: minoura makoto +Signed-off-by: Hiroshi Shimamoto +Tested-by: Hiroshi Shimamoto +Cc: Trond Myklebust +Fixes: 9130b8dbc6ac ("SUNRPC: allow for upcalls for same uid but different gss service") +Signed-off-by: Trond Myklebust +Signed-off-by: Sasha Levin +--- + include/linux/sunrpc/rpc_pipe_fs.h | 5 +++++ + net/sunrpc/auth_gss/auth_gss.c | 19 +++++++++++++++++-- + 2 files changed, 22 insertions(+), 2 deletions(-) + +diff --git a/include/linux/sunrpc/rpc_pipe_fs.h b/include/linux/sunrpc/rpc_pipe_fs.h +index cd188a527d16..3b35b6f6533a 100644 +--- a/include/linux/sunrpc/rpc_pipe_fs.h ++++ b/include/linux/sunrpc/rpc_pipe_fs.h +@@ -92,6 +92,11 @@ extern ssize_t rpc_pipe_generic_upcall(struct file *, struct rpc_pipe_msg *, + char __user *, size_t); + extern int rpc_queue_upcall(struct rpc_pipe *, struct rpc_pipe_msg *); + ++/* returns true if the msg is in-flight, i.e., already eaten by the peer */ ++static inline bool rpc_msg_is_inflight(const struct rpc_pipe_msg *msg) { ++ return (msg->copied != 0 && list_empty(&msg->list)); ++} ++ + struct rpc_clnt; + extern struct dentry *rpc_create_client_dir(struct dentry *, const char *, struct rpc_clnt *); + extern int rpc_remove_client_dir(struct rpc_clnt *); +diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c +index 5f42aa5fc612..2ff66a6a7e54 100644 +--- a/net/sunrpc/auth_gss/auth_gss.c ++++ b/net/sunrpc/auth_gss/auth_gss.c +@@ -301,7 +301,7 @@ __gss_find_upcall(struct rpc_pipe *pipe, kuid_t uid, const struct gss_auth *auth + list_for_each_entry(pos, &pipe->in_downcall, list) { + if (!uid_eq(pos->uid, uid)) + continue; +- if (auth && pos->auth->service != auth->service) ++ if (pos->auth->service != auth->service) + continue; + refcount_inc(&pos->count); + return pos; +@@ -685,6 +685,21 @@ gss_create_upcall(struct gss_auth *gss_auth, struct gss_cred *gss_cred) + return err; + } + ++static struct gss_upcall_msg * ++gss_find_downcall(struct rpc_pipe *pipe, kuid_t uid) ++{ ++ struct gss_upcall_msg *pos; ++ list_for_each_entry(pos, &pipe->in_downcall, list) { ++ if (!uid_eq(pos->uid, uid)) ++ continue; ++ if (!rpc_msg_is_inflight(&pos->msg)) ++ continue; ++ refcount_inc(&pos->count); ++ return pos; ++ } ++ return NULL; ++} ++ + #define MSG_BUF_MAXSIZE 1024 + + static ssize_t +@@ -731,7 +746,7 @@ gss_pipe_downcall(struct file *filp, const char __user *src, size_t mlen) + err = -ENOENT; + /* Find a matching upcall */ + spin_lock(&pipe->lock); +- gss_msg = __gss_find_upcall(pipe, uid, NULL); ++ gss_msg = gss_find_downcall(pipe, uid); + if (gss_msg == NULL) { + spin_unlock(&pipe->lock); + goto err_put_ctx; +-- +2.35.1 + diff --git a/queue-5.15/udf-fix-extension-of-the-last-extent-in-the-file.patch b/queue-5.15/udf-fix-extension-of-the-last-extent-in-the-file.patch new file mode 100644 index 00000000000..330daeab3b6 --- /dev/null +++ b/queue-5.15/udf-fix-extension-of-the-last-extent-in-the-file.patch @@ -0,0 +1,37 @@ +From 4eb777f600456b291ba9a946d742b5516b846419 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 21 Dec 2022 17:45:51 +0100 +Subject: udf: Fix extension of the last extent in the file + +From: Jan Kara + +[ Upstream commit 83c7423d1eb6806d13c521d1002cc1a012111719 ] + +When extending the last extent in the file within the last block, we +wrongly computed the length of the last extent. This is mostly a +cosmetical problem since the extent does not contain any data and the +length will be fixed up by following operations but still. + +Fixes: 1f3868f06855 ("udf: Fix extending file within last block") +Signed-off-by: Jan Kara +Signed-off-by: Sasha Levin +--- + fs/udf/inode.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/fs/udf/inode.c b/fs/udf/inode.c +index 6a0e8ef664c1..d2488b7e54a5 100644 +--- a/fs/udf/inode.c ++++ b/fs/udf/inode.c +@@ -599,7 +599,7 @@ static void udf_do_extend_final_block(struct inode *inode, + */ + if (new_elen <= (last_ext->extLength & UDF_EXTENT_LENGTH_MASK)) + return; +- added_bytes = (last_ext->extLength & UDF_EXTENT_LENGTH_MASK) - new_elen; ++ added_bytes = new_elen - (last_ext->extLength & UDF_EXTENT_LENGTH_MASK); + last_ext->extLength += added_bytes; + UDF_I(inode)->i_lenExtents += added_bytes; + +-- +2.35.1 + diff --git a/queue-5.15/usb-rndis_host-secure-rndis_query-check-against-int-.patch b/queue-5.15/usb-rndis_host-secure-rndis_query-check-against-int-.patch new file mode 100644 index 00000000000..1f45ca2f880 --- /dev/null +++ b/queue-5.15/usb-rndis_host-secure-rndis_query-check-against-int-.patch @@ -0,0 +1,43 @@ +From 16cbd68a3b6a6c0bbb44f3f350ef61e8a8677690 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 3 Jan 2023 10:17:09 +0100 +Subject: usb: rndis_host: Secure rndis_query check against int overflow + +From: Szymon Heidrich + +[ Upstream commit c7dd13805f8b8fc1ce3b6d40f6aff47e66b72ad2 ] + +Variables off and len typed as uint32 in rndis_query function +are controlled by incoming RNDIS response message thus their +value may be manipulated. Setting off to a unexpectetly large +value will cause the sum with len and 8 to overflow and pass +the implemented validation step. Consequently the response +pointer will be referring to a location past the expected +buffer boundaries allowing information leakage e.g. via +RNDIS_OID_802_3_PERMANENT_ADDRESS OID. + +Fixes: ddda08624013 ("USB: rndis_host, various cleanups") +Signed-off-by: Szymon Heidrich +Signed-off-by: David S. Miller +Signed-off-by: Sasha Levin +--- + drivers/net/usb/rndis_host.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +diff --git a/drivers/net/usb/rndis_host.c b/drivers/net/usb/rndis_host.c +index bedd36ab5cf0..e5f6614da5ac 100644 +--- a/drivers/net/usb/rndis_host.c ++++ b/drivers/net/usb/rndis_host.c +@@ -255,7 +255,8 @@ static int rndis_query(struct usbnet *dev, struct usb_interface *intf, + + off = le32_to_cpu(u.get_c->offset); + len = le32_to_cpu(u.get_c->len); +- if (unlikely((8 + off + len) > CONTROL_BUFFER_SIZE)) ++ if (unlikely((off > CONTROL_BUFFER_SIZE - 8) || ++ (len > CONTROL_BUFFER_SIZE - 8 - off))) + goto response_error; + + if (*reply_len != -1 && len != *reply_len) +-- +2.35.1 + diff --git a/queue-5.15/vdpa_sim-fix-possible-memory-leak-in-vdpasim_net_ini.patch b/queue-5.15/vdpa_sim-fix-possible-memory-leak-in-vdpasim_net_ini.patch new file mode 100644 index 00000000000..0519d5a3176 --- /dev/null +++ b/queue-5.15/vdpa_sim-fix-possible-memory-leak-in-vdpasim_net_ini.patch @@ -0,0 +1,103 @@ +From ebe8de0e9769a51c6fecbc9cc31ba5c59180edcd Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 10 Nov 2022 16:23:48 +0800 +Subject: vdpa_sim: fix possible memory leak in vdpasim_net_init() and + vdpasim_blk_init() + +From: ruanjinjie + +[ Upstream commit aeca7ff254843d49a8739f07f7dab1341450111d ] + +Inject fault while probing module, if device_register() fails in +vdpasim_net_init() or vdpasim_blk_init(), but the refcount of kobject is +not decreased to 0, the name allocated in dev_set_name() is leaked. +Fix this by calling put_device(), so that name can be freed in +callback function kobject_cleanup(). + +(vdpa_sim_net) +unreferenced object 0xffff88807eebc370 (size 16): + comm "modprobe", pid 3848, jiffies 4362982860 (age 18.153s) + hex dump (first 16 bytes): + 76 64 70 61 73 69 6d 5f 6e 65 74 00 6b 6b 6b a5 vdpasim_net.kkk. + backtrace: + [] __kmalloc_node_track_caller+0x4e/0x150 + [] kstrdup+0x33/0x60 + [] kobject_set_name_vargs+0x41/0x110 + [] dev_set_name+0xab/0xe0 + [] device_add+0xe3/0x1a80 + [] 0xffffffffa0270013 + [] do_one_initcall+0x87/0x2e0 + [] do_init_module+0x1ab/0x640 + [] load_module+0x5d00/0x77f0 + [] __do_sys_finit_module+0x110/0x1b0 + [] do_syscall_64+0x35/0x80 + [] entry_SYSCALL_64_after_hwframe+0x46/0xb0 + +(vdpa_sim_blk) +unreferenced object 0xffff8881070c1250 (size 16): + comm "modprobe", pid 6844, jiffies 4364069319 (age 17.572s) + hex dump (first 16 bytes): + 76 64 70 61 73 69 6d 5f 62 6c 6b 00 6b 6b 6b a5 vdpasim_blk.kkk. + backtrace: + [] __kmalloc_node_track_caller+0x4e/0x150 + [] kstrdup+0x33/0x60 + [] kobject_set_name_vargs+0x41/0x110 + [] dev_set_name+0xab/0xe0 + [] device_add+0xe3/0x1a80 + [] 0xffffffffa0220013 + [] do_one_initcall+0x87/0x2e0 + [] do_init_module+0x1ab/0x640 + [] load_module+0x5d00/0x77f0 + [] __do_sys_finit_module+0x110/0x1b0 + [] do_syscall_64+0x35/0x80 + [] entry_SYSCALL_64_after_hwframe+0x46/0xb0 + +Fixes: 899c4d187f6a ("vdpa_sim_blk: add support for vdpa management tool") +Fixes: a3c06ae158dd ("vdpa_sim_net: Add support for user supported devices") + +Signed-off-by: ruanjinjie +Reviewed-by: Stefano Garzarella +Message-Id: <20221110082348.4105476-1-ruanjinjie@huawei.com> +Signed-off-by: Michael S. Tsirkin +Acked-by: Jason Wang +Signed-off-by: Sasha Levin +--- + drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 4 +++- + drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 4 +++- + 2 files changed, 6 insertions(+), 2 deletions(-) + +diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c +index a790903f243e..22b812c32bee 100644 +--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c ++++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c +@@ -308,8 +308,10 @@ static int __init vdpasim_blk_init(void) + int ret; + + ret = device_register(&vdpasim_blk_mgmtdev); +- if (ret) ++ if (ret) { ++ put_device(&vdpasim_blk_mgmtdev); + return ret; ++ } + + ret = vdpa_mgmtdev_register(&mgmt_dev); + if (ret) +diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c +index a1ab6163f7d1..f1c420c5e26e 100644 +--- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c ++++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c +@@ -194,8 +194,10 @@ static int __init vdpasim_net_init(void) + } + + ret = device_register(&vdpasim_net_mgmtdev); +- if (ret) ++ if (ret) { ++ put_device(&vdpasim_net_mgmtdev); + return ret; ++ } + + ret = vdpa_mgmtdev_register(&mgmt_dev); + if (ret) +-- +2.35.1 + diff --git a/queue-5.15/vdpa_sim-fix-vringh-initialization-in-vdpasim_queue_.patch b/queue-5.15/vdpa_sim-fix-vringh-initialization-in-vdpasim_queue_.patch new file mode 100644 index 00000000000..666e8413ba2 --- /dev/null +++ b/queue-5.15/vdpa_sim-fix-vringh-initialization-in-vdpasim_queue_.patch @@ -0,0 +1,52 @@ +From 92cb518cc49f256f673f8d6135639391e3f601c2 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 10 Nov 2022 15:13:35 +0100 +Subject: vdpa_sim: fix vringh initialization in vdpasim_queue_ready() +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Stefano Garzarella + +[ Upstream commit 794ec498c9fa79e6bfd71b931410d5897a9c00d4 ] + +When we initialize vringh, we should pass the features and the +number of elements in the virtqueue negotiated with the driver, +otherwise operations with vringh may fail. + +This was discovered in a case where the driver sets a number of +elements in the virtqueue different from the value returned by +.get_vq_num_max(). + +In vdpasim_vq_reset() is safe to initialize the vringh with +default values, since the virtqueue will not be used until +vdpasim_queue_ready() is called again. + +Fixes: 2c53d0f64c06 ("vdpasim: vDPA device simulator") +Signed-off-by: Stefano Garzarella +Message-Id: <20221110141335.62171-1-sgarzare@redhat.com> +Signed-off-by: Michael S. Tsirkin +Acked-by: Jason Wang +Acked-by: Eugenio Pérez +Signed-off-by: Sasha Levin +--- + drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 +-- + 1 file changed, 1 insertion(+), 2 deletions(-) + +diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c +index 2faf3bd1c3ba..4d9e3fdae5f6 100644 +--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c ++++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c +@@ -66,8 +66,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx) + { + struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; + +- vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features, +- VDPASIM_QUEUE_MAX, false, ++ vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, false, + (struct vring_desc *)(uintptr_t)vq->desc_addr, + (struct vring_avail *) + (uintptr_t)vq->driver_addr, +-- +2.35.1 + diff --git a/queue-5.15/veth-fix-race-with-af_xdp-exposing-old-or-uninitiali.patch b/queue-5.15/veth-fix-race-with-af_xdp-exposing-old-or-uninitiali.patch new file mode 100644 index 00000000000..ace6e9ba044 --- /dev/null +++ b/queue-5.15/veth-fix-race-with-af_xdp-exposing-old-or-uninitiali.patch @@ -0,0 +1,88 @@ +From 06ddca3a8428c8f146ff01854fd0eb881bea2cd3 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 20 Dec 2022 12:59:03 -0600 +Subject: veth: Fix race with AF_XDP exposing old or uninitialized descriptors + +From: Shawn Bohrer + +[ Upstream commit fa349e396e4886d742fd6501c599ec627ef1353b ] + +When AF_XDP is used on on a veth interface the RX ring is updated in two +steps. veth_xdp_rcv() removes packet descriptors from the FILL ring +fills them and places them in the RX ring updating the cached_prod +pointer. Later xdp_do_flush() syncs the RX ring prod pointer with the +cached_prod pointer allowing user-space to see the recently filled in +descriptors. The rings are intended to be SPSC, however the existing +order in veth_poll allows the xdp_do_flush() to run concurrently with +another CPU creating a race condition that allows user-space to see old +or uninitialized descriptors in the RX ring. This bug has been observed +in production systems. + +To summarize, we are expecting this ordering: + +CPU 0 __xsk_rcv_zc() +CPU 0 __xsk_map_flush() +CPU 2 __xsk_rcv_zc() +CPU 2 __xsk_map_flush() + +But we are seeing this order: + +CPU 0 __xsk_rcv_zc() +CPU 2 __xsk_rcv_zc() +CPU 0 __xsk_map_flush() +CPU 2 __xsk_map_flush() + +This occurs because we rely on NAPI to ensure that only one napi_poll +handler is running at a time for the given veth receive queue. +napi_schedule_prep() will prevent multiple instances from getting +scheduled. However calling napi_complete_done() signals that this +napi_poll is complete and allows subsequent calls to +napi_schedule_prep() and __napi_schedule() to succeed in scheduling a +concurrent napi_poll before the xdp_do_flush() has been called. For the +veth driver a concurrent call to napi_schedule_prep() and +__napi_schedule() can occur on a different CPU because the veth xmit +path can additionally schedule a napi_poll creating the race. + +The fix as suggested by Magnus Karlsson, is to simply move the +xdp_do_flush() call before napi_complete_done(). This syncs the +producer ring pointers before another instance of napi_poll can be +scheduled on another CPU. It will also slightly improve performance by +moving the flush closer to when the descriptors were placed in the +RX ring. + +Fixes: d1396004dd86 ("veth: Add XDP TX and REDIRECT") +Suggested-by: Magnus Karlsson +Signed-off-by: Shawn Bohrer +Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com +Signed-off-by: Paolo Abeni +Signed-off-by: Sasha Levin +--- + drivers/net/veth.c | 5 +++-- + 1 file changed, 3 insertions(+), 2 deletions(-) + +diff --git a/drivers/net/veth.c b/drivers/net/veth.c +index 64fa8e9c0a22..41cb9179e8b7 100644 +--- a/drivers/net/veth.c ++++ b/drivers/net/veth.c +@@ -916,6 +916,9 @@ static int veth_poll(struct napi_struct *napi, int budget) + xdp_set_return_frame_no_direct(); + done = veth_xdp_rcv(rq, budget, &bq, &stats); + ++ if (stats.xdp_redirect > 0) ++ xdp_do_flush(); ++ + if (done < budget && napi_complete_done(napi, done)) { + /* Write rx_notify_masked before reading ptr_ring */ + smp_store_mb(rq->rx_notify_masked, false); +@@ -929,8 +932,6 @@ static int veth_poll(struct napi_struct *napi, int budget) + + if (stats.xdp_tx > 0) + veth_xdp_flush(rq, &bq); +- if (stats.xdp_redirect > 0) +- xdp_do_flush(); + xdp_clear_return_frame_no_direct(); + + return done; +-- +2.35.1 + diff --git a/queue-5.15/vhost-fix-range-used-in-translate_desc.patch b/queue-5.15/vhost-fix-range-used-in-translate_desc.patch new file mode 100644 index 00000000000..d7ee0212f8f --- /dev/null +++ b/queue-5.15/vhost-fix-range-used-in-translate_desc.patch @@ -0,0 +1,55 @@ +From 412239dc2f79a95965f593ea300016d59fe1ecca Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 9 Nov 2022 11:25:03 +0100 +Subject: vhost: fix range used in translate_desc() + +From: Stefano Garzarella + +[ Upstream commit 98047313cdb46828093894d0ac8b1183b8b317f9 ] + +vhost_iotlb_itree_first() requires `start` and `last` parameters +to search for a mapping that overlaps the range. + +In translate_desc() we cyclically call vhost_iotlb_itree_first(), +incrementing `addr` by the amount already translated, so rightly +we move the `start` parameter passed to vhost_iotlb_itree_first(), +but we should hold the `last` parameter constant. + +Let's fix it by saving the `last` parameter value before incrementing +`addr` in the loop. + +Fixes: a9709d6874d5 ("vhost: convert pre sorted vhost memory array to interval tree") +Acked-by: Jason Wang +Signed-off-by: Stefano Garzarella +Message-Id: <20221109102503.18816-3-sgarzare@redhat.com> +Signed-off-by: Michael S. Tsirkin +Signed-off-by: Sasha Levin +--- + drivers/vhost/vhost.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c +index 6942472cffb0..0a9746bc9228 100644 +--- a/drivers/vhost/vhost.c ++++ b/drivers/vhost/vhost.c +@@ -2048,7 +2048,7 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, + struct vhost_dev *dev = vq->dev; + struct vhost_iotlb *umem = dev->iotlb ? dev->iotlb : dev->umem; + struct iovec *_iov; +- u64 s = 0; ++ u64 s = 0, last = addr + len - 1; + int ret = 0; + + while ((u64)len > s) { +@@ -2058,7 +2058,7 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, + break; + } + +- map = vhost_iotlb_itree_first(umem, addr, addr + len - 1); ++ map = vhost_iotlb_itree_first(umem, addr, last); + if (map == NULL || map->start > addr) { + if (umem != dev->iotlb) { + ret = -EFAULT; +-- +2.35.1 + diff --git a/queue-5.15/vhost-vsock-fix-error-handling-in-vhost_vsock_init.patch b/queue-5.15/vhost-vsock-fix-error-handling-in-vhost_vsock_init.patch new file mode 100644 index 00000000000..bd396f4aab2 --- /dev/null +++ b/queue-5.15/vhost-vsock-fix-error-handling-in-vhost_vsock_init.patch @@ -0,0 +1,64 @@ +From ea969df3350260841a6e8c67fc6f5e0daadb94fc Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 8 Nov 2022 10:17:05 +0000 +Subject: vhost/vsock: Fix error handling in vhost_vsock_init() + +From: Yuan Can + +[ Upstream commit 7a4efe182ca61fb3e5307e69b261c57cbf434cd4 ] + +A problem about modprobe vhost_vsock failed is triggered with the +following log given: + +modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy + +The reason is that vhost_vsock_init() returns misc_register() directly +without checking its return value, if misc_register() failed, it returns +without calling vsock_core_unregister() on vhost_transport, resulting the +vhost_vsock can never be installed later. +A simple call graph is shown as below: + + vhost_vsock_init() + vsock_core_register() # register vhost_transport + misc_register() + device_create_with_groups() + device_create_groups_vargs() + dev = kzalloc(...) # OOM happened + # return without unregister vhost_transport + +Fix by calling vsock_core_unregister() when misc_register() returns error. + +Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") +Signed-off-by: Yuan Can +Message-Id: <20221108101705.45981-1-yuancan@huawei.com> +Signed-off-by: Michael S. Tsirkin +Reviewed-by: Stefano Garzarella +Acked-by: Jason Wang +Signed-off-by: Sasha Levin +--- + drivers/vhost/vsock.c | 9 ++++++++- + 1 file changed, 8 insertions(+), 1 deletion(-) + +diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c +index 97bfe499222b..74ac0c28fe43 100644 +--- a/drivers/vhost/vsock.c ++++ b/drivers/vhost/vsock.c +@@ -968,7 +968,14 @@ static int __init vhost_vsock_init(void) + VSOCK_TRANSPORT_F_H2G); + if (ret < 0) + return ret; +- return misc_register(&vhost_vsock_misc); ++ ++ ret = misc_register(&vhost_vsock_misc); ++ if (ret) { ++ vsock_core_unregister(&vhost_transport.transport); ++ return ret; ++ } ++ ++ return 0; + }; + + static void __exit vhost_vsock_exit(void) +-- +2.35.1 + diff --git a/queue-5.15/vmxnet3-correctly-report-csum_level-for-encapsulated.patch b/queue-5.15/vmxnet3-correctly-report-csum_level-for-encapsulated.patch new file mode 100644 index 00000000000..0f408c9e3e3 --- /dev/null +++ b/queue-5.15/vmxnet3-correctly-report-csum_level-for-encapsulated.patch @@ -0,0 +1,55 @@ +From 8eebf2537e96cf35cb4cfc620b0030e5118f6374 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 20 Dec 2022 12:25:55 -0800 +Subject: vmxnet3: correctly report csum_level for encapsulated packet + +From: Ronak Doshi + +[ Upstream commit 3d8f2c4269d08f8793e946279dbdf5e972cc4911 ] + +Commit dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload +support") added support for encapsulation offload. However, the +pathc did not report correctly the csum_level for encapsulated packet. + +This patch fixes this issue by reporting correct csum level for the +encapsulated packet. + +Fixes: dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") +Signed-off-by: Ronak Doshi +Acked-by: Peng Li +Link: https://lore.kernel.org/r/20221220202556.24421-1-doshir@vmware.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + drivers/net/vmxnet3/vmxnet3_drv.c | 8 ++++++++ + 1 file changed, 8 insertions(+) + +diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c +index 21896e221300..b88092a6bc85 100644 +--- a/drivers/net/vmxnet3/vmxnet3_drv.c ++++ b/drivers/net/vmxnet3/vmxnet3_drv.c +@@ -1242,6 +1242,10 @@ vmxnet3_rx_csum(struct vmxnet3_adapter *adapter, + (le32_to_cpu(gdesc->dword[3]) & + VMXNET3_RCD_CSUM_OK) == VMXNET3_RCD_CSUM_OK) { + skb->ip_summed = CHECKSUM_UNNECESSARY; ++ if ((le32_to_cpu(gdesc->dword[0]) & ++ (1UL << VMXNET3_RCD_HDR_INNER_SHIFT))) { ++ skb->csum_level = 1; ++ } + WARN_ON_ONCE(!(gdesc->rcd.tcp || gdesc->rcd.udp) && + !(le32_to_cpu(gdesc->dword[0]) & + (1UL << VMXNET3_RCD_HDR_INNER_SHIFT))); +@@ -1251,6 +1255,10 @@ vmxnet3_rx_csum(struct vmxnet3_adapter *adapter, + } else if (gdesc->rcd.v6 && (le32_to_cpu(gdesc->dword[3]) & + (1 << VMXNET3_RCD_TUC_SHIFT))) { + skb->ip_summed = CHECKSUM_UNNECESSARY; ++ if ((le32_to_cpu(gdesc->dword[0]) & ++ (1UL << VMXNET3_RCD_HDR_INNER_SHIFT))) { ++ skb->csum_level = 1; ++ } + WARN_ON_ONCE(!(gdesc->rcd.tcp || gdesc->rcd.udp) && + !(le32_to_cpu(gdesc->dword[0]) & + (1UL << VMXNET3_RCD_HDR_INNER_SHIFT))); +-- +2.35.1 + diff --git a/queue-5.15/vringh-fix-range-used-in-iotlb_translate.patch b/queue-5.15/vringh-fix-range-used-in-iotlb_translate.patch new file mode 100644 index 00000000000..3c9c586961b --- /dev/null +++ b/queue-5.15/vringh-fix-range-used-in-iotlb_translate.patch @@ -0,0 +1,56 @@ +From a4f4166b4249cadc80f3390ce17c2acb17fe013d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 9 Nov 2022 11:25:02 +0100 +Subject: vringh: fix range used in iotlb_translate() + +From: Stefano Garzarella + +[ Upstream commit f85efa9b0f5381874f727bd98f56787840313f0b ] + +vhost_iotlb_itree_first() requires `start` and `last` parameters +to search for a mapping that overlaps the range. + +In iotlb_translate() we cyclically call vhost_iotlb_itree_first(), +incrementing `addr` by the amount already translated, so rightly +we move the `start` parameter passed to vhost_iotlb_itree_first(), +but we should hold the `last` parameter constant. + +Let's fix it by saving the `last` parameter value before incrementing +`addr` in the loop. + +Fixes: 9ad9c49cfe97 ("vringh: IOTLB support") +Acked-by: Jason Wang +Signed-off-by: Stefano Garzarella +Message-Id: <20221109102503.18816-2-sgarzare@redhat.com> +Signed-off-by: Michael S. Tsirkin +Signed-off-by: Sasha Levin +--- + drivers/vhost/vringh.c | 5 ++--- + 1 file changed, 2 insertions(+), 3 deletions(-) + +diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c +index eab55accf381..786876af0a73 100644 +--- a/drivers/vhost/vringh.c ++++ b/drivers/vhost/vringh.c +@@ -1101,7 +1101,7 @@ static int iotlb_translate(const struct vringh *vrh, + struct vhost_iotlb_map *map; + struct vhost_iotlb *iotlb = vrh->iotlb; + int ret = 0; +- u64 s = 0; ++ u64 s = 0, last = addr + len - 1; + + spin_lock(vrh->iotlb_lock); + +@@ -1113,8 +1113,7 @@ static int iotlb_translate(const struct vringh *vrh, + break; + } + +- map = vhost_iotlb_itree_first(iotlb, addr, +- addr + len - 1); ++ map = vhost_iotlb_itree_first(iotlb, addr, last); + if (!map || map->start > addr) { + ret = -EINVAL; + break; +-- +2.35.1 +