Fixes for 5.4

author Sasha Levin <sashal@kernel.org>

Fri, 2 Sep 2022 04:23:51 +0000 (00:23 -0400)

committer Sasha Levin <sashal@kernel.org>

Fri, 2 Sep 2022 04:23:51 +0000 (00:23 -0400)
author Sasha Levin <sashal@kernel.org>
Fri, 2 Sep 2022 04:23:51 +0000 (00:23 -0400)
committer Sasha Levin <sashal@kernel.org>
Fri, 2 Sep 2022 04:23:51 +0000 (00:23 -0400)
diff --git a/queue-5.4/btrfs-do-not-pin-logs-too-early-during-renames.patch b/queue-5.4/btrfs-do-not-pin-logs-too-early-during-renames.patch

new file mode 100644 (file)

index 0000000..ce97032
--- /dev/null
+++ b/queue-5.4/btrfs-do-not-pin-logs-too-early-during-renames.patch
@@ -0,0 +1,244 @@
+From 17f3de13e77397c37936193ccec5d940f9a15985 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 27 Jul 2021 11:24:45 +0100
+Subject: btrfs: do not pin logs too early during renames
+
+From: Filipe Manana <fdmanana@suse.com>
+
+[ Upstream commit bd54f381a12ac695593271a663d36d14220215b2 ]
+
+During renames we pin the logs of the roots a bit too early, before the
+calls to btrfs_insert_inode_ref(). We can pin the logs after those calls,
+since those will not change anything in a log tree.
+
+In a scenario where we have multiple and diverse filesystem operations
+running in parallel, those calls can take a significant amount of time,
+due to lock contention on extent buffers, and delay log commits from other
+tasks for longer than necessary.
+
+So just pin logs after calls to btrfs_insert_inode_ref() and right before
+the first operation that can update a log tree.
+
+The following script that uses dbench was used for testing:
+
+  $ cat dbench-test.sh
+  #!/bin/bash
+
+  DEV=/dev/nvme0n1
+  MNT=/mnt/nvme0n1
+  MOUNT_OPTIONS="-o ssd"
+  MKFS_OPTIONS="-m single -d single"
+
+  echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
+
+  umount $DEV &> /dev/null
+  mkfs.btrfs -f $MKFS_OPTIONS $DEV
+  mount $MOUNT_OPTIONS $DEV $MNT
+
+  dbench -D $MNT -t 120 16
+
+  umount $MNT
+
+The tests were run on a machine with 12 cores, 64G of RAN, a NVMe device
+and using a non-debug kernel config (Debian's default config).
+
+The results compare a branch without this patch and without the previous
+patch in the series, that has the subject:
+
+ "btrfs: eliminate some false positives when checking if inode was logged"
+
+Versus the same branch with these two patches applied.
+
+dbench with 8 clients, results before:
+
+ Operation      Count    AvgLat    MaxLat
+ ----------------------------------------
+ NTCreateX    4391359     0.009   249.745
+ Close        3225882     0.001     3.243
+ Rename        185953     0.065   240.643
+ Unlink        886669     0.049   249.906
+ Deltree          112     2.455   217.433
+ Mkdir             56     0.002     0.004
+ Qpathinfo    3980281     0.004     3.109
+ Qfileinfo     697579     0.001     0.187
+ Qfsinfo       729780     0.002     2.424
+ Sfileinfo     357764     0.004     1.415
+ Find         1538861     0.016     4.863
+ WriteX       2189666     0.010     3.327
+ ReadX        6883443     0.002     0.729
+ LockX          14298     0.002     0.073
+ UnlockX        14298     0.001     0.042
+ Flush         307777     2.447   303.663
+
+Throughput 1149.6 MB/sec  8 clients  8 procs  max_latency=303.666 ms
+
+dbench with 8 clients, results after:
+
+ Operation      Count    AvgLat    MaxLat
+ ----------------------------------------
+ NTCreateX    4269920     0.009   213.532
+ Close        3136653     0.001     0.690
+ Rename        180805     0.082   213.858
+ Unlink        862189     0.050   172.893
+ Deltree          112     2.998   218.328
+ Mkdir             56     0.002     0.003
+ Qpathinfo    3870158     0.004     5.072
+ Qfileinfo     678375     0.001     0.194
+ Qfsinfo       709604     0.002     0.485
+ Sfileinfo     347850     0.004     1.304
+ Find         1496310     0.017     5.504
+ WriteX       2129613     0.010     2.882
+ ReadX        6693066     0.002     1.517
+ LockX          13902     0.002     0.075
+ UnlockX        13902     0.001     0.055
+ Flush         299276     2.511   220.189
+
+Throughput 1187.33 MB/sec  8 clients  8 procs  max_latency=220.194 ms
+
++3.2% throughput, -31.8% max latency
+
+dbench with 16 clients, results before:
+
+ Operation      Count    AvgLat    MaxLat
+ ----------------------------------------
+ NTCreateX    5978334     0.028   156.507
+ Close        4391598     0.001     1.345
+ Rename        253136     0.241   155.057
+ Unlink       1207220     0.182   257.344
+ Deltree          160     6.123    36.277
+ Mkdir             80     0.003     0.005
+ Qpathinfo    5418817     0.012     6.867
+ Qfileinfo     949929     0.001     0.941
+ Qfsinfo       993560     0.002     1.386
+ Sfileinfo     486904     0.004     2.829
+ Find         2095088     0.059     8.164
+ WriteX       2982319     0.017     9.029
+ ReadX        9371484     0.002     4.052
+ LockX          19470     0.002     0.461
+ UnlockX        19470     0.001     0.990
+ Flush         418936     2.740   347.902
+
+Throughput 1495.31 MB/sec  16 clients  16 procs  max_latency=347.909 ms
+
+dbench with 16 clients, results after:
+
+ Operation      Count    AvgLat    MaxLat
+ ----------------------------------------
+ NTCreateX    5711833     0.029   131.240
+ Close        4195897     0.001     1.732
+ Rename        241849     0.204   147.831
+ Unlink       1153341     0.184   231.322
+ Deltree          160     6.086    30.198
+ Mkdir             80     0.003     0.021
+ Qpathinfo    5177011     0.012     7.150
+ Qfileinfo     907768     0.001     0.793
+ Qfsinfo       949205     0.002     1.431
+ Sfileinfo     465317     0.004     2.454
+ Find         2001541     0.058     7.819
+ WriteX       2850661     0.017     9.110
+ ReadX        8952289     0.002     3.991
+ LockX          18596     0.002     0.655
+ UnlockX        18596     0.001     0.179
+ Flush         400342     2.879   293.607
+
+Throughput 1565.73 MB/sec  16 clients  16 procs  max_latency=293.611 ms
+
++4.6% throughput, -16.9% max latency
+
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ fs/btrfs/inode.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
+ 1 file changed, 42 insertions(+), 6 deletions(-)
+
+diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
+index 7755a0362a3ad..20c5db8ef8427 100644
+--- a/fs/btrfs/inode.c
++++ b/fs/btrfs/inode.c
+@@ -9751,8 +9751,6 @@ static int btrfs_rename_exchange(struct inode *old_dir,
+               /* force full log commit if subvolume involved. */
+               btrfs_set_log_full_commit(trans);
+       } else {
+-              btrfs_pin_log_trans(root);
+-              root_log_pinned = true;
+               ret = btrfs_insert_inode_ref(trans, dest,
+                                            new_dentry->d_name.name,
+                                            new_dentry->d_name.len,
+@@ -9768,8 +9766,6 @@ static int btrfs_rename_exchange(struct inode *old_dir,
+               /* force full log commit if subvolume involved. */
+               btrfs_set_log_full_commit(trans);
+       } else {
+-              btrfs_pin_log_trans(dest);
+-              dest_log_pinned = true;
+               ret = btrfs_insert_inode_ref(trans, root,
+                                            old_dentry->d_name.name,
+                                            old_dentry->d_name.len,
+@@ -9797,6 +9793,29 @@ static int btrfs_rename_exchange(struct inode *old_dir,
+                               BTRFS_I(new_inode), 1);
+       }
+ 
++      /*
++       * Now pin the logs of the roots. We do it to ensure that no other task
++       * can sync the logs while we are in progress with the rename, because
++       * that could result in an inconsistency in case any of the inodes that
++       * are part of this rename operation were logged before.
++       *
++       * We pin the logs even if at this precise moment none of the inodes was
++       * logged before. This is because right after we checked for that, some
++       * other task fsyncing some other inode not involved with this rename
++       * operation could log that one of our inodes exists.
++       *
++       * We don't need to pin the logs before the above calls to
++       * btrfs_insert_inode_ref(), since those don't ever need to change a log.
++       */
++      if (old_ino != BTRFS_FIRST_FREE_OBJECTID) {
++              btrfs_pin_log_trans(root);
++              root_log_pinned = true;
++      }
++      if (new_ino != BTRFS_FIRST_FREE_OBJECTID) {
++              btrfs_pin_log_trans(dest);
++              dest_log_pinned = true;
++      }
++
+       /* src is a subvolume */
+       if (old_ino == BTRFS_FIRST_FREE_OBJECTID) {
+               ret = btrfs_unlink_subvol(trans, old_dir, old_dentry);
+@@ -10046,8 +10065,6 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+               /* force full log commit if subvolume involved. */
+               btrfs_set_log_full_commit(trans);
+       } else {
+-              btrfs_pin_log_trans(root);
+-              log_pinned = true;
+               ret = btrfs_insert_inode_ref(trans, dest,
+                                            new_dentry->d_name.name,
+                                            new_dentry->d_name.len,
+@@ -10071,6 +10088,25 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+       if (unlikely(old_ino == BTRFS_FIRST_FREE_OBJECTID)) {
+               ret = btrfs_unlink_subvol(trans, old_dir, old_dentry);
+       } else {
++              /*
++               * Now pin the log. We do it to ensure that no other task can
++               * sync the log while we are in progress with the rename, as
++               * that could result in an inconsistency in case any of the
++               * inodes that are part of this rename operation were logged
++               * before.
++               *
++               * We pin the log even if at this precise moment none of the
++               * inodes was logged before. This is because right after we
++               * checked for that, some other task fsyncing some other inode
++               * not involved with this rename operation could log that one of
++               * our inodes exists.
++               *
++               * We don't need to pin the logs before the above call to
++               * btrfs_insert_inode_ref(), since that does not need to change
++               * a log.
++               */
++              btrfs_pin_log_trans(root);
++              log_pinned = true;
+               ret = __btrfs_unlink_inode(trans, root, BTRFS_I(old_dir),
+                                       BTRFS_I(d_inode(old_dentry)),
+                                       old_dentry->d_name.name,
+-- 
+2.35.1
+
diff --git a/queue-5.4/btrfs-introduce-btrfs_lookup_match_dir.patch b/queue-5.4/btrfs-introduce-btrfs_lookup_match_dir.patch

new file mode 100644 (file)

index 0000000..9048b42
--- /dev/null
+++ b/queue-5.4/btrfs-introduce-btrfs_lookup_match_dir.patch
@@ -0,0 +1,175 @@
+From 59980c3df86522964cd7efbcf4342d0ba7d23625 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Mon, 26 Jul 2021 16:19:09 -0300
+Subject: btrfs: introduce btrfs_lookup_match_dir
+
+From: Marcos Paulo de Souza <mpdesouza@suse.com>
+
+[ Upstream commit a7d1c5dc8632e9b370ad26478c468d4e4e29f263 ]
+
+btrfs_search_slot is called in multiple places in dir-item.c to search
+for a dir entry, and then calling btrfs_match_dir_name to return a
+btrfs_dir_item.
+
+In order to reduce the number of callers of btrfs_search_slot, create a
+common function that looks for the dir key, and if found call
+btrfs_match_dir_item_name.
+
+Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
+Reviewed-by: David Sterba <dsterba@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ fs/btrfs/dir-item.c | 76 +++++++++++++++++++++++----------------------
+ 1 file changed, 39 insertions(+), 37 deletions(-)
+
+diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c
+index 863367c2c6205..1c0a7cd6b9b0a 100644
+--- a/fs/btrfs/dir-item.c
++++ b/fs/btrfs/dir-item.c
+@@ -171,6 +171,25 @@ int btrfs_insert_dir_item(struct btrfs_trans_handle *trans, const char *name,
+       return 0;
+ }
+ 
++static struct btrfs_dir_item *btrfs_lookup_match_dir(
++                      struct btrfs_trans_handle *trans,
++                      struct btrfs_root *root, struct btrfs_path *path,
++                      struct btrfs_key *key, const char *name,
++                      int name_len, int mod)
++{
++      const int ins_len = (mod < 0 ? -1 : 0);
++      const int cow = (mod != 0);
++      int ret;
++
++      ret = btrfs_search_slot(trans, root, key, path, ins_len, cow);
++      if (ret < 0)
++              return ERR_PTR(ret);
++      if (ret > 0)
++              return ERR_PTR(-ENOENT);
++
++      return btrfs_match_dir_item_name(root->fs_info, path, name, name_len);
++}
++
+ /*
+  * lookup a directory item based on name.  'dir' is the objectid
+  * we're searching in, and 'mod' tells us if you plan on deleting the
+@@ -182,23 +201,18 @@ struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans,
+                                            const char *name, int name_len,
+                                            int mod)
+ {
+-      int ret;
+       struct btrfs_key key;
+-      int ins_len = mod < 0 ? -1 : 0;
+-      int cow = mod != 0;
++      struct btrfs_dir_item *di;
+ 
+       key.objectid = dir;
+       key.type = BTRFS_DIR_ITEM_KEY;
+-
+       key.offset = btrfs_name_hash(name, name_len);
+ 
+-      ret = btrfs_search_slot(trans, root, &key, path, ins_len, cow);
+-      if (ret < 0)
+-              return ERR_PTR(ret);
+-      if (ret > 0)
++      di = btrfs_lookup_match_dir(trans, root, path, &key, name, name_len, mod);
++      if (IS_ERR(di) && PTR_ERR(di) == -ENOENT)
+               return NULL;
+ 
+-      return btrfs_match_dir_item_name(root->fs_info, path, name, name_len);
++      return di;
+ }
+ 
+ int btrfs_check_dir_item_collision(struct btrfs_root *root, u64 dir,
+@@ -212,7 +226,6 @@ int btrfs_check_dir_item_collision(struct btrfs_root *root, u64 dir,
+       int slot;
+       struct btrfs_path *path;
+ 
+-
+       path = btrfs_alloc_path();
+       if (!path)
+               return -ENOMEM;
+@@ -221,20 +234,20 @@ int btrfs_check_dir_item_collision(struct btrfs_root *root, u64 dir,
+       key.type = BTRFS_DIR_ITEM_KEY;
+       key.offset = btrfs_name_hash(name, name_len);
+ 
+-      ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+-
+-      /* return back any errors */
+-      if (ret < 0)
+-              goto out;
++      di = btrfs_lookup_match_dir(NULL, root, path, &key, name, name_len, 0);
++      if (IS_ERR(di)) {
++              ret = PTR_ERR(di);
++              /* Nothing found, we're safe */
++              if (ret == -ENOENT) {
++                      ret = 0;
++                      goto out;
++              }
+ 
+-      /* nothing found, we're safe */
+-      if (ret > 0) {
+-              ret = 0;
+-              goto out;
++              if (ret < 0)
++                      goto out;
+       }
+ 
+       /* we found an item, look for our name in the item */
+-      di = btrfs_match_dir_item_name(root->fs_info, path, name, name_len);
+       if (di) {
+               /* our exact name was found */
+               ret = -EEXIST;
+@@ -275,21 +288,13 @@ btrfs_lookup_dir_index_item(struct btrfs_trans_handle *trans,
+                           u64 objectid, const char *name, int name_len,
+                           int mod)
+ {
+-      int ret;
+       struct btrfs_key key;
+-      int ins_len = mod < 0 ? -1 : 0;
+-      int cow = mod != 0;
+ 
+       key.objectid = dir;
+       key.type = BTRFS_DIR_INDEX_KEY;
+       key.offset = objectid;
+ 
+-      ret = btrfs_search_slot(trans, root, &key, path, ins_len, cow);
+-      if (ret < 0)
+-              return ERR_PTR(ret);
+-      if (ret > 0)
+-              return ERR_PTR(-ENOENT);
+-      return btrfs_match_dir_item_name(root->fs_info, path, name, name_len);
++      return btrfs_lookup_match_dir(trans, root, path, &key, name, name_len, mod);
+ }
+ 
+ struct btrfs_dir_item *
+@@ -346,21 +351,18 @@ struct btrfs_dir_item *btrfs_lookup_xattr(struct btrfs_trans_handle *trans,
+                                         const char *name, u16 name_len,
+                                         int mod)
+ {
+-      int ret;
+       struct btrfs_key key;
+-      int ins_len = mod < 0 ? -1 : 0;
+-      int cow = mod != 0;
++      struct btrfs_dir_item *di;
+ 
+       key.objectid = dir;
+       key.type = BTRFS_XATTR_ITEM_KEY;
+       key.offset = btrfs_name_hash(name, name_len);
+-      ret = btrfs_search_slot(trans, root, &key, path, ins_len, cow);
+-      if (ret < 0)
+-              return ERR_PTR(ret);
+-      if (ret > 0)
++
++      di = btrfs_lookup_match_dir(trans, root, path, &key, name, name_len, mod);
++      if (IS_ERR(di) && PTR_ERR(di) == -ENOENT)
+               return NULL;
+ 
+-      return btrfs_match_dir_item_name(root->fs_info, path, name, name_len);
++      return di;
+ }
+ 
+ /*
+-- 
+2.35.1
+
diff --git a/queue-5.4/btrfs-tree-checker-check-for-overlapping-extent-item.patch b/queue-5.4/btrfs-tree-checker-check-for-overlapping-extent-item.patch

new file mode 100644 (file)

index 0000000..1e729e8
--- /dev/null
+++ b/queue-5.4/btrfs-tree-checker-check-for-overlapping-extent-item.patch
@@ -0,0 +1,77 @@
+From 7f78204081a27c0f91973a4b4aeedec23782c850 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 3 Aug 2022 14:28:47 -0400
+Subject: btrfs: tree-checker: check for overlapping extent items
+
+From: Josef Bacik <josef@toxicpanda.com>
+
+[ Upstream commit 899b7f69f244e539ea5df1b4d756046337de44a5 ]
+
+We're seeing a weird problem in production where we have overlapping
+extent items in the extent tree.  It's unclear where these are coming
+from, and in debugging we realized there's no check in the tree checker
+for this sort of problem.  Add a check to the tree-checker to make sure
+that the extents do not overlap each other.
+
+Reviewed-by: Qu Wenruo <wqu@suse.com>
+Signed-off-by: Josef Bacik <josef@toxicpanda.com>
+Reviewed-by: David Sterba <dsterba@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ fs/btrfs/tree-checker.c | 25 +++++++++++++++++++++++--
+ 1 file changed, 23 insertions(+), 2 deletions(-)
+
+diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
+index 368c43c6cbd08..d15de5abb562d 100644
+--- a/fs/btrfs/tree-checker.c
++++ b/fs/btrfs/tree-checker.c
+@@ -1019,7 +1019,8 @@ static void extent_err(const struct extent_buffer *eb, int slot,
+ }
+ 
+ static int check_extent_item(struct extent_buffer *leaf,
+-                           struct btrfs_key *key, int slot)
++                           struct btrfs_key *key, int slot,
++                           struct btrfs_key *prev_key)
+ {
+       struct btrfs_fs_info *fs_info = leaf->fs_info;
+       struct btrfs_extent_item *ei;
+@@ -1230,6 +1231,26 @@ static int check_extent_item(struct extent_buffer *leaf,
+                          total_refs, inline_refs);
+               return -EUCLEAN;
+       }
++
++      if ((prev_key->type == BTRFS_EXTENT_ITEM_KEY) ||
++          (prev_key->type == BTRFS_METADATA_ITEM_KEY)) {
++              u64 prev_end = prev_key->objectid;
++
++              if (prev_key->type == BTRFS_METADATA_ITEM_KEY)
++                      prev_end += fs_info->nodesize;
++              else
++                      prev_end += prev_key->offset;
++
++              if (unlikely(prev_end > key->objectid)) {
++                      extent_err(leaf, slot,
++      "previous extent [%llu %u %llu] overlaps current extent [%llu %u %llu]",
++                                 prev_key->objectid, prev_key->type,
++                                 prev_key->offset, key->objectid, key->type,
++                                 key->offset);
++                      return -EUCLEAN;
++              }
++      }
++
+       return 0;
+ }
+ 
+@@ -1343,7 +1364,7 @@ static int check_leaf_item(struct extent_buffer *leaf,
+               break;
+       case BTRFS_EXTENT_ITEM_KEY:
+       case BTRFS_METADATA_ITEM_KEY:
+-              ret = check_extent_item(leaf, key, slot);
++              ret = check_extent_item(leaf, key, slot, prev_key);
+               break;
+       case BTRFS_TREE_BLOCK_REF_KEY:
+       case BTRFS_SHARED_DATA_REF_KEY:
+-- 
+2.35.1
+
diff --git a/queue-5.4/btrfs-unify-lookup-return-value-when-dir-entry-is-mi.patch b/queue-5.4/btrfs-unify-lookup-return-value-when-dir-entry-is-mi.patch

new file mode 100644 (file)

index 0000000..3935bb1
--- /dev/null
+++ b/queue-5.4/btrfs-unify-lookup-return-value-when-dir-entry-is-mi.patch
@@ -0,0 +1,187 @@
+From 3cc2f7864ff2e3a9e6bea48b77f043d9e9ecdc2e Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 1 Oct 2021 13:52:33 +0100
+Subject: btrfs: unify lookup return value when dir entry is missing
+
+From: Filipe Manana <fdmanana@suse.com>
+
+[ Upstream commit 8dcbc26194eb872cc3430550fb70bb461424d267 ]
+
+btrfs_lookup_dir_index_item() and btrfs_lookup_dir_item() lookup for dir
+entries and both are used during log replay or when updating a log tree
+during an unlink.
+
+However when the dir item does not exists, btrfs_lookup_dir_item() returns
+NULL while btrfs_lookup_dir_index_item() returns PTR_ERR(-ENOENT), and if
+the dir item exists but there is no matching entry for a given name or
+index, both return NULL. This makes the call sites during log replay to
+be more verbose than necessary and it makes it easy to miss this slight
+difference. Since we don't need to distinguish between those two cases,
+make btrfs_lookup_dir_index_item() always return NULL when there is no
+matching directory entry - either because there isn't any dir entry or
+because there is one but it does not match the given name and index.
+
+Also rename the argument 'objectid' of btrfs_lookup_dir_index_item() to
+'index' since it is supposed to match an index number, and the name
+'objectid' is not very good because it can easily be confused with an
+inode number (like the inode number a dir entry points to).
+
+CC: stable@vger.kernel.org # 4.14+
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Reviewed-by: David Sterba <dsterba@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ fs/btrfs/ctree.h    |  2 +-
+ fs/btrfs/dir-item.c | 48 ++++++++++++++++++++++++++++++++++-----------
+ fs/btrfs/tree-log.c | 14 ++++---------
+ 3 files changed, 42 insertions(+), 22 deletions(-)
+
+diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
+index cd77c0621a555..c2e5fe972f566 100644
+--- a/fs/btrfs/ctree.h
++++ b/fs/btrfs/ctree.h
+@@ -2727,7 +2727,7 @@ struct btrfs_dir_item *
+ btrfs_lookup_dir_index_item(struct btrfs_trans_handle *trans,
+                           struct btrfs_root *root,
+                           struct btrfs_path *path, u64 dir,
+-                          u64 objectid, const char *name, int name_len,
++                          u64 index, const char *name, int name_len,
+                           int mod);
+ struct btrfs_dir_item *
+ btrfs_search_dir_index_item(struct btrfs_root *root,
+diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c
+index 1c0a7cd6b9b0a..98c6faa8ce15b 100644
+--- a/fs/btrfs/dir-item.c
++++ b/fs/btrfs/dir-item.c
+@@ -191,9 +191,20 @@ static struct btrfs_dir_item *btrfs_lookup_match_dir(
+ }
+ 
+ /*
+- * lookup a directory item based on name.  'dir' is the objectid
+- * we're searching in, and 'mod' tells us if you plan on deleting the
+- * item (use mod < 0) or changing the options (use mod > 0)
++ * Lookup for a directory item by name.
++ *
++ * @trans:    The transaction handle to use. Can be NULL if @mod is 0.
++ * @root:     The root of the target tree.
++ * @path:     Path to use for the search.
++ * @dir:      The inode number (objectid) of the directory.
++ * @name:     The name associated to the directory entry we are looking for.
++ * @name_len: The length of the name.
++ * @mod:      Used to indicate if the tree search is meant for a read only
++ *            lookup, for a modification lookup or for a deletion lookup, so
++ *            its value should be 0, 1 or -1, respectively.
++ *
++ * Returns: NULL if the dir item does not exists, an error pointer if an error
++ * happened, or a pointer to a dir item if a dir item exists for the given name.
+  */
+ struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans,
+                                            struct btrfs_root *root,
+@@ -274,27 +285,42 @@ int btrfs_check_dir_item_collision(struct btrfs_root *root, u64 dir,
+ }
+ 
+ /*
+- * lookup a directory item based on index.  'dir' is the objectid
+- * we're searching in, and 'mod' tells us if you plan on deleting the
+- * item (use mod < 0) or changing the options (use mod > 0)
++ * Lookup for a directory index item by name and index number.
+  *
+- * The name is used to make sure the index really points to the name you were
+- * looking for.
++ * @trans:    The transaction handle to use. Can be NULL if @mod is 0.
++ * @root:     The root of the target tree.
++ * @path:     Path to use for the search.
++ * @dir:      The inode number (objectid) of the directory.
++ * @index:    The index number.
++ * @name:     The name associated to the directory entry we are looking for.
++ * @name_len: The length of the name.
++ * @mod:      Used to indicate if the tree search is meant for a read only
++ *            lookup, for a modification lookup or for a deletion lookup, so
++ *            its value should be 0, 1 or -1, respectively.
++ *
++ * Returns: NULL if the dir index item does not exists, an error pointer if an
++ * error happened, or a pointer to a dir item if the dir index item exists and
++ * matches the criteria (name and index number).
+  */
+ struct btrfs_dir_item *
+ btrfs_lookup_dir_index_item(struct btrfs_trans_handle *trans,
+                           struct btrfs_root *root,
+                           struct btrfs_path *path, u64 dir,
+-                          u64 objectid, const char *name, int name_len,
++                          u64 index, const char *name, int name_len,
+                           int mod)
+ {
++      struct btrfs_dir_item *di;
+       struct btrfs_key key;
+ 
+       key.objectid = dir;
+       key.type = BTRFS_DIR_INDEX_KEY;
+-      key.offset = objectid;
++      key.offset = index;
+ 
+-      return btrfs_lookup_match_dir(trans, root, path, &key, name, name_len, mod);
++      di = btrfs_lookup_match_dir(trans, root, path, &key, name, name_len, mod);
++      if (di == ERR_PTR(-ENOENT))
++              return NULL;
++
++      return di;
+ }
+ 
+ struct btrfs_dir_item *
+diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
+index bebd74267bed6..926b1d34e55cc 100644
+--- a/fs/btrfs/tree-log.c
++++ b/fs/btrfs/tree-log.c
+@@ -918,8 +918,7 @@ static noinline int inode_in_dir(struct btrfs_root *root,
+       di = btrfs_lookup_dir_index_item(NULL, root, path, dirid,
+                                        index, name, name_len, 0);
+       if (IS_ERR(di)) {
+-              if (PTR_ERR(di) != -ENOENT)
+-                      ret = PTR_ERR(di);
++              ret = PTR_ERR(di);
+               goto out;
+       } else if (di) {
+               btrfs_dir_item_key_to_cpu(path->nodes[0], di, &location);
+@@ -1171,8 +1170,7 @@ static inline int __add_inode_ref(struct btrfs_trans_handle *trans,
+       di = btrfs_lookup_dir_index_item(trans, root, path, btrfs_ino(dir),
+                                        ref_index, name, namelen, 0);
+       if (IS_ERR(di)) {
+-              if (PTR_ERR(di) != -ENOENT)
+-                      return PTR_ERR(di);
++              return PTR_ERR(di);
+       } else if (di) {
+               ret = drop_one_dir_item(trans, root, path, dir, di);
+               if (ret)
+@@ -2022,9 +2020,6 @@ static noinline int replay_one_name(struct btrfs_trans_handle *trans,
+               goto out;
+       }
+ 
+-      if (dst_di == ERR_PTR(-ENOENT))
+-              dst_di = NULL;
+-
+       if (IS_ERR(dst_di)) {
+               ret = PTR_ERR(dst_di);
+               goto out;
+@@ -2309,7 +2304,7 @@ static noinline int check_item_in_log(struct btrfs_trans_handle *trans,
+                                                    dir_key->offset,
+                                                    name, name_len, 0);
+               }
+-              if (!log_di || log_di == ERR_PTR(-ENOENT)) {
++              if (!log_di) {
+                       btrfs_dir_item_key_to_cpu(eb, di, &location);
+                       btrfs_release_path(path);
+                       btrfs_release_path(log_path);
+@@ -3522,8 +3517,7 @@ int btrfs_del_dir_entries_in_log(struct btrfs_trans_handle *trans,
+       if (err == -ENOSPC) {
+               btrfs_set_log_full_commit(trans);
+               err = 0;
+-      } else if (err < 0 && err != -ENOENT) {
+-              /* ENOENT can be returned if the entry hasn't been fsynced yet */
++      } else if (err < 0) {
+               btrfs_abort_transaction(trans, err);
+       }
+ 
+-- 
+2.35.1
+
diff --git a/queue-5.4/drm-amd-display-avoid-mpc-infinite-loop.patch b/queue-5.4/drm-amd-display-avoid-mpc-infinite-loop.patch

new file mode 100644 (file)

index 0000000..ac74696
--- /dev/null
+++ b/queue-5.4/drm-amd-display-avoid-mpc-infinite-loop.patch
@@ -0,0 +1,66 @@
+From d4ba5a491eaff693e3e90fc7ce71f5e59e87dcc1 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 21 Jul 2022 15:33:00 -0400
+Subject: drm/amd/display: Avoid MPC infinite loop
+
+From: Josip Pavic <Josip.Pavic@amd.com>
+
+[ Upstream commit 8de297dc046c180651c0500f8611663ae1c3828a ]
+
+[why]
+In some cases MPC tree bottom pipe ends up point to itself.  This causes
+iterating from top to bottom to hang the system in an infinite loop.
+
+[how]
+When looping to next MPC bottom pipe, check that the pointer is not same
+as current to avoid infinite loop.
+
+Reviewed-by: Josip Pavic <Josip.Pavic@amd.com>
+Reviewed-by: Jun Lei <Jun.Lei@amd.com>
+Acked-by: Alex Hung <alex.hung@amd.com>
+Signed-off-by: Aric Cyr <aric.cyr@amd.com>
+Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c | 6 ++++++
+ drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c | 6 ++++++
+ 2 files changed, 12 insertions(+)
+
+diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c
+index 8b2f29f6dabd2..068e79fa3490d 100644
+--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c
++++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c
+@@ -118,6 +118,12 @@ struct mpcc *mpc1_get_mpcc_for_dpp(struct mpc_tree *tree, int dpp_id)
+       while (tmp_mpcc != NULL) {
+               if (tmp_mpcc->dpp_id == dpp_id)
+                       return tmp_mpcc;
++
++              /* avoid circular linked list */
++              ASSERT(tmp_mpcc != tmp_mpcc->mpcc_bot);
++              if (tmp_mpcc == tmp_mpcc->mpcc_bot)
++                      break;
++
+               tmp_mpcc = tmp_mpcc->mpcc_bot;
+       }
+       return NULL;
+diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
+index 5a188b2bc033c..0a00bd8e00abc 100644
+--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
++++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_mpc.c
+@@ -488,6 +488,12 @@ struct mpcc *mpc2_get_mpcc_for_dpp(struct mpc_tree *tree, int dpp_id)
+       while (tmp_mpcc != NULL) {
+               if (tmp_mpcc->dpp_id == 0xf || tmp_mpcc->dpp_id == dpp_id)
+                       return tmp_mpcc;
++
++              /* avoid circular linked list */
++              ASSERT(tmp_mpcc != tmp_mpcc->mpcc_bot);
++              if (tmp_mpcc == tmp_mpcc->mpcc_bot)
++                      break;
++
+               tmp_mpcc = tmp_mpcc->mpcc_bot;
+       }
+       return NULL;
+-- 
+2.35.1
+
diff --git a/queue-5.4/drm-amd-display-clear-optc-underflow-before-turn-off.patch b/queue-5.4/drm-amd-display-clear-optc-underflow-before-turn-off.patch

new file mode 100644 (file)

index 0000000..980781b
--- /dev/null
+++ b/queue-5.4/drm-amd-display-clear-optc-underflow-before-turn-off.patch
@@ -0,0 +1,45 @@
+From 9c601b73f56f994330b4ec91edf4fa955adfd5c2 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 27 Jul 2022 12:01:29 +0800
+Subject: drm/amd/display: clear optc underflow before turn off odm clock
+
+From: Fudong Wang <Fudong.Wang@amd.com>
+
+[ Upstream commit b2a93490201300a749ad261b5c5d05cb50179c44 ]
+
+[Why]
+After ODM clock off, optc underflow bit will be kept there always and clear not work.
+We need to clear that before clock off.
+
+[How]
+Clear that if have when clock off.
+
+Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
+Acked-by: Tom Chung <chiahsuan.chung@amd.com>
+Signed-off-by: Fudong Wang <Fudong.Wang@amd.com>
+Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c | 5 +++++
+ 1 file changed, 5 insertions(+)
+
+diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c
+index e74a07d03fde9..4b0200e96eb77 100644
+--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c
++++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c
+@@ -425,6 +425,11 @@ void optc1_enable_optc_clock(struct timing_generator *optc, bool enable)
+                               OTG_CLOCK_ON, 1,
+                               1, 1000);
+       } else  {
++
++              //last chance to clear underflow, otherwise, it will always there due to clock is off.
++              if (optc->funcs->is_optc_underflow_occurred(optc) == true)
++                      optc->funcs->clear_optc_underflow(optc);
++
+               REG_UPDATE_2(OTG_CLOCK_CONTROL,
+                               OTG_CLOCK_GATE_DIS, 0,
+                               OTG_CLOCK_EN, 0);
+-- 
+2.35.1
+
diff --git a/queue-5.4/drm-amd-display-fix-pixel-clock-programming.patch b/queue-5.4/drm-amd-display-fix-pixel-clock-programming.patch

new file mode 100644 (file)

index 0000000..6aac3e8
--- /dev/null
+++ b/queue-5.4/drm-amd-display-fix-pixel-clock-programming.patch
@@ -0,0 +1,50 @@
+From 262513aa5386b304dedb17c265197c770e9b9b32 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 26 Jul 2022 16:19:38 -0400
+Subject: drm/amd/display: Fix pixel clock programming
+
+From: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
+
+[ Upstream commit 04fb918bf421b299feaee1006e82921d7d381f18 ]
+
+[Why]
+Some pixel clock values could cause HDMI TMDS SSCPs to be misaligned
+between different HDMI lanes when using YCbCr420 10-bit pixel format.
+
+BIOS functions for transmitter/encoder control take pixel clock in kHz
+increments, whereas the function for setting the pixel clock is in 100Hz
+increments. Setting pixel clock to a value that is not on a kHz boundary
+will cause the issue.
+
+[How]
+Round pixel clock down to nearest kHz in 10/12-bpc cases.
+
+Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
+Acked-by: Brian Chang <Brian.Chang@amd.com>
+Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
+Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c b/drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c
+index eca67d5d5b10d..721be82ccebec 100644
+--- a/drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c
++++ b/drivers/gpu/drm/amd/display/dc/dce/dce_clock_source.c
+@@ -546,9 +546,11 @@ static void dce112_get_pix_clk_dividers_helper (
+               switch (pix_clk_params->color_depth) {
+               case COLOR_DEPTH_101010:
+                       actual_pixel_clock_100hz = (actual_pixel_clock_100hz * 5) >> 2;
++                      actual_pixel_clock_100hz -= actual_pixel_clock_100hz % 10;
+                       break;
+               case COLOR_DEPTH_121212:
+                       actual_pixel_clock_100hz = (actual_pixel_clock_100hz * 6) >> 2;
++                      actual_pixel_clock_100hz -= actual_pixel_clock_100hz % 10;
+                       break;
+               case COLOR_DEPTH_161616:
+                       actual_pixel_clock_100hz = actual_pixel_clock_100hz * 2;
+-- 
+2.35.1
+
diff --git a/queue-5.4/lib-vdso-let-do_coarse-return-0-to-simplify-the-call.patch b/queue-5.4/lib-vdso-let-do_coarse-return-0-to-simplify-the-call.patch

new file mode 100644 (file)

index 0000000..8d2ca4e
--- /dev/null
+++ b/queue-5.4/lib-vdso-let-do_coarse-return-0-to-simplify-the-call.patch
@@ -0,0 +1,67 @@
+From 21f3e1be7748c79b97f9be1468bd5fb83f9cd44e Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Mon, 23 Dec 2019 14:31:07 +0000
+Subject: lib/vdso: Let do_coarse() return 0 to simplify the callsite
+
+From: Christophe Leroy <christophe.leroy@c-s.fr>
+
+[ Upstream commit 8463cf80529d0fd80b84cd5ab8b9b952b01c7eb9 ]
+
+do_coarse() is similar to do_hres() except that it never fails.
+
+Change its type to int instead of void and let it always return success (0)
+to simplify the call site.
+
+Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Link: https://lore.kernel.org/r/21e8afa38c02ca8672c2690307383507fe63b454.1577111367.git.christophe.leroy@c-s.fr
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ lib/vdso/gettimeofday.c | 15 ++++++++-------
+ 1 file changed, 8 insertions(+), 7 deletions(-)
+
+diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
+index 45f57fd2db649..c549e72758aa0 100644
+--- a/lib/vdso/gettimeofday.c
++++ b/lib/vdso/gettimeofday.c
+@@ -68,7 +68,7 @@ static int do_hres(const struct vdso_data *vd, clockid_t clk,
+       return 0;
+ }
+ 
+-static void do_coarse(const struct vdso_data *vd, clockid_t clk,
++static int do_coarse(const struct vdso_data *vd, clockid_t clk,
+                     struct __kernel_timespec *ts)
+ {
+       const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
+@@ -79,6 +79,8 @@ static void do_coarse(const struct vdso_data *vd, clockid_t clk,
+               ts->tv_sec = vdso_ts->sec;
+               ts->tv_nsec = vdso_ts->nsec;
+       } while (unlikely(vdso_read_retry(vd, seq)));
++
++      return 0;
+ }
+ 
+ static __maybe_unused int
+@@ -96,14 +98,13 @@ __cvdso_clock_gettime_common(clockid_t clock, struct __kernel_timespec *ts)
+        * clocks are handled in the VDSO directly.
+        */
+       msk = 1U << clock;
+-      if (likely(msk & VDSO_HRES)) {
++      if (likely(msk & VDSO_HRES))
+               return do_hres(&vd[CS_HRES_COARSE], clock, ts);
+-      } else if (msk & VDSO_COARSE) {
+-              do_coarse(&vd[CS_HRES_COARSE], clock, ts);
+-              return 0;
+-      } else if (msk & VDSO_RAW) {
++      else if (msk & VDSO_COARSE)
++              return do_coarse(&vd[CS_HRES_COARSE], clock, ts);
++      else if (msk & VDSO_RAW)
+               return do_hres(&vd[CS_RAW], clock, ts);
+-      }
++
+       return -1;
+ }
+ 
+-- 
+2.35.1
+
diff --git a/queue-5.4/lib-vdso-mark-do_hres-and-do_coarse-as-__always_inli.patch b/queue-5.4/lib-vdso-mark-do_hres-and-do_coarse-as-__always_inli.patch

new file mode 100644 (file)

index 0000000..c8b4baf
--- /dev/null
+++ b/queue-5.4/lib-vdso-mark-do_hres-and-do_coarse-as-__always_inli.patch
@@ -0,0 +1,85 @@
+From 4d361f500d538ab7c4b3f23c6d2dae9cca81e1a7 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 12 Nov 2019 01:26:51 +0000
+Subject: lib/vdso: Mark do_hres() and do_coarse() as __always_inline
+
+From: Andrei Vagin <avagin@gmail.com>
+
+[ Upstream commit c966533f8c6c45f93c52599f8460e7695f0b7eaa ]
+
+Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
+(more clock_gettime() cycles - the better):
+
+clock            | before     | after      | diff
+----------------------------------------------------------
+monotonic        |  153222105 |  166775025 | 8.8%
+monotonic-coarse |  671557054 |  691513017 | 3.0%
+monotonic-raw    |  147116067 |  161057395 | 9.5%
+boottime         |  153446224 |  166962668 | 9.1%
+
+The improvement for arm64 for monotonic and boottime is around 3.5%.
+
+clock            | before     | after      | diff
+==================================================
+monotonic          17326692     17951770     3.6%
+monotonic-coarse   43624027     44215292     1.3%
+monotonic-raw      17541809     17554932     0.1%
+boottime           17334982     17954361     3.5%
+
+[ tglx: Avoid the goto ]
+
+Signed-off-by: Andrei Vagin <avagin@gmail.com>
+Signed-off-by: Dmitry Safonov <dima@arista.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Link: https://lore.kernel.org/r/20191112012724.250792-3-dima@arista.com
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ lib/vdso/gettimeofday.c | 14 ++++++++------
+ 1 file changed, 8 insertions(+), 6 deletions(-)
+
+diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
+index c549e72758aa0..5667fb746a1fe 100644
+--- a/lib/vdso/gettimeofday.c
++++ b/lib/vdso/gettimeofday.c
+@@ -38,7 +38,7 @@ u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
+ }
+ #endif
+ 
+-static int do_hres(const struct vdso_data *vd, clockid_t clk,
++static __always_inline int do_hres(const struct vdso_data *vd, clockid_t clk,
+                  struct __kernel_timespec *ts)
+ {
+       const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
+@@ -68,8 +68,8 @@ static int do_hres(const struct vdso_data *vd, clockid_t clk,
+       return 0;
+ }
+ 
+-static int do_coarse(const struct vdso_data *vd, clockid_t clk,
+-                    struct __kernel_timespec *ts)
++static __always_inline int do_coarse(const struct vdso_data *vd, clockid_t clk,
++                                   struct __kernel_timespec *ts)
+ {
+       const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
+       u32 seq;
+@@ -99,13 +99,15 @@ __cvdso_clock_gettime_common(clockid_t clock, struct __kernel_timespec *ts)
+        */
+       msk = 1U << clock;
+       if (likely(msk & VDSO_HRES))
+-              return do_hres(&vd[CS_HRES_COARSE], clock, ts);
++              vd = &vd[CS_HRES_COARSE];
+       else if (msk & VDSO_COARSE)
+               return do_coarse(&vd[CS_HRES_COARSE], clock, ts);
+       else if (msk & VDSO_RAW)
+-              return do_hres(&vd[CS_RAW], clock, ts);
++              vd = &vd[CS_RAW];
++      else
++              return -1;
+ 
+-      return -1;
++      return do_hres(vd, clock, ts);
+ }
+ 
+ static __maybe_unused int
+-- 
+2.35.1
+
diff --git a/queue-5.4/neigh-fix-possible-dos-due-to-net-iface-start-stop-l.patch b/queue-5.4/neigh-fix-possible-dos-due-to-net-iface-start-stop-l.patch

new file mode 100644 (file)

index 0000000..509ee1c
--- /dev/null
+++ b/queue-5.4/neigh-fix-possible-dos-due-to-net-iface-start-stop-l.patch
@@ -0,0 +1,129 @@
+From 25142e57fcab4996c4f185ecd707eedf4b50725b Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 11 Aug 2022 18:20:11 +0300
+Subject: neigh: fix possible DoS due to net iface start/stop loop
+
+From: Denis V. Lunev <den@openvz.org>
+
+[ Upstream commit 66ba215cb51323e4e55e38fd5f250e0fae0cbc94 ]
+
+Normal processing of ARP request (usually this is Ethernet broadcast
+packet) coming to the host is looking like the following:
+* the packet comes to arp_process() call and is passed through routing
+  procedure
+* the request is put into the queue using pneigh_enqueue() if
+  corresponding ARP record is not local (common case for container
+  records on the host)
+* the request is processed by timer (within 80 jiffies by default) and
+  ARP reply is sent from the same arp_process() using
+  NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside
+  pneigh_enqueue())
+
+And here the problem comes. Linux kernel calls pneigh_queue_purge()
+which destroys the whole queue of ARP requests on ANY network interface
+start/stop event through __neigh_ifdown().
+
+This is actually not a problem within the original world as network
+interface start/stop was accessible to the host 'root' only, which
+could do more destructive things. But the world is changed and there
+are Linux containers available. Here container 'root' has an access
+to this API and could be considered as untrusted user in the hosting
+(container's) world.
+
+Thus there is an attack vector to other containers on node when
+container's root will endlessly start/stop interfaces. We have observed
+similar situation on a real production node when docker container was
+doing such activity and thus other containers on the node become not
+accessible.
+
+The patch proposed doing very simple thing. It drops only packets from
+the same namespace in the pneigh_queue_purge() where network interface
+state change is detected. This is enough to prevent the problem for the
+whole node preserving original semantics of the code.
+
+v2:
+       - do del_timer_sync() if queue is empty after pneigh_queue_purge()
+v3:
+       - rebase to net tree
+
+Cc: "David S. Miller" <davem@davemloft.net>
+Cc: Eric Dumazet <edumazet@google.com>
+Cc: Jakub Kicinski <kuba@kernel.org>
+Cc: Paolo Abeni <pabeni@redhat.com>
+Cc: Daniel Borkmann <daniel@iogearbox.net>
+Cc: David Ahern <dsahern@kernel.org>
+Cc: Yajun Deng <yajun.deng@linux.dev>
+Cc: Roopa Prabhu <roopa@nvidia.com>
+Cc: Christian Brauner <brauner@kernel.org>
+Cc: netdev@vger.kernel.org
+Cc: linux-kernel@vger.kernel.org
+Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
+Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
+Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
+Cc: kernel@openvz.org
+Cc: devel@openvz.org
+Investigated-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
+Signed-off-by: Denis V. Lunev <den@openvz.org>
+Signed-off-by: David S. Miller <davem@davemloft.net>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/core/neighbour.c | 25 +++++++++++++++++--------
+ 1 file changed, 17 insertions(+), 8 deletions(-)
+
+diff --git a/net/core/neighbour.c b/net/core/neighbour.c
+index 8b6140e67e7f8..6056b8e545658 100644
+--- a/net/core/neighbour.c
++++ b/net/core/neighbour.c
+@@ -280,14 +280,23 @@ static int neigh_del_timer(struct neighbour *n)
+       return 0;
+ }
+ 
+-static void pneigh_queue_purge(struct sk_buff_head *list)
++static void pneigh_queue_purge(struct sk_buff_head *list, struct net *net)
+ {
++      unsigned long flags;
+       struct sk_buff *skb;
+ 
+-      while ((skb = skb_dequeue(list)) != NULL) {
+-              dev_put(skb->dev);
+-              kfree_skb(skb);
++      spin_lock_irqsave(&list->lock, flags);
++      skb = skb_peek(list);
++      while (skb != NULL) {
++              struct sk_buff *skb_next = skb_peek_next(skb, list);
++              if (net == NULL || net_eq(dev_net(skb->dev), net)) {
++                      __skb_unlink(skb, list);
++                      dev_put(skb->dev);
++                      kfree_skb(skb);
++              }
++              skb = skb_next;
+       }
++      spin_unlock_irqrestore(&list->lock, flags);
+ }
+ 
+ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev,
+@@ -358,9 +367,9 @@ static int __neigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
+       write_lock_bh(&tbl->lock);
+       neigh_flush_dev(tbl, dev, skip_perm);
+       pneigh_ifdown_and_unlock(tbl, dev);
+-
+-      del_timer_sync(&tbl->proxy_timer);
+-      pneigh_queue_purge(&tbl->proxy_queue);
++      pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev));
++      if (skb_queue_empty_lockless(&tbl->proxy_queue))
++              del_timer_sync(&tbl->proxy_timer);
+       return 0;
+ }
+ 
+@@ -1741,7 +1750,7 @@ int neigh_table_clear(int index, struct neigh_table *tbl)
+       /* It is not clean... Fix it to unload IPv6 module safely */
+       cancel_delayed_work_sync(&tbl->gc_work);
+       del_timer_sync(&tbl->proxy_timer);
+-      pneigh_queue_purge(&tbl->proxy_queue);
++      pneigh_queue_purge(&tbl->proxy_queue, NULL);
+       neigh_ifdown(tbl, NULL);
+       if (atomic_read(&tbl->entries))
+               pr_crit("neighbour leakage\n");
+-- 
+2.35.1
+
diff --git a/queue-5.4/netfilter-conntrack-nf_conntrack_procfs-should-no-lo.patch b/queue-5.4/netfilter-conntrack-nf_conntrack_procfs-should-no-lo.patch

new file mode 100644 (file)

index 0000000..7c71671
--- /dev/null
+++ b/queue-5.4/netfilter-conntrack-nf_conntrack_procfs-should-no-lo.patch
@@ -0,0 +1,36 @@
+From c496b7bdc4dab9bd86d3d7dd67a82bbc4b9c9eb1 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Mon, 15 Aug 2022 12:39:20 +0200
+Subject: netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to
+ y
+
+From: Geert Uytterhoeven <geert@linux-m68k.org>
+
+[ Upstream commit aa5762c34213aba7a72dc58e70601370805fa794 ]
+
+NF_CONNTRACK_PROCFS was marked obsolete in commit 54b07dca68557b09
+("netfilter: provide config option to disable ancient procfs parts") in
+v3.3.
+
+Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
+Signed-off-by: Florian Westphal <fw@strlen.de>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/netfilter/Kconfig | 1 -
+ 1 file changed, 1 deletion(-)
+
+diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
+index ef72819d9d315..d569915da003c 100644
+--- a/net/netfilter/Kconfig
++++ b/net/netfilter/Kconfig
+@@ -118,7 +118,6 @@ config NF_CONNTRACK_ZONES
+ 
+ config NF_CONNTRACK_PROCFS
+       bool "Supply CT list in procfs (OBSOLETE)"
+-      default y
+       depends on PROC_FS
+       ---help---
+       This option enables for the list of known conntrack entries
+-- 
+2.35.1
+
diff --git a/queue-5.4/s390-hypfs-avoid-error-message-under-kvm.patch b/queue-5.4/s390-hypfs-avoid-error-message-under-kvm.patch

new file mode 100644 (file)

index 0000000..26f5217
--- /dev/null
+++ b/queue-5.4/s390-hypfs-avoid-error-message-under-kvm.patch
@@ -0,0 +1,60 @@
+From 8d6992c21e5e6bb40181fe00c1a6070183a350d3 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Mon, 20 Jun 2022 11:45:34 +0200
+Subject: s390/hypfs: avoid error message under KVM
+
+From: Juergen Gross <jgross@suse.com>
+
+[ Upstream commit 7b6670b03641ac308aaa6fa2e6f964ac993b5ea3 ]
+
+When booting under KVM the following error messages are issued:
+
+hypfs.7f5705: The hardware system does not support hypfs
+hypfs.7a79f0: Initialization of hypfs failed with rc=-61
+
+Demote the severity of first message from "error" to "info" and issue
+the second message only in other error cases.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Heiko Carstens <hca@linux.ibm.com>
+Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
+Link: https://lore.kernel.org/r/20220620094534.18967-1-jgross@suse.com
+[arch/s390/hypfs/hypfs_diag.c changed description]
+Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ arch/s390/hypfs/hypfs_diag.c | 2 +-
+ arch/s390/hypfs/inode.c      | 2 +-
+ 2 files changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/arch/s390/hypfs/hypfs_diag.c b/arch/s390/hypfs/hypfs_diag.c
+index f0bc4dc3e9bf0..6511d15ace45e 100644
+--- a/arch/s390/hypfs/hypfs_diag.c
++++ b/arch/s390/hypfs/hypfs_diag.c
+@@ -437,7 +437,7 @@ __init int hypfs_diag_init(void)
+       int rc;
+ 
+       if (diag204_probe()) {
+-              pr_err("The hardware system does not support hypfs\n");
++              pr_info("The hardware system does not support hypfs\n");
+               return -ENODATA;
+       }
+ 
+diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c
+index 70139d0791b61..ca4fc66a361fb 100644
+--- a/arch/s390/hypfs/inode.c
++++ b/arch/s390/hypfs/inode.c
+@@ -501,9 +501,9 @@ static int __init hypfs_init(void)
+       hypfs_vm_exit();
+ fail_hypfs_diag_exit:
+       hypfs_diag_exit();
++      pr_err("Initialization of hypfs failed with rc=%i\n", rc);
+ fail_dbfs_exit:
+       hypfs_dbfs_exit();
+-      pr_err("Initialization of hypfs failed with rc=%i\n", rc);
+       return rc;
+ }
+ device_initcall(hypfs_init)
+-- 
+2.35.1
+
diff --git a/queue-5.4/series b/queue-5.4/series

index e571311c1314c08ac429e1e76e6321153f08d5f1..68e3e503da32b476b1ba304e3ec126f427d70744 100644 (file)
--- a/queue-5.4/series
+++ b/queue-5.4/series
@@ -59,3 +59,15 @@ fbdev-fb_pm2fb-avoid-potential-divide-by-zero-error.patch
  ftrace-fix-null-pointer-dereference-in-is_ftrace_trampoline-when-ftrace-is-dead.patch
  bpf-don-t-redirect-packets-with-invalid-pkt_len.patch
  mm-rmap-fix-anon_vma-degree-ambiguity-leading-to-double-reuse.patch
+btrfs-introduce-btrfs_lookup_match_dir.patch
+btrfs-do-not-pin-logs-too-early-during-renames.patch
+btrfs-unify-lookup-return-value-when-dir-entry-is-mi.patch
+drm-amd-display-avoid-mpc-infinite-loop.patch
+drm-amd-display-clear-optc-underflow-before-turn-off.patch
+neigh-fix-possible-dos-due-to-net-iface-start-stop-l.patch
+s390-hypfs-avoid-error-message-under-kvm.patch
+drm-amd-display-fix-pixel-clock-programming.patch
+netfilter-conntrack-nf_conntrack_procfs-should-no-lo.patch
+btrfs-tree-checker-check-for-overlapping-extent-item.patch
+lib-vdso-let-do_coarse-return-0-to-simplify-the-call.patch
+lib-vdso-mark-do_hres-and-do_coarse-as-__always_inli.patch
author	Sasha Levin <sashal@kernel.org>
	Fri, 2 Sep 2022 04:23:51 +0000 (00:23 -0400)
committer	Sasha Levin <sashal@kernel.org>
	Fri, 2 Sep 2022 04:23:51 +0000 (00:23 -0400)
queue-5.4/btrfs-do-not-pin-logs-too-early-during-renames.patch	[new file with mode: 0644]	patch \| blob
queue-5.4/btrfs-introduce-btrfs_lookup_match_dir.patch	[new file with mode: 0644]	patch \| blob
queue-5.4/btrfs-tree-checker-check-for-overlapping-extent-item.patch	[new file with mode: 0644]	patch \| blob
queue-5.4/btrfs-unify-lookup-return-value-when-dir-entry-is-mi.patch	[new file with mode: 0644]	patch \| blob
queue-5.4/drm-amd-display-avoid-mpc-infinite-loop.patch	[new file with mode: 0644]	patch \| blob
queue-5.4/drm-amd-display-clear-optc-underflow-before-turn-off.patch	[new file with mode: 0644]	patch \| blob
queue-5.4/drm-amd-display-fix-pixel-clock-programming.patch	[new file with mode: 0644]	patch \| blob
queue-5.4/lib-vdso-let-do_coarse-return-0-to-simplify-the-call.patch	[new file with mode: 0644]	patch \| blob
queue-5.4/lib-vdso-mark-do_hres-and-do_coarse-as-__always_inli.patch	[new file with mode: 0644]	patch \| blob
queue-5.4/neigh-fix-possible-dos-due-to-net-iface-start-stop-l.patch	[new file with mode: 0644]	patch \| blob
queue-5.4/netfilter-conntrack-nf_conntrack_procfs-should-no-lo.patch	[new file with mode: 0644]	patch \| blob
queue-5.4/s390-hypfs-avoid-error-message-under-kvm.patch	[new file with mode: 0644]	patch \| blob
queue-5.4/series		patch \| blob \| blame \| history