Fixes for 6.12

author Sasha Levin <sashal@kernel.org>

Tue, 18 Feb 2025 12:30:04 +0000 (07:30 -0500)

committer Sasha Levin <sashal@kernel.org>

Tue, 18 Feb 2025 12:30:47 +0000 (07:30 -0500)
author Sasha Levin <sashal@kernel.org>
Tue, 18 Feb 2025 12:30:04 +0000 (07:30 -0500)
committer Sasha Levin <sashal@kernel.org>
Tue, 18 Feb 2025 12:30:47 +0000 (07:30 -0500)
diff --git a/queue-6.12/arp-use-rcu-protection-in-arp_xmit.patch b/queue-6.12/arp-use-rcu-protection-in-arp_xmit.patch

new file mode 100644 (file)

index 0000000..87da261
--- /dev/null
+++ b/queue-6.12/arp-use-rcu-protection-in-arp_xmit.patch
@@ -0,0 +1,45 @@
+From 016ece15c59d5f3a99685e52040b5887115f182b Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 7 Feb 2025 13:58:36 +0000
+Subject: arp: use RCU protection in arp_xmit()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit a42b69f692165ec39db42d595f4f65a4c8f42e44 ]
+
+arp_xmit() can be called without RTNL or RCU protection.
+
+Use RCU protection to avoid potential UAF.
+
+Fixes: 29a26a568038 ("netfilter: Pass struct net into the netfilter hooks")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: David Ahern <dsahern@kernel.org>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250207135841.1948589-5-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/arp.c | 4 +++-
+ 1 file changed, 3 insertions(+), 1 deletion(-)
+
+diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
+index 11c1519b36993..59ffaa89d7b05 100644
+--- a/net/ipv4/arp.c
++++ b/net/ipv4/arp.c
+@@ -659,10 +659,12 @@ static int arp_xmit_finish(struct net *net, struct sock *sk, struct sk_buff *skb
+  */
+ void arp_xmit(struct sk_buff *skb)
+ {
++      rcu_read_lock();
+       /* Send it off, maybe filter it using firewalling first.  */
+       NF_HOOK(NFPROTO_ARP, NF_ARP_OUT,
+-              dev_net(skb->dev), NULL, skb, NULL, skb->dev,
++              dev_net_rcu(skb->dev), NULL, skb, NULL, skb->dev,
+               arp_xmit_finish);
++      rcu_read_unlock();
+ }
+ EXPORT_SYMBOL(arp_xmit);
+ 
+-- 
+2.39.5
+
diff --git a/queue-6.12/btrfs-fix-stale-page-cache-after-race-between-readah.patch b/queue-6.12/btrfs-fix-stale-page-cache-after-race-between-readah.patch

new file mode 100644 (file)

index 0000000..284f293
--- /dev/null
+++ b/queue-6.12/btrfs-fix-stale-page-cache-after-race-between-readah.patch
@@ -0,0 +1,208 @@
+From 856009bc6db21a7eff4281b0dfdf4b33375298b9 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 4 Feb 2025 11:02:32 +0000
+Subject: btrfs: fix stale page cache after race between readahead and direct
+ IO write
+
+From: Filipe Manana <fdmanana@suse.com>
+
+[ Upstream commit acc18e1c1d8c0d59d793cf87790ccfcafb1bf5f0 ]
+
+After commit ac325fc2aad5 ("btrfs: do not hold the extent lock for entire
+read") we can now trigger a race between a task doing a direct IO write
+and readahead. When this race is triggered it results in tasks getting
+stale data when they attempt do a buffered read (including the task that
+did the direct IO write).
+
+This race can be sporadically triggered with test case generic/418, failing
+like this:
+
+   $ ./check generic/418
+   FSTYP         -- btrfs
+   PLATFORM      -- Linux/x86_64 debian0 6.13.0-rc7-btrfs-next-185+ #17 SMP PREEMPT_DYNAMIC Mon Feb  3 12:28:46 WET 2025
+   MKFS_OPTIONS  -- /dev/sdc
+   MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
+
+   generic/418 14s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/418.out.bad)
+       --- tests/generic/418.out       2020-06-10 19:29:03.850519863 +0100
+       +++ /home/fdmanana/git/hub/xfstests/results//generic/418.out.bad        2025-02-03 15:42:36.974609476 +0000
+       @@ -1,2 +1,5 @@
+        QA output created by 418
+       +cmpbuf: offset 0: Expected: 0x1, got 0x0
+       +[6:0] FAIL - comparison failed, offset 24576
+       +diotest -wp -b 4096 -n 8 -i 4 failed at loop 3
+        Silence is golden
+       ...
+       (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/418.out /home/fdmanana/git/hub/xfstests/results//generic/418.out.bad'  to see the entire diff)
+   Ran: generic/418
+   Failures: generic/418
+   Failed 1 of 1 tests
+
+The race happens like this:
+
+1) A file has a prealloc extent for the range [16K, 28K);
+
+2) Task A starts a direct IO write against file range [24K, 28K).
+   At the start of the direct IO write it invalidates the page cache at
+   __iomap_dio_rw() with kiocb_invalidate_pages() for the 4K page at file
+   offset 24K;
+
+3) Task A enters btrfs_dio_iomap_begin() and locks the extent range
+   [24K, 28K);
+
+4) Task B starts a readahead for file range [16K, 28K), entering
+   btrfs_readahead().
+
+   First it attempts to read the page at offset 16K by entering
+   btrfs_do_readpage(), where it calls get_extent_map(), locks the range
+   [16K, 20K) and gets the extent map for the range [16K, 28K), caching
+   it into the 'em_cached' variable declared in the local stack of
+   btrfs_readahead(), and then unlocks the range [16K, 20K).
+
+   Since the extent map has the prealloc flag, at btrfs_do_readpage() we
+   zero out the page's content and don't submit any bio to read the page
+   from the extent.
+
+   Then it attempts to read the page at offset 20K entering
+   btrfs_do_readpage() where we reuse the previously cached extent map
+   (decided by get_extent_map()) since it spans the page's range and
+   it's still in the inode's extent map tree.
+
+   Just like for the previous page, we zero out the page's content since
+   the extent map has the prealloc flag set.
+
+   Then it attempts to read the page at offset 24K entering
+   btrfs_do_readpage() where we reuse the previously cached extent map
+   (decided by get_extent_map()) since it spans the page's range and
+   it's still in the inode's extent map tree.
+
+   Just like for the previous pages, we zero out the page's content since
+   the extent map has the prealloc flag set. Note that we didn't lock the
+   extent range [24K, 28K), so we didn't synchronize with the ongoing
+   direct IO write being performed by task A;
+
+5) Task A enters btrfs_create_dio_extent() and creates an ordered extent
+   for the range [24K, 28K), with the flags BTRFS_ORDERED_DIRECT and
+   BTRFS_ORDERED_PREALLOC set;
+
+6) Task A unlocks the range [24K, 28K) at btrfs_dio_iomap_begin();
+
+7) The ordered extent enters btrfs_finish_one_ordered() and locks the
+   range [24K, 28K);
+
+8) Task A enters fs/iomap/direct-io.c:iomap_dio_complete() and it tries
+   to invalidate the page at offset 24K by calling
+   kiocb_invalidate_post_direct_write(), resulting in a call chain that
+   ends up at btrfs_release_folio().
+
+   The btrfs_release_folio() call ends up returning false because the range
+   for the page at file offset 24K is currently locked by the task doing
+   the ordered extent completion in the previous step (7), so we have:
+
+   btrfs_release_folio() ->
+      __btrfs_release_folio() ->
+         try_release_extent_mapping() ->
+            try_release_extent_state()
+
+   This last function checking that the range is locked and returning false
+   and propagating it up to btrfs_release_folio().
+
+   So this results in a failure to invalidate the page and
+   kiocb_invalidate_post_direct_write() triggers this message logged in
+   dmesg:
+
+     Page cache invalidation failure on direct I/O.  Possible data corruption due to collision with buffered I/O!
+
+   After this we leave the page cache with stale data for the file range
+   [24K, 28K), filled with zeroes instead of the data written by direct IO
+   write (all bytes with a 0x01 value), so any task attempting to read with
+   buffered IO, including the task that did the direct IO write, will get
+   all bytes in the range with a 0x00 value instead of the written data.
+
+Fix this by locking the range, with btrfs_lock_and_flush_ordered_range(),
+at the two callers of btrfs_do_readpage() instead of doing it at
+get_extent_map(), just like we did before commit ac325fc2aad5 ("btrfs: do
+not hold the extent lock for entire read"), and unlocking the range after
+all the calls to btrfs_do_readpage(). This way we never reuse a cached
+extent map without flushing any pending ordered extents from a concurrent
+direct IO write.
+
+Fixes: ac325fc2aad5 ("btrfs: do not hold the extent lock for entire read")
+Reviewed-by: Qu Wenruo <wqu@suse.com>
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ fs/btrfs/extent_io.c | 18 +++++++++++++++---
+ 1 file changed, 15 insertions(+), 3 deletions(-)
+
+diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
+index e6e6c4dc53c48..fe08c983d5bb4 100644
+--- a/fs/btrfs/extent_io.c
++++ b/fs/btrfs/extent_io.c
+@@ -906,7 +906,6 @@ static struct extent_map *get_extent_map(struct btrfs_inode *inode,
+                                        u64 len, struct extent_map **em_cached)
+ {
+       struct extent_map *em;
+-      struct extent_state *cached_state = NULL;
+ 
+       ASSERT(em_cached);
+ 
+@@ -922,14 +921,12 @@ static struct extent_map *get_extent_map(struct btrfs_inode *inode,
+               *em_cached = NULL;
+       }
+ 
+-      btrfs_lock_and_flush_ordered_range(inode, start, start + len - 1, &cached_state);
+       em = btrfs_get_extent(inode, folio, start, len);
+       if (!IS_ERR(em)) {
+               BUG_ON(*em_cached);
+               refcount_inc(&em->refs);
+               *em_cached = em;
+       }
+-      unlock_extent(&inode->io_tree, start, start + len - 1, &cached_state);
+ 
+       return em;
+ }
+@@ -1086,11 +1083,18 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
+ 
+ int btrfs_read_folio(struct file *file, struct folio *folio)
+ {
++      struct btrfs_inode *inode = folio_to_inode(folio);
++      const u64 start = folio_pos(folio);
++      const u64 end = start + folio_size(folio) - 1;
++      struct extent_state *cached_state = NULL;
+       struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ };
+       struct extent_map *em_cached = NULL;
+       int ret;
+ 
++      btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state);
+       ret = btrfs_do_readpage(folio, &em_cached, &bio_ctrl, NULL);
++      unlock_extent(&inode->io_tree, start, end, &cached_state);
++
+       free_extent_map(em_cached);
+ 
+       /*
+@@ -2267,12 +2271,20 @@ void btrfs_readahead(struct readahead_control *rac)
+ {
+       struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ | REQ_RAHEAD };
+       struct folio *folio;
++      struct btrfs_inode *inode = BTRFS_I(rac->mapping->host);
++      const u64 start = readahead_pos(rac);
++      const u64 end = start + readahead_length(rac) - 1;
++      struct extent_state *cached_state = NULL;
+       struct extent_map *em_cached = NULL;
+       u64 prev_em_start = (u64)-1;
+ 
++      btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state);
++
+       while ((folio = readahead_folio(rac)) != NULL)
+               btrfs_do_readpage(folio, &em_cached, &bio_ctrl, &prev_em_start);
+ 
++      unlock_extent(&inode->io_tree, start, end, &cached_state);
++
+       if (em_cached)
+               free_extent_map(em_cached);
+       submit_one_bio(&bio_ctrl);
+-- 
+2.39.5
+
diff --git a/queue-6.12/btrfs-rename-__get_extent_map-and-pass-btrfs_inode.patch b/queue-6.12/btrfs-rename-__get_extent_map-and-pass-btrfs_inode.patch

new file mode 100644 (file)

index 0000000..817a6f7
--- /dev/null
+++ b/queue-6.12/btrfs-rename-__get_extent_map-and-pass-btrfs_inode.patch
@@ -0,0 +1,70 @@
+From 693ad002410794be6b07348c6324c687becc4ec4 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 9 Jan 2025 11:24:15 +0100
+Subject: btrfs: rename __get_extent_map() and pass btrfs_inode
+
+From: David Sterba <dsterba@suse.com>
+
+[ Upstream commit 06de96faf795b5c276a3be612da6b08c6112e747 ]
+
+The double underscore naming scheme does not apply here, there's only
+only get_extent_map(). As the definition is changed also pass the struct
+btrfs_inode.
+
+Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
+Reviewed-by: Anand Jain <anand.jain@oracle.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Stable-dep-of: acc18e1c1d8c ("btrfs: fix stale page cache after race between readahead and direct IO write")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ fs/btrfs/extent_io.c | 15 +++++++--------
+ 1 file changed, 7 insertions(+), 8 deletions(-)
+
+diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
+index 42c9899d9241c..e6e6c4dc53c48 100644
+--- a/fs/btrfs/extent_io.c
++++ b/fs/btrfs/extent_io.c
+@@ -901,9 +901,9 @@ void clear_folio_extent_mapped(struct folio *folio)
+       folio_detach_private(folio);
+ }
+ 
+-static struct extent_map *__get_extent_map(struct inode *inode,
+-                                         struct folio *folio, u64 start,
+-                                         u64 len, struct extent_map **em_cached)
++static struct extent_map *get_extent_map(struct btrfs_inode *inode,
++                                       struct folio *folio, u64 start,
++                                       u64 len, struct extent_map **em_cached)
+ {
+       struct extent_map *em;
+       struct extent_state *cached_state = NULL;
+@@ -922,14 +922,14 @@ static struct extent_map *__get_extent_map(struct inode *inode,
+               *em_cached = NULL;
+       }
+ 
+-      btrfs_lock_and_flush_ordered_range(BTRFS_I(inode), start, start + len - 1, &cached_state);
+-      em = btrfs_get_extent(BTRFS_I(inode), folio, start, len);
++      btrfs_lock_and_flush_ordered_range(inode, start, start + len - 1, &cached_state);
++      em = btrfs_get_extent(inode, folio, start, len);
+       if (!IS_ERR(em)) {
+               BUG_ON(*em_cached);
+               refcount_inc(&em->refs);
+               *em_cached = em;
+       }
+-      unlock_extent(&BTRFS_I(inode)->io_tree, start, start + len - 1, &cached_state);
++      unlock_extent(&inode->io_tree, start, start + len - 1, &cached_state);
+ 
+       return em;
+ }
+@@ -985,8 +985,7 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
+                       end_folio_read(folio, true, cur, iosize);
+                       break;
+               }
+-              em = __get_extent_map(inode, folio, cur, end - cur + 1,
+-                                    em_cached);
++              em = get_extent_map(BTRFS_I(inode), folio, cur, end - cur + 1, em_cached);
+               if (IS_ERR(em)) {
+                       end_folio_read(folio, false, cur, end + 1 - cur);
+                       return PTR_ERR(em);
+-- 
+2.39.5
+
diff --git a/queue-6.12/clocksource-use-migrate_disable-to-avoid-calling-get.patch b/queue-6.12/clocksource-use-migrate_disable-to-avoid-calling-get.patch

new file mode 100644 (file)

index 0000000..bab768d
--- /dev/null
+++ b/queue-6.12/clocksource-use-migrate_disable-to-avoid-calling-get.patch
@@ -0,0 +1,82 @@
+From 260b25a3327f0749a6dde43fe624cd790fde8b01 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 31 Jan 2025 12:33:23 -0500
+Subject: clocksource: Use migrate_disable() to avoid calling get_random_u32()
+ in atomic context
+
+From: Waiman Long <longman@redhat.com>
+
+[ Upstream commit 6bb05a33337b2c842373857b63de5c9bf1ae2a09 ]
+
+The following bug report happened with a PREEMPT_RT kernel:
+
+  BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
+  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2012, name: kwatchdog
+  preempt_count: 1, expected: 0
+  RCU nest depth: 0, expected: 0
+  get_random_u32+0x4f/0x110
+  clocksource_verify_choose_cpus+0xab/0x1a0
+  clocksource_verify_percpu.part.0+0x6b/0x330
+  clocksource_watchdog_kthread+0x193/0x1a0
+
+It is due to the fact that clocksource_verify_choose_cpus() is invoked with
+preemption disabled.  This function invokes get_random_u32() to obtain
+random numbers for choosing CPUs.  The batched_entropy_32 local lock and/or
+the base_crng.lock spinlock in driver/char/random.c will be acquired during
+the call. In PREEMPT_RT kernel, they are both sleeping locks and so cannot
+be acquired in atomic context.
+
+Fix this problem by using migrate_disable() to allow smp_processor_id() to
+be reliably used without introducing atomic context. preempt_disable() is
+then called after clocksource_verify_choose_cpus() but before the
+clocksource measurement is being run to avoid introducing unexpected
+latency.
+
+Fixes: 7560c02bdffb ("clocksource: Check per-CPU clock synchronization when marked unstable")
+Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Signed-off-by: Waiman Long <longman@redhat.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
+Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Link: https://lore.kernel.org/all/20250131173323.891943-2-longman@redhat.com
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ kernel/time/clocksource.c | 6 ++++--
+ 1 file changed, 4 insertions(+), 2 deletions(-)
+
+diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
+index c4e6b5e6af88c..58fb7280cabbe 100644
+--- a/kernel/time/clocksource.c
++++ b/kernel/time/clocksource.c
+@@ -365,10 +365,10 @@ void clocksource_verify_percpu(struct clocksource *cs)
+       cpumask_clear(&cpus_ahead);
+       cpumask_clear(&cpus_behind);
+       cpus_read_lock();
+-      preempt_disable();
++      migrate_disable();
+       clocksource_verify_choose_cpus();
+       if (cpumask_empty(&cpus_chosen)) {
+-              preempt_enable();
++              migrate_enable();
+               cpus_read_unlock();
+               pr_warn("Not enough CPUs to check clocksource '%s'.\n", cs->name);
+               return;
+@@ -376,6 +376,7 @@ void clocksource_verify_percpu(struct clocksource *cs)
+       testcpu = smp_processor_id();
+       pr_info("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n",
+               cs->name, testcpu, cpumask_pr_args(&cpus_chosen));
++      preempt_disable();
+       for_each_cpu(cpu, &cpus_chosen) {
+               if (cpu == testcpu)
+                       continue;
+@@ -395,6 +396,7 @@ void clocksource_verify_percpu(struct clocksource *cs)
+                       cs_nsec_min = cs_nsec;
+       }
+       preempt_enable();
++      migrate_enable();
+       cpus_read_unlock();
+       if (!cpumask_empty(&cpus_ahead))
+               pr_warn("        CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
+-- 
+2.39.5
+
diff --git a/queue-6.12/clocksource-use-pr_info-for-checking-clocksource-syn.patch b/queue-6.12/clocksource-use-pr_info-for-checking-clocksource-syn.patch

new file mode 100644 (file)

index 0000000..c8e1a2b
--- /dev/null
+++ b/queue-6.12/clocksource-use-pr_info-for-checking-clocksource-syn.patch
@@ -0,0 +1,45 @@
+From db18d29bc84d9ca827c9acfd807524ca58beb127 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 24 Jan 2025 20:54:41 -0500
+Subject: clocksource: Use pr_info() for "Checking clocksource synchronization"
+ message
+
+From: Waiman Long <longman@redhat.com>
+
+[ Upstream commit 1f566840a82982141f94086061927a90e79440e5 ]
+
+The "Checking clocksource synchronization" message is normally printed
+when clocksource_verify_percpu() is called for a given clocksource if
+both the CLOCK_SOURCE_UNSTABLE and CLOCK_SOURCE_VERIFY_PERCPU flags
+are set.
+
+It is an informational message and so pr_info() is the correct choice.
+
+Signed-off-by: Waiman Long <longman@redhat.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
+Acked-by: John Stultz <jstultz@google.com>
+Link: https://lore.kernel.org/all/20250125015442.3740588-1-longman@redhat.com
+Stable-dep-of: 6bb05a33337b ("clocksource: Use migrate_disable() to avoid calling get_random_u32() in atomic context")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ kernel/time/clocksource.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
+index 8a40a616288b8..c4e6b5e6af88c 100644
+--- a/kernel/time/clocksource.c
++++ b/kernel/time/clocksource.c
+@@ -374,7 +374,8 @@ void clocksource_verify_percpu(struct clocksource *cs)
+               return;
+       }
+       testcpu = smp_processor_id();
+-      pr_warn("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n", cs->name, testcpu, cpumask_pr_args(&cpus_chosen));
++      pr_info("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n",
++              cs->name, testcpu, cpumask_pr_args(&cpus_chosen));
+       for_each_cpu(cpu, &cpus_chosen) {
+               if (cpu == testcpu)
+                       continue;
+-- 
+2.39.5
+
diff --git a/queue-6.12/cpufreq-amd-pstate-align-offline-flow-of-shared-memo.patch b/queue-6.12/cpufreq-amd-pstate-align-offline-flow-of-shared-memo.patch

new file mode 100644 (file)

index 0000000..14bc28f
--- /dev/null
+++ b/queue-6.12/cpufreq-amd-pstate-align-offline-flow-of-shared-memo.patch
@@ -0,0 +1,39 @@
+From 3b7e555c9f6f68ea4d1a91938ffe89fd9b9f37d2 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 23 Oct 2024 10:21:12 +0000
+Subject: cpufreq/amd-pstate: Align offline flow of shared memory and MSR based
+ systems
+
+From: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+
+[ Upstream commit a6960e6b1b0e2cb268f427a99040c408a8d10665 ]
+
+Set min_perf to lowest_perf for shared memory systems, similar to the MSR
+based systems.
+
+Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
+Link: https://lore.kernel.org/r/20241023102108.5980-5-Dhananjay.Ugwekar@amd.com
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 161334937090c..895d108428b40 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -1636,6 +1636,7 @@ static void amd_pstate_epp_offline(struct cpufreq_policy *policy)
+               wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
+       } else {
+               perf_ctrls.desired_perf = 0;
++              perf_ctrls.min_perf = min_perf;
+               perf_ctrls.max_perf = min_perf;
+               cppc_set_perf(cpudata->cpu, &perf_ctrls);
+               perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_BALANCE_POWERSAVE);
+-- 
+2.39.5
+
diff --git a/queue-6.12/cpufreq-amd-pstate-call-cppc_set_epp_perf-in-the-ree.patch b/queue-6.12/cpufreq-amd-pstate-call-cppc_set_epp_perf-in-the-ree.patch

new file mode 100644 (file)

index 0000000..e0defff
--- /dev/null
+++ b/queue-6.12/cpufreq-amd-pstate-call-cppc_set_epp_perf-in-the-ree.patch
@@ -0,0 +1,53 @@
+From 4de4224d57ce3204800eb03d8d2036e2e3c9ccf0 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 23 Oct 2024 10:21:10 +0000
+Subject: cpufreq/amd-pstate: Call cppc_set_epp_perf in the reenable function
+
+From: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+
+[ Upstream commit 796ff50e127af8362035f87ba29b6b84e2dd9742 ]
+
+The EPP value being set in perf_ctrls.energy_perf is not being propagated
+to the shared memory, fix that.
+
+Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Reviewed-by: Perry Yuan <perry.yuan@amd.com>
+Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
+Link: https://lore.kernel.org/r/20241023102108.5980-4-Dhananjay.Ugwekar@amd.com
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 6 ++++--
+ 1 file changed, 4 insertions(+), 2 deletions(-)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 91d3c3b1c2d3b..161334937090c 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -1594,8 +1594,9 @@ static void amd_pstate_epp_reenable(struct amd_cpudata *cpudata)
+               wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
+       } else {
+               perf_ctrls.max_perf = max_perf;
+-              perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(cpudata->epp_cached);
+               cppc_set_perf(cpudata->cpu, &perf_ctrls);
++              perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(cpudata->epp_cached);
++              cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
+       }
+ }
+ 
+@@ -1636,8 +1637,9 @@ static void amd_pstate_epp_offline(struct cpufreq_policy *policy)
+       } else {
+               perf_ctrls.desired_perf = 0;
+               perf_ctrls.max_perf = min_perf;
+-              perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_BALANCE_POWERSAVE);
+               cppc_set_perf(cpudata->cpu, &perf_ctrls);
++              perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_BALANCE_POWERSAVE);
++              cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
+       }
+       mutex_unlock(&amd_pstate_limits_lock);
+ }
+-- 
+2.39.5
+
diff --git a/queue-6.12/cpufreq-amd-pstate-convert-mutex-use-to-guard.patch b/queue-6.12/cpufreq-amd-pstate-convert-mutex-use-to-guard.patch

new file mode 100644 (file)

index 0000000..e9413b0
--- /dev/null
+++ b/queue-6.12/cpufreq-amd-pstate-convert-mutex-use-to-guard.patch
@@ -0,0 +1,132 @@
+From ea16a7ceb11c3ba3d1179b32d4abce18d3272a18 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Mon, 9 Dec 2024 12:52:37 -0600
+Subject: cpufreq/amd-pstate: convert mutex use to guard()
+
+From: Mario Limonciello <mario.limonciello@amd.com>
+
+[ Upstream commit 6c093d5a5b73ec1caf1e706510ae6031af2f9d43 ]
+
+Using scoped guard declaration will unlock mutexes automatically.
+
+Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
+Link: https://lore.kernel.org/r/20241209185248.16301-5-mario.limonciello@amd.com
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 32 ++++++++++++--------------------
+ 1 file changed, 12 insertions(+), 20 deletions(-)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 145a48fc49034..33777f5ab7d16 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -696,12 +696,12 @@ static int amd_pstate_set_boost(struct cpufreq_policy *policy, int state)
+               pr_err("Boost mode is not supported by this processor or SBIOS\n");
+               return -EOPNOTSUPP;
+       }
+-      mutex_lock(&amd_pstate_driver_lock);
++      guard(mutex)(&amd_pstate_driver_lock);
++
+       ret = amd_pstate_cpu_boost_update(policy, state);
+       WRITE_ONCE(cpudata->boost_state, !ret ? state : false);
+       policy->boost_enabled = !ret ? state : false;
+       refresh_frequency_limits(policy);
+-      mutex_unlock(&amd_pstate_driver_lock);
+ 
+       return ret;
+ }
+@@ -792,7 +792,8 @@ static void amd_pstate_update_limits(unsigned int cpu)
+       if (!amd_pstate_prefcore)
+               return;
+ 
+-      mutex_lock(&amd_pstate_driver_lock);
++      guard(mutex)(&amd_pstate_driver_lock);
++
+       ret = amd_get_highest_perf(cpu, &cur_high);
+       if (ret)
+               goto free_cpufreq_put;
+@@ -812,7 +813,6 @@ static void amd_pstate_update_limits(unsigned int cpu)
+       if (!highest_perf_changed)
+               cpufreq_update_policy(cpu);
+ 
+-      mutex_unlock(&amd_pstate_driver_lock);
+ }
+ 
+ /*
+@@ -1145,11 +1145,11 @@ static ssize_t store_energy_performance_preference(
+       if (ret < 0)
+               return -EINVAL;
+ 
+-      mutex_lock(&amd_pstate_limits_lock);
++      guard(mutex)(&amd_pstate_limits_lock);
++
+       ret = amd_pstate_set_energy_pref_index(cpudata, ret);
+-      mutex_unlock(&amd_pstate_limits_lock);
+ 
+-      return ret ?: count;
++      return ret ? ret : count;
+ }
+ 
+ static ssize_t show_energy_performance_preference(
+@@ -1297,13 +1297,10 @@ EXPORT_SYMBOL_GPL(amd_pstate_update_status);
+ static ssize_t status_show(struct device *dev,
+                          struct device_attribute *attr, char *buf)
+ {
+-      ssize_t ret;
+ 
+-      mutex_lock(&amd_pstate_driver_lock);
+-      ret = amd_pstate_show_status(buf);
+-      mutex_unlock(&amd_pstate_driver_lock);
++      guard(mutex)(&amd_pstate_driver_lock);
+ 
+-      return ret;
++      return amd_pstate_show_status(buf);
+ }
+ 
+ static ssize_t status_store(struct device *a, struct device_attribute *b,
+@@ -1312,9 +1309,8 @@ static ssize_t status_store(struct device *a, struct device_attribute *b,
+       char *p = memchr(buf, '\n', count);
+       int ret;
+ 
+-      mutex_lock(&amd_pstate_driver_lock);
++      guard(mutex)(&amd_pstate_driver_lock);
+       ret = amd_pstate_update_status(buf, p ? p - buf : count);
+-      mutex_unlock(&amd_pstate_driver_lock);
+ 
+       return ret < 0 ? ret : count;
+ }
+@@ -1614,13 +1610,11 @@ static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy)
+ 
+       min_perf = READ_ONCE(cpudata->lowest_perf);
+ 
+-      mutex_lock(&amd_pstate_limits_lock);
++      guard(mutex)(&amd_pstate_limits_lock);
+ 
+       amd_pstate_update_perf(cpudata, min_perf, 0, min_perf, false);
+       amd_pstate_set_epp(cpudata, AMD_CPPC_EPP_BALANCE_POWERSAVE);
+ 
+-      mutex_unlock(&amd_pstate_limits_lock);
+-
+       return 0;
+ }
+ 
+@@ -1656,13 +1650,11 @@ static int amd_pstate_epp_resume(struct cpufreq_policy *policy)
+       struct amd_cpudata *cpudata = policy->driver_data;
+ 
+       if (cpudata->suspended) {
+-              mutex_lock(&amd_pstate_limits_lock);
++              guard(mutex)(&amd_pstate_limits_lock);
+ 
+               /* enable amd pstate from suspend state*/
+               amd_pstate_epp_reenable(cpudata);
+ 
+-              mutex_unlock(&amd_pstate_limits_lock);
+-
+               cpudata->suspended = false;
+       }
+ 
+-- 
+2.39.5
+
diff --git a/queue-6.12/cpufreq-amd-pstate-fix-cpufreq_policy-ref-counting.patch b/queue-6.12/cpufreq-amd-pstate-fix-cpufreq_policy-ref-counting.patch

new file mode 100644 (file)

index 0000000..5839160
--- /dev/null
+++ b/queue-6.12/cpufreq-amd-pstate-fix-cpufreq_policy-ref-counting.patch
@@ -0,0 +1,55 @@
+From 71d244e6199e6d1092534fb1e982ffc84590ce97 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 11:25:20 +0000
+Subject: cpufreq/amd-pstate: Fix cpufreq_policy ref counting
+
+From: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
+
+[ Upstream commit 3ace20038e19f23fe73259513f1f08d4bf1a3c83 ]
+
+amd_pstate_update_limits() takes a cpufreq_policy reference but doesn't
+decrement the refcount in one of the exit paths, fix that.
+
+Fixes: 45722e777fd9 ("cpufreq: amd-pstate: Optimize amd_pstate_update_limits()")
+Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Link: https://lore.kernel.org/r/20250205112523.201101-10-dhananjay.ugwekar@amd.com
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 9 +++++----
+ 1 file changed, 5 insertions(+), 4 deletions(-)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 33777f5ab7d16..bdfd8ffe04398 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -778,20 +778,21 @@ static void amd_pstate_init_prefcore(struct amd_cpudata *cpudata)
+ 
+ static void amd_pstate_update_limits(unsigned int cpu)
+ {
+-      struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
++      struct cpufreq_policy *policy = NULL;
+       struct amd_cpudata *cpudata;
+       u32 prev_high = 0, cur_high = 0;
+       int ret;
+       bool highest_perf_changed = false;
+ 
++      if (!amd_pstate_prefcore)
++              return;
++
++      policy = cpufreq_cpu_get(cpu);
+       if (!policy)
+               return;
+ 
+       cpudata = policy->driver_data;
+ 
+-      if (!amd_pstate_prefcore)
+-              return;
+-
+       guard(mutex)(&amd_pstate_driver_lock);
+ 
+       ret = amd_get_highest_perf(cpu, &cur_high);
+-- 
+2.39.5
+
diff --git a/queue-6.12/cpufreq-amd-pstate-merge-amd_pstate_epp_cpu_offline-.patch b/queue-6.12/cpufreq-amd-pstate-merge-amd_pstate_epp_cpu_offline-.patch

new file mode 100644 (file)

index 0000000..52ad9fd
--- /dev/null
+++ b/queue-6.12/cpufreq-amd-pstate-merge-amd_pstate_epp_cpu_offline-.patch
@@ -0,0 +1,69 @@
+From 286c4edf998d184d941cc1f0b2cc45da6f805126 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 4 Dec 2024 14:48:42 +0000
+Subject: cpufreq/amd-pstate: Merge amd_pstate_epp_cpu_offline() and
+ amd_pstate_epp_offline()
+
+From: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+
+[ Upstream commit 53ec2101dfede8fecdd240662281a12e537c3411 ]
+
+amd_pstate_epp_offline() is only called from within
+amd_pstate_epp_cpu_offline() and doesn't make much sense to have it at all.
+Hence, remove it.
+
+Also remove the unncessary debug print in the offline path while at it.
+
+Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Link: https://lore.kernel.org/r/20241204144842.164178-6-Dhananjay.Ugwekar@amd.com
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 17 ++++-------------
+ 1 file changed, 4 insertions(+), 13 deletions(-)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 4dfe5bdcb2932..145a48fc49034 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -1604,11 +1604,14 @@ static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy)
+       return 0;
+ }
+ 
+-static void amd_pstate_epp_offline(struct cpufreq_policy *policy)
++static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy)
+ {
+       struct amd_cpudata *cpudata = policy->driver_data;
+       int min_perf;
+ 
++      if (cpudata->suspended)
++              return 0;
++
+       min_perf = READ_ONCE(cpudata->lowest_perf);
+ 
+       mutex_lock(&amd_pstate_limits_lock);
+@@ -1617,18 +1620,6 @@ static void amd_pstate_epp_offline(struct cpufreq_policy *policy)
+       amd_pstate_set_epp(cpudata, AMD_CPPC_EPP_BALANCE_POWERSAVE);
+ 
+       mutex_unlock(&amd_pstate_limits_lock);
+-}
+-
+-static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy)
+-{
+-      struct amd_cpudata *cpudata = policy->driver_data;
+-
+-      pr_debug("AMD CPU Core %d going offline\n", cpudata->cpu);
+-
+-      if (cpudata->suspended)
+-              return 0;
+-
+-      amd_pstate_epp_offline(policy);
+ 
+       return 0;
+ }
+-- 
+2.39.5
+
diff --git a/queue-6.12/cpufreq-amd-pstate-refactor-amd_pstate_epp_reenable-.patch b/queue-6.12/cpufreq-amd-pstate-refactor-amd_pstate_epp_reenable-.patch

new file mode 100644 (file)

index 0000000..9b31bf0
--- /dev/null
+++ b/queue-6.12/cpufreq-amd-pstate-refactor-amd_pstate_epp_reenable-.patch
@@ -0,0 +1,97 @@
+From 915d2fe2dd57d8d19bda8de19196eadd04f0eef6 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 4 Dec 2024 14:48:40 +0000
+Subject: cpufreq/amd-pstate: Refactor amd_pstate_epp_reenable() and
+ amd_pstate_epp_offline()
+
+From: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+
+[ Upstream commit b1089e0c8817fda93d474eaa82ad86386887aefe ]
+
+Replace similar code chunks with amd_pstate_update_perf() and
+amd_pstate_set_epp() function calls.
+
+Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
+Link: https://lore.kernel.org/r/20241204144842.164178-4-Dhananjay.Ugwekar@amd.com
+[ML: Fix LKP reported error about unused variable]
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 38 +++++++-----------------------------
+ 1 file changed, 7 insertions(+), 31 deletions(-)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 895d108428b40..19906141ef7fe 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -1579,25 +1579,17 @@ static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy)
+ 
+ static void amd_pstate_epp_reenable(struct amd_cpudata *cpudata)
+ {
+-      struct cppc_perf_ctrls perf_ctrls;
+-      u64 value, max_perf;
++      u64 max_perf;
+       int ret;
+ 
+       ret = amd_pstate_enable(true);
+       if (ret)
+               pr_err("failed to enable amd pstate during resume, return %d\n", ret);
+ 
+-      value = READ_ONCE(cpudata->cppc_req_cached);
+       max_perf = READ_ONCE(cpudata->highest_perf);
+ 
+-      if (cpu_feature_enabled(X86_FEATURE_CPPC)) {
+-              wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
+-      } else {
+-              perf_ctrls.max_perf = max_perf;
+-              cppc_set_perf(cpudata->cpu, &perf_ctrls);
+-              perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(cpudata->epp_cached);
+-              cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
+-      }
++      amd_pstate_update_perf(cpudata, 0, 0, max_perf, false);
++      amd_pstate_set_epp(cpudata, cpudata->epp_cached);
+ }
+ 
+ static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy)
+@@ -1617,31 +1609,15 @@ static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy)
+ static void amd_pstate_epp_offline(struct cpufreq_policy *policy)
+ {
+       struct amd_cpudata *cpudata = policy->driver_data;
+-      struct cppc_perf_ctrls perf_ctrls;
+       int min_perf;
+-      u64 value;
+ 
+       min_perf = READ_ONCE(cpudata->lowest_perf);
+-      value = READ_ONCE(cpudata->cppc_req_cached);
+ 
+       mutex_lock(&amd_pstate_limits_lock);
+-      if (cpu_feature_enabled(X86_FEATURE_CPPC)) {
+-              cpudata->epp_policy = CPUFREQ_POLICY_UNKNOWN;
+-
+-              /* Set max perf same as min perf */
+-              value &= ~AMD_CPPC_MAX_PERF(~0L);
+-              value |= AMD_CPPC_MAX_PERF(min_perf);
+-              value &= ~AMD_CPPC_MIN_PERF(~0L);
+-              value |= AMD_CPPC_MIN_PERF(min_perf);
+-              wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
+-      } else {
+-              perf_ctrls.desired_perf = 0;
+-              perf_ctrls.min_perf = min_perf;
+-              perf_ctrls.max_perf = min_perf;
+-              cppc_set_perf(cpudata->cpu, &perf_ctrls);
+-              perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_BALANCE_POWERSAVE);
+-              cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
+-      }
++
++      amd_pstate_update_perf(cpudata, min_perf, 0, min_perf, false);
++      amd_pstate_set_epp(cpudata, AMD_CPPC_EPP_BALANCE_POWERSAVE);
++
+       mutex_unlock(&amd_pstate_limits_lock);
+ }
+ 
+-- 
+2.39.5
+
diff --git a/queue-6.12/cpufreq-amd-pstate-remove-the-cppc_state-check-in-of.patch b/queue-6.12/cpufreq-amd-pstate-remove-the-cppc_state-check-in-of.patch

new file mode 100644 (file)

index 0000000..f5e27cc
--- /dev/null
+++ b/queue-6.12/cpufreq-amd-pstate-remove-the-cppc_state-check-in-of.patch
@@ -0,0 +1,56 @@
+From 07686cd20c1fa7f6b54a47e099382aae311067c0 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 4 Dec 2024 14:48:41 +0000
+Subject: cpufreq/amd-pstate: Remove the cppc_state check in offline/online
+ functions
+
+From: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+
+[ Upstream commit b78f8c87ec3e7499bb049986838636d3afbc7ece ]
+
+Only amd_pstate_epp driver (i.e. cppc_state = ACTIVE) enters the
+amd_pstate_epp_offline() and amd_pstate_epp_cpu_online() functions,
+so remove the unnecessary if condition checking if cppc_state is
+equal to AMD_PSTATE_ACTIVE.
+
+Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
+Link: https://lore.kernel.org/r/20241204144842.164178-5-Dhananjay.Ugwekar@amd.com
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 9 +++------
+ 1 file changed, 3 insertions(+), 6 deletions(-)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 19906141ef7fe..4dfe5bdcb2932 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -1598,10 +1598,8 @@ static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy)
+ 
+       pr_debug("AMD CPU Core %d going online\n", cpudata->cpu);
+ 
+-      if (cppc_state == AMD_PSTATE_ACTIVE) {
+-              amd_pstate_epp_reenable(cpudata);
+-              cpudata->suspended = false;
+-      }
++      amd_pstate_epp_reenable(cpudata);
++      cpudata->suspended = false;
+ 
+       return 0;
+ }
+@@ -1630,8 +1628,7 @@ static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy)
+       if (cpudata->suspended)
+               return 0;
+ 
+-      if (cppc_state == AMD_PSTATE_ACTIVE)
+-              amd_pstate_epp_offline(policy);
++      amd_pstate_epp_offline(policy);
+ 
+       return 0;
+ }
+-- 
+2.39.5
+
diff --git a/queue-6.12/flow_dissector-use-rcu-protection-to-fetch-dev_net.patch b/queue-6.12/flow_dissector-use-rcu-protection-to-fetch-dev_net.patch

new file mode 100644 (file)

index 0000000..bc7a99a
--- /dev/null
+++ b/queue-6.12/flow_dissector-use-rcu-protection-to-fetch-dev_net.patch
@@ -0,0 +1,81 @@
+From 081a12d3c0b2ff44f3d44ba992b88a9e99d725cf Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:17 +0000
+Subject: flow_dissector: use RCU protection to fetch dev_net()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit afec62cd0a4191cde6dd3a75382be4d51a38ce9b ]
+
+__skb_flow_dissect() can be called from arbitrary contexts.
+
+It must extend its RCU protection section to include
+the call to dev_net(), which can become dev_net_rcu().
+
+This makes sure the net structure can not disappear under us.
+
+Fixes: 9b52e3f267a6 ("flow_dissector: handle no-skb use case")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-10-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/core/flow_dissector.c | 21 +++++++++++----------
+ 1 file changed, 11 insertions(+), 10 deletions(-)
+
+diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
+index 0e638a37aa096..5db41bf2ed93e 100644
+--- a/net/core/flow_dissector.c
++++ b/net/core/flow_dissector.c
+@@ -1108,10 +1108,12 @@ bool __skb_flow_dissect(const struct net *net,
+                                             FLOW_DISSECTOR_KEY_BASIC,
+                                             target_container);
+ 
++      rcu_read_lock();
++
+       if (skb) {
+               if (!net) {
+                       if (skb->dev)
+-                              net = dev_net(skb->dev);
++                              net = dev_net_rcu(skb->dev);
+                       else if (skb->sk)
+                               net = sock_net(skb->sk);
+               }
+@@ -1122,7 +1124,6 @@ bool __skb_flow_dissect(const struct net *net,
+               enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
+               struct bpf_prog_array *run_array;
+ 
+-              rcu_read_lock();
+               run_array = rcu_dereference(init_net.bpf.run_array[type]);
+               if (!run_array)
+                       run_array = rcu_dereference(net->bpf.run_array[type]);
+@@ -1150,17 +1151,17 @@ bool __skb_flow_dissect(const struct net *net,
+                       prog = READ_ONCE(run_array->items[0].prog);
+                       result = bpf_flow_dissect(prog, &ctx, n_proto, nhoff,
+                                                 hlen, flags);
+-                      if (result == BPF_FLOW_DISSECTOR_CONTINUE)
+-                              goto dissect_continue;
+-                      __skb_flow_bpf_to_target(&flow_keys, flow_dissector,
+-                                               target_container);
+-                      rcu_read_unlock();
+-                      return result == BPF_OK;
++                      if (result != BPF_FLOW_DISSECTOR_CONTINUE) {
++                              __skb_flow_bpf_to_target(&flow_keys, flow_dissector,
++                                                       target_container);
++                              rcu_read_unlock();
++                              return result == BPF_OK;
++                      }
+               }
+-dissect_continue:
+-              rcu_read_unlock();
+       }
+ 
++      rcu_read_unlock();
++
+       if (dissector_uses_key(flow_dissector,
+                              FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+               struct ethhdr *eth = eth_hdr(skb);
+-- 
+2.39.5
+
diff --git a/queue-6.12/hid-hid-steam-make-sure-rumble-work-is-canceled-on-r.patch b/queue-6.12/hid-hid-steam-make-sure-rumble-work-is-canceled-on-r.patch

new file mode 100644 (file)

index 0000000..fddf877
--- /dev/null
+++ b/queue-6.12/hid-hid-steam-make-sure-rumble-work-is-canceled-on-r.patch
@@ -0,0 +1,38 @@
+From 185b5f815034c6e426816260490fa8b26c340ad5 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 25 Dec 2024 18:34:24 -0800
+Subject: HID: hid-steam: Make sure rumble work is canceled on removal
+
+From: Vicki Pfau <vi@endrift.com>
+
+[ Upstream commit cc4f952427aaa44ecfd92542e10a65cce67bd6f4 ]
+
+When a force feedback command is sent from userspace, work is scheduled to pass
+this data to the controller without blocking userspace itself. However, in
+theory, this work might not be properly canceled if the controller is removed
+at the exact right time. This patch ensures the work is properly canceled when
+the device is removed.
+
+Signed-off-by: Vicki Pfau <vi@endrift.com>
+Signed-off-by: Jiri Kosina <jkosina@suse.com>
+Stable-dep-of: 79504249d7e2 ("HID: hid-steam: Move hidraw input (un)registering to work")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/hid/hid-steam.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/drivers/hid/hid-steam.c b/drivers/hid/hid-steam.c
+index 9b6aec0733ae6..daca250e51c8b 100644
+--- a/drivers/hid/hid-steam.c
++++ b/drivers/hid/hid-steam.c
+@@ -1306,6 +1306,7 @@ static void steam_remove(struct hid_device *hdev)
+ 
+       cancel_delayed_work_sync(&steam->mode_switch);
+       cancel_work_sync(&steam->work_connect);
++      cancel_work_sync(&steam->rumble_work);
+       hid_destroy_device(steam->client_hdev);
+       steam->client_hdev = NULL;
+       steam->client_opened = 0;
+-- 
+2.39.5
+
diff --git a/queue-6.12/hid-hid-steam-move-hidraw-input-un-registering-to-wo.patch b/queue-6.12/hid-hid-steam-move-hidraw-input-un-registering-to-wo.patch

new file mode 100644 (file)

index 0000000..7afbe38
--- /dev/null
+++ b/queue-6.12/hid-hid-steam-move-hidraw-input-un-registering-to-wo.patch
@@ -0,0 +1,117 @@
+From 5bc8accc48c65d965f70c8918cd314ddb1ac3ece Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 4 Feb 2025 19:55:27 -0800
+Subject: HID: hid-steam: Move hidraw input (un)registering to work
+
+From: Vicki Pfau <vi@endrift.com>
+
+[ Upstream commit 79504249d7e27cad4a3eeb9afc6386e418728ce0 ]
+
+Due to an interplay between locking in the input and hid transport subsystems,
+attempting to register or deregister the relevant input devices during the
+hidraw open/close events can lead to a lock ordering issue. Though this
+shouldn't cause a deadlock, this commit moves the input device manipulation to
+deferred work to sidestep the issue.
+
+Fixes: 385a4886778f6 ("HID: steam: remove input device when a hid client is running.")
+Signed-off-by: Vicki Pfau <vi@endrift.com>
+Signed-off-by: Jiri Kosina <jkosina@suse.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/hid/hid-steam.c | 38 +++++++++++++++++++++++++++++++-------
+ 1 file changed, 31 insertions(+), 7 deletions(-)
+
+diff --git a/drivers/hid/hid-steam.c b/drivers/hid/hid-steam.c
+index daca250e51c8b..7b35966898785 100644
+--- a/drivers/hid/hid-steam.c
++++ b/drivers/hid/hid-steam.c
+@@ -313,6 +313,7 @@ struct steam_device {
+       u16 rumble_left;
+       u16 rumble_right;
+       unsigned int sensor_timestamp_us;
++      struct work_struct unregister_work;
+ };
+ 
+ static int steam_recv_report(struct steam_device *steam,
+@@ -1072,6 +1073,31 @@ static void steam_mode_switch_cb(struct work_struct *work)
+       }
+ }
+ 
++static void steam_work_unregister_cb(struct work_struct *work)
++{
++      struct steam_device *steam = container_of(work, struct steam_device,
++                                                      unregister_work);
++      unsigned long flags;
++      bool connected;
++      bool opened;
++
++      spin_lock_irqsave(&steam->lock, flags);
++      opened = steam->client_opened;
++      connected = steam->connected;
++      spin_unlock_irqrestore(&steam->lock, flags);
++
++      if (connected) {
++              if (opened) {
++                      steam_sensors_unregister(steam);
++                      steam_input_unregister(steam);
++              } else {
++                      steam_set_lizard_mode(steam, lizard_mode);
++                      steam_input_register(steam);
++                      steam_sensors_register(steam);
++              }
++      }
++}
++
+ static bool steam_is_valve_interface(struct hid_device *hdev)
+ {
+       struct hid_report_enum *rep_enum;
+@@ -1117,8 +1143,7 @@ static int steam_client_ll_open(struct hid_device *hdev)
+       steam->client_opened++;
+       spin_unlock_irqrestore(&steam->lock, flags);
+ 
+-      steam_sensors_unregister(steam);
+-      steam_input_unregister(steam);
++      schedule_work(&steam->unregister_work);
+ 
+       return 0;
+ }
+@@ -1135,11 +1160,7 @@ static void steam_client_ll_close(struct hid_device *hdev)
+       connected = steam->connected && !steam->client_opened;
+       spin_unlock_irqrestore(&steam->lock, flags);
+ 
+-      if (connected) {
+-              steam_set_lizard_mode(steam, lizard_mode);
+-              steam_input_register(steam);
+-              steam_sensors_register(steam);
+-      }
++      schedule_work(&steam->unregister_work);
+ }
+ 
+ static int steam_client_ll_raw_request(struct hid_device *hdev,
+@@ -1231,6 +1252,7 @@ static int steam_probe(struct hid_device *hdev,
+       INIT_LIST_HEAD(&steam->list);
+       INIT_WORK(&steam->rumble_work, steam_haptic_rumble_cb);
+       steam->sensor_timestamp_us = 0;
++      INIT_WORK(&steam->unregister_work, steam_work_unregister_cb);
+ 
+       /*
+        * With the real steam controller interface, do not connect hidraw.
+@@ -1291,6 +1313,7 @@ static int steam_probe(struct hid_device *hdev,
+       cancel_work_sync(&steam->work_connect);
+       cancel_delayed_work_sync(&steam->mode_switch);
+       cancel_work_sync(&steam->rumble_work);
++      cancel_work_sync(&steam->unregister_work);
+ 
+       return ret;
+ }
+@@ -1307,6 +1330,7 @@ static void steam_remove(struct hid_device *hdev)
+       cancel_delayed_work_sync(&steam->mode_switch);
+       cancel_work_sync(&steam->work_connect);
+       cancel_work_sync(&steam->rumble_work);
++      cancel_work_sync(&steam->unregister_work);
+       hid_destroy_device(steam->client_hdev);
+       steam->client_hdev = NULL;
+       steam->client_opened = 0;
+-- 
+2.39.5
+
diff --git a/queue-6.12/include-net-add-static-inline-dst_dev_overhead-to-ds.patch b/queue-6.12/include-net-add-static-inline-dst_dev_overhead-to-ds.patch

new file mode 100644 (file)

index 0000000..866cfbf
--- /dev/null
+++ b/queue-6.12/include-net-add-static-inline-dst_dev_overhead-to-ds.patch
@@ -0,0 +1,49 @@
+From 94442da5bb6720d911cd4dab449edd2d200b347c Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 3 Dec 2024 13:49:42 +0100
+Subject: include: net: add static inline dst_dev_overhead() to dst.h
+
+From: Justin Iurman <justin.iurman@uliege.be>
+
+[ Upstream commit 0600cf40e9b36fe17f9c9f04d4f9cef249eaa5e7 ]
+
+Add static inline dst_dev_overhead() function to include/net/dst.h. This
+helper function is used by ioam6_iptunnel, rpl_iptunnel and
+seg6_iptunnel to get the dev's overhead based on a cache entry
+(dst_entry). If the cache is empty, the default and generic value
+skb->mac_len is returned. Otherwise, LL_RESERVED_SPACE() over dst's dev
+is returned.
+
+Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
+Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
+Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
+Signed-off-by: Paolo Abeni <pabeni@redhat.com>
+Stable-dep-of: 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ include/net/dst.h | 9 +++++++++
+ 1 file changed, 9 insertions(+)
+
+diff --git a/include/net/dst.h b/include/net/dst.h
+index 0f303cc602520..08647c99d79c9 100644
+--- a/include/net/dst.h
++++ b/include/net/dst.h
+@@ -440,6 +440,15 @@ static inline void dst_set_expires(struct dst_entry *dst, int timeout)
+               dst->expires = expires;
+ }
+ 
++static inline unsigned int dst_dev_overhead(struct dst_entry *dst,
++                                          struct sk_buff *skb)
++{
++      if (likely(dst))
++              return LL_RESERVED_SPACE(dst->dev);
++
++      return skb->mac_len;
++}
++
+ INDIRECT_CALLABLE_DECLARE(int ip6_output(struct net *, struct sock *,
+                                        struct sk_buff *));
+ INDIRECT_CALLABLE_DECLARE(int ip_output(struct net *, struct sock *,
+-- 
+2.39.5
+
diff --git a/queue-6.12/ipv4-add-rcu-protection-to-ip4_dst_hoplimit.patch b/queue-6.12/ipv4-add-rcu-protection-to-ip4_dst_hoplimit.patch

new file mode 100644 (file)

index 0000000..354c700
--- /dev/null
+++ b/queue-6.12/ipv4-add-rcu-protection-to-ip4_dst_hoplimit.patch
@@ -0,0 +1,47 @@
+From cefa529ae75091cdf02550357f5616f1debc3840 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:10 +0000
+Subject: ipv4: add RCU protection to ip4_dst_hoplimit()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 469308552ca4560176cfc100e7ca84add1bebd7c ]
+
+ip4_dst_hoplimit() must use RCU protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-3-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ include/net/route.h | 9 +++++++--
+ 1 file changed, 7 insertions(+), 2 deletions(-)
+
+diff --git a/include/net/route.h b/include/net/route.h
+index 1789f1e6640b4..da34b6fa9862d 100644
+--- a/include/net/route.h
++++ b/include/net/route.h
+@@ -363,10 +363,15 @@ static inline int inet_iif(const struct sk_buff *skb)
+ static inline int ip4_dst_hoplimit(const struct dst_entry *dst)
+ {
+       int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT);
+-      struct net *net = dev_net(dst->dev);
+ 
+-      if (hoplimit == 0)
++      if (hoplimit == 0) {
++              const struct net *net;
++
++              rcu_read_lock();
++              net = dev_net_rcu(dst->dev);
+               hoplimit = READ_ONCE(net->ipv4.sysctl_ip_default_ttl);
++              rcu_read_unlock();
++      }
+       return hoplimit;
+ }
+ 
+-- 
+2.39.5
+
diff --git a/queue-6.12/ipv4-icmp-convert-to-dev_net_rcu.patch b/queue-6.12/ipv4-icmp-convert-to-dev_net_rcu.patch

new file mode 100644 (file)

index 0000000..9af481d
--- /dev/null
+++ b/queue-6.12/ipv4-icmp-convert-to-dev_net_rcu.patch
@@ -0,0 +1,150 @@
+From 989271e21e0151d868a02c4abfffcae6521ba34e Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:16 +0000
+Subject: ipv4: icmp: convert to dev_net_rcu()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 4b8474a0951e605d2a27a2c483da4eb4b8c63760 ]
+
+__icmp_send() must ensure rcu_read_lock() is held, as spotted
+by Jakub.
+
+Other ICMP uses of dev_net() seem safe, change them to dev_net_rcu()
+to get LOCKDEP support.
+
+Fixes: dde1bc0e6f86 ("[NETNS]: Add namespace for ICMP replying code.")
+Closes: https://lore.kernel.org/netdev/20250203153633.46ce0337@kernel.org/
+Reported-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Link: https://patch.msgid.link/20250205155120.1676781-9-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/icmp.c | 31 +++++++++++++++++--------------
+ 1 file changed, 17 insertions(+), 14 deletions(-)
+
+diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
+index 932bd775fc268..f45bc187a92a7 100644
+--- a/net/ipv4/icmp.c
++++ b/net/ipv4/icmp.c
+@@ -399,10 +399,10 @@ static void icmp_push_reply(struct sock *sk,
+ 
+ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
+ {
+-      struct ipcm_cookie ipc;
+       struct rtable *rt = skb_rtable(skb);
+-      struct net *net = dev_net(rt->dst.dev);
++      struct net *net = dev_net_rcu(rt->dst.dev);
+       bool apply_ratelimit = false;
++      struct ipcm_cookie ipc;
+       struct flowi4 fl4;
+       struct sock *sk;
+       struct inet_sock *inet;
+@@ -610,12 +610,14 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
+       struct sock *sk;
+ 
+       if (!rt)
+-              goto out;
++              return;
++
++      rcu_read_lock();
+ 
+       if (rt->dst.dev)
+-              net = dev_net(rt->dst.dev);
++              net = dev_net_rcu(rt->dst.dev);
+       else if (skb_in->dev)
+-              net = dev_net(skb_in->dev);
++              net = dev_net_rcu(skb_in->dev);
+       else
+               goto out;
+ 
+@@ -786,7 +788,8 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
+       icmp_xmit_unlock(sk);
+ out_bh_enable:
+       local_bh_enable();
+-out:;
++out:
++      rcu_read_unlock();
+ }
+ EXPORT_SYMBOL(__icmp_send);
+ 
+@@ -835,7 +838,7 @@ static void icmp_socket_deliver(struct sk_buff *skb, u32 info)
+        * avoid additional coding at protocol handlers.
+        */
+       if (!pskb_may_pull(skb, iph->ihl * 4 + 8)) {
+-              __ICMP_INC_STATS(dev_net(skb->dev), ICMP_MIB_INERRORS);
++              __ICMP_INC_STATS(dev_net_rcu(skb->dev), ICMP_MIB_INERRORS);
+               return;
+       }
+ 
+@@ -869,7 +872,7 @@ static enum skb_drop_reason icmp_unreach(struct sk_buff *skb)
+       struct net *net;
+       u32 info = 0;
+ 
+-      net = dev_net(skb_dst(skb)->dev);
++      net = dev_net_rcu(skb_dst(skb)->dev);
+ 
+       /*
+        *      Incomplete header ?
+@@ -980,7 +983,7 @@ static enum skb_drop_reason icmp_unreach(struct sk_buff *skb)
+ static enum skb_drop_reason icmp_redirect(struct sk_buff *skb)
+ {
+       if (skb->len < sizeof(struct iphdr)) {
+-              __ICMP_INC_STATS(dev_net(skb->dev), ICMP_MIB_INERRORS);
++              __ICMP_INC_STATS(dev_net_rcu(skb->dev), ICMP_MIB_INERRORS);
+               return SKB_DROP_REASON_PKT_TOO_SMALL;
+       }
+ 
+@@ -1012,7 +1015,7 @@ static enum skb_drop_reason icmp_echo(struct sk_buff *skb)
+       struct icmp_bxm icmp_param;
+       struct net *net;
+ 
+-      net = dev_net(skb_dst(skb)->dev);
++      net = dev_net_rcu(skb_dst(skb)->dev);
+       /* should there be an ICMP stat for ignored echos? */
+       if (READ_ONCE(net->ipv4.sysctl_icmp_echo_ignore_all))
+               return SKB_NOT_DROPPED_YET;
+@@ -1041,9 +1044,9 @@ static enum skb_drop_reason icmp_echo(struct sk_buff *skb)
+ 
+ bool icmp_build_probe(struct sk_buff *skb, struct icmphdr *icmphdr)
+ {
++      struct net *net = dev_net_rcu(skb->dev);
+       struct icmp_ext_hdr *ext_hdr, _ext_hdr;
+       struct icmp_ext_echo_iio *iio, _iio;
+-      struct net *net = dev_net(skb->dev);
+       struct inet6_dev *in6_dev;
+       struct in_device *in_dev;
+       struct net_device *dev;
+@@ -1182,7 +1185,7 @@ static enum skb_drop_reason icmp_timestamp(struct sk_buff *skb)
+       return SKB_NOT_DROPPED_YET;
+ 
+ out_err:
+-      __ICMP_INC_STATS(dev_net(skb_dst(skb)->dev), ICMP_MIB_INERRORS);
++      __ICMP_INC_STATS(dev_net_rcu(skb_dst(skb)->dev), ICMP_MIB_INERRORS);
+       return SKB_DROP_REASON_PKT_TOO_SMALL;
+ }
+ 
+@@ -1199,7 +1202,7 @@ int icmp_rcv(struct sk_buff *skb)
+ {
+       enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
+       struct rtable *rt = skb_rtable(skb);
+-      struct net *net = dev_net(rt->dst.dev);
++      struct net *net = dev_net_rcu(rt->dst.dev);
+       struct icmphdr *icmph;
+ 
+       if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+@@ -1372,9 +1375,9 @@ int icmp_err(struct sk_buff *skb, u32 info)
+       struct iphdr *iph = (struct iphdr *)skb->data;
+       int offset = iph->ihl<<2;
+       struct icmphdr *icmph = (struct icmphdr *)(skb->data + offset);
++      struct net *net = dev_net_rcu(skb->dev);
+       int type = icmp_hdr(skb)->type;
+       int code = icmp_hdr(skb)->code;
+-      struct net *net = dev_net(skb->dev);
+ 
+       /*
+        * Use ping_err to handle all icmp errors except those
+-- 
+2.39.5
+
diff --git a/queue-6.12/ipv4-use-rcu-protection-in-__ip_rt_update_pmtu.patch b/queue-6.12/ipv4-use-rcu-protection-in-__ip_rt_update_pmtu.patch

new file mode 100644 (file)

index 0000000..c4f7463
--- /dev/null
+++ b/queue-6.12/ipv4-use-rcu-protection-in-__ip_rt_update_pmtu.patch
@@ -0,0 +1,77 @@
+From 6df497a23d547d01649610b3005e4624bb5fd212 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:15 +0000
+Subject: ipv4: use RCU protection in __ip_rt_update_pmtu()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 139512191bd06f1b496117c76372b2ce372c9a41 ]
+
+__ip_rt_update_pmtu() must use RCU protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: 2fbc6e89b2f1 ("ipv4: Update exception handling for multipath routes via same device")
+Fixes: 1de6b15a434c ("Namespaceify min_pmtu sysctl")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Link: https://patch.msgid.link/20250205155120.1676781-8-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/route.c | 11 ++++++-----
+ 1 file changed, 6 insertions(+), 5 deletions(-)
+
+diff --git a/net/ipv4/route.c b/net/ipv4/route.c
+index f707cdb26ff20..41b320f0c20eb 100644
+--- a/net/ipv4/route.c
++++ b/net/ipv4/route.c
+@@ -1008,9 +1008,9 @@ out:     kfree_skb_reason(skb, reason);
+ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
+ {
+       struct dst_entry *dst = &rt->dst;
+-      struct net *net = dev_net(dst->dev);
+       struct fib_result res;
+       bool lock = false;
++      struct net *net;
+       u32 old_mtu;
+ 
+       if (ip_mtu_locked(dst))
+@@ -1020,6 +1020,8 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
+       if (old_mtu < mtu)
+               return;
+ 
++      rcu_read_lock();
++      net = dev_net_rcu(dst->dev);
+       if (mtu < net->ipv4.ip_rt_min_pmtu) {
+               lock = true;
+               mtu = min(old_mtu, net->ipv4.ip_rt_min_pmtu);
+@@ -1027,9 +1029,8 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
+ 
+       if (rt->rt_pmtu == mtu && !lock &&
+           time_before(jiffies, dst->expires - net->ipv4.ip_rt_mtu_expires / 2))
+-              return;
++              goto out;
+ 
+-      rcu_read_lock();
+       if (fib_lookup(net, fl4, &res, 0) == 0) {
+               struct fib_nh_common *nhc;
+ 
+@@ -1043,14 +1044,14 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
+                               update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
+                                                     jiffies + net->ipv4.ip_rt_mtu_expires);
+                       }
+-                      rcu_read_unlock();
+-                      return;
++                      goto out;
+               }
+ #endif /* CONFIG_IP_ROUTE_MULTIPATH */
+               nhc = FIB_RES_NHC(res);
+               update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
+                                     jiffies + net->ipv4.ip_rt_mtu_expires);
+       }
++out:
+       rcu_read_unlock();
+ }
+ 
+-- 
+2.39.5
+
diff --git a/queue-6.12/ipv4-use-rcu-protection-in-inet_select_addr.patch b/queue-6.12/ipv4-use-rcu-protection-in-inet_select_addr.patch

new file mode 100644 (file)

index 0000000..c0e5102
--- /dev/null
+++ b/queue-6.12/ipv4-use-rcu-protection-in-inet_select_addr.patch
@@ -0,0 +1,41 @@
+From 2ec2ebabf622d94b9b1318952c34696e2bef9e99 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:14 +0000
+Subject: ipv4: use RCU protection in inet_select_addr()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 719817cd293e4fa389e1f69c396f3f816ed5aa41 ]
+
+inet_select_addr() must use RCU protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: c4544c724322 ("[NETNS]: Process inet_select_addr inside a namespace.")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Link: https://patch.msgid.link/20250205155120.1676781-7-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/devinet.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
+index 7cf5f7d0d0de2..a55e95046984d 100644
+--- a/net/ipv4/devinet.c
++++ b/net/ipv4/devinet.c
+@@ -1351,10 +1351,11 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
+       __be32 addr = 0;
+       unsigned char localnet_scope = RT_SCOPE_HOST;
+       struct in_device *in_dev;
+-      struct net *net = dev_net(dev);
++      struct net *net;
+       int master_idx;
+ 
+       rcu_read_lock();
++      net = dev_net_rcu(dev);
+       in_dev = __in_dev_get_rcu(dev);
+       if (!in_dev)
+               goto no_in_dev;
+-- 
+2.39.5
+
diff --git a/queue-6.12/ipv4-use-rcu-protection-in-ip_dst_mtu_maybe_forward.patch b/queue-6.12/ipv4-use-rcu-protection-in-ip_dst_mtu_maybe_forward.patch

new file mode 100644 (file)

index 0000000..8b04468
--- /dev/null
+++ b/queue-6.12/ipv4-use-rcu-protection-in-ip_dst_mtu_maybe_forward.patch
@@ -0,0 +1,57 @@
+From c95dbd419be896fa93fc9f5f6aca15b4bc2855f3 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:11 +0000
+Subject: ipv4: use RCU protection in ip_dst_mtu_maybe_forward()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 071d8012869b6af352acca346ade13e7be90a49f ]
+
+ip_dst_mtu_maybe_forward() must use RCU protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: f87c10a8aa1e8 ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-4-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ include/net/ip.h | 13 ++++++++++---
+ 1 file changed, 10 insertions(+), 3 deletions(-)
+
+diff --git a/include/net/ip.h b/include/net/ip.h
+index d92d3bc3ec0e2..fe4f854381143 100644
+--- a/include/net/ip.h
++++ b/include/net/ip.h
+@@ -465,9 +465,12 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
+                                                   bool forwarding)
+ {
+       const struct rtable *rt = dst_rtable(dst);
+-      struct net *net = dev_net(dst->dev);
+-      unsigned int mtu;
++      unsigned int mtu, res;
++      struct net *net;
++
++      rcu_read_lock();
+ 
++      net = dev_net_rcu(dst->dev);
+       if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
+           ip_mtu_locked(dst) ||
+           !forwarding) {
+@@ -491,7 +494,11 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
+ out:
+       mtu = min_t(unsigned int, mtu, IP_MAX_MTU);
+ 
+-      return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
++      res = mtu - lwtunnel_headroom(dst->lwtstate, mtu);
++
++      rcu_read_unlock();
++
++      return res;
+ }
+ 
+ static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
+-- 
+2.39.5
+
diff --git a/queue-6.12/ipv4-use-rcu-protection-in-ipv4_default_advmss.patch b/queue-6.12/ipv4-use-rcu-protection-in-ipv4_default_advmss.patch

new file mode 100644 (file)

index 0000000..f39154d
--- /dev/null
+++ b/queue-6.12/ipv4-use-rcu-protection-in-ipv4_default_advmss.patch
@@ -0,0 +1,48 @@
+From 0d06ebdce5e1322d2c3790fdac2d3f10f8a4ae88 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:12 +0000
+Subject: ipv4: use RCU protection in ipv4_default_advmss()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 71b8471c93fa0bcab911fcb65da1eb6c4f5f735f ]
+
+ipv4_default_advmss() must use RCU protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: 2e9589ff809e ("ipv4: Namespaceify min_adv_mss sysctl knob")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-5-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/route.c | 11 ++++++++---
+ 1 file changed, 8 insertions(+), 3 deletions(-)
+
+diff --git a/net/ipv4/route.c b/net/ipv4/route.c
+index 2a27913588d05..9709ec3e2dce6 100644
+--- a/net/ipv4/route.c
++++ b/net/ipv4/route.c
+@@ -1294,10 +1294,15 @@ static void set_class_tag(struct rtable *rt, u32 tag)
+ 
+ static unsigned int ipv4_default_advmss(const struct dst_entry *dst)
+ {
+-      struct net *net = dev_net(dst->dev);
+       unsigned int header_size = sizeof(struct tcphdr) + sizeof(struct iphdr);
+-      unsigned int advmss = max_t(unsigned int, ipv4_mtu(dst) - header_size,
+-                                  net->ipv4.ip_rt_min_advmss);
++      unsigned int advmss;
++      struct net *net;
++
++      rcu_read_lock();
++      net = dev_net_rcu(dst->dev);
++      advmss = max_t(unsigned int, ipv4_mtu(dst) - header_size,
++                                 net->ipv4.ip_rt_min_advmss);
++      rcu_read_unlock();
+ 
+       return min(advmss, IPV4_MAX_PMTU - header_size);
+ }
+-- 
+2.39.5
+
diff --git a/queue-6.12/ipv4-use-rcu-protection-in-rt_is_expired.patch b/queue-6.12/ipv4-use-rcu-protection-in-rt_is_expired.patch

new file mode 100644 (file)

index 0000000..2090f98
--- /dev/null
+++ b/queue-6.12/ipv4-use-rcu-protection-in-rt_is_expired.patch
@@ -0,0 +1,44 @@
+From 4a621a026b1c90ff11300cc33a620d2c94dc6535 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:13 +0000
+Subject: ipv4: use RCU protection in rt_is_expired()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit dd205fcc33d92d54eee4d7f21bb073af9bd5ce2b ]
+
+rt_is_expired() must use RCU protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: e84f84f27647 ("netns: place rt_genid into struct net")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-6-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/route.c | 8 +++++++-
+ 1 file changed, 7 insertions(+), 1 deletion(-)
+
+diff --git a/net/ipv4/route.c b/net/ipv4/route.c
+index 9709ec3e2dce6..e31aa5a74ace4 100644
+--- a/net/ipv4/route.c
++++ b/net/ipv4/route.c
+@@ -390,7 +390,13 @@ static inline int ip_rt_proc_init(void)
+ 
+ static inline bool rt_is_expired(const struct rtable *rth)
+ {
+-      return rth->rt_genid != rt_genid_ipv4(dev_net(rth->dst.dev));
++      bool res;
++
++      rcu_read_lock();
++      res = rth->rt_genid != rt_genid_ipv4(dev_net_rcu(rth->dst.dev));
++      rcu_read_unlock();
++
++      return res;
+ }
+ 
+ void rt_cache_flush(struct net *net)
+-- 
+2.39.5
+
diff --git a/queue-6.12/ipv6-icmp-convert-to-dev_net_rcu.patch b/queue-6.12/ipv6-icmp-convert-to-dev_net_rcu.patch

new file mode 100644 (file)

index 0000000..0e17244
--- /dev/null
+++ b/queue-6.12/ipv6-icmp-convert-to-dev_net_rcu.patch
@@ -0,0 +1,191 @@
+From af9d8e5913252d2a98a619e3254dcc5f0f976ca9 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:19 +0000
+Subject: ipv6: icmp: convert to dev_net_rcu()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 34aef2b0ce3aa4eb4ef2e1f5cad3738d527032f5 ]
+
+icmp6_send() must acquire rcu_read_lock() sooner to ensure
+the dev_net() call done from a safe context.
+
+Other ICMPv6 uses of dev_net() seem safe, change them to
+dev_net_rcu() to get LOCKDEP support to catch bugs.
+
+Fixes: 9a43b709a230 ("[NETNS][IPV6] icmp6 - make icmpv6_socket per namespace")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Link: https://patch.msgid.link/20250205155120.1676781-12-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/icmp.c | 42 +++++++++++++++++++++++-------------------
+ 1 file changed, 23 insertions(+), 19 deletions(-)
+
+diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
+index a6984a29fdb9d..4d14ab7f7e99f 100644
+--- a/net/ipv6/icmp.c
++++ b/net/ipv6/icmp.c
+@@ -76,7 +76,7 @@ static int icmpv6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
+ {
+       /* icmpv6_notify checks 8 bytes can be pulled, icmp6hdr is 8 bytes */
+       struct icmp6hdr *icmp6 = (struct icmp6hdr *) (skb->data + offset);
+-      struct net *net = dev_net(skb->dev);
++      struct net *net = dev_net_rcu(skb->dev);
+ 
+       if (type == ICMPV6_PKT_TOOBIG)
+               ip6_update_pmtu(skb, net, info, skb->dev->ifindex, 0, sock_net_uid(net, NULL));
+@@ -473,7 +473,10 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+ 
+       if (!skb->dev)
+               return;
+-      net = dev_net(skb->dev);
++
++      rcu_read_lock();
++
++      net = dev_net_rcu(skb->dev);
+       mark = IP6_REPLY_MARK(net, skb->mark);
+       /*
+        *      Make sure we respect the rules
+@@ -496,7 +499,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+                   !(type == ICMPV6_PARAMPROB &&
+                     code == ICMPV6_UNK_OPTION &&
+                     (opt_unrec(skb, info))))
+-                      return;
++                      goto out;
+ 
+               saddr = NULL;
+       }
+@@ -526,7 +529,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+       if ((addr_type == IPV6_ADDR_ANY) || (addr_type & IPV6_ADDR_MULTICAST)) {
+               net_dbg_ratelimited("icmp6_send: addr_any/mcast source [%pI6c > %pI6c]\n",
+                                   &hdr->saddr, &hdr->daddr);
+-              return;
++              goto out;
+       }
+ 
+       /*
+@@ -535,7 +538,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+       if (is_ineligible(skb)) {
+               net_dbg_ratelimited("icmp6_send: no reply to icmp error [%pI6c > %pI6c]\n",
+                                   &hdr->saddr, &hdr->daddr);
+-              return;
++              goto out;
+       }
+ 
+       /* Needed by both icmpv6_global_allow and icmpv6_xmit_lock */
+@@ -582,7 +585,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+       np = inet6_sk(sk);
+ 
+       if (!icmpv6_xrlim_allow(sk, type, &fl6, apply_ratelimit))
+-              goto out;
++              goto out_unlock;
+ 
+       tmp_hdr.icmp6_type = type;
+       tmp_hdr.icmp6_code = code;
+@@ -600,7 +603,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+ 
+       dst = icmpv6_route_lookup(net, skb, sk, &fl6);
+       if (IS_ERR(dst))
+-              goto out;
++              goto out_unlock;
+ 
+       ipc6.hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst);
+ 
+@@ -616,7 +619,6 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+               goto out_dst_release;
+       }
+ 
+-      rcu_read_lock();
+       idev = __in6_dev_get(skb->dev);
+ 
+       if (ip6_append_data(sk, icmpv6_getfrag, &msg,
+@@ -630,13 +632,15 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+               icmpv6_push_pending_frames(sk, &fl6, &tmp_hdr,
+                                          len + sizeof(struct icmp6hdr));
+       }
+-      rcu_read_unlock();
++
+ out_dst_release:
+       dst_release(dst);
+-out:
++out_unlock:
+       icmpv6_xmit_unlock(sk);
+ out_bh_enable:
+       local_bh_enable();
++out:
++      rcu_read_unlock();
+ }
+ EXPORT_SYMBOL(icmp6_send);
+ 
+@@ -679,8 +683,8 @@ int ip6_err_gen_icmpv6_unreach(struct sk_buff *skb, int nhs, int type,
+       skb_pull(skb2, nhs);
+       skb_reset_network_header(skb2);
+ 
+-      rt = rt6_lookup(dev_net(skb->dev), &ipv6_hdr(skb2)->saddr, NULL, 0,
+-                      skb, 0);
++      rt = rt6_lookup(dev_net_rcu(skb->dev), &ipv6_hdr(skb2)->saddr,
++                      NULL, 0, skb, 0);
+ 
+       if (rt && rt->dst.dev)
+               skb2->dev = rt->dst.dev;
+@@ -717,7 +721,7 @@ EXPORT_SYMBOL(ip6_err_gen_icmpv6_unreach);
+ 
+ static enum skb_drop_reason icmpv6_echo_reply(struct sk_buff *skb)
+ {
+-      struct net *net = dev_net(skb->dev);
++      struct net *net = dev_net_rcu(skb->dev);
+       struct sock *sk;
+       struct inet6_dev *idev;
+       struct ipv6_pinfo *np;
+@@ -832,7 +836,7 @@ enum skb_drop_reason icmpv6_notify(struct sk_buff *skb, u8 type,
+                                  u8 code, __be32 info)
+ {
+       struct inet6_skb_parm *opt = IP6CB(skb);
+-      struct net *net = dev_net(skb->dev);
++      struct net *net = dev_net_rcu(skb->dev);
+       const struct inet6_protocol *ipprot;
+       enum skb_drop_reason reason;
+       int inner_offset;
+@@ -889,7 +893,7 @@ enum skb_drop_reason icmpv6_notify(struct sk_buff *skb, u8 type,
+ static int icmpv6_rcv(struct sk_buff *skb)
+ {
+       enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
+-      struct net *net = dev_net(skb->dev);
++      struct net *net = dev_net_rcu(skb->dev);
+       struct net_device *dev = icmp6_dev(skb);
+       struct inet6_dev *idev = __in6_dev_get(dev);
+       const struct in6_addr *saddr, *daddr;
+@@ -921,7 +925,7 @@ static int icmpv6_rcv(struct sk_buff *skb)
+               skb_set_network_header(skb, nh);
+       }
+ 
+-      __ICMP6_INC_STATS(dev_net(dev), idev, ICMP6_MIB_INMSGS);
++      __ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_INMSGS);
+ 
+       saddr = &ipv6_hdr(skb)->saddr;
+       daddr = &ipv6_hdr(skb)->daddr;
+@@ -939,7 +943,7 @@ static int icmpv6_rcv(struct sk_buff *skb)
+ 
+       type = hdr->icmp6_type;
+ 
+-      ICMP6MSGIN_INC_STATS(dev_net(dev), idev, type);
++      ICMP6MSGIN_INC_STATS(dev_net_rcu(dev), idev, type);
+ 
+       switch (type) {
+       case ICMPV6_ECHO_REQUEST:
+@@ -1034,9 +1038,9 @@ static int icmpv6_rcv(struct sk_buff *skb)
+ 
+ csum_error:
+       reason = SKB_DROP_REASON_ICMP_CSUM;
+-      __ICMP6_INC_STATS(dev_net(dev), idev, ICMP6_MIB_CSUMERRORS);
++      __ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_CSUMERRORS);
+ discard_it:
+-      __ICMP6_INC_STATS(dev_net(dev), idev, ICMP6_MIB_INERRORS);
++      __ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_INERRORS);
+ drop_no_count:
+       kfree_skb_reason(skb, reason);
+       return 0;
+-- 
+2.39.5
+
diff --git a/queue-6.12/ipv6-mcast-add-rcu-protection-to-mld_newpack.patch b/queue-6.12/ipv6-mcast-add-rcu-protection-to-mld_newpack.patch

new file mode 100644 (file)

index 0000000..07accf9
--- /dev/null
+++ b/queue-6.12/ipv6-mcast-add-rcu-protection-to-mld_newpack.patch
@@ -0,0 +1,80 @@
+From b5dfcef3b1bfee0811dc7d0a650074f8c6a261da Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 12 Feb 2025 14:10:21 +0000
+Subject: ipv6: mcast: add RCU protection to mld_newpack()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit a527750d877fd334de87eef81f1cb5f0f0ca3373 ]
+
+mld_newpack() can be called without RTNL or RCU being held.
+
+Note that we no longer can use sock_alloc_send_skb() because
+ipv6.igmp_sk uses GFP_KERNEL allocations which can sleep.
+
+Instead use alloc_skb() and charge the net->ipv6.igmp_sk
+socket under RCU protection.
+
+Fixes: b8ad0cbc58f7 ("[NETNS][IPV6] mcast - handle several network namespace")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: David Ahern <dsahern@kernel.org>
+Link: https://patch.msgid.link/20250212141021.1663666-1-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/mcast.c | 14 ++++++++++----
+ 1 file changed, 10 insertions(+), 4 deletions(-)
+
+diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
+index 6551648512585..b7b62e5a562e5 100644
+--- a/net/ipv6/mcast.c
++++ b/net/ipv6/mcast.c
+@@ -1730,21 +1730,19 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
+       struct net_device *dev = idev->dev;
+       int hlen = LL_RESERVED_SPACE(dev);
+       int tlen = dev->needed_tailroom;
+-      struct net *net = dev_net(dev);
+       const struct in6_addr *saddr;
+       struct in6_addr addr_buf;
+       struct mld2_report *pmr;
+       struct sk_buff *skb;
+       unsigned int size;
+       struct sock *sk;
+-      int err;
++      struct net *net;
+ 
+-      sk = net->ipv6.igmp_sk;
+       /* we assume size > sizeof(ra) here
+        * Also try to not allocate high-order pages for big MTU
+        */
+       size = min_t(int, mtu, PAGE_SIZE / 2) + hlen + tlen;
+-      skb = sock_alloc_send_skb(sk, size, 1, &err);
++      skb = alloc_skb(size, GFP_KERNEL);
+       if (!skb)
+               return NULL;
+ 
+@@ -1752,6 +1750,12 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
+       skb_reserve(skb, hlen);
+       skb_tailroom_reserve(skb, mtu, tlen);
+ 
++      rcu_read_lock();
++
++      net = dev_net_rcu(dev);
++      sk = net->ipv6.igmp_sk;
++      skb_set_owner_w(skb, sk);
++
+       if (ipv6_get_lladdr(dev, &addr_buf, IFA_F_TENTATIVE)) {
+               /* <draft-ietf-magma-mld-source-05.txt>:
+                * use unspecified address as the source address
+@@ -1763,6 +1767,8 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
+ 
+       ip6_mc_hdr(sk, skb, dev, saddr, &mld2_all_mcr, NEXTHDR_HOP, 0);
+ 
++      rcu_read_unlock();
++
+       skb_put_data(skb, ra, sizeof(ra));
+ 
+       skb_set_transport_header(skb, skb_tail_pointer(skb) - skb->data);
+-- 
+2.39.5
+
diff --git a/queue-6.12/ipv6-mcast-extend-rcu-protection-in-igmp6_send.patch b/queue-6.12/ipv6-mcast-extend-rcu-protection-in-igmp6_send.patch

new file mode 100644 (file)

index 0000000..97e96d3
--- /dev/null
+++ b/queue-6.12/ipv6-mcast-extend-rcu-protection-in-igmp6_send.patch
@@ -0,0 +1,105 @@
+From 657a1960fb3a36ea9c1566b4d869c8e03f5dc527 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 7 Feb 2025 13:58:40 +0000
+Subject: ipv6: mcast: extend RCU protection in igmp6_send()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 087c1faa594fa07a66933d750c0b2610aa1a2946 ]
+
+igmp6_send() can be called without RTNL or RCU being held.
+
+Extend RCU protection so that we can safely fetch the net pointer
+and avoid a potential UAF.
+
+Note that we no longer can use sock_alloc_send_skb() because
+ipv6.igmp_sk uses GFP_KERNEL allocations which can sleep.
+
+Instead use alloc_skb() and charge the net->ipv6.igmp_sk
+socket under RCU protection.
+
+Fixes: b8ad0cbc58f7 ("[NETNS][IPV6] mcast - handle several network namespace")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: David Ahern <dsahern@kernel.org>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250207135841.1948589-9-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/mcast.c | 31 +++++++++++++++----------------
+ 1 file changed, 15 insertions(+), 16 deletions(-)
+
+diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
+index b244dbf61d5f3..6551648512585 100644
+--- a/net/ipv6/mcast.c
++++ b/net/ipv6/mcast.c
+@@ -2122,21 +2122,21 @@ static void mld_send_cr(struct inet6_dev *idev)
+ 
+ static void igmp6_send(struct in6_addr *addr, struct net_device *dev, int type)
+ {
+-      struct net *net = dev_net(dev);
+-      struct sock *sk = net->ipv6.igmp_sk;
++      const struct in6_addr *snd_addr, *saddr;
++      int err, len, payload_len, full_len;
++      struct in6_addr addr_buf;
+       struct inet6_dev *idev;
+       struct sk_buff *skb;
+       struct mld_msg *hdr;
+-      const struct in6_addr *snd_addr, *saddr;
+-      struct in6_addr addr_buf;
+       int hlen = LL_RESERVED_SPACE(dev);
+       int tlen = dev->needed_tailroom;
+-      int err, len, payload_len, full_len;
+       u8 ra[8] = { IPPROTO_ICMPV6, 0,
+                    IPV6_TLV_ROUTERALERT, 2, 0, 0,
+                    IPV6_TLV_PADN, 0 };
+-      struct flowi6 fl6;
+       struct dst_entry *dst;
++      struct flowi6 fl6;
++      struct net *net;
++      struct sock *sk;
+ 
+       if (type == ICMPV6_MGM_REDUCTION)
+               snd_addr = &in6addr_linklocal_allrouters;
+@@ -2147,19 +2147,21 @@ static void igmp6_send(struct in6_addr *addr, struct net_device *dev, int type)
+       payload_len = len + sizeof(ra);
+       full_len = sizeof(struct ipv6hdr) + payload_len;
+ 
+-      rcu_read_lock();
+-      IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_OUTREQUESTS);
+-      rcu_read_unlock();
++      skb = alloc_skb(hlen + tlen + full_len, GFP_KERNEL);
+ 
+-      skb = sock_alloc_send_skb(sk, hlen + tlen + full_len, 1, &err);
++      rcu_read_lock();
+ 
++      net = dev_net_rcu(dev);
++      idev = __in6_dev_get(dev);
++      IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTREQUESTS);
+       if (!skb) {
+-              rcu_read_lock();
+-              IP6_INC_STATS(net, __in6_dev_get(dev),
+-                            IPSTATS_MIB_OUTDISCARDS);
++              IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
+               rcu_read_unlock();
+               return;
+       }
++      sk = net->ipv6.igmp_sk;
++      skb_set_owner_w(skb, sk);
++
+       skb->priority = TC_PRIO_CONTROL;
+       skb_reserve(skb, hlen);
+ 
+@@ -2184,9 +2186,6 @@ static void igmp6_send(struct in6_addr *addr, struct net_device *dev, int type)
+                                        IPPROTO_ICMPV6,
+                                        csum_partial(hdr, len, 0));
+ 
+-      rcu_read_lock();
+-      idev = __in6_dev_get(skb->dev);
+-
+       icmpv6_flow_init(sk, &fl6, type,
+                        &ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr,
+                        skb->dev->ifindex);
+-- 
+2.39.5
+
diff --git a/queue-6.12/ipv6-use-rcu-protection-in-ip6_default_advmss.patch b/queue-6.12/ipv6-use-rcu-protection-in-ip6_default_advmss.patch

new file mode 100644 (file)

index 0000000..1fae218
--- /dev/null
+++ b/queue-6.12/ipv6-use-rcu-protection-in-ip6_default_advmss.patch
@@ -0,0 +1,49 @@
+From 696b09f558018b55ac46120507790417aebd9c6a Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:18 +0000
+Subject: ipv6: use RCU protection in ip6_default_advmss()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 3c8ffcd248da34fc41e52a46e51505900115fc2a ]
+
+ip6_default_advmss() needs rcu protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: 5578689a4e3c ("[NETNS][IPV6] route6 - make route6 per namespace")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-11-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/route.c | 7 ++++++-
+ 1 file changed, 6 insertions(+), 1 deletion(-)
+
+diff --git a/net/ipv6/route.c b/net/ipv6/route.c
+index 8ebfed5d63232..2736dea77575b 100644
+--- a/net/ipv6/route.c
++++ b/net/ipv6/route.c
+@@ -3196,13 +3196,18 @@ static unsigned int ip6_default_advmss(const struct dst_entry *dst)
+ {
+       struct net_device *dev = dst->dev;
+       unsigned int mtu = dst_mtu(dst);
+-      struct net *net = dev_net(dev);
++      struct net *net;
+ 
+       mtu -= sizeof(struct ipv6hdr) + sizeof(struct tcphdr);
+ 
++      rcu_read_lock();
++
++      net = dev_net_rcu(dev);
+       if (mtu < net->ipv6.sysctl.ip6_rt_min_advmss)
+               mtu = net->ipv6.sysctl.ip6_rt_min_advmss;
+ 
++      rcu_read_unlock();
++
+       /*
+        * Maximal non-jumbo IPv6 payload is IPV6_MAXPLEN and
+        * corresponding MSS is IPV6_MAXPLEN - tcp_header_size.
+-- 
+2.39.5
+
diff --git a/queue-6.12/ndisc-extend-rcu-protection-in-ndisc_send_skb.patch b/queue-6.12/ndisc-extend-rcu-protection-in-ndisc_send_skb.patch

new file mode 100644 (file)

index 0000000..f796b5f
--- /dev/null
+++ b/queue-6.12/ndisc-extend-rcu-protection-in-ndisc_send_skb.patch
@@ -0,0 +1,72 @@
+From 93046772ca90a811f5ad97d1db80f62b1b7cc2af Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 7 Feb 2025 13:58:39 +0000
+Subject: ndisc: extend RCU protection in ndisc_send_skb()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit ed6ae1f325d3c43966ec1b62ac1459e2b8e45640 ]
+
+ndisc_send_skb() can be called without RTNL or RCU held.
+
+Acquire rcu_read_lock() earlier, so that we can use dev_net_rcu()
+and avoid a potential UAF.
+
+Fixes: 1762f7e88eb3 ("[NETNS][IPV6] ndisc - make socket control per namespace")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: David Ahern <dsahern@kernel.org>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250207135841.1948589-8-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/ndisc.c | 12 ++++++++----
+ 1 file changed, 8 insertions(+), 4 deletions(-)
+
+diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
+index 90f8aa2d7af2e..8699d1a188dc4 100644
+--- a/net/ipv6/ndisc.c
++++ b/net/ipv6/ndisc.c
+@@ -471,16 +471,20 @@ static void ip6_nd_hdr(struct sk_buff *skb,
+ void ndisc_send_skb(struct sk_buff *skb, const struct in6_addr *daddr,
+                   const struct in6_addr *saddr)
+ {
++      struct icmp6hdr *icmp6h = icmp6_hdr(skb);
+       struct dst_entry *dst = skb_dst(skb);
+-      struct net *net = dev_net(skb->dev);
+-      struct sock *sk = net->ipv6.ndisc_sk;
+       struct inet6_dev *idev;
++      struct net *net;
++      struct sock *sk;
+       int err;
+-      struct icmp6hdr *icmp6h = icmp6_hdr(skb);
+       u8 type;
+ 
+       type = icmp6h->icmp6_type;
+ 
++      rcu_read_lock();
++
++      net = dev_net_rcu(skb->dev);
++      sk = net->ipv6.ndisc_sk;
+       if (!dst) {
+               struct flowi6 fl6;
+               int oif = skb->dev->ifindex;
+@@ -488,6 +492,7 @@ void ndisc_send_skb(struct sk_buff *skb, const struct in6_addr *daddr,
+               icmpv6_flow_init(sk, &fl6, type, saddr, daddr, oif);
+               dst = icmp6_dst_alloc(skb->dev, &fl6);
+               if (IS_ERR(dst)) {
++                      rcu_read_unlock();
+                       kfree_skb(skb);
+                       return;
+               }
+@@ -502,7 +507,6 @@ void ndisc_send_skb(struct sk_buff *skb, const struct in6_addr *daddr,
+ 
+       ip6_nd_hdr(skb, saddr, daddr, READ_ONCE(inet6_sk(sk)->hop_limit), skb->len);
+ 
+-      rcu_read_lock();
+       idev = __in6_dev_get(dst->dev);
+       IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTREQUESTS);
+ 
+-- 
+2.39.5
+
diff --git a/queue-6.12/ndisc-use-rcu-protection-in-ndisc_alloc_skb.patch b/queue-6.12/ndisc-use-rcu-protection-in-ndisc_alloc_skb.patch

new file mode 100644 (file)

index 0000000..2b8ce24
--- /dev/null
+++ b/queue-6.12/ndisc-use-rcu-protection-in-ndisc_alloc_skb.patch
@@ -0,0 +1,59 @@
+From a591ecc09cc3f5cb8580aff4cf53e13e3067139e Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 7 Feb 2025 13:58:34 +0000
+Subject: ndisc: use RCU protection in ndisc_alloc_skb()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 628e6d18930bbd21f2d4562228afe27694f66da9 ]
+
+ndisc_alloc_skb() can be called without RTNL or RCU being held.
+
+Add RCU protection to avoid possible UAF.
+
+Fixes: de09334b9326 ("ndisc: Introduce ndisc_alloc_skb() helper.")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: David Ahern <dsahern@kernel.org>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250207135841.1948589-3-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/ndisc.c | 10 ++++------
+ 1 file changed, 4 insertions(+), 6 deletions(-)
+
+diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
+index 264b10a947577..90f8aa2d7af2e 100644
+--- a/net/ipv6/ndisc.c
++++ b/net/ipv6/ndisc.c
+@@ -418,15 +418,11 @@ static struct sk_buff *ndisc_alloc_skb(struct net_device *dev,
+ {
+       int hlen = LL_RESERVED_SPACE(dev);
+       int tlen = dev->needed_tailroom;
+-      struct sock *sk = dev_net(dev)->ipv6.ndisc_sk;
+       struct sk_buff *skb;
+ 
+       skb = alloc_skb(hlen + sizeof(struct ipv6hdr) + len + tlen, GFP_ATOMIC);
+-      if (!skb) {
+-              ND_PRINTK(0, err, "ndisc: %s failed to allocate an skb\n",
+-                        __func__);
++      if (!skb)
+               return NULL;
+-      }
+ 
+       skb->protocol = htons(ETH_P_IPV6);
+       skb->dev = dev;
+@@ -437,7 +433,9 @@ static struct sk_buff *ndisc_alloc_skb(struct net_device *dev,
+       /* Manually assign socket ownership as we avoid calling
+        * sock_alloc_send_pskb() to bypass wmem buffer limits
+        */
+-      skb_set_owner_w(skb, sk);
++      rcu_read_lock();
++      skb_set_owner_w(skb, dev_net_rcu(dev)->ipv6.ndisc_sk);
++      rcu_read_unlock();
+ 
+       return skb;
+ }
+-- 
+2.39.5
+
diff --git a/queue-6.12/neighbour-use-rcu-protection-in-__neigh_notify.patch b/queue-6.12/neighbour-use-rcu-protection-in-__neigh_notify.patch

new file mode 100644 (file)

index 0000000..6ec0d2d
--- /dev/null
+++ b/queue-6.12/neighbour-use-rcu-protection-in-__neigh_notify.patch
@@ -0,0 +1,58 @@
+From b666d3ec0cc14264d2b1d2a5c2195418702bac2d Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 7 Feb 2025 13:58:35 +0000
+Subject: neighbour: use RCU protection in __neigh_notify()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit becbd5850c03ed33b232083dd66c6e38c0c0e569 ]
+
+__neigh_notify() can be called without RTNL or RCU protection.
+
+Use RCU protection to avoid potential UAF.
+
+Fixes: 426b5303eb43 ("[NETNS]: Modify the neighbour table code so it handles multiple network namespaces")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: David Ahern <dsahern@kernel.org>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250207135841.1948589-4-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/core/neighbour.c | 8 ++++++--
+ 1 file changed, 6 insertions(+), 2 deletions(-)
+
+diff --git a/net/core/neighbour.c b/net/core/neighbour.c
+index cc58315a40a79..c7f7ea61b524a 100644
+--- a/net/core/neighbour.c
++++ b/net/core/neighbour.c
+@@ -3513,10 +3513,12 @@ static const struct seq_operations neigh_stat_seq_ops = {
+ static void __neigh_notify(struct neighbour *n, int type, int flags,
+                          u32 pid)
+ {
+-      struct net *net = dev_net(n->dev);
+       struct sk_buff *skb;
+       int err = -ENOBUFS;
++      struct net *net;
+ 
++      rcu_read_lock();
++      net = dev_net_rcu(n->dev);
+       skb = nlmsg_new(neigh_nlmsg_size(), GFP_ATOMIC);
+       if (skb == NULL)
+               goto errout;
+@@ -3529,9 +3531,11 @@ static void __neigh_notify(struct neighbour *n, int type, int flags,
+               goto errout;
+       }
+       rtnl_notify(skb, net, 0, RTNLGRP_NEIGH, NULL, GFP_ATOMIC);
+-      return;
++      goto out;
+ errout:
+       rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
++out:
++      rcu_read_unlock();
+ }
+ 
+ void neigh_app_ns(struct neighbour *n)
+-- 
+2.39.5
+
diff --git a/queue-6.12/net-add-dev_net_rcu-helper.patch b/queue-6.12/net-add-dev_net_rcu-helper.patch

new file mode 100644 (file)

index 0000000..2c682f4
--- /dev/null
+++ b/queue-6.12/net-add-dev_net_rcu-helper.patch
@@ -0,0 +1,62 @@
+From 04a4d5e517db5ce3250defe23f9d333b36ffab8d Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:09 +0000
+Subject: net: add dev_net_rcu() helper
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 482ad2a4ace2740ca0ff1cbc8f3c7f862f3ab507 ]
+
+dev->nd_net can change, readers should either
+use rcu_read_lock() or RTNL.
+
+We currently use a generic helper, dev_net() with
+no debugging support. We probably have many hidden bugs.
+
+Add dev_net_rcu() helper for callers using rcu_read_lock()
+protection.
+
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-2-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Stable-dep-of: 71b8471c93fa ("ipv4: use RCU protection in ipv4_default_advmss()")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ include/linux/netdevice.h   | 6 ++++++
+ include/net/net_namespace.h | 2 +-
+ 2 files changed, 7 insertions(+), 1 deletion(-)
+
+diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
+index 02d3bafebbe77..4f17b786828af 100644
+--- a/include/linux/netdevice.h
++++ b/include/linux/netdevice.h
+@@ -2577,6 +2577,12 @@ struct net *dev_net(const struct net_device *dev)
+       return read_pnet(&dev->nd_net);
+ }
+ 
++static inline
++struct net *dev_net_rcu(const struct net_device *dev)
++{
++      return read_pnet_rcu(&dev->nd_net);
++}
++
+ static inline
+ void dev_net_set(struct net_device *dev, struct net *net)
+ {
+diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
+index 9398c8f499536..da93873df4dbd 100644
+--- a/include/net/net_namespace.h
++++ b/include/net/net_namespace.h
+@@ -387,7 +387,7 @@ static inline struct net *read_pnet(const possible_net_t *pnet)
+ #endif
+ }
+ 
+-static inline struct net *read_pnet_rcu(possible_net_t *pnet)
++static inline struct net *read_pnet_rcu(const possible_net_t *pnet)
+ {
+ #ifdef CONFIG_NET_NS
+       return rcu_dereference(pnet->net);
+-- 
+2.39.5
+
diff --git a/queue-6.12/net-ipv4-cache-pmtu-for-all-packet-paths-if-multipat.patch b/queue-6.12/net-ipv4-cache-pmtu-for-all-packet-paths-if-multipat.patch

new file mode 100644 (file)

index 0000000..5f0624c
--- /dev/null
+++ b/queue-6.12/net-ipv4-cache-pmtu-for-all-packet-paths-if-multipat.patch
@@ -0,0 +1,292 @@
+From b97fe83fff886ccd0049b9a3e014c55251140ddd Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 8 Nov 2024 09:34:24 +0000
+Subject: net: ipv4: Cache pmtu for all packet paths if multipath enabled
+
+From: Vladimir Vdovin <deliran@verdict.gg>
+
+[ Upstream commit 7d3f3b4367f315a61fc615e3138f3d320da8c466 ]
+
+Check number of paths by fib_info_num_path(),
+and update_or_create_fnhe() for every path.
+Problem is that pmtu is cached only for the oif
+that has received icmp message "need to frag",
+other oifs will still try to use "default" iface mtu.
+
+An example topology showing the problem:
+
+                    |  host1
+                +---------+
+                |  dummy0 | 10.179.20.18/32  mtu9000
+                +---------+
+        +-----------+----------------+
+    +---------+                     +---------+
+    | ens17f0 |  10.179.2.141/31    | ens17f1 |  10.179.2.13/31
+    +---------+                     +---------+
+        |    (all here have mtu 9000)    |
+    +------+                         +------+
+    | ro1  |  10.179.2.140/31        | ro2  |  10.179.2.12/31
+    +------+                         +------+
+        |                                |
+---------+------------+-------------------+------
+                        |
+                    +-----+
+                    | ro3 | 10.10.10.10  mtu1500
+                    +-----+
+                        |
+    ========================================
+                some networks
+    ========================================
+                        |
+                    +-----+
+                    | eth0| 10.10.30.30  mtu9000
+                    +-----+
+                        |  host2
+
+host1 have enabled multipath and
+sysctl net.ipv4.fib_multipath_hash_policy = 1:
+
+default proto static src 10.179.20.18
+        nexthop via 10.179.2.12 dev ens17f1 weight 1
+        nexthop via 10.179.2.140 dev ens17f0 weight 1
+
+When host1 tries to do pmtud from 10.179.20.18/32 to host2,
+host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500.
+And host1 caches it in nexthop exceptions cache.
+
+Problem is that it is cached only for the iface that has received icmp,
+and there is no way that ro3 will send icmp msg to host1 via another path.
+
+Host1 now have this routes to host2:
+
+ip r g 10.10.30.30 sport 30000 dport 443
+10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0
+    cache expires 521sec mtu 1500
+
+ip r g 10.10.30.30 sport 30033 dport 443
+10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0
+    cache
+
+So when host1 tries again to reach host2 with mtu>1500,
+if packet flow is lucky enough to be hashed with oif=ens17f1 its ok,
+if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1,
+until lucky day when ro3 will send it through another flow to ens17f0.
+
+Signed-off-by: Vladimir Vdovin <deliran@verdict.gg>
+Reviewed-by: Ido Schimmel <idosch@nvidia.com>
+Link: https://patch.msgid.link/20241108093427.317942-1-deliran@verdict.gg
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Stable-dep-of: 139512191bd0 ("ipv4: use RCU protection in __ip_rt_update_pmtu()")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/route.c                    |  13 ++++
+ tools/testing/selftests/net/pmtu.sh | 112 +++++++++++++++++++++++-----
+ 2 files changed, 108 insertions(+), 17 deletions(-)
+
+diff --git a/net/ipv4/route.c b/net/ipv4/route.c
+index e31aa5a74ace4..f707cdb26ff20 100644
+--- a/net/ipv4/route.c
++++ b/net/ipv4/route.c
+@@ -1034,6 +1034,19 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
+               struct fib_nh_common *nhc;
+ 
+               fib_select_path(net, &res, fl4, NULL);
++#ifdef CONFIG_IP_ROUTE_MULTIPATH
++              if (fib_info_num_path(res.fi) > 1) {
++                      int nhsel;
++
++                      for (nhsel = 0; nhsel < fib_info_num_path(res.fi); nhsel++) {
++                              nhc = fib_info_nhc(res.fi, nhsel);
++                              update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
++                                                    jiffies + net->ipv4.ip_rt_mtu_expires);
++                      }
++                      rcu_read_unlock();
++                      return;
++              }
++#endif /* CONFIG_IP_ROUTE_MULTIPATH */
+               nhc = FIB_RES_NHC(res);
+               update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
+                                     jiffies + net->ipv4.ip_rt_mtu_expires);
+diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
+index 6c651c880fe83..66be7699c72c9 100755
+--- a/tools/testing/selftests/net/pmtu.sh
++++ b/tools/testing/selftests/net/pmtu.sh
+@@ -197,6 +197,12 @@
+ #
+ # - pmtu_ipv6_route_change
+ #     Same as above but with IPv6
++#
++# - pmtu_ipv4_mp_exceptions
++#     Use the same topology as in pmtu_ipv4, but add routeable addresses
++#     on host A and B on lo reachable via both routers. Host A and B
++#     addresses have multipath routes to each other, b_r1 mtu = 1500.
++#     Check that PMTU exceptions are created for both paths.
+ 
+ source lib.sh
+ source net_helper.sh
+@@ -266,7 +272,8 @@ tests="
+       list_flush_ipv4_exception       ipv4: list and flush cached exceptions  1
+       list_flush_ipv6_exception       ipv6: list and flush cached exceptions  1
+       pmtu_ipv4_route_change          ipv4: PMTU exception w/route replace    1
+-      pmtu_ipv6_route_change          ipv6: PMTU exception w/route replace    1"
++      pmtu_ipv6_route_change          ipv6: PMTU exception w/route replace    1
++      pmtu_ipv4_mp_exceptions         ipv4: PMTU multipath nh exceptions      1"
+ 
+ # Addressing and routing for tests with routers: four network segments, with
+ # index SEGMENT between 1 and 4, a common prefix (PREFIX4 or PREFIX6) and an
+@@ -343,6 +350,9 @@ tunnel6_a_addr="fd00:2::a"
+ tunnel6_b_addr="fd00:2::b"
+ tunnel6_mask="64"
+ 
++host4_a_addr="192.168.99.99"
++host4_b_addr="192.168.88.88"
++
+ dummy6_0_prefix="fc00:1000::"
+ dummy6_1_prefix="fc00:1001::"
+ dummy6_mask="64"
+@@ -984,6 +994,52 @@ setup_ovs_bridge() {
+       run_cmd ip route add ${prefix6}:${b_r1}::1 via ${prefix6}:${a_r1}::2
+ }
+ 
++setup_multipath_new() {
++      # Set up host A with multipath routes to host B host4_b_addr
++      run_cmd ${ns_a} ip addr add ${host4_a_addr} dev lo
++      run_cmd ${ns_a} ip nexthop add id 401 via ${prefix4}.${a_r1}.2 dev veth_A-R1
++      run_cmd ${ns_a} ip nexthop add id 402 via ${prefix4}.${a_r2}.2 dev veth_A-R2
++      run_cmd ${ns_a} ip nexthop add id 403 group 401/402
++      run_cmd ${ns_a} ip route add ${host4_b_addr} src ${host4_a_addr} nhid 403
++
++      # Set up host B with multipath routes to host A host4_a_addr
++      run_cmd ${ns_b} ip addr add ${host4_b_addr} dev lo
++      run_cmd ${ns_b} ip nexthop add id 401 via ${prefix4}.${b_r1}.2 dev veth_B-R1
++      run_cmd ${ns_b} ip nexthop add id 402 via ${prefix4}.${b_r2}.2 dev veth_B-R2
++      run_cmd ${ns_b} ip nexthop add id 403 group 401/402
++      run_cmd ${ns_b} ip route add ${host4_a_addr} src ${host4_b_addr} nhid 403
++}
++
++setup_multipath_old() {
++      # Set up host A with multipath routes to host B host4_b_addr
++      run_cmd ${ns_a} ip addr add ${host4_a_addr} dev lo
++      run_cmd ${ns_a} ip route add ${host4_b_addr} \
++                      src ${host4_a_addr} \
++                      nexthop via ${prefix4}.${a_r1}.2 weight 1 \
++                      nexthop via ${prefix4}.${a_r2}.2 weight 1
++
++      # Set up host B with multipath routes to host A host4_a_addr
++      run_cmd ${ns_b} ip addr add ${host4_b_addr} dev lo
++      run_cmd ${ns_b} ip route add ${host4_a_addr} \
++                      src ${host4_b_addr} \
++                      nexthop via ${prefix4}.${b_r1}.2 weight 1 \
++                      nexthop via ${prefix4}.${b_r2}.2 weight 1
++}
++
++setup_multipath() {
++      if [ "$USE_NH" = "yes" ]; then
++              setup_multipath_new
++      else
++              setup_multipath_old
++      fi
++
++      # Set up routers with routes to dummies
++      run_cmd ${ns_r1} ip route add ${host4_a_addr} via ${prefix4}.${a_r1}.1
++      run_cmd ${ns_r2} ip route add ${host4_a_addr} via ${prefix4}.${a_r2}.1
++      run_cmd ${ns_r1} ip route add ${host4_b_addr} via ${prefix4}.${b_r1}.1
++      run_cmd ${ns_r2} ip route add ${host4_b_addr} via ${prefix4}.${b_r2}.1
++}
++
+ setup() {
+       [ "$(id -u)" -ne 0 ] && echo "  need to run as root" && return $ksft_skip
+ 
+@@ -1076,23 +1132,15 @@ link_get_mtu() {
+ }
+ 
+ route_get_dst_exception() {
+-      ns_cmd="${1}"
+-      dst="${2}"
+-      dsfield="${3}"
++      ns_cmd="${1}"; shift
+ 
+-      if [ -z "${dsfield}" ]; then
+-              dsfield=0
+-      fi
+-
+-      ${ns_cmd} ip route get "${dst}" dsfield "${dsfield}"
++      ${ns_cmd} ip route get "$@"
+ }
+ 
+ route_get_dst_pmtu_from_exception() {
+-      ns_cmd="${1}"
+-      dst="${2}"
+-      dsfield="${3}"
++      ns_cmd="${1}"; shift
+ 
+-      mtu_parse "$(route_get_dst_exception "${ns_cmd}" "${dst}" "${dsfield}")"
++      mtu_parse "$(route_get_dst_exception "${ns_cmd}" "$@")"
+ }
+ 
+ check_pmtu_value() {
+@@ -1235,10 +1283,10 @@ test_pmtu_ipv4_dscp_icmp_exception() {
+       run_cmd "${ns_a}" ping -q -M want -Q "${dsfield}" -c 1 -w 1 -s "${len}" "${dst2}"
+ 
+       # Check that exceptions have been created with the correct PMTU
+-      pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst1}" "${policy_mark}")"
++      pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst1}" dsfield "${policy_mark}")"
+       check_pmtu_value "1400" "${pmtu_1}" "exceeding MTU" || return 1
+ 
+-      pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst2}" "${policy_mark}")"
++      pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst2}" dsfield "${policy_mark}")"
+       check_pmtu_value "1500" "${pmtu_2}" "exceeding MTU" || return 1
+ }
+ 
+@@ -1285,9 +1333,9 @@ test_pmtu_ipv4_dscp_udp_exception() {
+               UDP:"${dst2}":50000,tos="${dsfield}"
+ 
+       # Check that exceptions have been created with the correct PMTU
+-      pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst1}" "${policy_mark}")"
++      pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst1}" dsfield "${policy_mark}")"
+       check_pmtu_value "1400" "${pmtu_1}" "exceeding MTU" || return 1
+-      pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst2}" "${policy_mark}")"
++      pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst2}" dsfield "${policy_mark}")"
+       check_pmtu_value "1500" "${pmtu_2}" "exceeding MTU" || return 1
+ }
+ 
+@@ -2329,6 +2377,36 @@ test_pmtu_ipv6_route_change() {
+       test_pmtu_ipvX_route_change 6
+ }
+ 
++test_pmtu_ipv4_mp_exceptions() {
++      setup namespaces routing multipath || return $ksft_skip
++
++      trace "${ns_a}"  veth_A-R1    "${ns_r1}" veth_R1-A \
++            "${ns_r1}" veth_R1-B    "${ns_b}"  veth_B-R1 \
++            "${ns_a}"  veth_A-R2    "${ns_r2}" veth_R2-A \
++            "${ns_r2}" veth_R2-B    "${ns_b}"  veth_B-R2
++
++      # Set up initial MTU values
++      mtu "${ns_a}"  veth_A-R1 2000
++      mtu "${ns_r1}" veth_R1-A 2000
++      mtu "${ns_r1}" veth_R1-B 1500
++      mtu "${ns_b}"  veth_B-R1 1500
++
++      mtu "${ns_a}"  veth_A-R2 2000
++      mtu "${ns_r2}" veth_R2-A 2000
++      mtu "${ns_r2}" veth_R2-B 1500
++      mtu "${ns_b}"  veth_B-R2 1500
++
++      # Ping and expect two nexthop exceptions for two routes
++      run_cmd ${ns_a} ping -q -M want -i 0.1 -c 1 -s 1800 "${host4_b_addr}"
++
++      # Check that exceptions have been created with the correct PMTU
++      pmtu_a_R1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${host4_b_addr}" oif veth_A-R1)"
++      pmtu_a_R2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${host4_b_addr}" oif veth_A-R2)"
++
++      check_pmtu_value "1500" "${pmtu_a_R1}" "exceeding MTU (veth_A-R1)" || return 1
++      check_pmtu_value "1500" "${pmtu_a_R2}" "exceeding MTU (veth_A-R2)" || return 1
++}
++
+ usage() {
+       echo
+       echo "$0 [OPTIONS] [TEST]..."
+-- 
+2.39.5
+
diff --git a/queue-6.12/net-ipv6-fix-dst-ref-loops-in-rpl-seg6-and-ioam6-lwt.patch b/queue-6.12/net-ipv6-fix-dst-ref-loops-in-rpl-seg6-and-ioam6-lwt.patch

new file mode 100644 (file)

index 0000000..18b2d14
--- /dev/null
+++ b/queue-6.12/net-ipv6-fix-dst-ref-loops-in-rpl-seg6-and-ioam6-lwt.patch
@@ -0,0 +1,94 @@
+From 01157585e1e4aefbcfed1d8919812874a7afed8c Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 29 Jan 2025 19:15:19 -0800
+Subject: net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels
+
+From: Jakub Kicinski <kuba@kernel.org>
+
+[ Upstream commit 92191dd1073088753821b862b791dcc83e558e07 ]
+
+Some lwtunnels have a dst cache for post-transformation dst.
+If the packet destination did not change we may end up recording
+a reference to the lwtunnel in its own cache, and the lwtunnel
+state will never be freed.
+
+Discovered by the ioam6.sh test, kmemleak was recently fixed
+to catch per-cpu memory leaks. I'm not sure if rpl and seg6
+can actually hit this, but in principle I don't see why not.
+
+Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation")
+Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
+Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
+Reviewed-by: Simon Horman <horms@kernel.org>
+Link: https://patch.msgid.link/20250130031519.2716843-2-kuba@kernel.org
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/ioam6_iptunnel.c | 9 ++++++---
+ net/ipv6/rpl_iptunnel.c   | 9 ++++++---
+ net/ipv6/seg6_iptunnel.c  | 9 ++++++---
+ 3 files changed, 18 insertions(+), 9 deletions(-)
+
+diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c
+index e81b45b1f6555..fb6cb540cd1bc 100644
+--- a/net/ipv6/ioam6_iptunnel.c
++++ b/net/ipv6/ioam6_iptunnel.c
+@@ -413,9 +413,12 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+                       goto drop;
+               }
+ 
+-              local_bh_disable();
+-              dst_cache_set_ip6(&ilwt->cache, cache_dst, &fl6.saddr);
+-              local_bh_enable();
++              /* cache only if we don't create a dst reference loop */
++              if (dst->lwtstate != cache_dst->lwtstate) {
++                      local_bh_disable();
++                      dst_cache_set_ip6(&ilwt->cache, cache_dst, &fl6.saddr);
++                      local_bh_enable();
++              }
+ 
+               err = skb_cow_head(skb, LL_RESERVED_SPACE(cache_dst->dev));
+               if (unlikely(err))
+diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
+index 7ba22d2f2bfef..be084089ec783 100644
+--- a/net/ipv6/rpl_iptunnel.c
++++ b/net/ipv6/rpl_iptunnel.c
+@@ -236,9 +236,12 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+                       goto drop;
+               }
+ 
+-              local_bh_disable();
+-              dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr);
+-              local_bh_enable();
++              /* cache only if we don't create a dst reference loop */
++              if (orig_dst->lwtstate != dst->lwtstate) {
++                      local_bh_disable();
++                      dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr);
++                      local_bh_enable();
++              }
+ 
+               err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+               if (unlikely(err))
+diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
+index 4bf937bfc2633..316dbc2694f2a 100644
+--- a/net/ipv6/seg6_iptunnel.c
++++ b/net/ipv6/seg6_iptunnel.c
+@@ -575,9 +575,12 @@ static int seg6_output_core(struct net *net, struct sock *sk,
+                       goto drop;
+               }
+ 
+-              local_bh_disable();
+-              dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr);
+-              local_bh_enable();
++              /* cache only if we don't create a dst reference loop */
++              if (orig_dst->lwtstate != dst->lwtstate) {
++                      local_bh_disable();
++                      dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr);
++                      local_bh_enable();
++              }
+ 
+               err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+               if (unlikely(err))
+-- 
+2.39.5
+
diff --git a/queue-6.12/net-ipv6-ioam6_iptunnel-mitigate-2-realloc-issue.patch b/queue-6.12/net-ipv6-ioam6_iptunnel-mitigate-2-realloc-issue.patch

new file mode 100644 (file)

index 0000000..3be07a3
--- /dev/null
+++ b/queue-6.12/net-ipv6-ioam6_iptunnel-mitigate-2-realloc-issue.patch
@@ -0,0 +1,190 @@
+From f753a5f7a35a7fd6411ab9a30c1a28ff123f9a37 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 3 Dec 2024 13:49:43 +0100
+Subject: net: ipv6: ioam6_iptunnel: mitigate 2-realloc issue
+
+From: Justin Iurman <justin.iurman@uliege.be>
+
+[ Upstream commit dce525185bc92864e5a318040285ee070563fe34 ]
+
+This patch mitigates the two-reallocations issue with ioam6_iptunnel by
+providing the dst_entry (in the cache) to the first call to
+skb_cow_head(). As a result, the very first iteration may still trigger
+two reallocations (i.e., empty cache), while next iterations would only
+trigger a single reallocation.
+
+Performance tests before/after applying this patch, which clearly shows
+the improvement:
+- inline mode:
+  - before: https://ibb.co/LhQ8V63
+  - after: https://ibb.co/x5YT2bS
+- encap mode:
+  - before: https://ibb.co/3Cjm5m0
+  - after: https://ibb.co/TwpsxTC
+- encap mode with tunsrc:
+  - before: https://ibb.co/Gpy9QPg
+  - after: https://ibb.co/PW1bZFT
+
+This patch also fixes an incorrect behavior: after the insertion, the
+second call to skb_cow_head() makes sure that the dev has enough
+headroom in the skb for layer 2 and stuff. In that case, the "old"
+dst_entry was used, which is now fixed. After discussing with Paolo, it
+appears that both patches can be merged into a single one -this one-
+(for the sake of readability) and target net-next.
+
+Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
+Signed-off-by: Paolo Abeni <pabeni@redhat.com>
+Stable-dep-of: 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/ioam6_iptunnel.c | 73 ++++++++++++++++++++-------------------
+ 1 file changed, 37 insertions(+), 36 deletions(-)
+
+diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c
+index beb6b4cfc551c..e81b45b1f6555 100644
+--- a/net/ipv6/ioam6_iptunnel.c
++++ b/net/ipv6/ioam6_iptunnel.c
+@@ -255,14 +255,15 @@ static int ioam6_do_fill(struct net *net, struct sk_buff *skb)
+ }
+ 
+ static int ioam6_do_inline(struct net *net, struct sk_buff *skb,
+-                         struct ioam6_lwt_encap *tuninfo)
++                         struct ioam6_lwt_encap *tuninfo,
++                         struct dst_entry *cache_dst)
+ {
+       struct ipv6hdr *oldhdr, *hdr;
+       int hdrlen, err;
+ 
+       hdrlen = (tuninfo->eh.hdrlen + 1) << 3;
+ 
+-      err = skb_cow_head(skb, hdrlen + skb->mac_len);
++      err = skb_cow_head(skb, hdrlen + dst_dev_overhead(cache_dst, skb));
+       if (unlikely(err))
+               return err;
+ 
+@@ -293,7 +294,8 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb,
+                         struct ioam6_lwt_encap *tuninfo,
+                         bool has_tunsrc,
+                         struct in6_addr *tunsrc,
+-                        struct in6_addr *tundst)
++                        struct in6_addr *tundst,
++                        struct dst_entry *cache_dst)
+ {
+       struct dst_entry *dst = skb_dst(skb);
+       struct ipv6hdr *hdr, *inner_hdr;
+@@ -302,7 +304,7 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb,
+       hdrlen = (tuninfo->eh.hdrlen + 1) << 3;
+       len = sizeof(*hdr) + hdrlen;
+ 
+-      err = skb_cow_head(skb, len + skb->mac_len);
++      err = skb_cow_head(skb, len + dst_dev_overhead(cache_dst, skb));
+       if (unlikely(err))
+               return err;
+ 
+@@ -336,7 +338,7 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb,
+ 
+ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+ {
+-      struct dst_entry *dst = skb_dst(skb);
++      struct dst_entry *dst = skb_dst(skb), *cache_dst;
+       struct in6_addr orig_daddr;
+       struct ioam6_lwt *ilwt;
+       int err = -EINVAL;
+@@ -354,6 +356,10 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+ 
+       orig_daddr = ipv6_hdr(skb)->daddr;
+ 
++      local_bh_disable();
++      cache_dst = dst_cache_get(&ilwt->cache);
++      local_bh_enable();
++
+       switch (ilwt->mode) {
+       case IOAM6_IPTUNNEL_MODE_INLINE:
+ do_inline:
+@@ -361,7 +367,7 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+               if (ipv6_hdr(skb)->nexthdr == NEXTHDR_HOP)
+                       goto out;
+ 
+-              err = ioam6_do_inline(net, skb, &ilwt->tuninfo);
++              err = ioam6_do_inline(net, skb, &ilwt->tuninfo, cache_dst);
+               if (unlikely(err))
+                       goto drop;
+ 
+@@ -371,7 +377,7 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+               /* Encapsulation (ip6ip6) */
+               err = ioam6_do_encap(net, skb, &ilwt->tuninfo,
+                                    ilwt->has_tunsrc, &ilwt->tunsrc,
+-                                   &ilwt->tundst);
++                                   &ilwt->tundst, cache_dst);
+               if (unlikely(err))
+                       goto drop;
+ 
+@@ -389,41 +395,36 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+               goto drop;
+       }
+ 
+-      err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+-      if (unlikely(err))
+-              goto drop;
++      if (unlikely(!cache_dst)) {
++              struct ipv6hdr *hdr = ipv6_hdr(skb);
++              struct flowi6 fl6;
++
++              memset(&fl6, 0, sizeof(fl6));
++              fl6.daddr = hdr->daddr;
++              fl6.saddr = hdr->saddr;
++              fl6.flowlabel = ip6_flowinfo(hdr);
++              fl6.flowi6_mark = skb->mark;
++              fl6.flowi6_proto = hdr->nexthdr;
++
++              cache_dst = ip6_route_output(net, NULL, &fl6);
++              if (cache_dst->error) {
++                      err = cache_dst->error;
++                      dst_release(cache_dst);
++                      goto drop;
++              }
+ 
+-      if (!ipv6_addr_equal(&orig_daddr, &ipv6_hdr(skb)->daddr)) {
+               local_bh_disable();
+-              dst = dst_cache_get(&ilwt->cache);
++              dst_cache_set_ip6(&ilwt->cache, cache_dst, &fl6.saddr);
+               local_bh_enable();
+ 
+-              if (unlikely(!dst)) {
+-                      struct ipv6hdr *hdr = ipv6_hdr(skb);
+-                      struct flowi6 fl6;
+-
+-                      memset(&fl6, 0, sizeof(fl6));
+-                      fl6.daddr = hdr->daddr;
+-                      fl6.saddr = hdr->saddr;
+-                      fl6.flowlabel = ip6_flowinfo(hdr);
+-                      fl6.flowi6_mark = skb->mark;
+-                      fl6.flowi6_proto = hdr->nexthdr;
+-
+-                      dst = ip6_route_output(net, NULL, &fl6);
+-                      if (dst->error) {
+-                              err = dst->error;
+-                              dst_release(dst);
+-                              goto drop;
+-                      }
+-
+-                      local_bh_disable();
+-                      dst_cache_set_ip6(&ilwt->cache, dst, &fl6.saddr);
+-                      local_bh_enable();
+-              }
++              err = skb_cow_head(skb, LL_RESERVED_SPACE(cache_dst->dev));
++              if (unlikely(err))
++                      goto drop;
++      }
+ 
++      if (!ipv6_addr_equal(&orig_daddr, &ipv6_hdr(skb)->daddr)) {
+               skb_dst_drop(skb);
+-              skb_dst_set(skb, dst);
+-
++              skb_dst_set(skb, cache_dst);
+               return dst_output(net, sk, skb);
+       }
+ out:
+-- 
+2.39.5
+
diff --git a/queue-6.12/net-ipv6-rpl_iptunnel-mitigate-2-realloc-issue.patch b/queue-6.12/net-ipv6-rpl_iptunnel-mitigate-2-realloc-issue.patch

new file mode 100644 (file)

index 0000000..93e88fd
--- /dev/null
+++ b/queue-6.12/net-ipv6-rpl_iptunnel-mitigate-2-realloc-issue.patch
@@ -0,0 +1,154 @@
+From 4d332b0718bdb32b3cae5af26c04ace8e1cc93ea Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 3 Dec 2024 13:49:45 +0100
+Subject: net: ipv6: rpl_iptunnel: mitigate 2-realloc issue
+
+From: Justin Iurman <justin.iurman@uliege.be>
+
+[ Upstream commit 985ec6f5e6235242191370628acb73d7a9f0c0ea ]
+
+This patch mitigates the two-reallocations issue with rpl_iptunnel by
+providing the dst_entry (in the cache) to the first call to
+skb_cow_head(). As a result, the very first iteration would still
+trigger two reallocations (i.e., empty cache), while next iterations
+would only trigger a single reallocation.
+
+Performance tests before/after applying this patch, which clearly shows
+there is no impact (it even shows improvement):
+- before: https://ibb.co/nQJhqwc
+- after: https://ibb.co/4ZvW6wV
+
+Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
+Cc: Alexander Aring <aahringo@redhat.com>
+Signed-off-by: Paolo Abeni <pabeni@redhat.com>
+Stable-dep-of: 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/rpl_iptunnel.c | 46 ++++++++++++++++++++++-------------------
+ 1 file changed, 25 insertions(+), 21 deletions(-)
+
+diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
+index db3c19a42e1ca..7ba22d2f2bfef 100644
+--- a/net/ipv6/rpl_iptunnel.c
++++ b/net/ipv6/rpl_iptunnel.c
+@@ -125,7 +125,8 @@ static void rpl_destroy_state(struct lwtunnel_state *lwt)
+ }
+ 
+ static int rpl_do_srh_inline(struct sk_buff *skb, const struct rpl_lwt *rlwt,
+-                           const struct ipv6_rpl_sr_hdr *srh)
++                           const struct ipv6_rpl_sr_hdr *srh,
++                           struct dst_entry *cache_dst)
+ {
+       struct ipv6_rpl_sr_hdr *isrh, *csrh;
+       const struct ipv6hdr *oldhdr;
+@@ -153,7 +154,7 @@ static int rpl_do_srh_inline(struct sk_buff *skb, const struct rpl_lwt *rlwt,
+ 
+       hdrlen = ((csrh->hdrlen + 1) << 3);
+ 
+-      err = skb_cow_head(skb, hdrlen + skb->mac_len);
++      err = skb_cow_head(skb, hdrlen + dst_dev_overhead(cache_dst, skb));
+       if (unlikely(err)) {
+               kfree(buf);
+               return err;
+@@ -186,7 +187,8 @@ static int rpl_do_srh_inline(struct sk_buff *skb, const struct rpl_lwt *rlwt,
+       return 0;
+ }
+ 
+-static int rpl_do_srh(struct sk_buff *skb, const struct rpl_lwt *rlwt)
++static int rpl_do_srh(struct sk_buff *skb, const struct rpl_lwt *rlwt,
++                    struct dst_entry *cache_dst)
+ {
+       struct dst_entry *dst = skb_dst(skb);
+       struct rpl_iptunnel_encap *tinfo;
+@@ -196,7 +198,7 @@ static int rpl_do_srh(struct sk_buff *skb, const struct rpl_lwt *rlwt)
+ 
+       tinfo = rpl_encap_lwtunnel(dst->lwtstate);
+ 
+-      return rpl_do_srh_inline(skb, rlwt, tinfo->srh);
++      return rpl_do_srh_inline(skb, rlwt, tinfo->srh, cache_dst);
+ }
+ 
+ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+@@ -208,14 +210,14 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+ 
+       rlwt = rpl_lwt_lwtunnel(orig_dst->lwtstate);
+ 
+-      err = rpl_do_srh(skb, rlwt);
+-      if (unlikely(err))
+-              goto drop;
+-
+       local_bh_disable();
+       dst = dst_cache_get(&rlwt->cache);
+       local_bh_enable();
+ 
++      err = rpl_do_srh(skb, rlwt, dst);
++      if (unlikely(err))
++              goto drop;
++
+       if (unlikely(!dst)) {
+               struct ipv6hdr *hdr = ipv6_hdr(skb);
+               struct flowi6 fl6;
+@@ -237,15 +239,15 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+               local_bh_disable();
+               dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr);
+               local_bh_enable();
++
++              err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
++              if (unlikely(err))
++                      goto drop;
+       }
+ 
+       skb_dst_drop(skb);
+       skb_dst_set(skb, dst);
+ 
+-      err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+-      if (unlikely(err))
+-              goto drop;
+-
+       return dst_output(net, sk, skb);
+ 
+ drop:
+@@ -262,29 +264,31 @@ static int rpl_input(struct sk_buff *skb)
+ 
+       rlwt = rpl_lwt_lwtunnel(orig_dst->lwtstate);
+ 
+-      err = rpl_do_srh(skb, rlwt);
+-      if (unlikely(err))
+-              goto drop;
+-
+       local_bh_disable();
+       dst = dst_cache_get(&rlwt->cache);
++      local_bh_enable();
++
++      err = rpl_do_srh(skb, rlwt, dst);
++      if (unlikely(err))
++              goto drop;
+ 
+       if (!dst) {
+               ip6_route_input(skb);
+               dst = skb_dst(skb);
+               if (!dst->error) {
++                      local_bh_disable();
+                       dst_cache_set_ip6(&rlwt->cache, dst,
+                                         &ipv6_hdr(skb)->saddr);
++                      local_bh_enable();
+               }
++
++              err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
++              if (unlikely(err))
++                      goto drop;
+       } else {
+               skb_dst_drop(skb);
+               skb_dst_set(skb, dst);
+       }
+-      local_bh_enable();
+-
+-      err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+-      if (unlikely(err))
+-              goto drop;
+ 
+       return dst_input(skb);
+ 
+-- 
+2.39.5
+
diff --git a/queue-6.12/net-ipv6-seg6_iptunnel-mitigate-2-realloc-issue.patch b/queue-6.12/net-ipv6-seg6_iptunnel-mitigate-2-realloc-issue.patch

new file mode 100644 (file)

index 0000000..5d0e603
--- /dev/null
+++ b/queue-6.12/net-ipv6-seg6_iptunnel-mitigate-2-realloc-issue.patch
@@ -0,0 +1,254 @@
+From 49b5da2d604b153b07f0b042c934c38cf1c379c8 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 3 Dec 2024 13:49:44 +0100
+Subject: net: ipv6: seg6_iptunnel: mitigate 2-realloc issue
+
+From: Justin Iurman <justin.iurman@uliege.be>
+
+[ Upstream commit 40475b63761abb6f8fdef960d03228a08662c9c4 ]
+
+This patch mitigates the two-reallocations issue with seg6_iptunnel by
+providing the dst_entry (in the cache) to the first call to
+skb_cow_head(). As a result, the very first iteration would still
+trigger two reallocations (i.e., empty cache), while next iterations
+would only trigger a single reallocation.
+
+Performance tests before/after applying this patch, which clearly shows
+the improvement:
+- before: https://ibb.co/3Cg4sNH
+- after: https://ibb.co/8rQ350r
+
+Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
+Cc: David Lebrun <dlebrun@google.com>
+Signed-off-by: Paolo Abeni <pabeni@redhat.com>
+Stable-dep-of: 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/seg6_iptunnel.c | 85 ++++++++++++++++++++++++----------------
+ 1 file changed, 52 insertions(+), 33 deletions(-)
+
+diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
+index 098632adc9b5a..4bf937bfc2633 100644
+--- a/net/ipv6/seg6_iptunnel.c
++++ b/net/ipv6/seg6_iptunnel.c
+@@ -124,8 +124,8 @@ static __be32 seg6_make_flowlabel(struct net *net, struct sk_buff *skb,
+       return flowlabel;
+ }
+ 
+-/* encapsulate an IPv6 packet within an outer IPv6 header with a given SRH */
+-int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
++static int __seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
++                             int proto, struct dst_entry *cache_dst)
+ {
+       struct dst_entry *dst = skb_dst(skb);
+       struct net *net = dev_net(dst->dev);
+@@ -137,7 +137,7 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
+       hdrlen = (osrh->hdrlen + 1) << 3;
+       tot_len = hdrlen + sizeof(*hdr);
+ 
+-      err = skb_cow_head(skb, tot_len + skb->mac_len);
++      err = skb_cow_head(skb, tot_len + dst_dev_overhead(cache_dst, skb));
+       if (unlikely(err))
+               return err;
+ 
+@@ -197,11 +197,18 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
+ 
+       return 0;
+ }
++
++/* encapsulate an IPv6 packet within an outer IPv6 header with a given SRH */
++int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
++{
++      return __seg6_do_srh_encap(skb, osrh, proto, NULL);
++}
+ EXPORT_SYMBOL_GPL(seg6_do_srh_encap);
+ 
+ /* encapsulate an IPv6 packet within an outer IPv6 header with reduced SRH */
+ static int seg6_do_srh_encap_red(struct sk_buff *skb,
+-                               struct ipv6_sr_hdr *osrh, int proto)
++                               struct ipv6_sr_hdr *osrh, int proto,
++                               struct dst_entry *cache_dst)
+ {
+       __u8 first_seg = osrh->first_segment;
+       struct dst_entry *dst = skb_dst(skb);
+@@ -230,7 +237,7 @@ static int seg6_do_srh_encap_red(struct sk_buff *skb,
+ 
+       tot_len = red_hdrlen + sizeof(struct ipv6hdr);
+ 
+-      err = skb_cow_head(skb, tot_len + skb->mac_len);
++      err = skb_cow_head(skb, tot_len + dst_dev_overhead(cache_dst, skb));
+       if (unlikely(err))
+               return err;
+ 
+@@ -317,8 +324,8 @@ static int seg6_do_srh_encap_red(struct sk_buff *skb,
+       return 0;
+ }
+ 
+-/* insert an SRH within an IPv6 packet, just after the IPv6 header */
+-int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
++static int __seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
++                              struct dst_entry *cache_dst)
+ {
+       struct ipv6hdr *hdr, *oldhdr;
+       struct ipv6_sr_hdr *isrh;
+@@ -326,7 +333,7 @@ int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
+ 
+       hdrlen = (osrh->hdrlen + 1) << 3;
+ 
+-      err = skb_cow_head(skb, hdrlen + skb->mac_len);
++      err = skb_cow_head(skb, hdrlen + dst_dev_overhead(cache_dst, skb));
+       if (unlikely(err))
+               return err;
+ 
+@@ -369,9 +376,8 @@ int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
+ 
+       return 0;
+ }
+-EXPORT_SYMBOL_GPL(seg6_do_srh_inline);
+ 
+-static int seg6_do_srh(struct sk_buff *skb)
++static int seg6_do_srh(struct sk_buff *skb, struct dst_entry *cache_dst)
+ {
+       struct dst_entry *dst = skb_dst(skb);
+       struct seg6_iptunnel_encap *tinfo;
+@@ -384,7 +390,7 @@ static int seg6_do_srh(struct sk_buff *skb)
+               if (skb->protocol != htons(ETH_P_IPV6))
+                       return -EINVAL;
+ 
+-              err = seg6_do_srh_inline(skb, tinfo->srh);
++              err = __seg6_do_srh_inline(skb, tinfo->srh, cache_dst);
+               if (err)
+                       return err;
+               break;
+@@ -402,9 +408,11 @@ static int seg6_do_srh(struct sk_buff *skb)
+                       return -EINVAL;
+ 
+               if (tinfo->mode == SEG6_IPTUN_MODE_ENCAP)
+-                      err = seg6_do_srh_encap(skb, tinfo->srh, proto);
++                      err = __seg6_do_srh_encap(skb, tinfo->srh,
++                                                proto, cache_dst);
+               else
+-                      err = seg6_do_srh_encap_red(skb, tinfo->srh, proto);
++                      err = seg6_do_srh_encap_red(skb, tinfo->srh,
++                                                  proto, cache_dst);
+ 
+               if (err)
+                       return err;
+@@ -425,11 +433,13 @@ static int seg6_do_srh(struct sk_buff *skb)
+               skb_push(skb, skb->mac_len);
+ 
+               if (tinfo->mode == SEG6_IPTUN_MODE_L2ENCAP)
+-                      err = seg6_do_srh_encap(skb, tinfo->srh,
+-                                              IPPROTO_ETHERNET);
++                      err = __seg6_do_srh_encap(skb, tinfo->srh,
++                                                IPPROTO_ETHERNET,
++                                                cache_dst);
+               else
+                       err = seg6_do_srh_encap_red(skb, tinfo->srh,
+-                                                  IPPROTO_ETHERNET);
++                                                  IPPROTO_ETHERNET,
++                                                  cache_dst);
+ 
+               if (err)
+                       return err;
+@@ -444,6 +454,13 @@ static int seg6_do_srh(struct sk_buff *skb)
+       return 0;
+ }
+ 
++/* insert an SRH within an IPv6 packet, just after the IPv6 header */
++int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
++{
++      return __seg6_do_srh_inline(skb, osrh, NULL);
++}
++EXPORT_SYMBOL_GPL(seg6_do_srh_inline);
++
+ static int seg6_input_finish(struct net *net, struct sock *sk,
+                            struct sk_buff *skb)
+ {
+@@ -458,31 +475,33 @@ static int seg6_input_core(struct net *net, struct sock *sk,
+       struct seg6_lwt *slwt;
+       int err;
+ 
+-      err = seg6_do_srh(skb);
+-      if (unlikely(err))
+-              goto drop;
+-
+       slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate);
+ 
+       local_bh_disable();
+       dst = dst_cache_get(&slwt->cache);
++      local_bh_enable();
++
++      err = seg6_do_srh(skb, dst);
++      if (unlikely(err))
++              goto drop;
+ 
+       if (!dst) {
+               ip6_route_input(skb);
+               dst = skb_dst(skb);
+               if (!dst->error) {
++                      local_bh_disable();
+                       dst_cache_set_ip6(&slwt->cache, dst,
+                                         &ipv6_hdr(skb)->saddr);
++                      local_bh_enable();
+               }
++
++              err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
++              if (unlikely(err))
++                      goto drop;
+       } else {
+               skb_dst_drop(skb);
+               skb_dst_set(skb, dst);
+       }
+-      local_bh_enable();
+-
+-      err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+-      if (unlikely(err))
+-              goto drop;
+ 
+       if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled))
+               return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT,
+@@ -528,16 +547,16 @@ static int seg6_output_core(struct net *net, struct sock *sk,
+       struct seg6_lwt *slwt;
+       int err;
+ 
+-      err = seg6_do_srh(skb);
+-      if (unlikely(err))
+-              goto drop;
+-
+       slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate);
+ 
+       local_bh_disable();
+       dst = dst_cache_get(&slwt->cache);
+       local_bh_enable();
+ 
++      err = seg6_do_srh(skb, dst);
++      if (unlikely(err))
++              goto drop;
++
+       if (unlikely(!dst)) {
+               struct ipv6hdr *hdr = ipv6_hdr(skb);
+               struct flowi6 fl6;
+@@ -559,15 +578,15 @@ static int seg6_output_core(struct net *net, struct sock *sk,
+               local_bh_disable();
+               dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr);
+               local_bh_enable();
++
++              err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
++              if (unlikely(err))
++                      goto drop;
+       }
+ 
+       skb_dst_drop(skb);
+       skb_dst_set(skb, dst);
+ 
+-      err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+-      if (unlikely(err))
+-              goto drop;
+-
+       if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled))
+               return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk, skb,
+                              NULL, skb_dst(skb)->dev, dst_output);
+-- 
+2.39.5
+
diff --git a/queue-6.12/openvswitch-use-rcu-protection-in-ovs_vport_cmd_fill.patch b/queue-6.12/openvswitch-use-rcu-protection-in-ovs_vport_cmd_fill.patch

new file mode 100644 (file)

index 0000000..2ee39b4
--- /dev/null
+++ b/queue-6.12/openvswitch-use-rcu-protection-in-ovs_vport_cmd_fill.patch
@@ -0,0 +1,66 @@
+From 38f927398ade60eb5687c7a62e805ff497f55810 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 7 Feb 2025 13:58:37 +0000
+Subject: openvswitch: use RCU protection in ovs_vport_cmd_fill_info()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 90b2f49a502fa71090d9f4fe29a2f51fe5dff76d ]
+
+ovs_vport_cmd_fill_info() can be called without RTNL or RCU.
+
+Use RCU protection and dev_net_rcu() to avoid potential UAF.
+
+Fixes: 9354d4520342 ("openvswitch: reliable interface indentification in port dumps")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250207135841.1948589-6-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/openvswitch/datapath.c | 12 +++++++++---
+ 1 file changed, 9 insertions(+), 3 deletions(-)
+
+diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
+index 78d9961fcd446..8d3c01f0e2aa1 100644
+--- a/net/openvswitch/datapath.c
++++ b/net/openvswitch/datapath.c
+@@ -2102,6 +2102,7 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb,
+ {
+       struct ovs_header *ovs_header;
+       struct ovs_vport_stats vport_stats;
++      struct net *net_vport;
+       int err;
+ 
+       ovs_header = genlmsg_put(skb, portid, seq, &dp_vport_genl_family,
+@@ -2118,12 +2119,15 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb,
+           nla_put_u32(skb, OVS_VPORT_ATTR_IFINDEX, vport->dev->ifindex))
+               goto nla_put_failure;
+ 
+-      if (!net_eq(net, dev_net(vport->dev))) {
+-              int id = peernet2id_alloc(net, dev_net(vport->dev), gfp);
++      rcu_read_lock();
++      net_vport = dev_net_rcu(vport->dev);
++      if (!net_eq(net, net_vport)) {
++              int id = peernet2id_alloc(net, net_vport, GFP_ATOMIC);
+ 
+               if (nla_put_s32(skb, OVS_VPORT_ATTR_NETNSID, id))
+-                      goto nla_put_failure;
++                      goto nla_put_failure_unlock;
+       }
++      rcu_read_unlock();
+ 
+       ovs_vport_get_stats(vport, &vport_stats);
+       if (nla_put_64bit(skb, OVS_VPORT_ATTR_STATS,
+@@ -2144,6 +2148,8 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb,
+       genlmsg_end(skb, ovs_header);
+       return 0;
+ 
++nla_put_failure_unlock:
++      rcu_read_unlock();
+ nla_put_failure:
+       err = -EMSGSIZE;
+ error:
+-- 
+2.39.5
+
diff --git a/queue-6.12/rust-kbuild-add-fzero-init-padding-bits-to-bindgen_s.patch b/queue-6.12/rust-kbuild-add-fzero-init-padding-bits-to-bindgen_s.patch

new file mode 100644 (file)

index 0000000..ed2850c
--- /dev/null
+++ b/queue-6.12/rust-kbuild-add-fzero-init-padding-bits-to-bindgen_s.patch
@@ -0,0 +1,42 @@
+From 5d7442285575e56ad8f868e8cac77b5a9a0f800d Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 29 Jan 2025 14:50:02 -0700
+Subject: rust: kbuild: add -fzero-init-padding-bits to bindgen_skip_cflags
+
+From: Justin M. Forbes <jforbes@fedoraproject.org>
+
+[ Upstream commit a9c621a217128eb3fb7522cf763992d9437fd5ba ]
+
+This seems to break the build when building with gcc15:
+
+    Unable to generate bindings: ClangDiagnostic("error: unknown
+    argument: '-fzero-init-padding-bits=all'\n")
+
+Thus skip that flag.
+
+Signed-off-by: Justin M. Forbes <jforbes@fedoraproject.org>
+Fixes: dce4aab8441d ("kbuild: Use -fzero-init-padding-bits=all")
+Reviewed-by: Kees Cook <kees@kernel.org>
+Link: https://lore.kernel.org/r/20250129215003.1736127-1-jforbes@fedoraproject.org
+[ Slightly reworded commit. - Miguel ]
+Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ rust/Makefile | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/rust/Makefile b/rust/Makefile
+index 9f59baacaf773..45779a064fa4f 100644
+--- a/rust/Makefile
++++ b/rust/Makefile
+@@ -229,6 +229,7 @@ bindgen_skip_c_flags := -mno-fp-ret-in-387 -mpreferred-stack-boundary=% \
+       -fzero-call-used-regs=% -fno-stack-clash-protection \
+       -fno-inline-functions-called-once -fsanitize=bounds-strict \
+       -fstrict-flex-arrays=% -fmin-function-alignment=% \
++      -fzero-init-padding-bits=% \
+       --param=% --param asan-%
+ 
+ # Derived from `scripts/Makefile.clang`.
+-- 
+2.39.5
+
diff --git a/queue-6.12/scsi-ufs-core-introduce-a-new-clock_gating-lock.patch b/queue-6.12/scsi-ufs-core-introduce-a-new-clock_gating-lock.patch

new file mode 100644 (file)

index 0000000..1535a6d
--- /dev/null
+++ b/queue-6.12/scsi-ufs-core-introduce-a-new-clock_gating-lock.patch
@@ -0,0 +1,336 @@
+From 479bee46067a69f86ef8b10ee99dbc2a7c765a73 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Sun, 24 Nov 2024 09:08:07 +0200
+Subject: scsi: ufs: core: Introduce a new clock_gating lock
+
+From: Avri Altman <avri.altman@wdc.com>
+
+[ Upstream commit 209f4e43b8068c24cde227f464111030430153fa ]
+
+Introduce a new clock gating lock to serialize access to some of the clock
+gating members instead of the host_lock.
+
+While at it, simplify the code with the guard() macro and co for automatic
+cleanup of the new lock. There are some explicit
+spin_lock_irqsave()/spin_unlock_irqrestore() snaking instances I left
+behind because I couldn't make heads or tails of it.
+
+Additionally, move the trace_ufshcd_clk_gating() call from inside the
+region protected by the lock as it doesn't needs protection.
+
+Signed-off-by: Avri Altman <avri.altman@wdc.com>
+Link: https://lore.kernel.org/r/20241124070808.194860-4-avri.altman@wdc.com
+Reviewed-by: Bart Van Assche <bvanassche@acm.org>
+Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
+Stable-dep-of: 839a74b5649c ("scsi: ufs: Fix toggling of clk_gating.state when clock gating is not allowed")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/ufs/core/ufshcd.c | 109 ++++++++++++++++++--------------------
+ include/ufs/ufshcd.h      |   9 +++-
+ 2 files changed, 59 insertions(+), 59 deletions(-)
+
+diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
+index 217619d64940e..5682fdcbf2da5 100644
+--- a/drivers/ufs/core/ufshcd.c
++++ b/drivers/ufs/core/ufshcd.c
+@@ -1840,19 +1840,16 @@ static void ufshcd_exit_clk_scaling(struct ufs_hba *hba)
+ static void ufshcd_ungate_work(struct work_struct *work)
+ {
+       int ret;
+-      unsigned long flags;
+       struct ufs_hba *hba = container_of(work, struct ufs_hba,
+                       clk_gating.ungate_work);
+ 
+       cancel_delayed_work_sync(&hba->clk_gating.gate_work);
+ 
+-      spin_lock_irqsave(hba->host->host_lock, flags);
+-      if (hba->clk_gating.state == CLKS_ON) {
+-              spin_unlock_irqrestore(hba->host->host_lock, flags);
+-              return;
++      scoped_guard(spinlock_irqsave, &hba->clk_gating.lock) {
++              if (hba->clk_gating.state == CLKS_ON)
++                      return;
+       }
+ 
+-      spin_unlock_irqrestore(hba->host->host_lock, flags);
+       ufshcd_hba_vreg_set_hpm(hba);
+       ufshcd_setup_clocks(hba, true);
+ 
+@@ -1887,7 +1884,7 @@ void ufshcd_hold(struct ufs_hba *hba)
+       if (!ufshcd_is_clkgating_allowed(hba) ||
+           !hba->clk_gating.is_initialized)
+               return;
+-      spin_lock_irqsave(hba->host->host_lock, flags);
++      spin_lock_irqsave(&hba->clk_gating.lock, flags);
+       hba->clk_gating.active_reqs++;
+ 
+ start:
+@@ -1903,11 +1900,11 @@ void ufshcd_hold(struct ufs_hba *hba)
+                */
+               if (ufshcd_can_hibern8_during_gating(hba) &&
+                   ufshcd_is_link_hibern8(hba)) {
+-                      spin_unlock_irqrestore(hba->host->host_lock, flags);
++                      spin_unlock_irqrestore(&hba->clk_gating.lock, flags);
+                       flush_result = flush_work(&hba->clk_gating.ungate_work);
+                       if (hba->clk_gating.is_suspended && !flush_result)
+                               return;
+-                      spin_lock_irqsave(hba->host->host_lock, flags);
++                      spin_lock_irqsave(&hba->clk_gating.lock, flags);
+                       goto start;
+               }
+               break;
+@@ -1936,17 +1933,17 @@ void ufshcd_hold(struct ufs_hba *hba)
+                */
+               fallthrough;
+       case REQ_CLKS_ON:
+-              spin_unlock_irqrestore(hba->host->host_lock, flags);
++              spin_unlock_irqrestore(&hba->clk_gating.lock, flags);
+               flush_work(&hba->clk_gating.ungate_work);
+               /* Make sure state is CLKS_ON before returning */
+-              spin_lock_irqsave(hba->host->host_lock, flags);
++              spin_lock_irqsave(&hba->clk_gating.lock, flags);
+               goto start;
+       default:
+               dev_err(hba->dev, "%s: clk gating is in invalid state %d\n",
+                               __func__, hba->clk_gating.state);
+               break;
+       }
+-      spin_unlock_irqrestore(hba->host->host_lock, flags);
++      spin_unlock_irqrestore(&hba->clk_gating.lock, flags);
+ }
+ EXPORT_SYMBOL_GPL(ufshcd_hold);
+ 
+@@ -1954,30 +1951,32 @@ static void ufshcd_gate_work(struct work_struct *work)
+ {
+       struct ufs_hba *hba = container_of(work, struct ufs_hba,
+                       clk_gating.gate_work.work);
+-      unsigned long flags;
+       int ret;
+ 
+-      spin_lock_irqsave(hba->host->host_lock, flags);
+-      /*
+-       * In case you are here to cancel this work the gating state
+-       * would be marked as REQ_CLKS_ON. In this case save time by
+-       * skipping the gating work and exit after changing the clock
+-       * state to CLKS_ON.
+-       */
+-      if (hba->clk_gating.is_suspended ||
+-              (hba->clk_gating.state != REQ_CLKS_OFF)) {
+-              hba->clk_gating.state = CLKS_ON;
+-              trace_ufshcd_clk_gating(dev_name(hba->dev),
+-                                      hba->clk_gating.state);
+-              goto rel_lock;
+-      }
++      scoped_guard(spinlock_irqsave, &hba->clk_gating.lock) {
++              /*
++               * In case you are here to cancel this work the gating state
++               * would be marked as REQ_CLKS_ON. In this case save time by
++               * skipping the gating work and exit after changing the clock
++               * state to CLKS_ON.
++               */
++              if (hba->clk_gating.is_suspended ||
++                  hba->clk_gating.state != REQ_CLKS_OFF) {
++                      hba->clk_gating.state = CLKS_ON;
++                      trace_ufshcd_clk_gating(dev_name(hba->dev),
++                                              hba->clk_gating.state);
++                      return;
++              }
+ 
+-      if (ufshcd_is_ufs_dev_busy(hba) ||
+-          hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL ||
+-          hba->clk_gating.active_reqs)
+-              goto rel_lock;
++              if (hba->clk_gating.active_reqs)
++                      return;
++      }
+ 
+-      spin_unlock_irqrestore(hba->host->host_lock, flags);
++      scoped_guard(spinlock_irqsave, hba->host->host_lock) {
++              if (ufshcd_is_ufs_dev_busy(hba) ||
++                  hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL)
++                      return;
++      }
+ 
+       /* put the link into hibern8 mode before turning off clocks */
+       if (ufshcd_can_hibern8_during_gating(hba)) {
+@@ -1988,7 +1987,7 @@ static void ufshcd_gate_work(struct work_struct *work)
+                                       __func__, ret);
+                       trace_ufshcd_clk_gating(dev_name(hba->dev),
+                                               hba->clk_gating.state);
+-                      goto out;
++                      return;
+               }
+               ufshcd_set_link_hibern8(hba);
+       }
+@@ -2008,32 +2007,34 @@ static void ufshcd_gate_work(struct work_struct *work)
+        * prevent from doing cancel work multiple times when there are
+        * new requests arriving before the current cancel work is done.
+        */
+-      spin_lock_irqsave(hba->host->host_lock, flags);
++      guard(spinlock_irqsave)(&hba->clk_gating.lock);
+       if (hba->clk_gating.state == REQ_CLKS_OFF) {
+               hba->clk_gating.state = CLKS_OFF;
+               trace_ufshcd_clk_gating(dev_name(hba->dev),
+                                       hba->clk_gating.state);
+       }
+-rel_lock:
+-      spin_unlock_irqrestore(hba->host->host_lock, flags);
+-out:
+-      return;
+ }
+ 
+-/* host lock must be held before calling this variant */
+ static void __ufshcd_release(struct ufs_hba *hba)
+ {
++      lockdep_assert_held(&hba->clk_gating.lock);
++
+       if (!ufshcd_is_clkgating_allowed(hba))
+               return;
+ 
+       hba->clk_gating.active_reqs--;
+ 
+       if (hba->clk_gating.active_reqs || hba->clk_gating.is_suspended ||
+-          hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL ||
+-          ufshcd_has_pending_tasks(hba) || !hba->clk_gating.is_initialized ||
++          !hba->clk_gating.is_initialized ||
+           hba->clk_gating.state == CLKS_OFF)
+               return;
+ 
++      scoped_guard(spinlock_irqsave, hba->host->host_lock) {
++              if (ufshcd_has_pending_tasks(hba) ||
++                  hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL)
++                      return;
++      }
++
+       hba->clk_gating.state = REQ_CLKS_OFF;
+       trace_ufshcd_clk_gating(dev_name(hba->dev), hba->clk_gating.state);
+       queue_delayed_work(hba->clk_gating.clk_gating_workq,
+@@ -2043,11 +2044,8 @@ static void __ufshcd_release(struct ufs_hba *hba)
+ 
+ void ufshcd_release(struct ufs_hba *hba)
+ {
+-      unsigned long flags;
+-
+-      spin_lock_irqsave(hba->host->host_lock, flags);
++      guard(spinlock_irqsave)(&hba->clk_gating.lock);
+       __ufshcd_release(hba);
+-      spin_unlock_irqrestore(hba->host->host_lock, flags);
+ }
+ EXPORT_SYMBOL_GPL(ufshcd_release);
+ 
+@@ -2062,11 +2060,9 @@ static ssize_t ufshcd_clkgate_delay_show(struct device *dev,
+ void ufshcd_clkgate_delay_set(struct device *dev, unsigned long value)
+ {
+       struct ufs_hba *hba = dev_get_drvdata(dev);
+-      unsigned long flags;
+ 
+-      spin_lock_irqsave(hba->host->host_lock, flags);
++      guard(spinlock_irqsave)(&hba->clk_gating.lock);
+       hba->clk_gating.delay_ms = value;
+-      spin_unlock_irqrestore(hba->host->host_lock, flags);
+ }
+ EXPORT_SYMBOL_GPL(ufshcd_clkgate_delay_set);
+ 
+@@ -2094,7 +2090,6 @@ static ssize_t ufshcd_clkgate_enable_store(struct device *dev,
+               struct device_attribute *attr, const char *buf, size_t count)
+ {
+       struct ufs_hba *hba = dev_get_drvdata(dev);
+-      unsigned long flags;
+       u32 value;
+ 
+       if (kstrtou32(buf, 0, &value))
+@@ -2102,9 +2097,10 @@ static ssize_t ufshcd_clkgate_enable_store(struct device *dev,
+ 
+       value = !!value;
+ 
+-      spin_lock_irqsave(hba->host->host_lock, flags);
++      guard(spinlock_irqsave)(&hba->clk_gating.lock);
++
+       if (value == hba->clk_gating.is_enabled)
+-              goto out;
++              return count;
+ 
+       if (value)
+               __ufshcd_release(hba);
+@@ -2112,8 +2108,7 @@ static ssize_t ufshcd_clkgate_enable_store(struct device *dev,
+               hba->clk_gating.active_reqs++;
+ 
+       hba->clk_gating.is_enabled = value;
+-out:
+-      spin_unlock_irqrestore(hba->host->host_lock, flags);
++
+       return count;
+ }
+ 
+@@ -2155,6 +2150,8 @@ static void ufshcd_init_clk_gating(struct ufs_hba *hba)
+       INIT_DELAYED_WORK(&hba->clk_gating.gate_work, ufshcd_gate_work);
+       INIT_WORK(&hba->clk_gating.ungate_work, ufshcd_ungate_work);
+ 
++      spin_lock_init(&hba->clk_gating.lock);
++
+       hba->clk_gating.clk_gating_workq = alloc_ordered_workqueue(
+               "ufs_clk_gating_%d", WQ_MEM_RECLAIM | WQ_HIGHPRI,
+               hba->host->host_no);
+@@ -9194,7 +9191,6 @@ static int ufshcd_setup_clocks(struct ufs_hba *hba, bool on)
+       int ret = 0;
+       struct ufs_clk_info *clki;
+       struct list_head *head = &hba->clk_list_head;
+-      unsigned long flags;
+       ktime_t start = ktime_get();
+       bool clk_state_changed = false;
+ 
+@@ -9245,11 +9241,10 @@ static int ufshcd_setup_clocks(struct ufs_hba *hba, bool on)
+                               clk_disable_unprepare(clki->clk);
+               }
+       } else if (!ret && on) {
+-              spin_lock_irqsave(hba->host->host_lock, flags);
+-              hba->clk_gating.state = CLKS_ON;
++              scoped_guard(spinlock_irqsave, &hba->clk_gating.lock)
++                      hba->clk_gating.state = CLKS_ON;
+               trace_ufshcd_clk_gating(dev_name(hba->dev),
+                                       hba->clk_gating.state);
+-              spin_unlock_irqrestore(hba->host->host_lock, flags);
+       }
+ 
+       if (clk_state_changed)
+diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
+index d5e43a1dcff22..47cba116f87b8 100644
+--- a/include/ufs/ufshcd.h
++++ b/include/ufs/ufshcd.h
+@@ -402,6 +402,9 @@ enum clk_gating_state {
+  * delay_ms
+  * @ungate_work: worker to turn on clocks that will be used in case of
+  * interrupt context
++ * @clk_gating_workq: workqueue for clock gating work.
++ * @lock: serialize access to some struct ufs_clk_gating members. An outer lock
++ * relative to the host lock
+  * @state: the current clocks state
+  * @delay_ms: gating delay in ms
+  * @is_suspended: clk gating is suspended when set to 1 which can be used
+@@ -412,11 +415,14 @@ enum clk_gating_state {
+  * @is_initialized: Indicates whether clock gating is initialized or not
+  * @active_reqs: number of requests that are pending and should be waited for
+  * completion before gating clocks.
+- * @clk_gating_workq: workqueue for clock gating work.
+  */
+ struct ufs_clk_gating {
+       struct delayed_work gate_work;
+       struct work_struct ungate_work;
++      struct workqueue_struct *clk_gating_workq;
++
++      spinlock_t lock;
++
+       enum clk_gating_state state;
+       unsigned long delay_ms;
+       bool is_suspended;
+@@ -425,7 +431,6 @@ struct ufs_clk_gating {
+       bool is_enabled;
+       bool is_initialized;
+       int active_reqs;
+-      struct workqueue_struct *clk_gating_workq;
+ };
+ 
+ /**
+-- 
+2.39.5
+
diff --git a/queue-6.12/scsi-ufs-core-introduce-ufshcd_has_pending_tasks.patch b/queue-6.12/scsi-ufs-core-introduce-ufshcd_has_pending_tasks.patch

new file mode 100644 (file)

index 0000000..885173f
--- /dev/null
+++ b/queue-6.12/scsi-ufs-core-introduce-ufshcd_has_pending_tasks.patch
@@ -0,0 +1,58 @@
+From 12b6e46a31da130d4f8de87ff66b6441ef65db02 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Sun, 24 Nov 2024 09:08:05 +0200
+Subject: scsi: ufs: core: Introduce ufshcd_has_pending_tasks()
+
+From: Avri Altman <avri.altman@wdc.com>
+
+[ Upstream commit e738ba458e7539be1757dcdf85835a5c7b11fad4 ]
+
+Prepare to remove hba->clk_gating.active_reqs check from
+ufshcd_is_ufs_dev_busy().
+
+Signed-off-by: Avri Altman <avri.altman@wdc.com>
+Link: https://lore.kernel.org/r/20241124070808.194860-2-avri.altman@wdc.com
+Reviewed-by: Bart Van Assche <bvanassche@acm.org>
+Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
+Stable-dep-of: 839a74b5649c ("scsi: ufs: Fix toggling of clk_gating.state when clock gating is not allowed")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/ufs/core/ufshcd.c | 13 +++++++++----
+ 1 file changed, 9 insertions(+), 4 deletions(-)
+
+diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
+index b786cba9a270f..94d7992457a3b 100644
+--- a/drivers/ufs/core/ufshcd.c
++++ b/drivers/ufs/core/ufshcd.c
+@@ -258,10 +258,16 @@ ufs_get_desired_pm_lvl_for_dev_link_state(enum ufs_dev_pwr_mode dev_state,
+       return UFS_PM_LVL_0;
+ }
+ 
++static bool ufshcd_has_pending_tasks(struct ufs_hba *hba)
++{
++      return hba->outstanding_tasks || hba->active_uic_cmd ||
++             hba->uic_async_done;
++}
++
+ static bool ufshcd_is_ufs_dev_busy(struct ufs_hba *hba)
+ {
+-      return (hba->clk_gating.active_reqs || hba->outstanding_reqs || hba->outstanding_tasks ||
+-              hba->active_uic_cmd || hba->uic_async_done);
++      return hba->clk_gating.active_reqs || hba->outstanding_reqs ||
++             ufshcd_has_pending_tasks(hba);
+ }
+ 
+ static const struct ufs_dev_quirk ufs_fixups[] = {
+@@ -2023,8 +2029,7 @@ static void __ufshcd_release(struct ufs_hba *hba)
+ 
+       if (hba->clk_gating.active_reqs || hba->clk_gating.is_suspended ||
+           hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL ||
+-          hba->outstanding_tasks || !hba->clk_gating.is_initialized ||
+-          hba->active_uic_cmd || hba->uic_async_done ||
++          ufshcd_has_pending_tasks(hba) || !hba->clk_gating.is_initialized ||
+           hba->clk_gating.state == CLKS_OFF)
+               return;
+ 
+-- 
+2.39.5
+
diff --git a/queue-6.12/scsi-ufs-core-prepare-to-introduce-a-new-clock_gatin.patch b/queue-6.12/scsi-ufs-core-prepare-to-introduce-a-new-clock_gatin.patch

new file mode 100644 (file)

index 0000000..3ee909d
--- /dev/null
+++ b/queue-6.12/scsi-ufs-core-prepare-to-introduce-a-new-clock_gatin.patch
@@ -0,0 +1,61 @@
+From ac2299a3755b2c55968ad57ae2d0676a5d10ade6 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Sun, 24 Nov 2024 09:08:06 +0200
+Subject: scsi: ufs: core: Prepare to introduce a new clock_gating lock
+
+From: Avri Altman <avri.altman@wdc.com>
+
+[ Upstream commit 7869c6521f5715688b3d1f1c897374a68544eef0 ]
+
+Remove hba->clk_gating.active_reqs check from ufshcd_is_ufs_dev_busy()
+function to separate clock gating logic from general device busy checks.
+
+Signed-off-by: Avri Altman <avri.altman@wdc.com>
+Link: https://lore.kernel.org/r/20241124070808.194860-3-avri.altman@wdc.com
+Reviewed-by: Bart Van Assche <bvanassche@acm.org>
+Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
+Stable-dep-of: 839a74b5649c ("scsi: ufs: Fix toggling of clk_gating.state when clock gating is not allowed")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/ufs/core/ufshcd.c | 11 +++++++----
+ 1 file changed, 7 insertions(+), 4 deletions(-)
+
+diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
+index 94d7992457a3b..217619d64940e 100644
+--- a/drivers/ufs/core/ufshcd.c
++++ b/drivers/ufs/core/ufshcd.c
+@@ -266,8 +266,7 @@ static bool ufshcd_has_pending_tasks(struct ufs_hba *hba)
+ 
+ static bool ufshcd_is_ufs_dev_busy(struct ufs_hba *hba)
+ {
+-      return hba->clk_gating.active_reqs || hba->outstanding_reqs ||
+-             ufshcd_has_pending_tasks(hba);
++      return hba->outstanding_reqs || ufshcd_has_pending_tasks(hba);
+ }
+ 
+ static const struct ufs_dev_quirk ufs_fixups[] = {
+@@ -1973,7 +1972,9 @@ static void ufshcd_gate_work(struct work_struct *work)
+               goto rel_lock;
+       }
+ 
+-      if (ufshcd_is_ufs_dev_busy(hba) || hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL)
++      if (ufshcd_is_ufs_dev_busy(hba) ||
++          hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL ||
++          hba->clk_gating.active_reqs)
+               goto rel_lock;
+ 
+       spin_unlock_irqrestore(hba->host->host_lock, flags);
+@@ -8272,7 +8273,9 @@ static void ufshcd_rtc_work(struct work_struct *work)
+       hba = container_of(to_delayed_work(work), struct ufs_hba, ufs_rtc_update_work);
+ 
+        /* Update RTC only when there are no requests in progress and UFSHCI is operational */
+-      if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL)
++      if (!ufshcd_is_ufs_dev_busy(hba) &&
++          hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL &&
++          !hba->clk_gating.active_reqs)
+               ufshcd_update_rtc(hba);
+ 
+       if (ufshcd_is_ufs_dev_active(hba) && hba->dev_info.rtc_update_period)
+-- 
+2.39.5
+
diff --git a/queue-6.12/scsi-ufs-fix-toggling-of-clk_gating.state-when-clock.patch b/queue-6.12/scsi-ufs-fix-toggling-of-clk_gating.state-when-clock.patch

new file mode 100644 (file)

index 0000000..5c00da9
--- /dev/null
+++ b/queue-6.12/scsi-ufs-fix-toggling-of-clk_gating.state-when-clock.patch
@@ -0,0 +1,48 @@
+From 0eb9778926430a8663ccd7169436f98b363a6bf2 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 28 Jan 2025 09:12:07 +0200
+Subject: scsi: ufs: Fix toggling of clk_gating.state when clock gating is not
+ allowed
+
+From: Avri Altman <avri.altman@wdc.com>
+
+[ Upstream commit 839a74b5649c9f41d939a05059b5ca6b17156d03 ]
+
+This commit addresses an issue where clk_gating.state is being toggled in
+ufshcd_setup_clocks() even if clock gating is not allowed.
+
+The fix is to add a check for hba->clk_gating.is_initialized before toggling
+clk_gating.state in ufshcd_setup_clocks().
+
+Since clk_gating.lock is now initialized unconditionally, it can no longer
+lead to the spinlock being used before it is properly initialized, but
+instead it is mostly for documentation purposes.
+
+Fixes: 1ab27c9cf8b6 ("ufs: Add support for clock gating")
+Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
+Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
+Signed-off-by: Avri Altman <avri.altman@wdc.com>
+Link: https://lore.kernel.org/r/20250128071207.75494-3-avri.altman@wdc.com
+Reviewed-by: Bart Van Assche <bvanassche@acm.org>
+Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/ufs/core/ufshcd.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
+index 5682fdcbf2da5..a73fffd6c3de4 100644
+--- a/drivers/ufs/core/ufshcd.c
++++ b/drivers/ufs/core/ufshcd.c
+@@ -9240,7 +9240,7 @@ static int ufshcd_setup_clocks(struct ufs_hba *hba, bool on)
+                       if (!IS_ERR_OR_NULL(clki->clk) && clki->enabled)
+                               clk_disable_unprepare(clki->clk);
+               }
+-      } else if (!ret && on) {
++      } else if (!ret && on && hba->clk_gating.is_initialized) {
+               scoped_guard(spinlock_irqsave, &hba->clk_gating.lock)
+                       hba->clk_gating.state = CLKS_ON;
+               trace_ufshcd_clk_gating(dev_name(hba->dev),
+-- 
+2.39.5
+
diff --git a/queue-6.12/series b/queue-6.12/series

index a470fc62887a4961d43bd485e2e4fc9e23faad2e..34457c5d51f9a1ddc960b553a195d531b9456919 100644 (file)
--- a/queue-6.12/series
+++ b/queue-6.12/series
@@ -91,3 +91,45 @@ orangefs-fix-a-oob-in-orangefs_debug_write.patch
  kbuild-suppress-stdout-from-merge_config-for-silent-.patch
  asoc-intel-bytcr_rt5640-add-dmi-quirk-for-vexia-edu-.patch
  kbuild-use-fzero-init-padding-bits-all.patch
+include-net-add-static-inline-dst_dev_overhead-to-ds.patch
+net-ipv6-ioam6_iptunnel-mitigate-2-realloc-issue.patch
+net-ipv6-seg6_iptunnel-mitigate-2-realloc-issue.patch
+net-ipv6-rpl_iptunnel-mitigate-2-realloc-issue.patch
+net-ipv6-fix-dst-ref-loops-in-rpl-seg6-and-ioam6-lwt.patch
+clocksource-use-pr_info-for-checking-clocksource-syn.patch
+clocksource-use-migrate_disable-to-avoid-calling-get.patch
+scsi-ufs-core-introduce-ufshcd_has_pending_tasks.patch
+scsi-ufs-core-prepare-to-introduce-a-new-clock_gatin.patch
+scsi-ufs-core-introduce-a-new-clock_gating-lock.patch
+scsi-ufs-fix-toggling-of-clk_gating.state-when-clock.patch
+rust-kbuild-add-fzero-init-padding-bits-to-bindgen_s.patch
+cpufreq-amd-pstate-call-cppc_set_epp_perf-in-the-ree.patch
+cpufreq-amd-pstate-align-offline-flow-of-shared-memo.patch
+cpufreq-amd-pstate-refactor-amd_pstate_epp_reenable-.patch
+cpufreq-amd-pstate-remove-the-cppc_state-check-in-of.patch
+cpufreq-amd-pstate-merge-amd_pstate_epp_cpu_offline-.patch
+cpufreq-amd-pstate-convert-mutex-use-to-guard.patch
+cpufreq-amd-pstate-fix-cpufreq_policy-ref-counting.patch
+ipv4-add-rcu-protection-to-ip4_dst_hoplimit.patch
+ipv4-use-rcu-protection-in-ip_dst_mtu_maybe_forward.patch
+net-add-dev_net_rcu-helper.patch
+ipv4-use-rcu-protection-in-ipv4_default_advmss.patch
+ipv4-use-rcu-protection-in-rt_is_expired.patch
+ipv4-use-rcu-protection-in-inet_select_addr.patch
+net-ipv4-cache-pmtu-for-all-packet-paths-if-multipat.patch
+ipv4-use-rcu-protection-in-__ip_rt_update_pmtu.patch
+ipv4-icmp-convert-to-dev_net_rcu.patch
+flow_dissector-use-rcu-protection-to-fetch-dev_net.patch
+ipv6-use-rcu-protection-in-ip6_default_advmss.patch
+ipv6-icmp-convert-to-dev_net_rcu.patch
+hid-hid-steam-make-sure-rumble-work-is-canceled-on-r.patch
+hid-hid-steam-move-hidraw-input-un-registering-to-wo.patch
+ndisc-use-rcu-protection-in-ndisc_alloc_skb.patch
+neighbour-use-rcu-protection-in-__neigh_notify.patch
+arp-use-rcu-protection-in-arp_xmit.patch
+openvswitch-use-rcu-protection-in-ovs_vport_cmd_fill.patch
+ndisc-extend-rcu-protection-in-ndisc_send_skb.patch
+ipv6-mcast-extend-rcu-protection-in-igmp6_send.patch
+btrfs-rename-__get_extent_map-and-pass-btrfs_inode.patch
+btrfs-fix-stale-page-cache-after-race-between-readah.patch
+ipv6-mcast-add-rcu-protection-to-mld_newpack.patch
author	Sasha Levin <sashal@kernel.org>
	Tue, 18 Feb 2025 12:30:04 +0000 (07:30 -0500)
committer	Sasha Levin <sashal@kernel.org>
	Tue, 18 Feb 2025 12:30:47 +0000 (07:30 -0500)
queue-6.12/arp-use-rcu-protection-in-arp_xmit.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/btrfs-fix-stale-page-cache-after-race-between-readah.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/btrfs-rename-__get_extent_map-and-pass-btrfs_inode.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/clocksource-use-migrate_disable-to-avoid-calling-get.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/clocksource-use-pr_info-for-checking-clocksource-syn.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/cpufreq-amd-pstate-align-offline-flow-of-shared-memo.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/cpufreq-amd-pstate-call-cppc_set_epp_perf-in-the-ree.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/cpufreq-amd-pstate-convert-mutex-use-to-guard.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/cpufreq-amd-pstate-fix-cpufreq_policy-ref-counting.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/cpufreq-amd-pstate-merge-amd_pstate_epp_cpu_offline-.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/cpufreq-amd-pstate-refactor-amd_pstate_epp_reenable-.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/cpufreq-amd-pstate-remove-the-cppc_state-check-in-of.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/flow_dissector-use-rcu-protection-to-fetch-dev_net.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/hid-hid-steam-make-sure-rumble-work-is-canceled-on-r.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/hid-hid-steam-move-hidraw-input-un-registering-to-wo.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/include-net-add-static-inline-dst_dev_overhead-to-ds.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/ipv4-add-rcu-protection-to-ip4_dst_hoplimit.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/ipv4-icmp-convert-to-dev_net_rcu.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/ipv4-use-rcu-protection-in-__ip_rt_update_pmtu.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/ipv4-use-rcu-protection-in-inet_select_addr.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/ipv4-use-rcu-protection-in-ip_dst_mtu_maybe_forward.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/ipv4-use-rcu-protection-in-ipv4_default_advmss.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/ipv4-use-rcu-protection-in-rt_is_expired.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/ipv6-icmp-convert-to-dev_net_rcu.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/ipv6-mcast-add-rcu-protection-to-mld_newpack.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/ipv6-mcast-extend-rcu-protection-in-igmp6_send.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/ipv6-use-rcu-protection-in-ip6_default_advmss.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/ndisc-extend-rcu-protection-in-ndisc_send_skb.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/ndisc-use-rcu-protection-in-ndisc_alloc_skb.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/neighbour-use-rcu-protection-in-__neigh_notify.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/net-add-dev_net_rcu-helper.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/net-ipv4-cache-pmtu-for-all-packet-paths-if-multipat.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/net-ipv6-fix-dst-ref-loops-in-rpl-seg6-and-ioam6-lwt.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/net-ipv6-ioam6_iptunnel-mitigate-2-realloc-issue.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/net-ipv6-rpl_iptunnel-mitigate-2-realloc-issue.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/net-ipv6-seg6_iptunnel-mitigate-2-realloc-issue.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/openvswitch-use-rcu-protection-in-ovs_vport_cmd_fill.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/rust-kbuild-add-fzero-init-padding-bits-to-bindgen_s.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/scsi-ufs-core-introduce-a-new-clock_gating-lock.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/scsi-ufs-core-introduce-ufshcd_has_pending_tasks.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/scsi-ufs-core-prepare-to-introduce-a-new-clock_gatin.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/scsi-ufs-fix-toggling-of-clk_gating.state-when-clock.patch	[new file with mode: 0644]	patch \| blob
queue-6.12/series		patch \| blob \| blame \| history