--- /dev/null
+From 016ece15c59d5f3a99685e52040b5887115f182b Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 7 Feb 2025 13:58:36 +0000
+Subject: arp: use RCU protection in arp_xmit()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit a42b69f692165ec39db42d595f4f65a4c8f42e44 ]
+
+arp_xmit() can be called without RTNL or RCU protection.
+
+Use RCU protection to avoid potential UAF.
+
+Fixes: 29a26a568038 ("netfilter: Pass struct net into the netfilter hooks")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: David Ahern <dsahern@kernel.org>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250207135841.1948589-5-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/arp.c | 4 +++-
+ 1 file changed, 3 insertions(+), 1 deletion(-)
+
+diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
+index 11c1519b36993..59ffaa89d7b05 100644
+--- a/net/ipv4/arp.c
++++ b/net/ipv4/arp.c
+@@ -659,10 +659,12 @@ static int arp_xmit_finish(struct net *net, struct sock *sk, struct sk_buff *skb
+ */
+ void arp_xmit(struct sk_buff *skb)
+ {
++ rcu_read_lock();
+ /* Send it off, maybe filter it using firewalling first. */
+ NF_HOOK(NFPROTO_ARP, NF_ARP_OUT,
+- dev_net(skb->dev), NULL, skb, NULL, skb->dev,
++ dev_net_rcu(skb->dev), NULL, skb, NULL, skb->dev,
+ arp_xmit_finish);
++ rcu_read_unlock();
+ }
+ EXPORT_SYMBOL(arp_xmit);
+
+--
+2.39.5
+
--- /dev/null
+From 856009bc6db21a7eff4281b0dfdf4b33375298b9 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 4 Feb 2025 11:02:32 +0000
+Subject: btrfs: fix stale page cache after race between readahead and direct
+ IO write
+
+From: Filipe Manana <fdmanana@suse.com>
+
+[ Upstream commit acc18e1c1d8c0d59d793cf87790ccfcafb1bf5f0 ]
+
+After commit ac325fc2aad5 ("btrfs: do not hold the extent lock for entire
+read") we can now trigger a race between a task doing a direct IO write
+and readahead. When this race is triggered it results in tasks getting
+stale data when they attempt do a buffered read (including the task that
+did the direct IO write).
+
+This race can be sporadically triggered with test case generic/418, failing
+like this:
+
+ $ ./check generic/418
+ FSTYP -- btrfs
+ PLATFORM -- Linux/x86_64 debian0 6.13.0-rc7-btrfs-next-185+ #17 SMP PREEMPT_DYNAMIC Mon Feb 3 12:28:46 WET 2025
+ MKFS_OPTIONS -- /dev/sdc
+ MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
+
+ generic/418 14s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/418.out.bad)
+ --- tests/generic/418.out 2020-06-10 19:29:03.850519863 +0100
+ +++ /home/fdmanana/git/hub/xfstests/results//generic/418.out.bad 2025-02-03 15:42:36.974609476 +0000
+ @@ -1,2 +1,5 @@
+ QA output created by 418
+ +cmpbuf: offset 0: Expected: 0x1, got 0x0
+ +[6:0] FAIL - comparison failed, offset 24576
+ +diotest -wp -b 4096 -n 8 -i 4 failed at loop 3
+ Silence is golden
+ ...
+ (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/418.out /home/fdmanana/git/hub/xfstests/results//generic/418.out.bad' to see the entire diff)
+ Ran: generic/418
+ Failures: generic/418
+ Failed 1 of 1 tests
+
+The race happens like this:
+
+1) A file has a prealloc extent for the range [16K, 28K);
+
+2) Task A starts a direct IO write against file range [24K, 28K).
+ At the start of the direct IO write it invalidates the page cache at
+ __iomap_dio_rw() with kiocb_invalidate_pages() for the 4K page at file
+ offset 24K;
+
+3) Task A enters btrfs_dio_iomap_begin() and locks the extent range
+ [24K, 28K);
+
+4) Task B starts a readahead for file range [16K, 28K), entering
+ btrfs_readahead().
+
+ First it attempts to read the page at offset 16K by entering
+ btrfs_do_readpage(), where it calls get_extent_map(), locks the range
+ [16K, 20K) and gets the extent map for the range [16K, 28K), caching
+ it into the 'em_cached' variable declared in the local stack of
+ btrfs_readahead(), and then unlocks the range [16K, 20K).
+
+ Since the extent map has the prealloc flag, at btrfs_do_readpage() we
+ zero out the page's content and don't submit any bio to read the page
+ from the extent.
+
+ Then it attempts to read the page at offset 20K entering
+ btrfs_do_readpage() where we reuse the previously cached extent map
+ (decided by get_extent_map()) since it spans the page's range and
+ it's still in the inode's extent map tree.
+
+ Just like for the previous page, we zero out the page's content since
+ the extent map has the prealloc flag set.
+
+ Then it attempts to read the page at offset 24K entering
+ btrfs_do_readpage() where we reuse the previously cached extent map
+ (decided by get_extent_map()) since it spans the page's range and
+ it's still in the inode's extent map tree.
+
+ Just like for the previous pages, we zero out the page's content since
+ the extent map has the prealloc flag set. Note that we didn't lock the
+ extent range [24K, 28K), so we didn't synchronize with the ongoing
+ direct IO write being performed by task A;
+
+5) Task A enters btrfs_create_dio_extent() and creates an ordered extent
+ for the range [24K, 28K), with the flags BTRFS_ORDERED_DIRECT and
+ BTRFS_ORDERED_PREALLOC set;
+
+6) Task A unlocks the range [24K, 28K) at btrfs_dio_iomap_begin();
+
+7) The ordered extent enters btrfs_finish_one_ordered() and locks the
+ range [24K, 28K);
+
+8) Task A enters fs/iomap/direct-io.c:iomap_dio_complete() and it tries
+ to invalidate the page at offset 24K by calling
+ kiocb_invalidate_post_direct_write(), resulting in a call chain that
+ ends up at btrfs_release_folio().
+
+ The btrfs_release_folio() call ends up returning false because the range
+ for the page at file offset 24K is currently locked by the task doing
+ the ordered extent completion in the previous step (7), so we have:
+
+ btrfs_release_folio() ->
+ __btrfs_release_folio() ->
+ try_release_extent_mapping() ->
+ try_release_extent_state()
+
+ This last function checking that the range is locked and returning false
+ and propagating it up to btrfs_release_folio().
+
+ So this results in a failure to invalidate the page and
+ kiocb_invalidate_post_direct_write() triggers this message logged in
+ dmesg:
+
+ Page cache invalidation failure on direct I/O. Possible data corruption due to collision with buffered I/O!
+
+ After this we leave the page cache with stale data for the file range
+ [24K, 28K), filled with zeroes instead of the data written by direct IO
+ write (all bytes with a 0x01 value), so any task attempting to read with
+ buffered IO, including the task that did the direct IO write, will get
+ all bytes in the range with a 0x00 value instead of the written data.
+
+Fix this by locking the range, with btrfs_lock_and_flush_ordered_range(),
+at the two callers of btrfs_do_readpage() instead of doing it at
+get_extent_map(), just like we did before commit ac325fc2aad5 ("btrfs: do
+not hold the extent lock for entire read"), and unlocking the range after
+all the calls to btrfs_do_readpage(). This way we never reuse a cached
+extent map without flushing any pending ordered extents from a concurrent
+direct IO write.
+
+Fixes: ac325fc2aad5 ("btrfs: do not hold the extent lock for entire read")
+Reviewed-by: Qu Wenruo <wqu@suse.com>
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ fs/btrfs/extent_io.c | 18 +++++++++++++++---
+ 1 file changed, 15 insertions(+), 3 deletions(-)
+
+diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
+index e6e6c4dc53c48..fe08c983d5bb4 100644
+--- a/fs/btrfs/extent_io.c
++++ b/fs/btrfs/extent_io.c
+@@ -906,7 +906,6 @@ static struct extent_map *get_extent_map(struct btrfs_inode *inode,
+ u64 len, struct extent_map **em_cached)
+ {
+ struct extent_map *em;
+- struct extent_state *cached_state = NULL;
+
+ ASSERT(em_cached);
+
+@@ -922,14 +921,12 @@ static struct extent_map *get_extent_map(struct btrfs_inode *inode,
+ *em_cached = NULL;
+ }
+
+- btrfs_lock_and_flush_ordered_range(inode, start, start + len - 1, &cached_state);
+ em = btrfs_get_extent(inode, folio, start, len);
+ if (!IS_ERR(em)) {
+ BUG_ON(*em_cached);
+ refcount_inc(&em->refs);
+ *em_cached = em;
+ }
+- unlock_extent(&inode->io_tree, start, start + len - 1, &cached_state);
+
+ return em;
+ }
+@@ -1086,11 +1083,18 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
+
+ int btrfs_read_folio(struct file *file, struct folio *folio)
+ {
++ struct btrfs_inode *inode = folio_to_inode(folio);
++ const u64 start = folio_pos(folio);
++ const u64 end = start + folio_size(folio) - 1;
++ struct extent_state *cached_state = NULL;
+ struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ };
+ struct extent_map *em_cached = NULL;
+ int ret;
+
++ btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state);
+ ret = btrfs_do_readpage(folio, &em_cached, &bio_ctrl, NULL);
++ unlock_extent(&inode->io_tree, start, end, &cached_state);
++
+ free_extent_map(em_cached);
+
+ /*
+@@ -2267,12 +2271,20 @@ void btrfs_readahead(struct readahead_control *rac)
+ {
+ struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ | REQ_RAHEAD };
+ struct folio *folio;
++ struct btrfs_inode *inode = BTRFS_I(rac->mapping->host);
++ const u64 start = readahead_pos(rac);
++ const u64 end = start + readahead_length(rac) - 1;
++ struct extent_state *cached_state = NULL;
+ struct extent_map *em_cached = NULL;
+ u64 prev_em_start = (u64)-1;
+
++ btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state);
++
+ while ((folio = readahead_folio(rac)) != NULL)
+ btrfs_do_readpage(folio, &em_cached, &bio_ctrl, &prev_em_start);
+
++ unlock_extent(&inode->io_tree, start, end, &cached_state);
++
+ if (em_cached)
+ free_extent_map(em_cached);
+ submit_one_bio(&bio_ctrl);
+--
+2.39.5
+
--- /dev/null
+From 693ad002410794be6b07348c6324c687becc4ec4 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Thu, 9 Jan 2025 11:24:15 +0100
+Subject: btrfs: rename __get_extent_map() and pass btrfs_inode
+
+From: David Sterba <dsterba@suse.com>
+
+[ Upstream commit 06de96faf795b5c276a3be612da6b08c6112e747 ]
+
+The double underscore naming scheme does not apply here, there's only
+only get_extent_map(). As the definition is changed also pass the struct
+btrfs_inode.
+
+Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
+Reviewed-by: Anand Jain <anand.jain@oracle.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Stable-dep-of: acc18e1c1d8c ("btrfs: fix stale page cache after race between readahead and direct IO write")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ fs/btrfs/extent_io.c | 15 +++++++--------
+ 1 file changed, 7 insertions(+), 8 deletions(-)
+
+diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
+index 42c9899d9241c..e6e6c4dc53c48 100644
+--- a/fs/btrfs/extent_io.c
++++ b/fs/btrfs/extent_io.c
+@@ -901,9 +901,9 @@ void clear_folio_extent_mapped(struct folio *folio)
+ folio_detach_private(folio);
+ }
+
+-static struct extent_map *__get_extent_map(struct inode *inode,
+- struct folio *folio, u64 start,
+- u64 len, struct extent_map **em_cached)
++static struct extent_map *get_extent_map(struct btrfs_inode *inode,
++ struct folio *folio, u64 start,
++ u64 len, struct extent_map **em_cached)
+ {
+ struct extent_map *em;
+ struct extent_state *cached_state = NULL;
+@@ -922,14 +922,14 @@ static struct extent_map *__get_extent_map(struct inode *inode,
+ *em_cached = NULL;
+ }
+
+- btrfs_lock_and_flush_ordered_range(BTRFS_I(inode), start, start + len - 1, &cached_state);
+- em = btrfs_get_extent(BTRFS_I(inode), folio, start, len);
++ btrfs_lock_and_flush_ordered_range(inode, start, start + len - 1, &cached_state);
++ em = btrfs_get_extent(inode, folio, start, len);
+ if (!IS_ERR(em)) {
+ BUG_ON(*em_cached);
+ refcount_inc(&em->refs);
+ *em_cached = em;
+ }
+- unlock_extent(&BTRFS_I(inode)->io_tree, start, start + len - 1, &cached_state);
++ unlock_extent(&inode->io_tree, start, start + len - 1, &cached_state);
+
+ return em;
+ }
+@@ -985,8 +985,7 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
+ end_folio_read(folio, true, cur, iosize);
+ break;
+ }
+- em = __get_extent_map(inode, folio, cur, end - cur + 1,
+- em_cached);
++ em = get_extent_map(BTRFS_I(inode), folio, cur, end - cur + 1, em_cached);
+ if (IS_ERR(em)) {
+ end_folio_read(folio, false, cur, end + 1 - cur);
+ return PTR_ERR(em);
+--
+2.39.5
+
--- /dev/null
+From 260b25a3327f0749a6dde43fe624cd790fde8b01 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 31 Jan 2025 12:33:23 -0500
+Subject: clocksource: Use migrate_disable() to avoid calling get_random_u32()
+ in atomic context
+
+From: Waiman Long <longman@redhat.com>
+
+[ Upstream commit 6bb05a33337b2c842373857b63de5c9bf1ae2a09 ]
+
+The following bug report happened with a PREEMPT_RT kernel:
+
+ BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
+ in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2012, name: kwatchdog
+ preempt_count: 1, expected: 0
+ RCU nest depth: 0, expected: 0
+ get_random_u32+0x4f/0x110
+ clocksource_verify_choose_cpus+0xab/0x1a0
+ clocksource_verify_percpu.part.0+0x6b/0x330
+ clocksource_watchdog_kthread+0x193/0x1a0
+
+It is due to the fact that clocksource_verify_choose_cpus() is invoked with
+preemption disabled. This function invokes get_random_u32() to obtain
+random numbers for choosing CPUs. The batched_entropy_32 local lock and/or
+the base_crng.lock spinlock in driver/char/random.c will be acquired during
+the call. In PREEMPT_RT kernel, they are both sleeping locks and so cannot
+be acquired in atomic context.
+
+Fix this problem by using migrate_disable() to allow smp_processor_id() to
+be reliably used without introducing atomic context. preempt_disable() is
+then called after clocksource_verify_choose_cpus() but before the
+clocksource measurement is being run to avoid introducing unexpected
+latency.
+
+Fixes: 7560c02bdffb ("clocksource: Check per-CPU clock synchronization when marked unstable")
+Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Signed-off-by: Waiman Long <longman@redhat.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
+Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
+Link: https://lore.kernel.org/all/20250131173323.891943-2-longman@redhat.com
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ kernel/time/clocksource.c | 6 ++++--
+ 1 file changed, 4 insertions(+), 2 deletions(-)
+
+diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
+index c4e6b5e6af88c..58fb7280cabbe 100644
+--- a/kernel/time/clocksource.c
++++ b/kernel/time/clocksource.c
+@@ -365,10 +365,10 @@ void clocksource_verify_percpu(struct clocksource *cs)
+ cpumask_clear(&cpus_ahead);
+ cpumask_clear(&cpus_behind);
+ cpus_read_lock();
+- preempt_disable();
++ migrate_disable();
+ clocksource_verify_choose_cpus();
+ if (cpumask_empty(&cpus_chosen)) {
+- preempt_enable();
++ migrate_enable();
+ cpus_read_unlock();
+ pr_warn("Not enough CPUs to check clocksource '%s'.\n", cs->name);
+ return;
+@@ -376,6 +376,7 @@ void clocksource_verify_percpu(struct clocksource *cs)
+ testcpu = smp_processor_id();
+ pr_info("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n",
+ cs->name, testcpu, cpumask_pr_args(&cpus_chosen));
++ preempt_disable();
+ for_each_cpu(cpu, &cpus_chosen) {
+ if (cpu == testcpu)
+ continue;
+@@ -395,6 +396,7 @@ void clocksource_verify_percpu(struct clocksource *cs)
+ cs_nsec_min = cs_nsec;
+ }
+ preempt_enable();
++ migrate_enable();
+ cpus_read_unlock();
+ if (!cpumask_empty(&cpus_ahead))
+ pr_warn(" CPUs %*pbl ahead of CPU %d for clocksource %s.\n",
+--
+2.39.5
+
--- /dev/null
+From db18d29bc84d9ca827c9acfd807524ca58beb127 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 24 Jan 2025 20:54:41 -0500
+Subject: clocksource: Use pr_info() for "Checking clocksource synchronization"
+ message
+
+From: Waiman Long <longman@redhat.com>
+
+[ Upstream commit 1f566840a82982141f94086061927a90e79440e5 ]
+
+The "Checking clocksource synchronization" message is normally printed
+when clocksource_verify_percpu() is called for a given clocksource if
+both the CLOCK_SOURCE_UNSTABLE and CLOCK_SOURCE_VERIFY_PERCPU flags
+are set.
+
+It is an informational message and so pr_info() is the correct choice.
+
+Signed-off-by: Waiman Long <longman@redhat.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
+Acked-by: John Stultz <jstultz@google.com>
+Link: https://lore.kernel.org/all/20250125015442.3740588-1-longman@redhat.com
+Stable-dep-of: 6bb05a33337b ("clocksource: Use migrate_disable() to avoid calling get_random_u32() in atomic context")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ kernel/time/clocksource.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
+index 8a40a616288b8..c4e6b5e6af88c 100644
+--- a/kernel/time/clocksource.c
++++ b/kernel/time/clocksource.c
+@@ -374,7 +374,8 @@ void clocksource_verify_percpu(struct clocksource *cs)
+ return;
+ }
+ testcpu = smp_processor_id();
+- pr_warn("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n", cs->name, testcpu, cpumask_pr_args(&cpus_chosen));
++ pr_info("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n",
++ cs->name, testcpu, cpumask_pr_args(&cpus_chosen));
+ for_each_cpu(cpu, &cpus_chosen) {
+ if (cpu == testcpu)
+ continue;
+--
+2.39.5
+
--- /dev/null
+From 3b7e555c9f6f68ea4d1a91938ffe89fd9b9f37d2 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 23 Oct 2024 10:21:12 +0000
+Subject: cpufreq/amd-pstate: Align offline flow of shared memory and MSR based
+ systems
+
+From: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+
+[ Upstream commit a6960e6b1b0e2cb268f427a99040c408a8d10665 ]
+
+Set min_perf to lowest_perf for shared memory systems, similar to the MSR
+based systems.
+
+Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
+Link: https://lore.kernel.org/r/20241023102108.5980-5-Dhananjay.Ugwekar@amd.com
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 161334937090c..895d108428b40 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -1636,6 +1636,7 @@ static void amd_pstate_epp_offline(struct cpufreq_policy *policy)
+ wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
+ } else {
+ perf_ctrls.desired_perf = 0;
++ perf_ctrls.min_perf = min_perf;
+ perf_ctrls.max_perf = min_perf;
+ cppc_set_perf(cpudata->cpu, &perf_ctrls);
+ perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_BALANCE_POWERSAVE);
+--
+2.39.5
+
--- /dev/null
+From 4de4224d57ce3204800eb03d8d2036e2e3c9ccf0 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 23 Oct 2024 10:21:10 +0000
+Subject: cpufreq/amd-pstate: Call cppc_set_epp_perf in the reenable function
+
+From: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+
+[ Upstream commit 796ff50e127af8362035f87ba29b6b84e2dd9742 ]
+
+The EPP value being set in perf_ctrls.energy_perf is not being propagated
+to the shared memory, fix that.
+
+Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Reviewed-by: Perry Yuan <perry.yuan@amd.com>
+Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
+Link: https://lore.kernel.org/r/20241023102108.5980-4-Dhananjay.Ugwekar@amd.com
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 6 ++++--
+ 1 file changed, 4 insertions(+), 2 deletions(-)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 91d3c3b1c2d3b..161334937090c 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -1594,8 +1594,9 @@ static void amd_pstate_epp_reenable(struct amd_cpudata *cpudata)
+ wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
+ } else {
+ perf_ctrls.max_perf = max_perf;
+- perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(cpudata->epp_cached);
+ cppc_set_perf(cpudata->cpu, &perf_ctrls);
++ perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(cpudata->epp_cached);
++ cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
+ }
+ }
+
+@@ -1636,8 +1637,9 @@ static void amd_pstate_epp_offline(struct cpufreq_policy *policy)
+ } else {
+ perf_ctrls.desired_perf = 0;
+ perf_ctrls.max_perf = min_perf;
+- perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_BALANCE_POWERSAVE);
+ cppc_set_perf(cpudata->cpu, &perf_ctrls);
++ perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_BALANCE_POWERSAVE);
++ cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
+ }
+ mutex_unlock(&amd_pstate_limits_lock);
+ }
+--
+2.39.5
+
--- /dev/null
+From ea16a7ceb11c3ba3d1179b32d4abce18d3272a18 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Mon, 9 Dec 2024 12:52:37 -0600
+Subject: cpufreq/amd-pstate: convert mutex use to guard()
+
+From: Mario Limonciello <mario.limonciello@amd.com>
+
+[ Upstream commit 6c093d5a5b73ec1caf1e706510ae6031af2f9d43 ]
+
+Using scoped guard declaration will unlock mutexes automatically.
+
+Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
+Link: https://lore.kernel.org/r/20241209185248.16301-5-mario.limonciello@amd.com
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 32 ++++++++++++--------------------
+ 1 file changed, 12 insertions(+), 20 deletions(-)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 145a48fc49034..33777f5ab7d16 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -696,12 +696,12 @@ static int amd_pstate_set_boost(struct cpufreq_policy *policy, int state)
+ pr_err("Boost mode is not supported by this processor or SBIOS\n");
+ return -EOPNOTSUPP;
+ }
+- mutex_lock(&amd_pstate_driver_lock);
++ guard(mutex)(&amd_pstate_driver_lock);
++
+ ret = amd_pstate_cpu_boost_update(policy, state);
+ WRITE_ONCE(cpudata->boost_state, !ret ? state : false);
+ policy->boost_enabled = !ret ? state : false;
+ refresh_frequency_limits(policy);
+- mutex_unlock(&amd_pstate_driver_lock);
+
+ return ret;
+ }
+@@ -792,7 +792,8 @@ static void amd_pstate_update_limits(unsigned int cpu)
+ if (!amd_pstate_prefcore)
+ return;
+
+- mutex_lock(&amd_pstate_driver_lock);
++ guard(mutex)(&amd_pstate_driver_lock);
++
+ ret = amd_get_highest_perf(cpu, &cur_high);
+ if (ret)
+ goto free_cpufreq_put;
+@@ -812,7 +813,6 @@ static void amd_pstate_update_limits(unsigned int cpu)
+ if (!highest_perf_changed)
+ cpufreq_update_policy(cpu);
+
+- mutex_unlock(&amd_pstate_driver_lock);
+ }
+
+ /*
+@@ -1145,11 +1145,11 @@ static ssize_t store_energy_performance_preference(
+ if (ret < 0)
+ return -EINVAL;
+
+- mutex_lock(&amd_pstate_limits_lock);
++ guard(mutex)(&amd_pstate_limits_lock);
++
+ ret = amd_pstate_set_energy_pref_index(cpudata, ret);
+- mutex_unlock(&amd_pstate_limits_lock);
+
+- return ret ?: count;
++ return ret ? ret : count;
+ }
+
+ static ssize_t show_energy_performance_preference(
+@@ -1297,13 +1297,10 @@ EXPORT_SYMBOL_GPL(amd_pstate_update_status);
+ static ssize_t status_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+ {
+- ssize_t ret;
+
+- mutex_lock(&amd_pstate_driver_lock);
+- ret = amd_pstate_show_status(buf);
+- mutex_unlock(&amd_pstate_driver_lock);
++ guard(mutex)(&amd_pstate_driver_lock);
+
+- return ret;
++ return amd_pstate_show_status(buf);
+ }
+
+ static ssize_t status_store(struct device *a, struct device_attribute *b,
+@@ -1312,9 +1309,8 @@ static ssize_t status_store(struct device *a, struct device_attribute *b,
+ char *p = memchr(buf, '\n', count);
+ int ret;
+
+- mutex_lock(&amd_pstate_driver_lock);
++ guard(mutex)(&amd_pstate_driver_lock);
+ ret = amd_pstate_update_status(buf, p ? p - buf : count);
+- mutex_unlock(&amd_pstate_driver_lock);
+
+ return ret < 0 ? ret : count;
+ }
+@@ -1614,13 +1610,11 @@ static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy)
+
+ min_perf = READ_ONCE(cpudata->lowest_perf);
+
+- mutex_lock(&amd_pstate_limits_lock);
++ guard(mutex)(&amd_pstate_limits_lock);
+
+ amd_pstate_update_perf(cpudata, min_perf, 0, min_perf, false);
+ amd_pstate_set_epp(cpudata, AMD_CPPC_EPP_BALANCE_POWERSAVE);
+
+- mutex_unlock(&amd_pstate_limits_lock);
+-
+ return 0;
+ }
+
+@@ -1656,13 +1650,11 @@ static int amd_pstate_epp_resume(struct cpufreq_policy *policy)
+ struct amd_cpudata *cpudata = policy->driver_data;
+
+ if (cpudata->suspended) {
+- mutex_lock(&amd_pstate_limits_lock);
++ guard(mutex)(&amd_pstate_limits_lock);
+
+ /* enable amd pstate from suspend state*/
+ amd_pstate_epp_reenable(cpudata);
+
+- mutex_unlock(&amd_pstate_limits_lock);
+-
+ cpudata->suspended = false;
+ }
+
+--
+2.39.5
+
--- /dev/null
+From 71d244e6199e6d1092534fb1e982ffc84590ce97 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 11:25:20 +0000
+Subject: cpufreq/amd-pstate: Fix cpufreq_policy ref counting
+
+From: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
+
+[ Upstream commit 3ace20038e19f23fe73259513f1f08d4bf1a3c83 ]
+
+amd_pstate_update_limits() takes a cpufreq_policy reference but doesn't
+decrement the refcount in one of the exit paths, fix that.
+
+Fixes: 45722e777fd9 ("cpufreq: amd-pstate: Optimize amd_pstate_update_limits()")
+Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Link: https://lore.kernel.org/r/20250205112523.201101-10-dhananjay.ugwekar@amd.com
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 9 +++++----
+ 1 file changed, 5 insertions(+), 4 deletions(-)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 33777f5ab7d16..bdfd8ffe04398 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -778,20 +778,21 @@ static void amd_pstate_init_prefcore(struct amd_cpudata *cpudata)
+
+ static void amd_pstate_update_limits(unsigned int cpu)
+ {
+- struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
++ struct cpufreq_policy *policy = NULL;
+ struct amd_cpudata *cpudata;
+ u32 prev_high = 0, cur_high = 0;
+ int ret;
+ bool highest_perf_changed = false;
+
++ if (!amd_pstate_prefcore)
++ return;
++
++ policy = cpufreq_cpu_get(cpu);
+ if (!policy)
+ return;
+
+ cpudata = policy->driver_data;
+
+- if (!amd_pstate_prefcore)
+- return;
+-
+ guard(mutex)(&amd_pstate_driver_lock);
+
+ ret = amd_get_highest_perf(cpu, &cur_high);
+--
+2.39.5
+
--- /dev/null
+From 286c4edf998d184d941cc1f0b2cc45da6f805126 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 4 Dec 2024 14:48:42 +0000
+Subject: cpufreq/amd-pstate: Merge amd_pstate_epp_cpu_offline() and
+ amd_pstate_epp_offline()
+
+From: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+
+[ Upstream commit 53ec2101dfede8fecdd240662281a12e537c3411 ]
+
+amd_pstate_epp_offline() is only called from within
+amd_pstate_epp_cpu_offline() and doesn't make much sense to have it at all.
+Hence, remove it.
+
+Also remove the unncessary debug print in the offline path while at it.
+
+Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Link: https://lore.kernel.org/r/20241204144842.164178-6-Dhananjay.Ugwekar@amd.com
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 17 ++++-------------
+ 1 file changed, 4 insertions(+), 13 deletions(-)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 4dfe5bdcb2932..145a48fc49034 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -1604,11 +1604,14 @@ static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy)
+ return 0;
+ }
+
+-static void amd_pstate_epp_offline(struct cpufreq_policy *policy)
++static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy)
+ {
+ struct amd_cpudata *cpudata = policy->driver_data;
+ int min_perf;
+
++ if (cpudata->suspended)
++ return 0;
++
+ min_perf = READ_ONCE(cpudata->lowest_perf);
+
+ mutex_lock(&amd_pstate_limits_lock);
+@@ -1617,18 +1620,6 @@ static void amd_pstate_epp_offline(struct cpufreq_policy *policy)
+ amd_pstate_set_epp(cpudata, AMD_CPPC_EPP_BALANCE_POWERSAVE);
+
+ mutex_unlock(&amd_pstate_limits_lock);
+-}
+-
+-static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy)
+-{
+- struct amd_cpudata *cpudata = policy->driver_data;
+-
+- pr_debug("AMD CPU Core %d going offline\n", cpudata->cpu);
+-
+- if (cpudata->suspended)
+- return 0;
+-
+- amd_pstate_epp_offline(policy);
+
+ return 0;
+ }
+--
+2.39.5
+
--- /dev/null
+From 915d2fe2dd57d8d19bda8de19196eadd04f0eef6 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 4 Dec 2024 14:48:40 +0000
+Subject: cpufreq/amd-pstate: Refactor amd_pstate_epp_reenable() and
+ amd_pstate_epp_offline()
+
+From: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+
+[ Upstream commit b1089e0c8817fda93d474eaa82ad86386887aefe ]
+
+Replace similar code chunks with amd_pstate_update_perf() and
+amd_pstate_set_epp() function calls.
+
+Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
+Link: https://lore.kernel.org/r/20241204144842.164178-4-Dhananjay.Ugwekar@amd.com
+[ML: Fix LKP reported error about unused variable]
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 38 +++++++-----------------------------
+ 1 file changed, 7 insertions(+), 31 deletions(-)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 895d108428b40..19906141ef7fe 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -1579,25 +1579,17 @@ static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy)
+
+ static void amd_pstate_epp_reenable(struct amd_cpudata *cpudata)
+ {
+- struct cppc_perf_ctrls perf_ctrls;
+- u64 value, max_perf;
++ u64 max_perf;
+ int ret;
+
+ ret = amd_pstate_enable(true);
+ if (ret)
+ pr_err("failed to enable amd pstate during resume, return %d\n", ret);
+
+- value = READ_ONCE(cpudata->cppc_req_cached);
+ max_perf = READ_ONCE(cpudata->highest_perf);
+
+- if (cpu_feature_enabled(X86_FEATURE_CPPC)) {
+- wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
+- } else {
+- perf_ctrls.max_perf = max_perf;
+- cppc_set_perf(cpudata->cpu, &perf_ctrls);
+- perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(cpudata->epp_cached);
+- cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
+- }
++ amd_pstate_update_perf(cpudata, 0, 0, max_perf, false);
++ amd_pstate_set_epp(cpudata, cpudata->epp_cached);
+ }
+
+ static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy)
+@@ -1617,31 +1609,15 @@ static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy)
+ static void amd_pstate_epp_offline(struct cpufreq_policy *policy)
+ {
+ struct amd_cpudata *cpudata = policy->driver_data;
+- struct cppc_perf_ctrls perf_ctrls;
+ int min_perf;
+- u64 value;
+
+ min_perf = READ_ONCE(cpudata->lowest_perf);
+- value = READ_ONCE(cpudata->cppc_req_cached);
+
+ mutex_lock(&amd_pstate_limits_lock);
+- if (cpu_feature_enabled(X86_FEATURE_CPPC)) {
+- cpudata->epp_policy = CPUFREQ_POLICY_UNKNOWN;
+-
+- /* Set max perf same as min perf */
+- value &= ~AMD_CPPC_MAX_PERF(~0L);
+- value |= AMD_CPPC_MAX_PERF(min_perf);
+- value &= ~AMD_CPPC_MIN_PERF(~0L);
+- value |= AMD_CPPC_MIN_PERF(min_perf);
+- wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value);
+- } else {
+- perf_ctrls.desired_perf = 0;
+- perf_ctrls.min_perf = min_perf;
+- perf_ctrls.max_perf = min_perf;
+- cppc_set_perf(cpudata->cpu, &perf_ctrls);
+- perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_BALANCE_POWERSAVE);
+- cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1);
+- }
++
++ amd_pstate_update_perf(cpudata, min_perf, 0, min_perf, false);
++ amd_pstate_set_epp(cpudata, AMD_CPPC_EPP_BALANCE_POWERSAVE);
++
+ mutex_unlock(&amd_pstate_limits_lock);
+ }
+
+--
+2.39.5
+
--- /dev/null
+From 07686cd20c1fa7f6b54a47e099382aae311067c0 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 4 Dec 2024 14:48:41 +0000
+Subject: cpufreq/amd-pstate: Remove the cppc_state check in offline/online
+ functions
+
+From: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+
+[ Upstream commit b78f8c87ec3e7499bb049986838636d3afbc7ece ]
+
+Only amd_pstate_epp driver (i.e. cppc_state = ACTIVE) enters the
+amd_pstate_epp_offline() and amd_pstate_epp_cpu_online() functions,
+so remove the unnecessary if condition checking if cppc_state is
+equal to AMD_PSTATE_ACTIVE.
+
+Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
+Link: https://lore.kernel.org/r/20241204144842.164178-5-Dhananjay.Ugwekar@amd.com
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/cpufreq/amd-pstate.c | 9 +++------
+ 1 file changed, 3 insertions(+), 6 deletions(-)
+
+diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
+index 19906141ef7fe..4dfe5bdcb2932 100644
+--- a/drivers/cpufreq/amd-pstate.c
++++ b/drivers/cpufreq/amd-pstate.c
+@@ -1598,10 +1598,8 @@ static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy)
+
+ pr_debug("AMD CPU Core %d going online\n", cpudata->cpu);
+
+- if (cppc_state == AMD_PSTATE_ACTIVE) {
+- amd_pstate_epp_reenable(cpudata);
+- cpudata->suspended = false;
+- }
++ amd_pstate_epp_reenable(cpudata);
++ cpudata->suspended = false;
+
+ return 0;
+ }
+@@ -1630,8 +1628,7 @@ static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy)
+ if (cpudata->suspended)
+ return 0;
+
+- if (cppc_state == AMD_PSTATE_ACTIVE)
+- amd_pstate_epp_offline(policy);
++ amd_pstate_epp_offline(policy);
+
+ return 0;
+ }
+--
+2.39.5
+
--- /dev/null
+From 081a12d3c0b2ff44f3d44ba992b88a9e99d725cf Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:17 +0000
+Subject: flow_dissector: use RCU protection to fetch dev_net()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit afec62cd0a4191cde6dd3a75382be4d51a38ce9b ]
+
+__skb_flow_dissect() can be called from arbitrary contexts.
+
+It must extend its RCU protection section to include
+the call to dev_net(), which can become dev_net_rcu().
+
+This makes sure the net structure can not disappear under us.
+
+Fixes: 9b52e3f267a6 ("flow_dissector: handle no-skb use case")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-10-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/core/flow_dissector.c | 21 +++++++++++----------
+ 1 file changed, 11 insertions(+), 10 deletions(-)
+
+diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
+index 0e638a37aa096..5db41bf2ed93e 100644
+--- a/net/core/flow_dissector.c
++++ b/net/core/flow_dissector.c
+@@ -1108,10 +1108,12 @@ bool __skb_flow_dissect(const struct net *net,
+ FLOW_DISSECTOR_KEY_BASIC,
+ target_container);
+
++ rcu_read_lock();
++
+ if (skb) {
+ if (!net) {
+ if (skb->dev)
+- net = dev_net(skb->dev);
++ net = dev_net_rcu(skb->dev);
+ else if (skb->sk)
+ net = sock_net(skb->sk);
+ }
+@@ -1122,7 +1124,6 @@ bool __skb_flow_dissect(const struct net *net,
+ enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR;
+ struct bpf_prog_array *run_array;
+
+- rcu_read_lock();
+ run_array = rcu_dereference(init_net.bpf.run_array[type]);
+ if (!run_array)
+ run_array = rcu_dereference(net->bpf.run_array[type]);
+@@ -1150,17 +1151,17 @@ bool __skb_flow_dissect(const struct net *net,
+ prog = READ_ONCE(run_array->items[0].prog);
+ result = bpf_flow_dissect(prog, &ctx, n_proto, nhoff,
+ hlen, flags);
+- if (result == BPF_FLOW_DISSECTOR_CONTINUE)
+- goto dissect_continue;
+- __skb_flow_bpf_to_target(&flow_keys, flow_dissector,
+- target_container);
+- rcu_read_unlock();
+- return result == BPF_OK;
++ if (result != BPF_FLOW_DISSECTOR_CONTINUE) {
++ __skb_flow_bpf_to_target(&flow_keys, flow_dissector,
++ target_container);
++ rcu_read_unlock();
++ return result == BPF_OK;
++ }
+ }
+-dissect_continue:
+- rcu_read_unlock();
+ }
+
++ rcu_read_unlock();
++
+ if (dissector_uses_key(flow_dissector,
+ FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+ struct ethhdr *eth = eth_hdr(skb);
+--
+2.39.5
+
--- /dev/null
+From 185b5f815034c6e426816260490fa8b26c340ad5 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 25 Dec 2024 18:34:24 -0800
+Subject: HID: hid-steam: Make sure rumble work is canceled on removal
+
+From: Vicki Pfau <vi@endrift.com>
+
+[ Upstream commit cc4f952427aaa44ecfd92542e10a65cce67bd6f4 ]
+
+When a force feedback command is sent from userspace, work is scheduled to pass
+this data to the controller without blocking userspace itself. However, in
+theory, this work might not be properly canceled if the controller is removed
+at the exact right time. This patch ensures the work is properly canceled when
+the device is removed.
+
+Signed-off-by: Vicki Pfau <vi@endrift.com>
+Signed-off-by: Jiri Kosina <jkosina@suse.com>
+Stable-dep-of: 79504249d7e2 ("HID: hid-steam: Move hidraw input (un)registering to work")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/hid/hid-steam.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/drivers/hid/hid-steam.c b/drivers/hid/hid-steam.c
+index 9b6aec0733ae6..daca250e51c8b 100644
+--- a/drivers/hid/hid-steam.c
++++ b/drivers/hid/hid-steam.c
+@@ -1306,6 +1306,7 @@ static void steam_remove(struct hid_device *hdev)
+
+ cancel_delayed_work_sync(&steam->mode_switch);
+ cancel_work_sync(&steam->work_connect);
++ cancel_work_sync(&steam->rumble_work);
+ hid_destroy_device(steam->client_hdev);
+ steam->client_hdev = NULL;
+ steam->client_opened = 0;
+--
+2.39.5
+
--- /dev/null
+From 5bc8accc48c65d965f70c8918cd314ddb1ac3ece Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 4 Feb 2025 19:55:27 -0800
+Subject: HID: hid-steam: Move hidraw input (un)registering to work
+
+From: Vicki Pfau <vi@endrift.com>
+
+[ Upstream commit 79504249d7e27cad4a3eeb9afc6386e418728ce0 ]
+
+Due to an interplay between locking in the input and hid transport subsystems,
+attempting to register or deregister the relevant input devices during the
+hidraw open/close events can lead to a lock ordering issue. Though this
+shouldn't cause a deadlock, this commit moves the input device manipulation to
+deferred work to sidestep the issue.
+
+Fixes: 385a4886778f6 ("HID: steam: remove input device when a hid client is running.")
+Signed-off-by: Vicki Pfau <vi@endrift.com>
+Signed-off-by: Jiri Kosina <jkosina@suse.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/hid/hid-steam.c | 38 +++++++++++++++++++++++++++++++-------
+ 1 file changed, 31 insertions(+), 7 deletions(-)
+
+diff --git a/drivers/hid/hid-steam.c b/drivers/hid/hid-steam.c
+index daca250e51c8b..7b35966898785 100644
+--- a/drivers/hid/hid-steam.c
++++ b/drivers/hid/hid-steam.c
+@@ -313,6 +313,7 @@ struct steam_device {
+ u16 rumble_left;
+ u16 rumble_right;
+ unsigned int sensor_timestamp_us;
++ struct work_struct unregister_work;
+ };
+
+ static int steam_recv_report(struct steam_device *steam,
+@@ -1072,6 +1073,31 @@ static void steam_mode_switch_cb(struct work_struct *work)
+ }
+ }
+
++static void steam_work_unregister_cb(struct work_struct *work)
++{
++ struct steam_device *steam = container_of(work, struct steam_device,
++ unregister_work);
++ unsigned long flags;
++ bool connected;
++ bool opened;
++
++ spin_lock_irqsave(&steam->lock, flags);
++ opened = steam->client_opened;
++ connected = steam->connected;
++ spin_unlock_irqrestore(&steam->lock, flags);
++
++ if (connected) {
++ if (opened) {
++ steam_sensors_unregister(steam);
++ steam_input_unregister(steam);
++ } else {
++ steam_set_lizard_mode(steam, lizard_mode);
++ steam_input_register(steam);
++ steam_sensors_register(steam);
++ }
++ }
++}
++
+ static bool steam_is_valve_interface(struct hid_device *hdev)
+ {
+ struct hid_report_enum *rep_enum;
+@@ -1117,8 +1143,7 @@ static int steam_client_ll_open(struct hid_device *hdev)
+ steam->client_opened++;
+ spin_unlock_irqrestore(&steam->lock, flags);
+
+- steam_sensors_unregister(steam);
+- steam_input_unregister(steam);
++ schedule_work(&steam->unregister_work);
+
+ return 0;
+ }
+@@ -1135,11 +1160,7 @@ static void steam_client_ll_close(struct hid_device *hdev)
+ connected = steam->connected && !steam->client_opened;
+ spin_unlock_irqrestore(&steam->lock, flags);
+
+- if (connected) {
+- steam_set_lizard_mode(steam, lizard_mode);
+- steam_input_register(steam);
+- steam_sensors_register(steam);
+- }
++ schedule_work(&steam->unregister_work);
+ }
+
+ static int steam_client_ll_raw_request(struct hid_device *hdev,
+@@ -1231,6 +1252,7 @@ static int steam_probe(struct hid_device *hdev,
+ INIT_LIST_HEAD(&steam->list);
+ INIT_WORK(&steam->rumble_work, steam_haptic_rumble_cb);
+ steam->sensor_timestamp_us = 0;
++ INIT_WORK(&steam->unregister_work, steam_work_unregister_cb);
+
+ /*
+ * With the real steam controller interface, do not connect hidraw.
+@@ -1291,6 +1313,7 @@ static int steam_probe(struct hid_device *hdev,
+ cancel_work_sync(&steam->work_connect);
+ cancel_delayed_work_sync(&steam->mode_switch);
+ cancel_work_sync(&steam->rumble_work);
++ cancel_work_sync(&steam->unregister_work);
+
+ return ret;
+ }
+@@ -1307,6 +1330,7 @@ static void steam_remove(struct hid_device *hdev)
+ cancel_delayed_work_sync(&steam->mode_switch);
+ cancel_work_sync(&steam->work_connect);
+ cancel_work_sync(&steam->rumble_work);
++ cancel_work_sync(&steam->unregister_work);
+ hid_destroy_device(steam->client_hdev);
+ steam->client_hdev = NULL;
+ steam->client_opened = 0;
+--
+2.39.5
+
--- /dev/null
+From 94442da5bb6720d911cd4dab449edd2d200b347c Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 3 Dec 2024 13:49:42 +0100
+Subject: include: net: add static inline dst_dev_overhead() to dst.h
+
+From: Justin Iurman <justin.iurman@uliege.be>
+
+[ Upstream commit 0600cf40e9b36fe17f9c9f04d4f9cef249eaa5e7 ]
+
+Add static inline dst_dev_overhead() function to include/net/dst.h. This
+helper function is used by ioam6_iptunnel, rpl_iptunnel and
+seg6_iptunnel to get the dev's overhead based on a cache entry
+(dst_entry). If the cache is empty, the default and generic value
+skb->mac_len is returned. Otherwise, LL_RESERVED_SPACE() over dst's dev
+is returned.
+
+Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
+Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
+Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
+Signed-off-by: Paolo Abeni <pabeni@redhat.com>
+Stable-dep-of: 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ include/net/dst.h | 9 +++++++++
+ 1 file changed, 9 insertions(+)
+
+diff --git a/include/net/dst.h b/include/net/dst.h
+index 0f303cc602520..08647c99d79c9 100644
+--- a/include/net/dst.h
++++ b/include/net/dst.h
+@@ -440,6 +440,15 @@ static inline void dst_set_expires(struct dst_entry *dst, int timeout)
+ dst->expires = expires;
+ }
+
++static inline unsigned int dst_dev_overhead(struct dst_entry *dst,
++ struct sk_buff *skb)
++{
++ if (likely(dst))
++ return LL_RESERVED_SPACE(dst->dev);
++
++ return skb->mac_len;
++}
++
+ INDIRECT_CALLABLE_DECLARE(int ip6_output(struct net *, struct sock *,
+ struct sk_buff *));
+ INDIRECT_CALLABLE_DECLARE(int ip_output(struct net *, struct sock *,
+--
+2.39.5
+
--- /dev/null
+From cefa529ae75091cdf02550357f5616f1debc3840 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:10 +0000
+Subject: ipv4: add RCU protection to ip4_dst_hoplimit()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 469308552ca4560176cfc100e7ca84add1bebd7c ]
+
+ip4_dst_hoplimit() must use RCU protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-3-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ include/net/route.h | 9 +++++++--
+ 1 file changed, 7 insertions(+), 2 deletions(-)
+
+diff --git a/include/net/route.h b/include/net/route.h
+index 1789f1e6640b4..da34b6fa9862d 100644
+--- a/include/net/route.h
++++ b/include/net/route.h
+@@ -363,10 +363,15 @@ static inline int inet_iif(const struct sk_buff *skb)
+ static inline int ip4_dst_hoplimit(const struct dst_entry *dst)
+ {
+ int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT);
+- struct net *net = dev_net(dst->dev);
+
+- if (hoplimit == 0)
++ if (hoplimit == 0) {
++ const struct net *net;
++
++ rcu_read_lock();
++ net = dev_net_rcu(dst->dev);
+ hoplimit = READ_ONCE(net->ipv4.sysctl_ip_default_ttl);
++ rcu_read_unlock();
++ }
+ return hoplimit;
+ }
+
+--
+2.39.5
+
--- /dev/null
+From 989271e21e0151d868a02c4abfffcae6521ba34e Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:16 +0000
+Subject: ipv4: icmp: convert to dev_net_rcu()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 4b8474a0951e605d2a27a2c483da4eb4b8c63760 ]
+
+__icmp_send() must ensure rcu_read_lock() is held, as spotted
+by Jakub.
+
+Other ICMP uses of dev_net() seem safe, change them to dev_net_rcu()
+to get LOCKDEP support.
+
+Fixes: dde1bc0e6f86 ("[NETNS]: Add namespace for ICMP replying code.")
+Closes: https://lore.kernel.org/netdev/20250203153633.46ce0337@kernel.org/
+Reported-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Link: https://patch.msgid.link/20250205155120.1676781-9-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/icmp.c | 31 +++++++++++++++++--------------
+ 1 file changed, 17 insertions(+), 14 deletions(-)
+
+diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
+index 932bd775fc268..f45bc187a92a7 100644
+--- a/net/ipv4/icmp.c
++++ b/net/ipv4/icmp.c
+@@ -399,10 +399,10 @@ static void icmp_push_reply(struct sock *sk,
+
+ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
+ {
+- struct ipcm_cookie ipc;
+ struct rtable *rt = skb_rtable(skb);
+- struct net *net = dev_net(rt->dst.dev);
++ struct net *net = dev_net_rcu(rt->dst.dev);
+ bool apply_ratelimit = false;
++ struct ipcm_cookie ipc;
+ struct flowi4 fl4;
+ struct sock *sk;
+ struct inet_sock *inet;
+@@ -610,12 +610,14 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
+ struct sock *sk;
+
+ if (!rt)
+- goto out;
++ return;
++
++ rcu_read_lock();
+
+ if (rt->dst.dev)
+- net = dev_net(rt->dst.dev);
++ net = dev_net_rcu(rt->dst.dev);
+ else if (skb_in->dev)
+- net = dev_net(skb_in->dev);
++ net = dev_net_rcu(skb_in->dev);
+ else
+ goto out;
+
+@@ -786,7 +788,8 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
+ icmp_xmit_unlock(sk);
+ out_bh_enable:
+ local_bh_enable();
+-out:;
++out:
++ rcu_read_unlock();
+ }
+ EXPORT_SYMBOL(__icmp_send);
+
+@@ -835,7 +838,7 @@ static void icmp_socket_deliver(struct sk_buff *skb, u32 info)
+ * avoid additional coding at protocol handlers.
+ */
+ if (!pskb_may_pull(skb, iph->ihl * 4 + 8)) {
+- __ICMP_INC_STATS(dev_net(skb->dev), ICMP_MIB_INERRORS);
++ __ICMP_INC_STATS(dev_net_rcu(skb->dev), ICMP_MIB_INERRORS);
+ return;
+ }
+
+@@ -869,7 +872,7 @@ static enum skb_drop_reason icmp_unreach(struct sk_buff *skb)
+ struct net *net;
+ u32 info = 0;
+
+- net = dev_net(skb_dst(skb)->dev);
++ net = dev_net_rcu(skb_dst(skb)->dev);
+
+ /*
+ * Incomplete header ?
+@@ -980,7 +983,7 @@ static enum skb_drop_reason icmp_unreach(struct sk_buff *skb)
+ static enum skb_drop_reason icmp_redirect(struct sk_buff *skb)
+ {
+ if (skb->len < sizeof(struct iphdr)) {
+- __ICMP_INC_STATS(dev_net(skb->dev), ICMP_MIB_INERRORS);
++ __ICMP_INC_STATS(dev_net_rcu(skb->dev), ICMP_MIB_INERRORS);
+ return SKB_DROP_REASON_PKT_TOO_SMALL;
+ }
+
+@@ -1012,7 +1015,7 @@ static enum skb_drop_reason icmp_echo(struct sk_buff *skb)
+ struct icmp_bxm icmp_param;
+ struct net *net;
+
+- net = dev_net(skb_dst(skb)->dev);
++ net = dev_net_rcu(skb_dst(skb)->dev);
+ /* should there be an ICMP stat for ignored echos? */
+ if (READ_ONCE(net->ipv4.sysctl_icmp_echo_ignore_all))
+ return SKB_NOT_DROPPED_YET;
+@@ -1041,9 +1044,9 @@ static enum skb_drop_reason icmp_echo(struct sk_buff *skb)
+
+ bool icmp_build_probe(struct sk_buff *skb, struct icmphdr *icmphdr)
+ {
++ struct net *net = dev_net_rcu(skb->dev);
+ struct icmp_ext_hdr *ext_hdr, _ext_hdr;
+ struct icmp_ext_echo_iio *iio, _iio;
+- struct net *net = dev_net(skb->dev);
+ struct inet6_dev *in6_dev;
+ struct in_device *in_dev;
+ struct net_device *dev;
+@@ -1182,7 +1185,7 @@ static enum skb_drop_reason icmp_timestamp(struct sk_buff *skb)
+ return SKB_NOT_DROPPED_YET;
+
+ out_err:
+- __ICMP_INC_STATS(dev_net(skb_dst(skb)->dev), ICMP_MIB_INERRORS);
++ __ICMP_INC_STATS(dev_net_rcu(skb_dst(skb)->dev), ICMP_MIB_INERRORS);
+ return SKB_DROP_REASON_PKT_TOO_SMALL;
+ }
+
+@@ -1199,7 +1202,7 @@ int icmp_rcv(struct sk_buff *skb)
+ {
+ enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
+ struct rtable *rt = skb_rtable(skb);
+- struct net *net = dev_net(rt->dst.dev);
++ struct net *net = dev_net_rcu(rt->dst.dev);
+ struct icmphdr *icmph;
+
+ if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
+@@ -1372,9 +1375,9 @@ int icmp_err(struct sk_buff *skb, u32 info)
+ struct iphdr *iph = (struct iphdr *)skb->data;
+ int offset = iph->ihl<<2;
+ struct icmphdr *icmph = (struct icmphdr *)(skb->data + offset);
++ struct net *net = dev_net_rcu(skb->dev);
+ int type = icmp_hdr(skb)->type;
+ int code = icmp_hdr(skb)->code;
+- struct net *net = dev_net(skb->dev);
+
+ /*
+ * Use ping_err to handle all icmp errors except those
+--
+2.39.5
+
--- /dev/null
+From 6df497a23d547d01649610b3005e4624bb5fd212 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:15 +0000
+Subject: ipv4: use RCU protection in __ip_rt_update_pmtu()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 139512191bd06f1b496117c76372b2ce372c9a41 ]
+
+__ip_rt_update_pmtu() must use RCU protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: 2fbc6e89b2f1 ("ipv4: Update exception handling for multipath routes via same device")
+Fixes: 1de6b15a434c ("Namespaceify min_pmtu sysctl")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Link: https://patch.msgid.link/20250205155120.1676781-8-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/route.c | 11 ++++++-----
+ 1 file changed, 6 insertions(+), 5 deletions(-)
+
+diff --git a/net/ipv4/route.c b/net/ipv4/route.c
+index f707cdb26ff20..41b320f0c20eb 100644
+--- a/net/ipv4/route.c
++++ b/net/ipv4/route.c
+@@ -1008,9 +1008,9 @@ out: kfree_skb_reason(skb, reason);
+ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
+ {
+ struct dst_entry *dst = &rt->dst;
+- struct net *net = dev_net(dst->dev);
+ struct fib_result res;
+ bool lock = false;
++ struct net *net;
+ u32 old_mtu;
+
+ if (ip_mtu_locked(dst))
+@@ -1020,6 +1020,8 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
+ if (old_mtu < mtu)
+ return;
+
++ rcu_read_lock();
++ net = dev_net_rcu(dst->dev);
+ if (mtu < net->ipv4.ip_rt_min_pmtu) {
+ lock = true;
+ mtu = min(old_mtu, net->ipv4.ip_rt_min_pmtu);
+@@ -1027,9 +1029,8 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
+
+ if (rt->rt_pmtu == mtu && !lock &&
+ time_before(jiffies, dst->expires - net->ipv4.ip_rt_mtu_expires / 2))
+- return;
++ goto out;
+
+- rcu_read_lock();
+ if (fib_lookup(net, fl4, &res, 0) == 0) {
+ struct fib_nh_common *nhc;
+
+@@ -1043,14 +1044,14 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
+ update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
+ jiffies + net->ipv4.ip_rt_mtu_expires);
+ }
+- rcu_read_unlock();
+- return;
++ goto out;
+ }
+ #endif /* CONFIG_IP_ROUTE_MULTIPATH */
+ nhc = FIB_RES_NHC(res);
+ update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
+ jiffies + net->ipv4.ip_rt_mtu_expires);
+ }
++out:
+ rcu_read_unlock();
+ }
+
+--
+2.39.5
+
--- /dev/null
+From 2ec2ebabf622d94b9b1318952c34696e2bef9e99 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:14 +0000
+Subject: ipv4: use RCU protection in inet_select_addr()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 719817cd293e4fa389e1f69c396f3f816ed5aa41 ]
+
+inet_select_addr() must use RCU protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: c4544c724322 ("[NETNS]: Process inet_select_addr inside a namespace.")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Link: https://patch.msgid.link/20250205155120.1676781-7-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/devinet.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
+index 7cf5f7d0d0de2..a55e95046984d 100644
+--- a/net/ipv4/devinet.c
++++ b/net/ipv4/devinet.c
+@@ -1351,10 +1351,11 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
+ __be32 addr = 0;
+ unsigned char localnet_scope = RT_SCOPE_HOST;
+ struct in_device *in_dev;
+- struct net *net = dev_net(dev);
++ struct net *net;
+ int master_idx;
+
+ rcu_read_lock();
++ net = dev_net_rcu(dev);
+ in_dev = __in_dev_get_rcu(dev);
+ if (!in_dev)
+ goto no_in_dev;
+--
+2.39.5
+
--- /dev/null
+From c95dbd419be896fa93fc9f5f6aca15b4bc2855f3 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:11 +0000
+Subject: ipv4: use RCU protection in ip_dst_mtu_maybe_forward()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 071d8012869b6af352acca346ade13e7be90a49f ]
+
+ip_dst_mtu_maybe_forward() must use RCU protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: f87c10a8aa1e8 ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-4-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ include/net/ip.h | 13 ++++++++++---
+ 1 file changed, 10 insertions(+), 3 deletions(-)
+
+diff --git a/include/net/ip.h b/include/net/ip.h
+index d92d3bc3ec0e2..fe4f854381143 100644
+--- a/include/net/ip.h
++++ b/include/net/ip.h
+@@ -465,9 +465,12 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
+ bool forwarding)
+ {
+ const struct rtable *rt = dst_rtable(dst);
+- struct net *net = dev_net(dst->dev);
+- unsigned int mtu;
++ unsigned int mtu, res;
++ struct net *net;
++
++ rcu_read_lock();
+
++ net = dev_net_rcu(dst->dev);
+ if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
+ ip_mtu_locked(dst) ||
+ !forwarding) {
+@@ -491,7 +494,11 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
+ out:
+ mtu = min_t(unsigned int, mtu, IP_MAX_MTU);
+
+- return mtu - lwtunnel_headroom(dst->lwtstate, mtu);
++ res = mtu - lwtunnel_headroom(dst->lwtstate, mtu);
++
++ rcu_read_unlock();
++
++ return res;
+ }
+
+ static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
+--
+2.39.5
+
--- /dev/null
+From 0d06ebdce5e1322d2c3790fdac2d3f10f8a4ae88 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:12 +0000
+Subject: ipv4: use RCU protection in ipv4_default_advmss()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 71b8471c93fa0bcab911fcb65da1eb6c4f5f735f ]
+
+ipv4_default_advmss() must use RCU protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: 2e9589ff809e ("ipv4: Namespaceify min_adv_mss sysctl knob")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-5-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/route.c | 11 ++++++++---
+ 1 file changed, 8 insertions(+), 3 deletions(-)
+
+diff --git a/net/ipv4/route.c b/net/ipv4/route.c
+index 2a27913588d05..9709ec3e2dce6 100644
+--- a/net/ipv4/route.c
++++ b/net/ipv4/route.c
+@@ -1294,10 +1294,15 @@ static void set_class_tag(struct rtable *rt, u32 tag)
+
+ static unsigned int ipv4_default_advmss(const struct dst_entry *dst)
+ {
+- struct net *net = dev_net(dst->dev);
+ unsigned int header_size = sizeof(struct tcphdr) + sizeof(struct iphdr);
+- unsigned int advmss = max_t(unsigned int, ipv4_mtu(dst) - header_size,
+- net->ipv4.ip_rt_min_advmss);
++ unsigned int advmss;
++ struct net *net;
++
++ rcu_read_lock();
++ net = dev_net_rcu(dst->dev);
++ advmss = max_t(unsigned int, ipv4_mtu(dst) - header_size,
++ net->ipv4.ip_rt_min_advmss);
++ rcu_read_unlock();
+
+ return min(advmss, IPV4_MAX_PMTU - header_size);
+ }
+--
+2.39.5
+
--- /dev/null
+From 4a621a026b1c90ff11300cc33a620d2c94dc6535 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:13 +0000
+Subject: ipv4: use RCU protection in rt_is_expired()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit dd205fcc33d92d54eee4d7f21bb073af9bd5ce2b ]
+
+rt_is_expired() must use RCU protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: e84f84f27647 ("netns: place rt_genid into struct net")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-6-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/route.c | 8 +++++++-
+ 1 file changed, 7 insertions(+), 1 deletion(-)
+
+diff --git a/net/ipv4/route.c b/net/ipv4/route.c
+index 9709ec3e2dce6..e31aa5a74ace4 100644
+--- a/net/ipv4/route.c
++++ b/net/ipv4/route.c
+@@ -390,7 +390,13 @@ static inline int ip_rt_proc_init(void)
+
+ static inline bool rt_is_expired(const struct rtable *rth)
+ {
+- return rth->rt_genid != rt_genid_ipv4(dev_net(rth->dst.dev));
++ bool res;
++
++ rcu_read_lock();
++ res = rth->rt_genid != rt_genid_ipv4(dev_net_rcu(rth->dst.dev));
++ rcu_read_unlock();
++
++ return res;
+ }
+
+ void rt_cache_flush(struct net *net)
+--
+2.39.5
+
--- /dev/null
+From af9d8e5913252d2a98a619e3254dcc5f0f976ca9 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:19 +0000
+Subject: ipv6: icmp: convert to dev_net_rcu()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 34aef2b0ce3aa4eb4ef2e1f5cad3738d527032f5 ]
+
+icmp6_send() must acquire rcu_read_lock() sooner to ensure
+the dev_net() call done from a safe context.
+
+Other ICMPv6 uses of dev_net() seem safe, change them to
+dev_net_rcu() to get LOCKDEP support to catch bugs.
+
+Fixes: 9a43b709a230 ("[NETNS][IPV6] icmp6 - make icmpv6_socket per namespace")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Link: https://patch.msgid.link/20250205155120.1676781-12-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/icmp.c | 42 +++++++++++++++++++++++-------------------
+ 1 file changed, 23 insertions(+), 19 deletions(-)
+
+diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
+index a6984a29fdb9d..4d14ab7f7e99f 100644
+--- a/net/ipv6/icmp.c
++++ b/net/ipv6/icmp.c
+@@ -76,7 +76,7 @@ static int icmpv6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
+ {
+ /* icmpv6_notify checks 8 bytes can be pulled, icmp6hdr is 8 bytes */
+ struct icmp6hdr *icmp6 = (struct icmp6hdr *) (skb->data + offset);
+- struct net *net = dev_net(skb->dev);
++ struct net *net = dev_net_rcu(skb->dev);
+
+ if (type == ICMPV6_PKT_TOOBIG)
+ ip6_update_pmtu(skb, net, info, skb->dev->ifindex, 0, sock_net_uid(net, NULL));
+@@ -473,7 +473,10 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+
+ if (!skb->dev)
+ return;
+- net = dev_net(skb->dev);
++
++ rcu_read_lock();
++
++ net = dev_net_rcu(skb->dev);
+ mark = IP6_REPLY_MARK(net, skb->mark);
+ /*
+ * Make sure we respect the rules
+@@ -496,7 +499,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+ !(type == ICMPV6_PARAMPROB &&
+ code == ICMPV6_UNK_OPTION &&
+ (opt_unrec(skb, info))))
+- return;
++ goto out;
+
+ saddr = NULL;
+ }
+@@ -526,7 +529,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+ if ((addr_type == IPV6_ADDR_ANY) || (addr_type & IPV6_ADDR_MULTICAST)) {
+ net_dbg_ratelimited("icmp6_send: addr_any/mcast source [%pI6c > %pI6c]\n",
+ &hdr->saddr, &hdr->daddr);
+- return;
++ goto out;
+ }
+
+ /*
+@@ -535,7 +538,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+ if (is_ineligible(skb)) {
+ net_dbg_ratelimited("icmp6_send: no reply to icmp error [%pI6c > %pI6c]\n",
+ &hdr->saddr, &hdr->daddr);
+- return;
++ goto out;
+ }
+
+ /* Needed by both icmpv6_global_allow and icmpv6_xmit_lock */
+@@ -582,7 +585,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+ np = inet6_sk(sk);
+
+ if (!icmpv6_xrlim_allow(sk, type, &fl6, apply_ratelimit))
+- goto out;
++ goto out_unlock;
+
+ tmp_hdr.icmp6_type = type;
+ tmp_hdr.icmp6_code = code;
+@@ -600,7 +603,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+
+ dst = icmpv6_route_lookup(net, skb, sk, &fl6);
+ if (IS_ERR(dst))
+- goto out;
++ goto out_unlock;
+
+ ipc6.hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst);
+
+@@ -616,7 +619,6 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+ goto out_dst_release;
+ }
+
+- rcu_read_lock();
+ idev = __in6_dev_get(skb->dev);
+
+ if (ip6_append_data(sk, icmpv6_getfrag, &msg,
+@@ -630,13 +632,15 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
+ icmpv6_push_pending_frames(sk, &fl6, &tmp_hdr,
+ len + sizeof(struct icmp6hdr));
+ }
+- rcu_read_unlock();
++
+ out_dst_release:
+ dst_release(dst);
+-out:
++out_unlock:
+ icmpv6_xmit_unlock(sk);
+ out_bh_enable:
+ local_bh_enable();
++out:
++ rcu_read_unlock();
+ }
+ EXPORT_SYMBOL(icmp6_send);
+
+@@ -679,8 +683,8 @@ int ip6_err_gen_icmpv6_unreach(struct sk_buff *skb, int nhs, int type,
+ skb_pull(skb2, nhs);
+ skb_reset_network_header(skb2);
+
+- rt = rt6_lookup(dev_net(skb->dev), &ipv6_hdr(skb2)->saddr, NULL, 0,
+- skb, 0);
++ rt = rt6_lookup(dev_net_rcu(skb->dev), &ipv6_hdr(skb2)->saddr,
++ NULL, 0, skb, 0);
+
+ if (rt && rt->dst.dev)
+ skb2->dev = rt->dst.dev;
+@@ -717,7 +721,7 @@ EXPORT_SYMBOL(ip6_err_gen_icmpv6_unreach);
+
+ static enum skb_drop_reason icmpv6_echo_reply(struct sk_buff *skb)
+ {
+- struct net *net = dev_net(skb->dev);
++ struct net *net = dev_net_rcu(skb->dev);
+ struct sock *sk;
+ struct inet6_dev *idev;
+ struct ipv6_pinfo *np;
+@@ -832,7 +836,7 @@ enum skb_drop_reason icmpv6_notify(struct sk_buff *skb, u8 type,
+ u8 code, __be32 info)
+ {
+ struct inet6_skb_parm *opt = IP6CB(skb);
+- struct net *net = dev_net(skb->dev);
++ struct net *net = dev_net_rcu(skb->dev);
+ const struct inet6_protocol *ipprot;
+ enum skb_drop_reason reason;
+ int inner_offset;
+@@ -889,7 +893,7 @@ enum skb_drop_reason icmpv6_notify(struct sk_buff *skb, u8 type,
+ static int icmpv6_rcv(struct sk_buff *skb)
+ {
+ enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
+- struct net *net = dev_net(skb->dev);
++ struct net *net = dev_net_rcu(skb->dev);
+ struct net_device *dev = icmp6_dev(skb);
+ struct inet6_dev *idev = __in6_dev_get(dev);
+ const struct in6_addr *saddr, *daddr;
+@@ -921,7 +925,7 @@ static int icmpv6_rcv(struct sk_buff *skb)
+ skb_set_network_header(skb, nh);
+ }
+
+- __ICMP6_INC_STATS(dev_net(dev), idev, ICMP6_MIB_INMSGS);
++ __ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_INMSGS);
+
+ saddr = &ipv6_hdr(skb)->saddr;
+ daddr = &ipv6_hdr(skb)->daddr;
+@@ -939,7 +943,7 @@ static int icmpv6_rcv(struct sk_buff *skb)
+
+ type = hdr->icmp6_type;
+
+- ICMP6MSGIN_INC_STATS(dev_net(dev), idev, type);
++ ICMP6MSGIN_INC_STATS(dev_net_rcu(dev), idev, type);
+
+ switch (type) {
+ case ICMPV6_ECHO_REQUEST:
+@@ -1034,9 +1038,9 @@ static int icmpv6_rcv(struct sk_buff *skb)
+
+ csum_error:
+ reason = SKB_DROP_REASON_ICMP_CSUM;
+- __ICMP6_INC_STATS(dev_net(dev), idev, ICMP6_MIB_CSUMERRORS);
++ __ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_CSUMERRORS);
+ discard_it:
+- __ICMP6_INC_STATS(dev_net(dev), idev, ICMP6_MIB_INERRORS);
++ __ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_INERRORS);
+ drop_no_count:
+ kfree_skb_reason(skb, reason);
+ return 0;
+--
+2.39.5
+
--- /dev/null
+From b5dfcef3b1bfee0811dc7d0a650074f8c6a261da Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 12 Feb 2025 14:10:21 +0000
+Subject: ipv6: mcast: add RCU protection to mld_newpack()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit a527750d877fd334de87eef81f1cb5f0f0ca3373 ]
+
+mld_newpack() can be called without RTNL or RCU being held.
+
+Note that we no longer can use sock_alloc_send_skb() because
+ipv6.igmp_sk uses GFP_KERNEL allocations which can sleep.
+
+Instead use alloc_skb() and charge the net->ipv6.igmp_sk
+socket under RCU protection.
+
+Fixes: b8ad0cbc58f7 ("[NETNS][IPV6] mcast - handle several network namespace")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: David Ahern <dsahern@kernel.org>
+Link: https://patch.msgid.link/20250212141021.1663666-1-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/mcast.c | 14 ++++++++++----
+ 1 file changed, 10 insertions(+), 4 deletions(-)
+
+diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
+index 6551648512585..b7b62e5a562e5 100644
+--- a/net/ipv6/mcast.c
++++ b/net/ipv6/mcast.c
+@@ -1730,21 +1730,19 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
+ struct net_device *dev = idev->dev;
+ int hlen = LL_RESERVED_SPACE(dev);
+ int tlen = dev->needed_tailroom;
+- struct net *net = dev_net(dev);
+ const struct in6_addr *saddr;
+ struct in6_addr addr_buf;
+ struct mld2_report *pmr;
+ struct sk_buff *skb;
+ unsigned int size;
+ struct sock *sk;
+- int err;
++ struct net *net;
+
+- sk = net->ipv6.igmp_sk;
+ /* we assume size > sizeof(ra) here
+ * Also try to not allocate high-order pages for big MTU
+ */
+ size = min_t(int, mtu, PAGE_SIZE / 2) + hlen + tlen;
+- skb = sock_alloc_send_skb(sk, size, 1, &err);
++ skb = alloc_skb(size, GFP_KERNEL);
+ if (!skb)
+ return NULL;
+
+@@ -1752,6 +1750,12 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
+ skb_reserve(skb, hlen);
+ skb_tailroom_reserve(skb, mtu, tlen);
+
++ rcu_read_lock();
++
++ net = dev_net_rcu(dev);
++ sk = net->ipv6.igmp_sk;
++ skb_set_owner_w(skb, sk);
++
+ if (ipv6_get_lladdr(dev, &addr_buf, IFA_F_TENTATIVE)) {
+ /* <draft-ietf-magma-mld-source-05.txt>:
+ * use unspecified address as the source address
+@@ -1763,6 +1767,8 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu)
+
+ ip6_mc_hdr(sk, skb, dev, saddr, &mld2_all_mcr, NEXTHDR_HOP, 0);
+
++ rcu_read_unlock();
++
+ skb_put_data(skb, ra, sizeof(ra));
+
+ skb_set_transport_header(skb, skb_tail_pointer(skb) - skb->data);
+--
+2.39.5
+
--- /dev/null
+From 657a1960fb3a36ea9c1566b4d869c8e03f5dc527 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 7 Feb 2025 13:58:40 +0000
+Subject: ipv6: mcast: extend RCU protection in igmp6_send()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 087c1faa594fa07a66933d750c0b2610aa1a2946 ]
+
+igmp6_send() can be called without RTNL or RCU being held.
+
+Extend RCU protection so that we can safely fetch the net pointer
+and avoid a potential UAF.
+
+Note that we no longer can use sock_alloc_send_skb() because
+ipv6.igmp_sk uses GFP_KERNEL allocations which can sleep.
+
+Instead use alloc_skb() and charge the net->ipv6.igmp_sk
+socket under RCU protection.
+
+Fixes: b8ad0cbc58f7 ("[NETNS][IPV6] mcast - handle several network namespace")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: David Ahern <dsahern@kernel.org>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250207135841.1948589-9-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/mcast.c | 31 +++++++++++++++----------------
+ 1 file changed, 15 insertions(+), 16 deletions(-)
+
+diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
+index b244dbf61d5f3..6551648512585 100644
+--- a/net/ipv6/mcast.c
++++ b/net/ipv6/mcast.c
+@@ -2122,21 +2122,21 @@ static void mld_send_cr(struct inet6_dev *idev)
+
+ static void igmp6_send(struct in6_addr *addr, struct net_device *dev, int type)
+ {
+- struct net *net = dev_net(dev);
+- struct sock *sk = net->ipv6.igmp_sk;
++ const struct in6_addr *snd_addr, *saddr;
++ int err, len, payload_len, full_len;
++ struct in6_addr addr_buf;
+ struct inet6_dev *idev;
+ struct sk_buff *skb;
+ struct mld_msg *hdr;
+- const struct in6_addr *snd_addr, *saddr;
+- struct in6_addr addr_buf;
+ int hlen = LL_RESERVED_SPACE(dev);
+ int tlen = dev->needed_tailroom;
+- int err, len, payload_len, full_len;
+ u8 ra[8] = { IPPROTO_ICMPV6, 0,
+ IPV6_TLV_ROUTERALERT, 2, 0, 0,
+ IPV6_TLV_PADN, 0 };
+- struct flowi6 fl6;
+ struct dst_entry *dst;
++ struct flowi6 fl6;
++ struct net *net;
++ struct sock *sk;
+
+ if (type == ICMPV6_MGM_REDUCTION)
+ snd_addr = &in6addr_linklocal_allrouters;
+@@ -2147,19 +2147,21 @@ static void igmp6_send(struct in6_addr *addr, struct net_device *dev, int type)
+ payload_len = len + sizeof(ra);
+ full_len = sizeof(struct ipv6hdr) + payload_len;
+
+- rcu_read_lock();
+- IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_OUTREQUESTS);
+- rcu_read_unlock();
++ skb = alloc_skb(hlen + tlen + full_len, GFP_KERNEL);
+
+- skb = sock_alloc_send_skb(sk, hlen + tlen + full_len, 1, &err);
++ rcu_read_lock();
+
++ net = dev_net_rcu(dev);
++ idev = __in6_dev_get(dev);
++ IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTREQUESTS);
+ if (!skb) {
+- rcu_read_lock();
+- IP6_INC_STATS(net, __in6_dev_get(dev),
+- IPSTATS_MIB_OUTDISCARDS);
++ IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
+ rcu_read_unlock();
+ return;
+ }
++ sk = net->ipv6.igmp_sk;
++ skb_set_owner_w(skb, sk);
++
+ skb->priority = TC_PRIO_CONTROL;
+ skb_reserve(skb, hlen);
+
+@@ -2184,9 +2186,6 @@ static void igmp6_send(struct in6_addr *addr, struct net_device *dev, int type)
+ IPPROTO_ICMPV6,
+ csum_partial(hdr, len, 0));
+
+- rcu_read_lock();
+- idev = __in6_dev_get(skb->dev);
+-
+ icmpv6_flow_init(sk, &fl6, type,
+ &ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr,
+ skb->dev->ifindex);
+--
+2.39.5
+
--- /dev/null
+From 696b09f558018b55ac46120507790417aebd9c6a Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:18 +0000
+Subject: ipv6: use RCU protection in ip6_default_advmss()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 3c8ffcd248da34fc41e52a46e51505900115fc2a ]
+
+ip6_default_advmss() needs rcu protection to make
+sure the net structure it reads does not disappear.
+
+Fixes: 5578689a4e3c ("[NETNS][IPV6] route6 - make route6 per namespace")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-11-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/route.c | 7 ++++++-
+ 1 file changed, 6 insertions(+), 1 deletion(-)
+
+diff --git a/net/ipv6/route.c b/net/ipv6/route.c
+index 8ebfed5d63232..2736dea77575b 100644
+--- a/net/ipv6/route.c
++++ b/net/ipv6/route.c
+@@ -3196,13 +3196,18 @@ static unsigned int ip6_default_advmss(const struct dst_entry *dst)
+ {
+ struct net_device *dev = dst->dev;
+ unsigned int mtu = dst_mtu(dst);
+- struct net *net = dev_net(dev);
++ struct net *net;
+
+ mtu -= sizeof(struct ipv6hdr) + sizeof(struct tcphdr);
+
++ rcu_read_lock();
++
++ net = dev_net_rcu(dev);
+ if (mtu < net->ipv6.sysctl.ip6_rt_min_advmss)
+ mtu = net->ipv6.sysctl.ip6_rt_min_advmss;
+
++ rcu_read_unlock();
++
+ /*
+ * Maximal non-jumbo IPv6 payload is IPV6_MAXPLEN and
+ * corresponding MSS is IPV6_MAXPLEN - tcp_header_size.
+--
+2.39.5
+
--- /dev/null
+From 93046772ca90a811f5ad97d1db80f62b1b7cc2af Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 7 Feb 2025 13:58:39 +0000
+Subject: ndisc: extend RCU protection in ndisc_send_skb()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit ed6ae1f325d3c43966ec1b62ac1459e2b8e45640 ]
+
+ndisc_send_skb() can be called without RTNL or RCU held.
+
+Acquire rcu_read_lock() earlier, so that we can use dev_net_rcu()
+and avoid a potential UAF.
+
+Fixes: 1762f7e88eb3 ("[NETNS][IPV6] ndisc - make socket control per namespace")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: David Ahern <dsahern@kernel.org>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250207135841.1948589-8-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/ndisc.c | 12 ++++++++----
+ 1 file changed, 8 insertions(+), 4 deletions(-)
+
+diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
+index 90f8aa2d7af2e..8699d1a188dc4 100644
+--- a/net/ipv6/ndisc.c
++++ b/net/ipv6/ndisc.c
+@@ -471,16 +471,20 @@ static void ip6_nd_hdr(struct sk_buff *skb,
+ void ndisc_send_skb(struct sk_buff *skb, const struct in6_addr *daddr,
+ const struct in6_addr *saddr)
+ {
++ struct icmp6hdr *icmp6h = icmp6_hdr(skb);
+ struct dst_entry *dst = skb_dst(skb);
+- struct net *net = dev_net(skb->dev);
+- struct sock *sk = net->ipv6.ndisc_sk;
+ struct inet6_dev *idev;
++ struct net *net;
++ struct sock *sk;
+ int err;
+- struct icmp6hdr *icmp6h = icmp6_hdr(skb);
+ u8 type;
+
+ type = icmp6h->icmp6_type;
+
++ rcu_read_lock();
++
++ net = dev_net_rcu(skb->dev);
++ sk = net->ipv6.ndisc_sk;
+ if (!dst) {
+ struct flowi6 fl6;
+ int oif = skb->dev->ifindex;
+@@ -488,6 +492,7 @@ void ndisc_send_skb(struct sk_buff *skb, const struct in6_addr *daddr,
+ icmpv6_flow_init(sk, &fl6, type, saddr, daddr, oif);
+ dst = icmp6_dst_alloc(skb->dev, &fl6);
+ if (IS_ERR(dst)) {
++ rcu_read_unlock();
+ kfree_skb(skb);
+ return;
+ }
+@@ -502,7 +507,6 @@ void ndisc_send_skb(struct sk_buff *skb, const struct in6_addr *daddr,
+
+ ip6_nd_hdr(skb, saddr, daddr, READ_ONCE(inet6_sk(sk)->hop_limit), skb->len);
+
+- rcu_read_lock();
+ idev = __in6_dev_get(dst->dev);
+ IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTREQUESTS);
+
+--
+2.39.5
+
--- /dev/null
+From a591ecc09cc3f5cb8580aff4cf53e13e3067139e Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 7 Feb 2025 13:58:34 +0000
+Subject: ndisc: use RCU protection in ndisc_alloc_skb()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 628e6d18930bbd21f2d4562228afe27694f66da9 ]
+
+ndisc_alloc_skb() can be called without RTNL or RCU being held.
+
+Add RCU protection to avoid possible UAF.
+
+Fixes: de09334b9326 ("ndisc: Introduce ndisc_alloc_skb() helper.")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: David Ahern <dsahern@kernel.org>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250207135841.1948589-3-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/ndisc.c | 10 ++++------
+ 1 file changed, 4 insertions(+), 6 deletions(-)
+
+diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
+index 264b10a947577..90f8aa2d7af2e 100644
+--- a/net/ipv6/ndisc.c
++++ b/net/ipv6/ndisc.c
+@@ -418,15 +418,11 @@ static struct sk_buff *ndisc_alloc_skb(struct net_device *dev,
+ {
+ int hlen = LL_RESERVED_SPACE(dev);
+ int tlen = dev->needed_tailroom;
+- struct sock *sk = dev_net(dev)->ipv6.ndisc_sk;
+ struct sk_buff *skb;
+
+ skb = alloc_skb(hlen + sizeof(struct ipv6hdr) + len + tlen, GFP_ATOMIC);
+- if (!skb) {
+- ND_PRINTK(0, err, "ndisc: %s failed to allocate an skb\n",
+- __func__);
++ if (!skb)
+ return NULL;
+- }
+
+ skb->protocol = htons(ETH_P_IPV6);
+ skb->dev = dev;
+@@ -437,7 +433,9 @@ static struct sk_buff *ndisc_alloc_skb(struct net_device *dev,
+ /* Manually assign socket ownership as we avoid calling
+ * sock_alloc_send_pskb() to bypass wmem buffer limits
+ */
+- skb_set_owner_w(skb, sk);
++ rcu_read_lock();
++ skb_set_owner_w(skb, dev_net_rcu(dev)->ipv6.ndisc_sk);
++ rcu_read_unlock();
+
+ return skb;
+ }
+--
+2.39.5
+
--- /dev/null
+From b666d3ec0cc14264d2b1d2a5c2195418702bac2d Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 7 Feb 2025 13:58:35 +0000
+Subject: neighbour: use RCU protection in __neigh_notify()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit becbd5850c03ed33b232083dd66c6e38c0c0e569 ]
+
+__neigh_notify() can be called without RTNL or RCU protection.
+
+Use RCU protection to avoid potential UAF.
+
+Fixes: 426b5303eb43 ("[NETNS]: Modify the neighbour table code so it handles multiple network namespaces")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: David Ahern <dsahern@kernel.org>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250207135841.1948589-4-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/core/neighbour.c | 8 ++++++--
+ 1 file changed, 6 insertions(+), 2 deletions(-)
+
+diff --git a/net/core/neighbour.c b/net/core/neighbour.c
+index cc58315a40a79..c7f7ea61b524a 100644
+--- a/net/core/neighbour.c
++++ b/net/core/neighbour.c
+@@ -3513,10 +3513,12 @@ static const struct seq_operations neigh_stat_seq_ops = {
+ static void __neigh_notify(struct neighbour *n, int type, int flags,
+ u32 pid)
+ {
+- struct net *net = dev_net(n->dev);
+ struct sk_buff *skb;
+ int err = -ENOBUFS;
++ struct net *net;
+
++ rcu_read_lock();
++ net = dev_net_rcu(n->dev);
+ skb = nlmsg_new(neigh_nlmsg_size(), GFP_ATOMIC);
+ if (skb == NULL)
+ goto errout;
+@@ -3529,9 +3531,11 @@ static void __neigh_notify(struct neighbour *n, int type, int flags,
+ goto errout;
+ }
+ rtnl_notify(skb, net, 0, RTNLGRP_NEIGH, NULL, GFP_ATOMIC);
+- return;
++ goto out;
+ errout:
+ rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
++out:
++ rcu_read_unlock();
+ }
+
+ void neigh_app_ns(struct neighbour *n)
+--
+2.39.5
+
--- /dev/null
+From 04a4d5e517db5ce3250defe23f9d333b36ffab8d Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 5 Feb 2025 15:51:09 +0000
+Subject: net: add dev_net_rcu() helper
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 482ad2a4ace2740ca0ff1cbc8f3c7f862f3ab507 ]
+
+dev->nd_net can change, readers should either
+use rcu_read_lock() or RTNL.
+
+We currently use a generic helper, dev_net() with
+no debugging support. We probably have many hidden bugs.
+
+Add dev_net_rcu() helper for callers using rcu_read_lock()
+protection.
+
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250205155120.1676781-2-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Stable-dep-of: 71b8471c93fa ("ipv4: use RCU protection in ipv4_default_advmss()")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ include/linux/netdevice.h | 6 ++++++
+ include/net/net_namespace.h | 2 +-
+ 2 files changed, 7 insertions(+), 1 deletion(-)
+
+diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
+index 02d3bafebbe77..4f17b786828af 100644
+--- a/include/linux/netdevice.h
++++ b/include/linux/netdevice.h
+@@ -2577,6 +2577,12 @@ struct net *dev_net(const struct net_device *dev)
+ return read_pnet(&dev->nd_net);
+ }
+
++static inline
++struct net *dev_net_rcu(const struct net_device *dev)
++{
++ return read_pnet_rcu(&dev->nd_net);
++}
++
+ static inline
+ void dev_net_set(struct net_device *dev, struct net *net)
+ {
+diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
+index 9398c8f499536..da93873df4dbd 100644
+--- a/include/net/net_namespace.h
++++ b/include/net/net_namespace.h
+@@ -387,7 +387,7 @@ static inline struct net *read_pnet(const possible_net_t *pnet)
+ #endif
+ }
+
+-static inline struct net *read_pnet_rcu(possible_net_t *pnet)
++static inline struct net *read_pnet_rcu(const possible_net_t *pnet)
+ {
+ #ifdef CONFIG_NET_NS
+ return rcu_dereference(pnet->net);
+--
+2.39.5
+
--- /dev/null
+From b97fe83fff886ccd0049b9a3e014c55251140ddd Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 8 Nov 2024 09:34:24 +0000
+Subject: net: ipv4: Cache pmtu for all packet paths if multipath enabled
+
+From: Vladimir Vdovin <deliran@verdict.gg>
+
+[ Upstream commit 7d3f3b4367f315a61fc615e3138f3d320da8c466 ]
+
+Check number of paths by fib_info_num_path(),
+and update_or_create_fnhe() for every path.
+Problem is that pmtu is cached only for the oif
+that has received icmp message "need to frag",
+other oifs will still try to use "default" iface mtu.
+
+An example topology showing the problem:
+
+ | host1
+ +---------+
+ | dummy0 | 10.179.20.18/32 mtu9000
+ +---------+
+ +-----------+----------------+
+ +---------+ +---------+
+ | ens17f0 | 10.179.2.141/31 | ens17f1 | 10.179.2.13/31
+ +---------+ +---------+
+ | (all here have mtu 9000) |
+ +------+ +------+
+ | ro1 | 10.179.2.140/31 | ro2 | 10.179.2.12/31
+ +------+ +------+
+ | |
+---------+------------+-------------------+------
+ |
+ +-----+
+ | ro3 | 10.10.10.10 mtu1500
+ +-----+
+ |
+ ========================================
+ some networks
+ ========================================
+ |
+ +-----+
+ | eth0| 10.10.30.30 mtu9000
+ +-----+
+ | host2
+
+host1 have enabled multipath and
+sysctl net.ipv4.fib_multipath_hash_policy = 1:
+
+default proto static src 10.179.20.18
+ nexthop via 10.179.2.12 dev ens17f1 weight 1
+ nexthop via 10.179.2.140 dev ens17f0 weight 1
+
+When host1 tries to do pmtud from 10.179.20.18/32 to host2,
+host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500.
+And host1 caches it in nexthop exceptions cache.
+
+Problem is that it is cached only for the iface that has received icmp,
+and there is no way that ro3 will send icmp msg to host1 via another path.
+
+Host1 now have this routes to host2:
+
+ip r g 10.10.30.30 sport 30000 dport 443
+10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0
+ cache expires 521sec mtu 1500
+
+ip r g 10.10.30.30 sport 30033 dport 443
+10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0
+ cache
+
+So when host1 tries again to reach host2 with mtu>1500,
+if packet flow is lucky enough to be hashed with oif=ens17f1 its ok,
+if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1,
+until lucky day when ro3 will send it through another flow to ens17f0.
+
+Signed-off-by: Vladimir Vdovin <deliran@verdict.gg>
+Reviewed-by: Ido Schimmel <idosch@nvidia.com>
+Link: https://patch.msgid.link/20241108093427.317942-1-deliran@verdict.gg
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Stable-dep-of: 139512191bd0 ("ipv4: use RCU protection in __ip_rt_update_pmtu()")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv4/route.c | 13 ++++
+ tools/testing/selftests/net/pmtu.sh | 112 +++++++++++++++++++++++-----
+ 2 files changed, 108 insertions(+), 17 deletions(-)
+
+diff --git a/net/ipv4/route.c b/net/ipv4/route.c
+index e31aa5a74ace4..f707cdb26ff20 100644
+--- a/net/ipv4/route.c
++++ b/net/ipv4/route.c
+@@ -1034,6 +1034,19 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
+ struct fib_nh_common *nhc;
+
+ fib_select_path(net, &res, fl4, NULL);
++#ifdef CONFIG_IP_ROUTE_MULTIPATH
++ if (fib_info_num_path(res.fi) > 1) {
++ int nhsel;
++
++ for (nhsel = 0; nhsel < fib_info_num_path(res.fi); nhsel++) {
++ nhc = fib_info_nhc(res.fi, nhsel);
++ update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
++ jiffies + net->ipv4.ip_rt_mtu_expires);
++ }
++ rcu_read_unlock();
++ return;
++ }
++#endif /* CONFIG_IP_ROUTE_MULTIPATH */
+ nhc = FIB_RES_NHC(res);
+ update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock,
+ jiffies + net->ipv4.ip_rt_mtu_expires);
+diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
+index 6c651c880fe83..66be7699c72c9 100755
+--- a/tools/testing/selftests/net/pmtu.sh
++++ b/tools/testing/selftests/net/pmtu.sh
+@@ -197,6 +197,12 @@
+ #
+ # - pmtu_ipv6_route_change
+ # Same as above but with IPv6
++#
++# - pmtu_ipv4_mp_exceptions
++# Use the same topology as in pmtu_ipv4, but add routeable addresses
++# on host A and B on lo reachable via both routers. Host A and B
++# addresses have multipath routes to each other, b_r1 mtu = 1500.
++# Check that PMTU exceptions are created for both paths.
+
+ source lib.sh
+ source net_helper.sh
+@@ -266,7 +272,8 @@ tests="
+ list_flush_ipv4_exception ipv4: list and flush cached exceptions 1
+ list_flush_ipv6_exception ipv6: list and flush cached exceptions 1
+ pmtu_ipv4_route_change ipv4: PMTU exception w/route replace 1
+- pmtu_ipv6_route_change ipv6: PMTU exception w/route replace 1"
++ pmtu_ipv6_route_change ipv6: PMTU exception w/route replace 1
++ pmtu_ipv4_mp_exceptions ipv4: PMTU multipath nh exceptions 1"
+
+ # Addressing and routing for tests with routers: four network segments, with
+ # index SEGMENT between 1 and 4, a common prefix (PREFIX4 or PREFIX6) and an
+@@ -343,6 +350,9 @@ tunnel6_a_addr="fd00:2::a"
+ tunnel6_b_addr="fd00:2::b"
+ tunnel6_mask="64"
+
++host4_a_addr="192.168.99.99"
++host4_b_addr="192.168.88.88"
++
+ dummy6_0_prefix="fc00:1000::"
+ dummy6_1_prefix="fc00:1001::"
+ dummy6_mask="64"
+@@ -984,6 +994,52 @@ setup_ovs_bridge() {
+ run_cmd ip route add ${prefix6}:${b_r1}::1 via ${prefix6}:${a_r1}::2
+ }
+
++setup_multipath_new() {
++ # Set up host A with multipath routes to host B host4_b_addr
++ run_cmd ${ns_a} ip addr add ${host4_a_addr} dev lo
++ run_cmd ${ns_a} ip nexthop add id 401 via ${prefix4}.${a_r1}.2 dev veth_A-R1
++ run_cmd ${ns_a} ip nexthop add id 402 via ${prefix4}.${a_r2}.2 dev veth_A-R2
++ run_cmd ${ns_a} ip nexthop add id 403 group 401/402
++ run_cmd ${ns_a} ip route add ${host4_b_addr} src ${host4_a_addr} nhid 403
++
++ # Set up host B with multipath routes to host A host4_a_addr
++ run_cmd ${ns_b} ip addr add ${host4_b_addr} dev lo
++ run_cmd ${ns_b} ip nexthop add id 401 via ${prefix4}.${b_r1}.2 dev veth_B-R1
++ run_cmd ${ns_b} ip nexthop add id 402 via ${prefix4}.${b_r2}.2 dev veth_B-R2
++ run_cmd ${ns_b} ip nexthop add id 403 group 401/402
++ run_cmd ${ns_b} ip route add ${host4_a_addr} src ${host4_b_addr} nhid 403
++}
++
++setup_multipath_old() {
++ # Set up host A with multipath routes to host B host4_b_addr
++ run_cmd ${ns_a} ip addr add ${host4_a_addr} dev lo
++ run_cmd ${ns_a} ip route add ${host4_b_addr} \
++ src ${host4_a_addr} \
++ nexthop via ${prefix4}.${a_r1}.2 weight 1 \
++ nexthop via ${prefix4}.${a_r2}.2 weight 1
++
++ # Set up host B with multipath routes to host A host4_a_addr
++ run_cmd ${ns_b} ip addr add ${host4_b_addr} dev lo
++ run_cmd ${ns_b} ip route add ${host4_a_addr} \
++ src ${host4_b_addr} \
++ nexthop via ${prefix4}.${b_r1}.2 weight 1 \
++ nexthop via ${prefix4}.${b_r2}.2 weight 1
++}
++
++setup_multipath() {
++ if [ "$USE_NH" = "yes" ]; then
++ setup_multipath_new
++ else
++ setup_multipath_old
++ fi
++
++ # Set up routers with routes to dummies
++ run_cmd ${ns_r1} ip route add ${host4_a_addr} via ${prefix4}.${a_r1}.1
++ run_cmd ${ns_r2} ip route add ${host4_a_addr} via ${prefix4}.${a_r2}.1
++ run_cmd ${ns_r1} ip route add ${host4_b_addr} via ${prefix4}.${b_r1}.1
++ run_cmd ${ns_r2} ip route add ${host4_b_addr} via ${prefix4}.${b_r2}.1
++}
++
+ setup() {
+ [ "$(id -u)" -ne 0 ] && echo " need to run as root" && return $ksft_skip
+
+@@ -1076,23 +1132,15 @@ link_get_mtu() {
+ }
+
+ route_get_dst_exception() {
+- ns_cmd="${1}"
+- dst="${2}"
+- dsfield="${3}"
++ ns_cmd="${1}"; shift
+
+- if [ -z "${dsfield}" ]; then
+- dsfield=0
+- fi
+-
+- ${ns_cmd} ip route get "${dst}" dsfield "${dsfield}"
++ ${ns_cmd} ip route get "$@"
+ }
+
+ route_get_dst_pmtu_from_exception() {
+- ns_cmd="${1}"
+- dst="${2}"
+- dsfield="${3}"
++ ns_cmd="${1}"; shift
+
+- mtu_parse "$(route_get_dst_exception "${ns_cmd}" "${dst}" "${dsfield}")"
++ mtu_parse "$(route_get_dst_exception "${ns_cmd}" "$@")"
+ }
+
+ check_pmtu_value() {
+@@ -1235,10 +1283,10 @@ test_pmtu_ipv4_dscp_icmp_exception() {
+ run_cmd "${ns_a}" ping -q -M want -Q "${dsfield}" -c 1 -w 1 -s "${len}" "${dst2}"
+
+ # Check that exceptions have been created with the correct PMTU
+- pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst1}" "${policy_mark}")"
++ pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst1}" dsfield "${policy_mark}")"
+ check_pmtu_value "1400" "${pmtu_1}" "exceeding MTU" || return 1
+
+- pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst2}" "${policy_mark}")"
++ pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst2}" dsfield "${policy_mark}")"
+ check_pmtu_value "1500" "${pmtu_2}" "exceeding MTU" || return 1
+ }
+
+@@ -1285,9 +1333,9 @@ test_pmtu_ipv4_dscp_udp_exception() {
+ UDP:"${dst2}":50000,tos="${dsfield}"
+
+ # Check that exceptions have been created with the correct PMTU
+- pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst1}" "${policy_mark}")"
++ pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst1}" dsfield "${policy_mark}")"
+ check_pmtu_value "1400" "${pmtu_1}" "exceeding MTU" || return 1
+- pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst2}" "${policy_mark}")"
++ pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst2}" dsfield "${policy_mark}")"
+ check_pmtu_value "1500" "${pmtu_2}" "exceeding MTU" || return 1
+ }
+
+@@ -2329,6 +2377,36 @@ test_pmtu_ipv6_route_change() {
+ test_pmtu_ipvX_route_change 6
+ }
+
++test_pmtu_ipv4_mp_exceptions() {
++ setup namespaces routing multipath || return $ksft_skip
++
++ trace "${ns_a}" veth_A-R1 "${ns_r1}" veth_R1-A \
++ "${ns_r1}" veth_R1-B "${ns_b}" veth_B-R1 \
++ "${ns_a}" veth_A-R2 "${ns_r2}" veth_R2-A \
++ "${ns_r2}" veth_R2-B "${ns_b}" veth_B-R2
++
++ # Set up initial MTU values
++ mtu "${ns_a}" veth_A-R1 2000
++ mtu "${ns_r1}" veth_R1-A 2000
++ mtu "${ns_r1}" veth_R1-B 1500
++ mtu "${ns_b}" veth_B-R1 1500
++
++ mtu "${ns_a}" veth_A-R2 2000
++ mtu "${ns_r2}" veth_R2-A 2000
++ mtu "${ns_r2}" veth_R2-B 1500
++ mtu "${ns_b}" veth_B-R2 1500
++
++ # Ping and expect two nexthop exceptions for two routes
++ run_cmd ${ns_a} ping -q -M want -i 0.1 -c 1 -s 1800 "${host4_b_addr}"
++
++ # Check that exceptions have been created with the correct PMTU
++ pmtu_a_R1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${host4_b_addr}" oif veth_A-R1)"
++ pmtu_a_R2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${host4_b_addr}" oif veth_A-R2)"
++
++ check_pmtu_value "1500" "${pmtu_a_R1}" "exceeding MTU (veth_A-R1)" || return 1
++ check_pmtu_value "1500" "${pmtu_a_R2}" "exceeding MTU (veth_A-R2)" || return 1
++}
++
+ usage() {
+ echo
+ echo "$0 [OPTIONS] [TEST]..."
+--
+2.39.5
+
--- /dev/null
+From 01157585e1e4aefbcfed1d8919812874a7afed8c Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 29 Jan 2025 19:15:19 -0800
+Subject: net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels
+
+From: Jakub Kicinski <kuba@kernel.org>
+
+[ Upstream commit 92191dd1073088753821b862b791dcc83e558e07 ]
+
+Some lwtunnels have a dst cache for post-transformation dst.
+If the packet destination did not change we may end up recording
+a reference to the lwtunnel in its own cache, and the lwtunnel
+state will never be freed.
+
+Discovered by the ioam6.sh test, kmemleak was recently fixed
+to catch per-cpu memory leaks. I'm not sure if rpl and seg6
+can actually hit this, but in principle I don't see why not.
+
+Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation")
+Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
+Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
+Reviewed-by: Simon Horman <horms@kernel.org>
+Link: https://patch.msgid.link/20250130031519.2716843-2-kuba@kernel.org
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/ioam6_iptunnel.c | 9 ++++++---
+ net/ipv6/rpl_iptunnel.c | 9 ++++++---
+ net/ipv6/seg6_iptunnel.c | 9 ++++++---
+ 3 files changed, 18 insertions(+), 9 deletions(-)
+
+diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c
+index e81b45b1f6555..fb6cb540cd1bc 100644
+--- a/net/ipv6/ioam6_iptunnel.c
++++ b/net/ipv6/ioam6_iptunnel.c
+@@ -413,9 +413,12 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+ goto drop;
+ }
+
+- local_bh_disable();
+- dst_cache_set_ip6(&ilwt->cache, cache_dst, &fl6.saddr);
+- local_bh_enable();
++ /* cache only if we don't create a dst reference loop */
++ if (dst->lwtstate != cache_dst->lwtstate) {
++ local_bh_disable();
++ dst_cache_set_ip6(&ilwt->cache, cache_dst, &fl6.saddr);
++ local_bh_enable();
++ }
+
+ err = skb_cow_head(skb, LL_RESERVED_SPACE(cache_dst->dev));
+ if (unlikely(err))
+diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
+index 7ba22d2f2bfef..be084089ec783 100644
+--- a/net/ipv6/rpl_iptunnel.c
++++ b/net/ipv6/rpl_iptunnel.c
+@@ -236,9 +236,12 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+ goto drop;
+ }
+
+- local_bh_disable();
+- dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr);
+- local_bh_enable();
++ /* cache only if we don't create a dst reference loop */
++ if (orig_dst->lwtstate != dst->lwtstate) {
++ local_bh_disable();
++ dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr);
++ local_bh_enable();
++ }
+
+ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+ if (unlikely(err))
+diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
+index 4bf937bfc2633..316dbc2694f2a 100644
+--- a/net/ipv6/seg6_iptunnel.c
++++ b/net/ipv6/seg6_iptunnel.c
+@@ -575,9 +575,12 @@ static int seg6_output_core(struct net *net, struct sock *sk,
+ goto drop;
+ }
+
+- local_bh_disable();
+- dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr);
+- local_bh_enable();
++ /* cache only if we don't create a dst reference loop */
++ if (orig_dst->lwtstate != dst->lwtstate) {
++ local_bh_disable();
++ dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr);
++ local_bh_enable();
++ }
+
+ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+ if (unlikely(err))
+--
+2.39.5
+
--- /dev/null
+From f753a5f7a35a7fd6411ab9a30c1a28ff123f9a37 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 3 Dec 2024 13:49:43 +0100
+Subject: net: ipv6: ioam6_iptunnel: mitigate 2-realloc issue
+
+From: Justin Iurman <justin.iurman@uliege.be>
+
+[ Upstream commit dce525185bc92864e5a318040285ee070563fe34 ]
+
+This patch mitigates the two-reallocations issue with ioam6_iptunnel by
+providing the dst_entry (in the cache) to the first call to
+skb_cow_head(). As a result, the very first iteration may still trigger
+two reallocations (i.e., empty cache), while next iterations would only
+trigger a single reallocation.
+
+Performance tests before/after applying this patch, which clearly shows
+the improvement:
+- inline mode:
+ - before: https://ibb.co/LhQ8V63
+ - after: https://ibb.co/x5YT2bS
+- encap mode:
+ - before: https://ibb.co/3Cjm5m0
+ - after: https://ibb.co/TwpsxTC
+- encap mode with tunsrc:
+ - before: https://ibb.co/Gpy9QPg
+ - after: https://ibb.co/PW1bZFT
+
+This patch also fixes an incorrect behavior: after the insertion, the
+second call to skb_cow_head() makes sure that the dev has enough
+headroom in the skb for layer 2 and stuff. In that case, the "old"
+dst_entry was used, which is now fixed. After discussing with Paolo, it
+appears that both patches can be merged into a single one -this one-
+(for the sake of readability) and target net-next.
+
+Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
+Signed-off-by: Paolo Abeni <pabeni@redhat.com>
+Stable-dep-of: 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/ioam6_iptunnel.c | 73 ++++++++++++++++++++-------------------
+ 1 file changed, 37 insertions(+), 36 deletions(-)
+
+diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c
+index beb6b4cfc551c..e81b45b1f6555 100644
+--- a/net/ipv6/ioam6_iptunnel.c
++++ b/net/ipv6/ioam6_iptunnel.c
+@@ -255,14 +255,15 @@ static int ioam6_do_fill(struct net *net, struct sk_buff *skb)
+ }
+
+ static int ioam6_do_inline(struct net *net, struct sk_buff *skb,
+- struct ioam6_lwt_encap *tuninfo)
++ struct ioam6_lwt_encap *tuninfo,
++ struct dst_entry *cache_dst)
+ {
+ struct ipv6hdr *oldhdr, *hdr;
+ int hdrlen, err;
+
+ hdrlen = (tuninfo->eh.hdrlen + 1) << 3;
+
+- err = skb_cow_head(skb, hdrlen + skb->mac_len);
++ err = skb_cow_head(skb, hdrlen + dst_dev_overhead(cache_dst, skb));
+ if (unlikely(err))
+ return err;
+
+@@ -293,7 +294,8 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb,
+ struct ioam6_lwt_encap *tuninfo,
+ bool has_tunsrc,
+ struct in6_addr *tunsrc,
+- struct in6_addr *tundst)
++ struct in6_addr *tundst,
++ struct dst_entry *cache_dst)
+ {
+ struct dst_entry *dst = skb_dst(skb);
+ struct ipv6hdr *hdr, *inner_hdr;
+@@ -302,7 +304,7 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb,
+ hdrlen = (tuninfo->eh.hdrlen + 1) << 3;
+ len = sizeof(*hdr) + hdrlen;
+
+- err = skb_cow_head(skb, len + skb->mac_len);
++ err = skb_cow_head(skb, len + dst_dev_overhead(cache_dst, skb));
+ if (unlikely(err))
+ return err;
+
+@@ -336,7 +338,7 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb,
+
+ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+ {
+- struct dst_entry *dst = skb_dst(skb);
++ struct dst_entry *dst = skb_dst(skb), *cache_dst;
+ struct in6_addr orig_daddr;
+ struct ioam6_lwt *ilwt;
+ int err = -EINVAL;
+@@ -354,6 +356,10 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+
+ orig_daddr = ipv6_hdr(skb)->daddr;
+
++ local_bh_disable();
++ cache_dst = dst_cache_get(&ilwt->cache);
++ local_bh_enable();
++
+ switch (ilwt->mode) {
+ case IOAM6_IPTUNNEL_MODE_INLINE:
+ do_inline:
+@@ -361,7 +367,7 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+ if (ipv6_hdr(skb)->nexthdr == NEXTHDR_HOP)
+ goto out;
+
+- err = ioam6_do_inline(net, skb, &ilwt->tuninfo);
++ err = ioam6_do_inline(net, skb, &ilwt->tuninfo, cache_dst);
+ if (unlikely(err))
+ goto drop;
+
+@@ -371,7 +377,7 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+ /* Encapsulation (ip6ip6) */
+ err = ioam6_do_encap(net, skb, &ilwt->tuninfo,
+ ilwt->has_tunsrc, &ilwt->tunsrc,
+- &ilwt->tundst);
++ &ilwt->tundst, cache_dst);
+ if (unlikely(err))
+ goto drop;
+
+@@ -389,41 +395,36 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+ goto drop;
+ }
+
+- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+- if (unlikely(err))
+- goto drop;
++ if (unlikely(!cache_dst)) {
++ struct ipv6hdr *hdr = ipv6_hdr(skb);
++ struct flowi6 fl6;
++
++ memset(&fl6, 0, sizeof(fl6));
++ fl6.daddr = hdr->daddr;
++ fl6.saddr = hdr->saddr;
++ fl6.flowlabel = ip6_flowinfo(hdr);
++ fl6.flowi6_mark = skb->mark;
++ fl6.flowi6_proto = hdr->nexthdr;
++
++ cache_dst = ip6_route_output(net, NULL, &fl6);
++ if (cache_dst->error) {
++ err = cache_dst->error;
++ dst_release(cache_dst);
++ goto drop;
++ }
+
+- if (!ipv6_addr_equal(&orig_daddr, &ipv6_hdr(skb)->daddr)) {
+ local_bh_disable();
+- dst = dst_cache_get(&ilwt->cache);
++ dst_cache_set_ip6(&ilwt->cache, cache_dst, &fl6.saddr);
+ local_bh_enable();
+
+- if (unlikely(!dst)) {
+- struct ipv6hdr *hdr = ipv6_hdr(skb);
+- struct flowi6 fl6;
+-
+- memset(&fl6, 0, sizeof(fl6));
+- fl6.daddr = hdr->daddr;
+- fl6.saddr = hdr->saddr;
+- fl6.flowlabel = ip6_flowinfo(hdr);
+- fl6.flowi6_mark = skb->mark;
+- fl6.flowi6_proto = hdr->nexthdr;
+-
+- dst = ip6_route_output(net, NULL, &fl6);
+- if (dst->error) {
+- err = dst->error;
+- dst_release(dst);
+- goto drop;
+- }
+-
+- local_bh_disable();
+- dst_cache_set_ip6(&ilwt->cache, dst, &fl6.saddr);
+- local_bh_enable();
+- }
++ err = skb_cow_head(skb, LL_RESERVED_SPACE(cache_dst->dev));
++ if (unlikely(err))
++ goto drop;
++ }
+
++ if (!ipv6_addr_equal(&orig_daddr, &ipv6_hdr(skb)->daddr)) {
+ skb_dst_drop(skb);
+- skb_dst_set(skb, dst);
+-
++ skb_dst_set(skb, cache_dst);
+ return dst_output(net, sk, skb);
+ }
+ out:
+--
+2.39.5
+
--- /dev/null
+From 4d332b0718bdb32b3cae5af26c04ace8e1cc93ea Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 3 Dec 2024 13:49:45 +0100
+Subject: net: ipv6: rpl_iptunnel: mitigate 2-realloc issue
+
+From: Justin Iurman <justin.iurman@uliege.be>
+
+[ Upstream commit 985ec6f5e6235242191370628acb73d7a9f0c0ea ]
+
+This patch mitigates the two-reallocations issue with rpl_iptunnel by
+providing the dst_entry (in the cache) to the first call to
+skb_cow_head(). As a result, the very first iteration would still
+trigger two reallocations (i.e., empty cache), while next iterations
+would only trigger a single reallocation.
+
+Performance tests before/after applying this patch, which clearly shows
+there is no impact (it even shows improvement):
+- before: https://ibb.co/nQJhqwc
+- after: https://ibb.co/4ZvW6wV
+
+Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
+Cc: Alexander Aring <aahringo@redhat.com>
+Signed-off-by: Paolo Abeni <pabeni@redhat.com>
+Stable-dep-of: 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/rpl_iptunnel.c | 46 ++++++++++++++++++++++-------------------
+ 1 file changed, 25 insertions(+), 21 deletions(-)
+
+diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
+index db3c19a42e1ca..7ba22d2f2bfef 100644
+--- a/net/ipv6/rpl_iptunnel.c
++++ b/net/ipv6/rpl_iptunnel.c
+@@ -125,7 +125,8 @@ static void rpl_destroy_state(struct lwtunnel_state *lwt)
+ }
+
+ static int rpl_do_srh_inline(struct sk_buff *skb, const struct rpl_lwt *rlwt,
+- const struct ipv6_rpl_sr_hdr *srh)
++ const struct ipv6_rpl_sr_hdr *srh,
++ struct dst_entry *cache_dst)
+ {
+ struct ipv6_rpl_sr_hdr *isrh, *csrh;
+ const struct ipv6hdr *oldhdr;
+@@ -153,7 +154,7 @@ static int rpl_do_srh_inline(struct sk_buff *skb, const struct rpl_lwt *rlwt,
+
+ hdrlen = ((csrh->hdrlen + 1) << 3);
+
+- err = skb_cow_head(skb, hdrlen + skb->mac_len);
++ err = skb_cow_head(skb, hdrlen + dst_dev_overhead(cache_dst, skb));
+ if (unlikely(err)) {
+ kfree(buf);
+ return err;
+@@ -186,7 +187,8 @@ static int rpl_do_srh_inline(struct sk_buff *skb, const struct rpl_lwt *rlwt,
+ return 0;
+ }
+
+-static int rpl_do_srh(struct sk_buff *skb, const struct rpl_lwt *rlwt)
++static int rpl_do_srh(struct sk_buff *skb, const struct rpl_lwt *rlwt,
++ struct dst_entry *cache_dst)
+ {
+ struct dst_entry *dst = skb_dst(skb);
+ struct rpl_iptunnel_encap *tinfo;
+@@ -196,7 +198,7 @@ static int rpl_do_srh(struct sk_buff *skb, const struct rpl_lwt *rlwt)
+
+ tinfo = rpl_encap_lwtunnel(dst->lwtstate);
+
+- return rpl_do_srh_inline(skb, rlwt, tinfo->srh);
++ return rpl_do_srh_inline(skb, rlwt, tinfo->srh, cache_dst);
+ }
+
+ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+@@ -208,14 +210,14 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+
+ rlwt = rpl_lwt_lwtunnel(orig_dst->lwtstate);
+
+- err = rpl_do_srh(skb, rlwt);
+- if (unlikely(err))
+- goto drop;
+-
+ local_bh_disable();
+ dst = dst_cache_get(&rlwt->cache);
+ local_bh_enable();
+
++ err = rpl_do_srh(skb, rlwt, dst);
++ if (unlikely(err))
++ goto drop;
++
+ if (unlikely(!dst)) {
+ struct ipv6hdr *hdr = ipv6_hdr(skb);
+ struct flowi6 fl6;
+@@ -237,15 +239,15 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb)
+ local_bh_disable();
+ dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr);
+ local_bh_enable();
++
++ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
++ if (unlikely(err))
++ goto drop;
+ }
+
+ skb_dst_drop(skb);
+ skb_dst_set(skb, dst);
+
+- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+- if (unlikely(err))
+- goto drop;
+-
+ return dst_output(net, sk, skb);
+
+ drop:
+@@ -262,29 +264,31 @@ static int rpl_input(struct sk_buff *skb)
+
+ rlwt = rpl_lwt_lwtunnel(orig_dst->lwtstate);
+
+- err = rpl_do_srh(skb, rlwt);
+- if (unlikely(err))
+- goto drop;
+-
+ local_bh_disable();
+ dst = dst_cache_get(&rlwt->cache);
++ local_bh_enable();
++
++ err = rpl_do_srh(skb, rlwt, dst);
++ if (unlikely(err))
++ goto drop;
+
+ if (!dst) {
+ ip6_route_input(skb);
+ dst = skb_dst(skb);
+ if (!dst->error) {
++ local_bh_disable();
+ dst_cache_set_ip6(&rlwt->cache, dst,
+ &ipv6_hdr(skb)->saddr);
++ local_bh_enable();
+ }
++
++ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
++ if (unlikely(err))
++ goto drop;
+ } else {
+ skb_dst_drop(skb);
+ skb_dst_set(skb, dst);
+ }
+- local_bh_enable();
+-
+- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+- if (unlikely(err))
+- goto drop;
+
+ return dst_input(skb);
+
+--
+2.39.5
+
--- /dev/null
+From 49b5da2d604b153b07f0b042c934c38cf1c379c8 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 3 Dec 2024 13:49:44 +0100
+Subject: net: ipv6: seg6_iptunnel: mitigate 2-realloc issue
+
+From: Justin Iurman <justin.iurman@uliege.be>
+
+[ Upstream commit 40475b63761abb6f8fdef960d03228a08662c9c4 ]
+
+This patch mitigates the two-reallocations issue with seg6_iptunnel by
+providing the dst_entry (in the cache) to the first call to
+skb_cow_head(). As a result, the very first iteration would still
+trigger two reallocations (i.e., empty cache), while next iterations
+would only trigger a single reallocation.
+
+Performance tests before/after applying this patch, which clearly shows
+the improvement:
+- before: https://ibb.co/3Cg4sNH
+- after: https://ibb.co/8rQ350r
+
+Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
+Cc: David Lebrun <dlebrun@google.com>
+Signed-off-by: Paolo Abeni <pabeni@redhat.com>
+Stable-dep-of: 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/ipv6/seg6_iptunnel.c | 85 ++++++++++++++++++++++++----------------
+ 1 file changed, 52 insertions(+), 33 deletions(-)
+
+diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
+index 098632adc9b5a..4bf937bfc2633 100644
+--- a/net/ipv6/seg6_iptunnel.c
++++ b/net/ipv6/seg6_iptunnel.c
+@@ -124,8 +124,8 @@ static __be32 seg6_make_flowlabel(struct net *net, struct sk_buff *skb,
+ return flowlabel;
+ }
+
+-/* encapsulate an IPv6 packet within an outer IPv6 header with a given SRH */
+-int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
++static int __seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
++ int proto, struct dst_entry *cache_dst)
+ {
+ struct dst_entry *dst = skb_dst(skb);
+ struct net *net = dev_net(dst->dev);
+@@ -137,7 +137,7 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
+ hdrlen = (osrh->hdrlen + 1) << 3;
+ tot_len = hdrlen + sizeof(*hdr);
+
+- err = skb_cow_head(skb, tot_len + skb->mac_len);
++ err = skb_cow_head(skb, tot_len + dst_dev_overhead(cache_dst, skb));
+ if (unlikely(err))
+ return err;
+
+@@ -197,11 +197,18 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
+
+ return 0;
+ }
++
++/* encapsulate an IPv6 packet within an outer IPv6 header with a given SRH */
++int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto)
++{
++ return __seg6_do_srh_encap(skb, osrh, proto, NULL);
++}
+ EXPORT_SYMBOL_GPL(seg6_do_srh_encap);
+
+ /* encapsulate an IPv6 packet within an outer IPv6 header with reduced SRH */
+ static int seg6_do_srh_encap_red(struct sk_buff *skb,
+- struct ipv6_sr_hdr *osrh, int proto)
++ struct ipv6_sr_hdr *osrh, int proto,
++ struct dst_entry *cache_dst)
+ {
+ __u8 first_seg = osrh->first_segment;
+ struct dst_entry *dst = skb_dst(skb);
+@@ -230,7 +237,7 @@ static int seg6_do_srh_encap_red(struct sk_buff *skb,
+
+ tot_len = red_hdrlen + sizeof(struct ipv6hdr);
+
+- err = skb_cow_head(skb, tot_len + skb->mac_len);
++ err = skb_cow_head(skb, tot_len + dst_dev_overhead(cache_dst, skb));
+ if (unlikely(err))
+ return err;
+
+@@ -317,8 +324,8 @@ static int seg6_do_srh_encap_red(struct sk_buff *skb,
+ return 0;
+ }
+
+-/* insert an SRH within an IPv6 packet, just after the IPv6 header */
+-int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
++static int __seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh,
++ struct dst_entry *cache_dst)
+ {
+ struct ipv6hdr *hdr, *oldhdr;
+ struct ipv6_sr_hdr *isrh;
+@@ -326,7 +333,7 @@ int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
+
+ hdrlen = (osrh->hdrlen + 1) << 3;
+
+- err = skb_cow_head(skb, hdrlen + skb->mac_len);
++ err = skb_cow_head(skb, hdrlen + dst_dev_overhead(cache_dst, skb));
+ if (unlikely(err))
+ return err;
+
+@@ -369,9 +376,8 @@ int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
+
+ return 0;
+ }
+-EXPORT_SYMBOL_GPL(seg6_do_srh_inline);
+
+-static int seg6_do_srh(struct sk_buff *skb)
++static int seg6_do_srh(struct sk_buff *skb, struct dst_entry *cache_dst)
+ {
+ struct dst_entry *dst = skb_dst(skb);
+ struct seg6_iptunnel_encap *tinfo;
+@@ -384,7 +390,7 @@ static int seg6_do_srh(struct sk_buff *skb)
+ if (skb->protocol != htons(ETH_P_IPV6))
+ return -EINVAL;
+
+- err = seg6_do_srh_inline(skb, tinfo->srh);
++ err = __seg6_do_srh_inline(skb, tinfo->srh, cache_dst);
+ if (err)
+ return err;
+ break;
+@@ -402,9 +408,11 @@ static int seg6_do_srh(struct sk_buff *skb)
+ return -EINVAL;
+
+ if (tinfo->mode == SEG6_IPTUN_MODE_ENCAP)
+- err = seg6_do_srh_encap(skb, tinfo->srh, proto);
++ err = __seg6_do_srh_encap(skb, tinfo->srh,
++ proto, cache_dst);
+ else
+- err = seg6_do_srh_encap_red(skb, tinfo->srh, proto);
++ err = seg6_do_srh_encap_red(skb, tinfo->srh,
++ proto, cache_dst);
+
+ if (err)
+ return err;
+@@ -425,11 +433,13 @@ static int seg6_do_srh(struct sk_buff *skb)
+ skb_push(skb, skb->mac_len);
+
+ if (tinfo->mode == SEG6_IPTUN_MODE_L2ENCAP)
+- err = seg6_do_srh_encap(skb, tinfo->srh,
+- IPPROTO_ETHERNET);
++ err = __seg6_do_srh_encap(skb, tinfo->srh,
++ IPPROTO_ETHERNET,
++ cache_dst);
+ else
+ err = seg6_do_srh_encap_red(skb, tinfo->srh,
+- IPPROTO_ETHERNET);
++ IPPROTO_ETHERNET,
++ cache_dst);
+
+ if (err)
+ return err;
+@@ -444,6 +454,13 @@ static int seg6_do_srh(struct sk_buff *skb)
+ return 0;
+ }
+
++/* insert an SRH within an IPv6 packet, just after the IPv6 header */
++int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh)
++{
++ return __seg6_do_srh_inline(skb, osrh, NULL);
++}
++EXPORT_SYMBOL_GPL(seg6_do_srh_inline);
++
+ static int seg6_input_finish(struct net *net, struct sock *sk,
+ struct sk_buff *skb)
+ {
+@@ -458,31 +475,33 @@ static int seg6_input_core(struct net *net, struct sock *sk,
+ struct seg6_lwt *slwt;
+ int err;
+
+- err = seg6_do_srh(skb);
+- if (unlikely(err))
+- goto drop;
+-
+ slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate);
+
+ local_bh_disable();
+ dst = dst_cache_get(&slwt->cache);
++ local_bh_enable();
++
++ err = seg6_do_srh(skb, dst);
++ if (unlikely(err))
++ goto drop;
+
+ if (!dst) {
+ ip6_route_input(skb);
+ dst = skb_dst(skb);
+ if (!dst->error) {
++ local_bh_disable();
+ dst_cache_set_ip6(&slwt->cache, dst,
+ &ipv6_hdr(skb)->saddr);
++ local_bh_enable();
+ }
++
++ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
++ if (unlikely(err))
++ goto drop;
+ } else {
+ skb_dst_drop(skb);
+ skb_dst_set(skb, dst);
+ }
+- local_bh_enable();
+-
+- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+- if (unlikely(err))
+- goto drop;
+
+ if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled))
+ return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT,
+@@ -528,16 +547,16 @@ static int seg6_output_core(struct net *net, struct sock *sk,
+ struct seg6_lwt *slwt;
+ int err;
+
+- err = seg6_do_srh(skb);
+- if (unlikely(err))
+- goto drop;
+-
+ slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate);
+
+ local_bh_disable();
+ dst = dst_cache_get(&slwt->cache);
+ local_bh_enable();
+
++ err = seg6_do_srh(skb, dst);
++ if (unlikely(err))
++ goto drop;
++
+ if (unlikely(!dst)) {
+ struct ipv6hdr *hdr = ipv6_hdr(skb);
+ struct flowi6 fl6;
+@@ -559,15 +578,15 @@ static int seg6_output_core(struct net *net, struct sock *sk,
+ local_bh_disable();
+ dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr);
+ local_bh_enable();
++
++ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
++ if (unlikely(err))
++ goto drop;
+ }
+
+ skb_dst_drop(skb);
+ skb_dst_set(skb, dst);
+
+- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+- if (unlikely(err))
+- goto drop;
+-
+ if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled))
+ return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk, skb,
+ NULL, skb_dst(skb)->dev, dst_output);
+--
+2.39.5
+
--- /dev/null
+From 38f927398ade60eb5687c7a62e805ff497f55810 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Fri, 7 Feb 2025 13:58:37 +0000
+Subject: openvswitch: use RCU protection in ovs_vport_cmd_fill_info()
+
+From: Eric Dumazet <edumazet@google.com>
+
+[ Upstream commit 90b2f49a502fa71090d9f4fe29a2f51fe5dff76d ]
+
+ovs_vport_cmd_fill_info() can be called without RTNL or RCU.
+
+Use RCU protection and dev_net_rcu() to avoid potential UAF.
+
+Fixes: 9354d4520342 ("openvswitch: reliable interface indentification in port dumps")
+Signed-off-by: Eric Dumazet <edumazet@google.com>
+Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
+Link: https://patch.msgid.link/20250207135841.1948589-6-edumazet@google.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ net/openvswitch/datapath.c | 12 +++++++++---
+ 1 file changed, 9 insertions(+), 3 deletions(-)
+
+diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
+index 78d9961fcd446..8d3c01f0e2aa1 100644
+--- a/net/openvswitch/datapath.c
++++ b/net/openvswitch/datapath.c
+@@ -2102,6 +2102,7 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb,
+ {
+ struct ovs_header *ovs_header;
+ struct ovs_vport_stats vport_stats;
++ struct net *net_vport;
+ int err;
+
+ ovs_header = genlmsg_put(skb, portid, seq, &dp_vport_genl_family,
+@@ -2118,12 +2119,15 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb,
+ nla_put_u32(skb, OVS_VPORT_ATTR_IFINDEX, vport->dev->ifindex))
+ goto nla_put_failure;
+
+- if (!net_eq(net, dev_net(vport->dev))) {
+- int id = peernet2id_alloc(net, dev_net(vport->dev), gfp);
++ rcu_read_lock();
++ net_vport = dev_net_rcu(vport->dev);
++ if (!net_eq(net, net_vport)) {
++ int id = peernet2id_alloc(net, net_vport, GFP_ATOMIC);
+
+ if (nla_put_s32(skb, OVS_VPORT_ATTR_NETNSID, id))
+- goto nla_put_failure;
++ goto nla_put_failure_unlock;
+ }
++ rcu_read_unlock();
+
+ ovs_vport_get_stats(vport, &vport_stats);
+ if (nla_put_64bit(skb, OVS_VPORT_ATTR_STATS,
+@@ -2144,6 +2148,8 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb,
+ genlmsg_end(skb, ovs_header);
+ return 0;
+
++nla_put_failure_unlock:
++ rcu_read_unlock();
+ nla_put_failure:
+ err = -EMSGSIZE;
+ error:
+--
+2.39.5
+
--- /dev/null
+From 5d7442285575e56ad8f868e8cac77b5a9a0f800d Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Wed, 29 Jan 2025 14:50:02 -0700
+Subject: rust: kbuild: add -fzero-init-padding-bits to bindgen_skip_cflags
+
+From: Justin M. Forbes <jforbes@fedoraproject.org>
+
+[ Upstream commit a9c621a217128eb3fb7522cf763992d9437fd5ba ]
+
+This seems to break the build when building with gcc15:
+
+ Unable to generate bindings: ClangDiagnostic("error: unknown
+ argument: '-fzero-init-padding-bits=all'\n")
+
+Thus skip that flag.
+
+Signed-off-by: Justin M. Forbes <jforbes@fedoraproject.org>
+Fixes: dce4aab8441d ("kbuild: Use -fzero-init-padding-bits=all")
+Reviewed-by: Kees Cook <kees@kernel.org>
+Link: https://lore.kernel.org/r/20250129215003.1736127-1-jforbes@fedoraproject.org
+[ Slightly reworded commit. - Miguel ]
+Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ rust/Makefile | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/rust/Makefile b/rust/Makefile
+index 9f59baacaf773..45779a064fa4f 100644
+--- a/rust/Makefile
++++ b/rust/Makefile
+@@ -229,6 +229,7 @@ bindgen_skip_c_flags := -mno-fp-ret-in-387 -mpreferred-stack-boundary=% \
+ -fzero-call-used-regs=% -fno-stack-clash-protection \
+ -fno-inline-functions-called-once -fsanitize=bounds-strict \
+ -fstrict-flex-arrays=% -fmin-function-alignment=% \
++ -fzero-init-padding-bits=% \
+ --param=% --param asan-%
+
+ # Derived from `scripts/Makefile.clang`.
+--
+2.39.5
+
--- /dev/null
+From 479bee46067a69f86ef8b10ee99dbc2a7c765a73 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Sun, 24 Nov 2024 09:08:07 +0200
+Subject: scsi: ufs: core: Introduce a new clock_gating lock
+
+From: Avri Altman <avri.altman@wdc.com>
+
+[ Upstream commit 209f4e43b8068c24cde227f464111030430153fa ]
+
+Introduce a new clock gating lock to serialize access to some of the clock
+gating members instead of the host_lock.
+
+While at it, simplify the code with the guard() macro and co for automatic
+cleanup of the new lock. There are some explicit
+spin_lock_irqsave()/spin_unlock_irqrestore() snaking instances I left
+behind because I couldn't make heads or tails of it.
+
+Additionally, move the trace_ufshcd_clk_gating() call from inside the
+region protected by the lock as it doesn't needs protection.
+
+Signed-off-by: Avri Altman <avri.altman@wdc.com>
+Link: https://lore.kernel.org/r/20241124070808.194860-4-avri.altman@wdc.com
+Reviewed-by: Bart Van Assche <bvanassche@acm.org>
+Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
+Stable-dep-of: 839a74b5649c ("scsi: ufs: Fix toggling of clk_gating.state when clock gating is not allowed")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/ufs/core/ufshcd.c | 109 ++++++++++++++++++--------------------
+ include/ufs/ufshcd.h | 9 +++-
+ 2 files changed, 59 insertions(+), 59 deletions(-)
+
+diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
+index 217619d64940e..5682fdcbf2da5 100644
+--- a/drivers/ufs/core/ufshcd.c
++++ b/drivers/ufs/core/ufshcd.c
+@@ -1840,19 +1840,16 @@ static void ufshcd_exit_clk_scaling(struct ufs_hba *hba)
+ static void ufshcd_ungate_work(struct work_struct *work)
+ {
+ int ret;
+- unsigned long flags;
+ struct ufs_hba *hba = container_of(work, struct ufs_hba,
+ clk_gating.ungate_work);
+
+ cancel_delayed_work_sync(&hba->clk_gating.gate_work);
+
+- spin_lock_irqsave(hba->host->host_lock, flags);
+- if (hba->clk_gating.state == CLKS_ON) {
+- spin_unlock_irqrestore(hba->host->host_lock, flags);
+- return;
++ scoped_guard(spinlock_irqsave, &hba->clk_gating.lock) {
++ if (hba->clk_gating.state == CLKS_ON)
++ return;
+ }
+
+- spin_unlock_irqrestore(hba->host->host_lock, flags);
+ ufshcd_hba_vreg_set_hpm(hba);
+ ufshcd_setup_clocks(hba, true);
+
+@@ -1887,7 +1884,7 @@ void ufshcd_hold(struct ufs_hba *hba)
+ if (!ufshcd_is_clkgating_allowed(hba) ||
+ !hba->clk_gating.is_initialized)
+ return;
+- spin_lock_irqsave(hba->host->host_lock, flags);
++ spin_lock_irqsave(&hba->clk_gating.lock, flags);
+ hba->clk_gating.active_reqs++;
+
+ start:
+@@ -1903,11 +1900,11 @@ void ufshcd_hold(struct ufs_hba *hba)
+ */
+ if (ufshcd_can_hibern8_during_gating(hba) &&
+ ufshcd_is_link_hibern8(hba)) {
+- spin_unlock_irqrestore(hba->host->host_lock, flags);
++ spin_unlock_irqrestore(&hba->clk_gating.lock, flags);
+ flush_result = flush_work(&hba->clk_gating.ungate_work);
+ if (hba->clk_gating.is_suspended && !flush_result)
+ return;
+- spin_lock_irqsave(hba->host->host_lock, flags);
++ spin_lock_irqsave(&hba->clk_gating.lock, flags);
+ goto start;
+ }
+ break;
+@@ -1936,17 +1933,17 @@ void ufshcd_hold(struct ufs_hba *hba)
+ */
+ fallthrough;
+ case REQ_CLKS_ON:
+- spin_unlock_irqrestore(hba->host->host_lock, flags);
++ spin_unlock_irqrestore(&hba->clk_gating.lock, flags);
+ flush_work(&hba->clk_gating.ungate_work);
+ /* Make sure state is CLKS_ON before returning */
+- spin_lock_irqsave(hba->host->host_lock, flags);
++ spin_lock_irqsave(&hba->clk_gating.lock, flags);
+ goto start;
+ default:
+ dev_err(hba->dev, "%s: clk gating is in invalid state %d\n",
+ __func__, hba->clk_gating.state);
+ break;
+ }
+- spin_unlock_irqrestore(hba->host->host_lock, flags);
++ spin_unlock_irqrestore(&hba->clk_gating.lock, flags);
+ }
+ EXPORT_SYMBOL_GPL(ufshcd_hold);
+
+@@ -1954,30 +1951,32 @@ static void ufshcd_gate_work(struct work_struct *work)
+ {
+ struct ufs_hba *hba = container_of(work, struct ufs_hba,
+ clk_gating.gate_work.work);
+- unsigned long flags;
+ int ret;
+
+- spin_lock_irqsave(hba->host->host_lock, flags);
+- /*
+- * In case you are here to cancel this work the gating state
+- * would be marked as REQ_CLKS_ON. In this case save time by
+- * skipping the gating work and exit after changing the clock
+- * state to CLKS_ON.
+- */
+- if (hba->clk_gating.is_suspended ||
+- (hba->clk_gating.state != REQ_CLKS_OFF)) {
+- hba->clk_gating.state = CLKS_ON;
+- trace_ufshcd_clk_gating(dev_name(hba->dev),
+- hba->clk_gating.state);
+- goto rel_lock;
+- }
++ scoped_guard(spinlock_irqsave, &hba->clk_gating.lock) {
++ /*
++ * In case you are here to cancel this work the gating state
++ * would be marked as REQ_CLKS_ON. In this case save time by
++ * skipping the gating work and exit after changing the clock
++ * state to CLKS_ON.
++ */
++ if (hba->clk_gating.is_suspended ||
++ hba->clk_gating.state != REQ_CLKS_OFF) {
++ hba->clk_gating.state = CLKS_ON;
++ trace_ufshcd_clk_gating(dev_name(hba->dev),
++ hba->clk_gating.state);
++ return;
++ }
+
+- if (ufshcd_is_ufs_dev_busy(hba) ||
+- hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL ||
+- hba->clk_gating.active_reqs)
+- goto rel_lock;
++ if (hba->clk_gating.active_reqs)
++ return;
++ }
+
+- spin_unlock_irqrestore(hba->host->host_lock, flags);
++ scoped_guard(spinlock_irqsave, hba->host->host_lock) {
++ if (ufshcd_is_ufs_dev_busy(hba) ||
++ hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL)
++ return;
++ }
+
+ /* put the link into hibern8 mode before turning off clocks */
+ if (ufshcd_can_hibern8_during_gating(hba)) {
+@@ -1988,7 +1987,7 @@ static void ufshcd_gate_work(struct work_struct *work)
+ __func__, ret);
+ trace_ufshcd_clk_gating(dev_name(hba->dev),
+ hba->clk_gating.state);
+- goto out;
++ return;
+ }
+ ufshcd_set_link_hibern8(hba);
+ }
+@@ -2008,32 +2007,34 @@ static void ufshcd_gate_work(struct work_struct *work)
+ * prevent from doing cancel work multiple times when there are
+ * new requests arriving before the current cancel work is done.
+ */
+- spin_lock_irqsave(hba->host->host_lock, flags);
++ guard(spinlock_irqsave)(&hba->clk_gating.lock);
+ if (hba->clk_gating.state == REQ_CLKS_OFF) {
+ hba->clk_gating.state = CLKS_OFF;
+ trace_ufshcd_clk_gating(dev_name(hba->dev),
+ hba->clk_gating.state);
+ }
+-rel_lock:
+- spin_unlock_irqrestore(hba->host->host_lock, flags);
+-out:
+- return;
+ }
+
+-/* host lock must be held before calling this variant */
+ static void __ufshcd_release(struct ufs_hba *hba)
+ {
++ lockdep_assert_held(&hba->clk_gating.lock);
++
+ if (!ufshcd_is_clkgating_allowed(hba))
+ return;
+
+ hba->clk_gating.active_reqs--;
+
+ if (hba->clk_gating.active_reqs || hba->clk_gating.is_suspended ||
+- hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL ||
+- ufshcd_has_pending_tasks(hba) || !hba->clk_gating.is_initialized ||
++ !hba->clk_gating.is_initialized ||
+ hba->clk_gating.state == CLKS_OFF)
+ return;
+
++ scoped_guard(spinlock_irqsave, hba->host->host_lock) {
++ if (ufshcd_has_pending_tasks(hba) ||
++ hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL)
++ return;
++ }
++
+ hba->clk_gating.state = REQ_CLKS_OFF;
+ trace_ufshcd_clk_gating(dev_name(hba->dev), hba->clk_gating.state);
+ queue_delayed_work(hba->clk_gating.clk_gating_workq,
+@@ -2043,11 +2044,8 @@ static void __ufshcd_release(struct ufs_hba *hba)
+
+ void ufshcd_release(struct ufs_hba *hba)
+ {
+- unsigned long flags;
+-
+- spin_lock_irqsave(hba->host->host_lock, flags);
++ guard(spinlock_irqsave)(&hba->clk_gating.lock);
+ __ufshcd_release(hba);
+- spin_unlock_irqrestore(hba->host->host_lock, flags);
+ }
+ EXPORT_SYMBOL_GPL(ufshcd_release);
+
+@@ -2062,11 +2060,9 @@ static ssize_t ufshcd_clkgate_delay_show(struct device *dev,
+ void ufshcd_clkgate_delay_set(struct device *dev, unsigned long value)
+ {
+ struct ufs_hba *hba = dev_get_drvdata(dev);
+- unsigned long flags;
+
+- spin_lock_irqsave(hba->host->host_lock, flags);
++ guard(spinlock_irqsave)(&hba->clk_gating.lock);
+ hba->clk_gating.delay_ms = value;
+- spin_unlock_irqrestore(hba->host->host_lock, flags);
+ }
+ EXPORT_SYMBOL_GPL(ufshcd_clkgate_delay_set);
+
+@@ -2094,7 +2090,6 @@ static ssize_t ufshcd_clkgate_enable_store(struct device *dev,
+ struct device_attribute *attr, const char *buf, size_t count)
+ {
+ struct ufs_hba *hba = dev_get_drvdata(dev);
+- unsigned long flags;
+ u32 value;
+
+ if (kstrtou32(buf, 0, &value))
+@@ -2102,9 +2097,10 @@ static ssize_t ufshcd_clkgate_enable_store(struct device *dev,
+
+ value = !!value;
+
+- spin_lock_irqsave(hba->host->host_lock, flags);
++ guard(spinlock_irqsave)(&hba->clk_gating.lock);
++
+ if (value == hba->clk_gating.is_enabled)
+- goto out;
++ return count;
+
+ if (value)
+ __ufshcd_release(hba);
+@@ -2112,8 +2108,7 @@ static ssize_t ufshcd_clkgate_enable_store(struct device *dev,
+ hba->clk_gating.active_reqs++;
+
+ hba->clk_gating.is_enabled = value;
+-out:
+- spin_unlock_irqrestore(hba->host->host_lock, flags);
++
+ return count;
+ }
+
+@@ -2155,6 +2150,8 @@ static void ufshcd_init_clk_gating(struct ufs_hba *hba)
+ INIT_DELAYED_WORK(&hba->clk_gating.gate_work, ufshcd_gate_work);
+ INIT_WORK(&hba->clk_gating.ungate_work, ufshcd_ungate_work);
+
++ spin_lock_init(&hba->clk_gating.lock);
++
+ hba->clk_gating.clk_gating_workq = alloc_ordered_workqueue(
+ "ufs_clk_gating_%d", WQ_MEM_RECLAIM | WQ_HIGHPRI,
+ hba->host->host_no);
+@@ -9194,7 +9191,6 @@ static int ufshcd_setup_clocks(struct ufs_hba *hba, bool on)
+ int ret = 0;
+ struct ufs_clk_info *clki;
+ struct list_head *head = &hba->clk_list_head;
+- unsigned long flags;
+ ktime_t start = ktime_get();
+ bool clk_state_changed = false;
+
+@@ -9245,11 +9241,10 @@ static int ufshcd_setup_clocks(struct ufs_hba *hba, bool on)
+ clk_disable_unprepare(clki->clk);
+ }
+ } else if (!ret && on) {
+- spin_lock_irqsave(hba->host->host_lock, flags);
+- hba->clk_gating.state = CLKS_ON;
++ scoped_guard(spinlock_irqsave, &hba->clk_gating.lock)
++ hba->clk_gating.state = CLKS_ON;
+ trace_ufshcd_clk_gating(dev_name(hba->dev),
+ hba->clk_gating.state);
+- spin_unlock_irqrestore(hba->host->host_lock, flags);
+ }
+
+ if (clk_state_changed)
+diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
+index d5e43a1dcff22..47cba116f87b8 100644
+--- a/include/ufs/ufshcd.h
++++ b/include/ufs/ufshcd.h
+@@ -402,6 +402,9 @@ enum clk_gating_state {
+ * delay_ms
+ * @ungate_work: worker to turn on clocks that will be used in case of
+ * interrupt context
++ * @clk_gating_workq: workqueue for clock gating work.
++ * @lock: serialize access to some struct ufs_clk_gating members. An outer lock
++ * relative to the host lock
+ * @state: the current clocks state
+ * @delay_ms: gating delay in ms
+ * @is_suspended: clk gating is suspended when set to 1 which can be used
+@@ -412,11 +415,14 @@ enum clk_gating_state {
+ * @is_initialized: Indicates whether clock gating is initialized or not
+ * @active_reqs: number of requests that are pending and should be waited for
+ * completion before gating clocks.
+- * @clk_gating_workq: workqueue for clock gating work.
+ */
+ struct ufs_clk_gating {
+ struct delayed_work gate_work;
+ struct work_struct ungate_work;
++ struct workqueue_struct *clk_gating_workq;
++
++ spinlock_t lock;
++
+ enum clk_gating_state state;
+ unsigned long delay_ms;
+ bool is_suspended;
+@@ -425,7 +431,6 @@ struct ufs_clk_gating {
+ bool is_enabled;
+ bool is_initialized;
+ int active_reqs;
+- struct workqueue_struct *clk_gating_workq;
+ };
+
+ /**
+--
+2.39.5
+
--- /dev/null
+From 12b6e46a31da130d4f8de87ff66b6441ef65db02 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Sun, 24 Nov 2024 09:08:05 +0200
+Subject: scsi: ufs: core: Introduce ufshcd_has_pending_tasks()
+
+From: Avri Altman <avri.altman@wdc.com>
+
+[ Upstream commit e738ba458e7539be1757dcdf85835a5c7b11fad4 ]
+
+Prepare to remove hba->clk_gating.active_reqs check from
+ufshcd_is_ufs_dev_busy().
+
+Signed-off-by: Avri Altman <avri.altman@wdc.com>
+Link: https://lore.kernel.org/r/20241124070808.194860-2-avri.altman@wdc.com
+Reviewed-by: Bart Van Assche <bvanassche@acm.org>
+Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
+Stable-dep-of: 839a74b5649c ("scsi: ufs: Fix toggling of clk_gating.state when clock gating is not allowed")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/ufs/core/ufshcd.c | 13 +++++++++----
+ 1 file changed, 9 insertions(+), 4 deletions(-)
+
+diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
+index b786cba9a270f..94d7992457a3b 100644
+--- a/drivers/ufs/core/ufshcd.c
++++ b/drivers/ufs/core/ufshcd.c
+@@ -258,10 +258,16 @@ ufs_get_desired_pm_lvl_for_dev_link_state(enum ufs_dev_pwr_mode dev_state,
+ return UFS_PM_LVL_0;
+ }
+
++static bool ufshcd_has_pending_tasks(struct ufs_hba *hba)
++{
++ return hba->outstanding_tasks || hba->active_uic_cmd ||
++ hba->uic_async_done;
++}
++
+ static bool ufshcd_is_ufs_dev_busy(struct ufs_hba *hba)
+ {
+- return (hba->clk_gating.active_reqs || hba->outstanding_reqs || hba->outstanding_tasks ||
+- hba->active_uic_cmd || hba->uic_async_done);
++ return hba->clk_gating.active_reqs || hba->outstanding_reqs ||
++ ufshcd_has_pending_tasks(hba);
+ }
+
+ static const struct ufs_dev_quirk ufs_fixups[] = {
+@@ -2023,8 +2029,7 @@ static void __ufshcd_release(struct ufs_hba *hba)
+
+ if (hba->clk_gating.active_reqs || hba->clk_gating.is_suspended ||
+ hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL ||
+- hba->outstanding_tasks || !hba->clk_gating.is_initialized ||
+- hba->active_uic_cmd || hba->uic_async_done ||
++ ufshcd_has_pending_tasks(hba) || !hba->clk_gating.is_initialized ||
+ hba->clk_gating.state == CLKS_OFF)
+ return;
+
+--
+2.39.5
+
--- /dev/null
+From ac2299a3755b2c55968ad57ae2d0676a5d10ade6 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Sun, 24 Nov 2024 09:08:06 +0200
+Subject: scsi: ufs: core: Prepare to introduce a new clock_gating lock
+
+From: Avri Altman <avri.altman@wdc.com>
+
+[ Upstream commit 7869c6521f5715688b3d1f1c897374a68544eef0 ]
+
+Remove hba->clk_gating.active_reqs check from ufshcd_is_ufs_dev_busy()
+function to separate clock gating logic from general device busy checks.
+
+Signed-off-by: Avri Altman <avri.altman@wdc.com>
+Link: https://lore.kernel.org/r/20241124070808.194860-3-avri.altman@wdc.com
+Reviewed-by: Bart Van Assche <bvanassche@acm.org>
+Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
+Stable-dep-of: 839a74b5649c ("scsi: ufs: Fix toggling of clk_gating.state when clock gating is not allowed")
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/ufs/core/ufshcd.c | 11 +++++++----
+ 1 file changed, 7 insertions(+), 4 deletions(-)
+
+diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
+index 94d7992457a3b..217619d64940e 100644
+--- a/drivers/ufs/core/ufshcd.c
++++ b/drivers/ufs/core/ufshcd.c
+@@ -266,8 +266,7 @@ static bool ufshcd_has_pending_tasks(struct ufs_hba *hba)
+
+ static bool ufshcd_is_ufs_dev_busy(struct ufs_hba *hba)
+ {
+- return hba->clk_gating.active_reqs || hba->outstanding_reqs ||
+- ufshcd_has_pending_tasks(hba);
++ return hba->outstanding_reqs || ufshcd_has_pending_tasks(hba);
+ }
+
+ static const struct ufs_dev_quirk ufs_fixups[] = {
+@@ -1973,7 +1972,9 @@ static void ufshcd_gate_work(struct work_struct *work)
+ goto rel_lock;
+ }
+
+- if (ufshcd_is_ufs_dev_busy(hba) || hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL)
++ if (ufshcd_is_ufs_dev_busy(hba) ||
++ hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL ||
++ hba->clk_gating.active_reqs)
+ goto rel_lock;
+
+ spin_unlock_irqrestore(hba->host->host_lock, flags);
+@@ -8272,7 +8273,9 @@ static void ufshcd_rtc_work(struct work_struct *work)
+ hba = container_of(to_delayed_work(work), struct ufs_hba, ufs_rtc_update_work);
+
+ /* Update RTC only when there are no requests in progress and UFSHCI is operational */
+- if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL)
++ if (!ufshcd_is_ufs_dev_busy(hba) &&
++ hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL &&
++ !hba->clk_gating.active_reqs)
+ ufshcd_update_rtc(hba);
+
+ if (ufshcd_is_ufs_dev_active(hba) && hba->dev_info.rtc_update_period)
+--
+2.39.5
+
--- /dev/null
+From 0eb9778926430a8663ccd7169436f98b363a6bf2 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sashal@kernel.org>
+Date: Tue, 28 Jan 2025 09:12:07 +0200
+Subject: scsi: ufs: Fix toggling of clk_gating.state when clock gating is not
+ allowed
+
+From: Avri Altman <avri.altman@wdc.com>
+
+[ Upstream commit 839a74b5649c9f41d939a05059b5ca6b17156d03 ]
+
+This commit addresses an issue where clk_gating.state is being toggled in
+ufshcd_setup_clocks() even if clock gating is not allowed.
+
+The fix is to add a check for hba->clk_gating.is_initialized before toggling
+clk_gating.state in ufshcd_setup_clocks().
+
+Since clk_gating.lock is now initialized unconditionally, it can no longer
+lead to the spinlock being used before it is properly initialized, but
+instead it is mostly for documentation purposes.
+
+Fixes: 1ab27c9cf8b6 ("ufs: Add support for clock gating")
+Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
+Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
+Signed-off-by: Avri Altman <avri.altman@wdc.com>
+Link: https://lore.kernel.org/r/20250128071207.75494-3-avri.altman@wdc.com
+Reviewed-by: Bart Van Assche <bvanassche@acm.org>
+Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ drivers/ufs/core/ufshcd.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
+index 5682fdcbf2da5..a73fffd6c3de4 100644
+--- a/drivers/ufs/core/ufshcd.c
++++ b/drivers/ufs/core/ufshcd.c
+@@ -9240,7 +9240,7 @@ static int ufshcd_setup_clocks(struct ufs_hba *hba, bool on)
+ if (!IS_ERR_OR_NULL(clki->clk) && clki->enabled)
+ clk_disable_unprepare(clki->clk);
+ }
+- } else if (!ret && on) {
++ } else if (!ret && on && hba->clk_gating.is_initialized) {
+ scoped_guard(spinlock_irqsave, &hba->clk_gating.lock)
+ hba->clk_gating.state = CLKS_ON;
+ trace_ufshcd_clk_gating(dev_name(hba->dev),
+--
+2.39.5
+
kbuild-suppress-stdout-from-merge_config-for-silent-.patch
asoc-intel-bytcr_rt5640-add-dmi-quirk-for-vexia-edu-.patch
kbuild-use-fzero-init-padding-bits-all.patch
+include-net-add-static-inline-dst_dev_overhead-to-ds.patch
+net-ipv6-ioam6_iptunnel-mitigate-2-realloc-issue.patch
+net-ipv6-seg6_iptunnel-mitigate-2-realloc-issue.patch
+net-ipv6-rpl_iptunnel-mitigate-2-realloc-issue.patch
+net-ipv6-fix-dst-ref-loops-in-rpl-seg6-and-ioam6-lwt.patch
+clocksource-use-pr_info-for-checking-clocksource-syn.patch
+clocksource-use-migrate_disable-to-avoid-calling-get.patch
+scsi-ufs-core-introduce-ufshcd_has_pending_tasks.patch
+scsi-ufs-core-prepare-to-introduce-a-new-clock_gatin.patch
+scsi-ufs-core-introduce-a-new-clock_gating-lock.patch
+scsi-ufs-fix-toggling-of-clk_gating.state-when-clock.patch
+rust-kbuild-add-fzero-init-padding-bits-to-bindgen_s.patch
+cpufreq-amd-pstate-call-cppc_set_epp_perf-in-the-ree.patch
+cpufreq-amd-pstate-align-offline-flow-of-shared-memo.patch
+cpufreq-amd-pstate-refactor-amd_pstate_epp_reenable-.patch
+cpufreq-amd-pstate-remove-the-cppc_state-check-in-of.patch
+cpufreq-amd-pstate-merge-amd_pstate_epp_cpu_offline-.patch
+cpufreq-amd-pstate-convert-mutex-use-to-guard.patch
+cpufreq-amd-pstate-fix-cpufreq_policy-ref-counting.patch
+ipv4-add-rcu-protection-to-ip4_dst_hoplimit.patch
+ipv4-use-rcu-protection-in-ip_dst_mtu_maybe_forward.patch
+net-add-dev_net_rcu-helper.patch
+ipv4-use-rcu-protection-in-ipv4_default_advmss.patch
+ipv4-use-rcu-protection-in-rt_is_expired.patch
+ipv4-use-rcu-protection-in-inet_select_addr.patch
+net-ipv4-cache-pmtu-for-all-packet-paths-if-multipat.patch
+ipv4-use-rcu-protection-in-__ip_rt_update_pmtu.patch
+ipv4-icmp-convert-to-dev_net_rcu.patch
+flow_dissector-use-rcu-protection-to-fetch-dev_net.patch
+ipv6-use-rcu-protection-in-ip6_default_advmss.patch
+ipv6-icmp-convert-to-dev_net_rcu.patch
+hid-hid-steam-make-sure-rumble-work-is-canceled-on-r.patch
+hid-hid-steam-move-hidraw-input-un-registering-to-wo.patch
+ndisc-use-rcu-protection-in-ndisc_alloc_skb.patch
+neighbour-use-rcu-protection-in-__neigh_notify.patch
+arp-use-rcu-protection-in-arp_xmit.patch
+openvswitch-use-rcu-protection-in-ovs_vport_cmd_fill.patch
+ndisc-extend-rcu-protection-in-ndisc_send_skb.patch
+ipv6-mcast-extend-rcu-protection-in-igmp6_send.patch
+btrfs-rename-__get_extent_map-and-pass-btrfs_inode.patch
+btrfs-fix-stale-page-cache-after-race-between-readah.patch
+ipv6-mcast-add-rcu-protection-to-mld_newpack.patch