From: Sasha Levin Date: Tue, 18 Feb 2025 12:30:04 +0000 (-0500) Subject: Fixes for 6.12 X-Git-Tag: v6.1.129~50^2~5 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=3ca55ba0d68e078c439c6c3ad26a839a9f2090df;p=thirdparty%2Fkernel%2Fstable-queue.git Fixes for 6.12 Signed-off-by: Sasha Levin --- diff --git a/queue-6.12/arp-use-rcu-protection-in-arp_xmit.patch b/queue-6.12/arp-use-rcu-protection-in-arp_xmit.patch new file mode 100644 index 0000000000..87da26162b --- /dev/null +++ b/queue-6.12/arp-use-rcu-protection-in-arp_xmit.patch @@ -0,0 +1,45 @@ +From 016ece15c59d5f3a99685e52040b5887115f182b Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 7 Feb 2025 13:58:36 +0000 +Subject: arp: use RCU protection in arp_xmit() + +From: Eric Dumazet + +[ Upstream commit a42b69f692165ec39db42d595f4f65a4c8f42e44 ] + +arp_xmit() can be called without RTNL or RCU protection. + +Use RCU protection to avoid potential UAF. + +Fixes: 29a26a568038 ("netfilter: Pass struct net into the netfilter hooks") +Signed-off-by: Eric Dumazet +Reviewed-by: David Ahern +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20250207135841.1948589-5-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/ipv4/arp.c | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c +index 11c1519b36993..59ffaa89d7b05 100644 +--- a/net/ipv4/arp.c ++++ b/net/ipv4/arp.c +@@ -659,10 +659,12 @@ static int arp_xmit_finish(struct net *net, struct sock *sk, struct sk_buff *skb + */ + void arp_xmit(struct sk_buff *skb) + { ++ rcu_read_lock(); + /* Send it off, maybe filter it using firewalling first. */ + NF_HOOK(NFPROTO_ARP, NF_ARP_OUT, +- dev_net(skb->dev), NULL, skb, NULL, skb->dev, ++ dev_net_rcu(skb->dev), NULL, skb, NULL, skb->dev, + arp_xmit_finish); ++ rcu_read_unlock(); + } + EXPORT_SYMBOL(arp_xmit); + +-- +2.39.5 + diff --git a/queue-6.12/btrfs-fix-stale-page-cache-after-race-between-readah.patch b/queue-6.12/btrfs-fix-stale-page-cache-after-race-between-readah.patch new file mode 100644 index 0000000000..284f293dae --- /dev/null +++ b/queue-6.12/btrfs-fix-stale-page-cache-after-race-between-readah.patch @@ -0,0 +1,208 @@ +From 856009bc6db21a7eff4281b0dfdf4b33375298b9 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 4 Feb 2025 11:02:32 +0000 +Subject: btrfs: fix stale page cache after race between readahead and direct + IO write + +From: Filipe Manana + +[ Upstream commit acc18e1c1d8c0d59d793cf87790ccfcafb1bf5f0 ] + +After commit ac325fc2aad5 ("btrfs: do not hold the extent lock for entire +read") we can now trigger a race between a task doing a direct IO write +and readahead. When this race is triggered it results in tasks getting +stale data when they attempt do a buffered read (including the task that +did the direct IO write). + +This race can be sporadically triggered with test case generic/418, failing +like this: + + $ ./check generic/418 + FSTYP -- btrfs + PLATFORM -- Linux/x86_64 debian0 6.13.0-rc7-btrfs-next-185+ #17 SMP PREEMPT_DYNAMIC Mon Feb 3 12:28:46 WET 2025 + MKFS_OPTIONS -- /dev/sdc + MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 + + generic/418 14s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/418.out.bad) + --- tests/generic/418.out 2020-06-10 19:29:03.850519863 +0100 + +++ /home/fdmanana/git/hub/xfstests/results//generic/418.out.bad 2025-02-03 15:42:36.974609476 +0000 + @@ -1,2 +1,5 @@ + QA output created by 418 + +cmpbuf: offset 0: Expected: 0x1, got 0x0 + +[6:0] FAIL - comparison failed, offset 24576 + +diotest -wp -b 4096 -n 8 -i 4 failed at loop 3 + Silence is golden + ... + (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/generic/418.out /home/fdmanana/git/hub/xfstests/results//generic/418.out.bad' to see the entire diff) + Ran: generic/418 + Failures: generic/418 + Failed 1 of 1 tests + +The race happens like this: + +1) A file has a prealloc extent for the range [16K, 28K); + +2) Task A starts a direct IO write against file range [24K, 28K). + At the start of the direct IO write it invalidates the page cache at + __iomap_dio_rw() with kiocb_invalidate_pages() for the 4K page at file + offset 24K; + +3) Task A enters btrfs_dio_iomap_begin() and locks the extent range + [24K, 28K); + +4) Task B starts a readahead for file range [16K, 28K), entering + btrfs_readahead(). + + First it attempts to read the page at offset 16K by entering + btrfs_do_readpage(), where it calls get_extent_map(), locks the range + [16K, 20K) and gets the extent map for the range [16K, 28K), caching + it into the 'em_cached' variable declared in the local stack of + btrfs_readahead(), and then unlocks the range [16K, 20K). + + Since the extent map has the prealloc flag, at btrfs_do_readpage() we + zero out the page's content and don't submit any bio to read the page + from the extent. + + Then it attempts to read the page at offset 20K entering + btrfs_do_readpage() where we reuse the previously cached extent map + (decided by get_extent_map()) since it spans the page's range and + it's still in the inode's extent map tree. + + Just like for the previous page, we zero out the page's content since + the extent map has the prealloc flag set. + + Then it attempts to read the page at offset 24K entering + btrfs_do_readpage() where we reuse the previously cached extent map + (decided by get_extent_map()) since it spans the page's range and + it's still in the inode's extent map tree. + + Just like for the previous pages, we zero out the page's content since + the extent map has the prealloc flag set. Note that we didn't lock the + extent range [24K, 28K), so we didn't synchronize with the ongoing + direct IO write being performed by task A; + +5) Task A enters btrfs_create_dio_extent() and creates an ordered extent + for the range [24K, 28K), with the flags BTRFS_ORDERED_DIRECT and + BTRFS_ORDERED_PREALLOC set; + +6) Task A unlocks the range [24K, 28K) at btrfs_dio_iomap_begin(); + +7) The ordered extent enters btrfs_finish_one_ordered() and locks the + range [24K, 28K); + +8) Task A enters fs/iomap/direct-io.c:iomap_dio_complete() and it tries + to invalidate the page at offset 24K by calling + kiocb_invalidate_post_direct_write(), resulting in a call chain that + ends up at btrfs_release_folio(). + + The btrfs_release_folio() call ends up returning false because the range + for the page at file offset 24K is currently locked by the task doing + the ordered extent completion in the previous step (7), so we have: + + btrfs_release_folio() -> + __btrfs_release_folio() -> + try_release_extent_mapping() -> + try_release_extent_state() + + This last function checking that the range is locked and returning false + and propagating it up to btrfs_release_folio(). + + So this results in a failure to invalidate the page and + kiocb_invalidate_post_direct_write() triggers this message logged in + dmesg: + + Page cache invalidation failure on direct I/O. Possible data corruption due to collision with buffered I/O! + + After this we leave the page cache with stale data for the file range + [24K, 28K), filled with zeroes instead of the data written by direct IO + write (all bytes with a 0x01 value), so any task attempting to read with + buffered IO, including the task that did the direct IO write, will get + all bytes in the range with a 0x00 value instead of the written data. + +Fix this by locking the range, with btrfs_lock_and_flush_ordered_range(), +at the two callers of btrfs_do_readpage() instead of doing it at +get_extent_map(), just like we did before commit ac325fc2aad5 ("btrfs: do +not hold the extent lock for entire read"), and unlocking the range after +all the calls to btrfs_do_readpage(). This way we never reuse a cached +extent map without flushing any pending ordered extents from a concurrent +direct IO write. + +Fixes: ac325fc2aad5 ("btrfs: do not hold the extent lock for entire read") +Reviewed-by: Qu Wenruo +Signed-off-by: Filipe Manana +Signed-off-by: David Sterba +Signed-off-by: Sasha Levin +--- + fs/btrfs/extent_io.c | 18 +++++++++++++++--- + 1 file changed, 15 insertions(+), 3 deletions(-) + +diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c +index e6e6c4dc53c48..fe08c983d5bb4 100644 +--- a/fs/btrfs/extent_io.c ++++ b/fs/btrfs/extent_io.c +@@ -906,7 +906,6 @@ static struct extent_map *get_extent_map(struct btrfs_inode *inode, + u64 len, struct extent_map **em_cached) + { + struct extent_map *em; +- struct extent_state *cached_state = NULL; + + ASSERT(em_cached); + +@@ -922,14 +921,12 @@ static struct extent_map *get_extent_map(struct btrfs_inode *inode, + *em_cached = NULL; + } + +- btrfs_lock_and_flush_ordered_range(inode, start, start + len - 1, &cached_state); + em = btrfs_get_extent(inode, folio, start, len); + if (!IS_ERR(em)) { + BUG_ON(*em_cached); + refcount_inc(&em->refs); + *em_cached = em; + } +- unlock_extent(&inode->io_tree, start, start + len - 1, &cached_state); + + return em; + } +@@ -1086,11 +1083,18 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, + + int btrfs_read_folio(struct file *file, struct folio *folio) + { ++ struct btrfs_inode *inode = folio_to_inode(folio); ++ const u64 start = folio_pos(folio); ++ const u64 end = start + folio_size(folio) - 1; ++ struct extent_state *cached_state = NULL; + struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ }; + struct extent_map *em_cached = NULL; + int ret; + ++ btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state); + ret = btrfs_do_readpage(folio, &em_cached, &bio_ctrl, NULL); ++ unlock_extent(&inode->io_tree, start, end, &cached_state); ++ + free_extent_map(em_cached); + + /* +@@ -2267,12 +2271,20 @@ void btrfs_readahead(struct readahead_control *rac) + { + struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ | REQ_RAHEAD }; + struct folio *folio; ++ struct btrfs_inode *inode = BTRFS_I(rac->mapping->host); ++ const u64 start = readahead_pos(rac); ++ const u64 end = start + readahead_length(rac) - 1; ++ struct extent_state *cached_state = NULL; + struct extent_map *em_cached = NULL; + u64 prev_em_start = (u64)-1; + ++ btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state); ++ + while ((folio = readahead_folio(rac)) != NULL) + btrfs_do_readpage(folio, &em_cached, &bio_ctrl, &prev_em_start); + ++ unlock_extent(&inode->io_tree, start, end, &cached_state); ++ + if (em_cached) + free_extent_map(em_cached); + submit_one_bio(&bio_ctrl); +-- +2.39.5 + diff --git a/queue-6.12/btrfs-rename-__get_extent_map-and-pass-btrfs_inode.patch b/queue-6.12/btrfs-rename-__get_extent_map-and-pass-btrfs_inode.patch new file mode 100644 index 0000000000..817a6f761c --- /dev/null +++ b/queue-6.12/btrfs-rename-__get_extent_map-and-pass-btrfs_inode.patch @@ -0,0 +1,70 @@ +From 693ad002410794be6b07348c6324c687becc4ec4 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Thu, 9 Jan 2025 11:24:15 +0100 +Subject: btrfs: rename __get_extent_map() and pass btrfs_inode + +From: David Sterba + +[ Upstream commit 06de96faf795b5c276a3be612da6b08c6112e747 ] + +The double underscore naming scheme does not apply here, there's only +only get_extent_map(). As the definition is changed also pass the struct +btrfs_inode. + +Reviewed-by: Johannes Thumshirn +Reviewed-by: Anand Jain +Signed-off-by: David Sterba +Stable-dep-of: acc18e1c1d8c ("btrfs: fix stale page cache after race between readahead and direct IO write") +Signed-off-by: Sasha Levin +--- + fs/btrfs/extent_io.c | 15 +++++++-------- + 1 file changed, 7 insertions(+), 8 deletions(-) + +diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c +index 42c9899d9241c..e6e6c4dc53c48 100644 +--- a/fs/btrfs/extent_io.c ++++ b/fs/btrfs/extent_io.c +@@ -901,9 +901,9 @@ void clear_folio_extent_mapped(struct folio *folio) + folio_detach_private(folio); + } + +-static struct extent_map *__get_extent_map(struct inode *inode, +- struct folio *folio, u64 start, +- u64 len, struct extent_map **em_cached) ++static struct extent_map *get_extent_map(struct btrfs_inode *inode, ++ struct folio *folio, u64 start, ++ u64 len, struct extent_map **em_cached) + { + struct extent_map *em; + struct extent_state *cached_state = NULL; +@@ -922,14 +922,14 @@ static struct extent_map *__get_extent_map(struct inode *inode, + *em_cached = NULL; + } + +- btrfs_lock_and_flush_ordered_range(BTRFS_I(inode), start, start + len - 1, &cached_state); +- em = btrfs_get_extent(BTRFS_I(inode), folio, start, len); ++ btrfs_lock_and_flush_ordered_range(inode, start, start + len - 1, &cached_state); ++ em = btrfs_get_extent(inode, folio, start, len); + if (!IS_ERR(em)) { + BUG_ON(*em_cached); + refcount_inc(&em->refs); + *em_cached = em; + } +- unlock_extent(&BTRFS_I(inode)->io_tree, start, start + len - 1, &cached_state); ++ unlock_extent(&inode->io_tree, start, start + len - 1, &cached_state); + + return em; + } +@@ -985,8 +985,7 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached, + end_folio_read(folio, true, cur, iosize); + break; + } +- em = __get_extent_map(inode, folio, cur, end - cur + 1, +- em_cached); ++ em = get_extent_map(BTRFS_I(inode), folio, cur, end - cur + 1, em_cached); + if (IS_ERR(em)) { + end_folio_read(folio, false, cur, end + 1 - cur); + return PTR_ERR(em); +-- +2.39.5 + diff --git a/queue-6.12/clocksource-use-migrate_disable-to-avoid-calling-get.patch b/queue-6.12/clocksource-use-migrate_disable-to-avoid-calling-get.patch new file mode 100644 index 0000000000..bab768d0f8 --- /dev/null +++ b/queue-6.12/clocksource-use-migrate_disable-to-avoid-calling-get.patch @@ -0,0 +1,82 @@ +From 260b25a3327f0749a6dde43fe624cd790fde8b01 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 31 Jan 2025 12:33:23 -0500 +Subject: clocksource: Use migrate_disable() to avoid calling get_random_u32() + in atomic context + +From: Waiman Long + +[ Upstream commit 6bb05a33337b2c842373857b63de5c9bf1ae2a09 ] + +The following bug report happened with a PREEMPT_RT kernel: + + BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 + in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2012, name: kwatchdog + preempt_count: 1, expected: 0 + RCU nest depth: 0, expected: 0 + get_random_u32+0x4f/0x110 + clocksource_verify_choose_cpus+0xab/0x1a0 + clocksource_verify_percpu.part.0+0x6b/0x330 + clocksource_watchdog_kthread+0x193/0x1a0 + +It is due to the fact that clocksource_verify_choose_cpus() is invoked with +preemption disabled. This function invokes get_random_u32() to obtain +random numbers for choosing CPUs. The batched_entropy_32 local lock and/or +the base_crng.lock spinlock in driver/char/random.c will be acquired during +the call. In PREEMPT_RT kernel, they are both sleeping locks and so cannot +be acquired in atomic context. + +Fix this problem by using migrate_disable() to allow smp_processor_id() to +be reliably used without introducing atomic context. preempt_disable() is +then called after clocksource_verify_choose_cpus() but before the +clocksource measurement is being run to avoid introducing unexpected +latency. + +Fixes: 7560c02bdffb ("clocksource: Check per-CPU clock synchronization when marked unstable") +Suggested-by: Sebastian Andrzej Siewior +Signed-off-by: Waiman Long +Signed-off-by: Thomas Gleixner +Reviewed-by: Paul E. McKenney +Reviewed-by: Sebastian Andrzej Siewior +Link: https://lore.kernel.org/all/20250131173323.891943-2-longman@redhat.com +Signed-off-by: Sasha Levin +--- + kernel/time/clocksource.c | 6 ++++-- + 1 file changed, 4 insertions(+), 2 deletions(-) + +diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c +index c4e6b5e6af88c..58fb7280cabbe 100644 +--- a/kernel/time/clocksource.c ++++ b/kernel/time/clocksource.c +@@ -365,10 +365,10 @@ void clocksource_verify_percpu(struct clocksource *cs) + cpumask_clear(&cpus_ahead); + cpumask_clear(&cpus_behind); + cpus_read_lock(); +- preempt_disable(); ++ migrate_disable(); + clocksource_verify_choose_cpus(); + if (cpumask_empty(&cpus_chosen)) { +- preempt_enable(); ++ migrate_enable(); + cpus_read_unlock(); + pr_warn("Not enough CPUs to check clocksource '%s'.\n", cs->name); + return; +@@ -376,6 +376,7 @@ void clocksource_verify_percpu(struct clocksource *cs) + testcpu = smp_processor_id(); + pr_info("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n", + cs->name, testcpu, cpumask_pr_args(&cpus_chosen)); ++ preempt_disable(); + for_each_cpu(cpu, &cpus_chosen) { + if (cpu == testcpu) + continue; +@@ -395,6 +396,7 @@ void clocksource_verify_percpu(struct clocksource *cs) + cs_nsec_min = cs_nsec; + } + preempt_enable(); ++ migrate_enable(); + cpus_read_unlock(); + if (!cpumask_empty(&cpus_ahead)) + pr_warn(" CPUs %*pbl ahead of CPU %d for clocksource %s.\n", +-- +2.39.5 + diff --git a/queue-6.12/clocksource-use-pr_info-for-checking-clocksource-syn.patch b/queue-6.12/clocksource-use-pr_info-for-checking-clocksource-syn.patch new file mode 100644 index 0000000000..c8e1a2b8d1 --- /dev/null +++ b/queue-6.12/clocksource-use-pr_info-for-checking-clocksource-syn.patch @@ -0,0 +1,45 @@ +From db18d29bc84d9ca827c9acfd807524ca58beb127 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 24 Jan 2025 20:54:41 -0500 +Subject: clocksource: Use pr_info() for "Checking clocksource synchronization" + message + +From: Waiman Long + +[ Upstream commit 1f566840a82982141f94086061927a90e79440e5 ] + +The "Checking clocksource synchronization" message is normally printed +when clocksource_verify_percpu() is called for a given clocksource if +both the CLOCK_SOURCE_UNSTABLE and CLOCK_SOURCE_VERIFY_PERCPU flags +are set. + +It is an informational message and so pr_info() is the correct choice. + +Signed-off-by: Waiman Long +Signed-off-by: Thomas Gleixner +Reviewed-by: Paul E. McKenney +Acked-by: John Stultz +Link: https://lore.kernel.org/all/20250125015442.3740588-1-longman@redhat.com +Stable-dep-of: 6bb05a33337b ("clocksource: Use migrate_disable() to avoid calling get_random_u32() in atomic context") +Signed-off-by: Sasha Levin +--- + kernel/time/clocksource.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c +index 8a40a616288b8..c4e6b5e6af88c 100644 +--- a/kernel/time/clocksource.c ++++ b/kernel/time/clocksource.c +@@ -374,7 +374,8 @@ void clocksource_verify_percpu(struct clocksource *cs) + return; + } + testcpu = smp_processor_id(); +- pr_warn("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n", cs->name, testcpu, cpumask_pr_args(&cpus_chosen)); ++ pr_info("Checking clocksource %s synchronization from CPU %d to CPUs %*pbl.\n", ++ cs->name, testcpu, cpumask_pr_args(&cpus_chosen)); + for_each_cpu(cpu, &cpus_chosen) { + if (cpu == testcpu) + continue; +-- +2.39.5 + diff --git a/queue-6.12/cpufreq-amd-pstate-align-offline-flow-of-shared-memo.patch b/queue-6.12/cpufreq-amd-pstate-align-offline-flow-of-shared-memo.patch new file mode 100644 index 0000000000..14bc28f93e --- /dev/null +++ b/queue-6.12/cpufreq-amd-pstate-align-offline-flow-of-shared-memo.patch @@ -0,0 +1,39 @@ +From 3b7e555c9f6f68ea4d1a91938ffe89fd9b9f37d2 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 23 Oct 2024 10:21:12 +0000 +Subject: cpufreq/amd-pstate: Align offline flow of shared memory and MSR based + systems + +From: Dhananjay Ugwekar + +[ Upstream commit a6960e6b1b0e2cb268f427a99040c408a8d10665 ] + +Set min_perf to lowest_perf for shared memory systems, similar to the MSR +based systems. + +Signed-off-by: Dhananjay Ugwekar +Reviewed-by: Mario Limonciello +Reviewed-by: Gautham R. Shenoy +Link: https://lore.kernel.org/r/20241023102108.5980-5-Dhananjay.Ugwekar@amd.com +Signed-off-by: Mario Limonciello +Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting") +Signed-off-by: Sasha Levin +--- + drivers/cpufreq/amd-pstate.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c +index 161334937090c..895d108428b40 100644 +--- a/drivers/cpufreq/amd-pstate.c ++++ b/drivers/cpufreq/amd-pstate.c +@@ -1636,6 +1636,7 @@ static void amd_pstate_epp_offline(struct cpufreq_policy *policy) + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value); + } else { + perf_ctrls.desired_perf = 0; ++ perf_ctrls.min_perf = min_perf; + perf_ctrls.max_perf = min_perf; + cppc_set_perf(cpudata->cpu, &perf_ctrls); + perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_BALANCE_POWERSAVE); +-- +2.39.5 + diff --git a/queue-6.12/cpufreq-amd-pstate-call-cppc_set_epp_perf-in-the-ree.patch b/queue-6.12/cpufreq-amd-pstate-call-cppc_set_epp_perf-in-the-ree.patch new file mode 100644 index 0000000000..e0defff421 --- /dev/null +++ b/queue-6.12/cpufreq-amd-pstate-call-cppc_set_epp_perf-in-the-ree.patch @@ -0,0 +1,53 @@ +From 4de4224d57ce3204800eb03d8d2036e2e3c9ccf0 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 23 Oct 2024 10:21:10 +0000 +Subject: cpufreq/amd-pstate: Call cppc_set_epp_perf in the reenable function + +From: Dhananjay Ugwekar + +[ Upstream commit 796ff50e127af8362035f87ba29b6b84e2dd9742 ] + +The EPP value being set in perf_ctrls.energy_perf is not being propagated +to the shared memory, fix that. + +Signed-off-by: Dhananjay Ugwekar +Reviewed-by: Mario Limonciello +Reviewed-by: Perry Yuan +Reviewed-by: Gautham R. Shenoy +Link: https://lore.kernel.org/r/20241023102108.5980-4-Dhananjay.Ugwekar@amd.com +Signed-off-by: Mario Limonciello +Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting") +Signed-off-by: Sasha Levin +--- + drivers/cpufreq/amd-pstate.c | 6 ++++-- + 1 file changed, 4 insertions(+), 2 deletions(-) + +diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c +index 91d3c3b1c2d3b..161334937090c 100644 +--- a/drivers/cpufreq/amd-pstate.c ++++ b/drivers/cpufreq/amd-pstate.c +@@ -1594,8 +1594,9 @@ static void amd_pstate_epp_reenable(struct amd_cpudata *cpudata) + wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value); + } else { + perf_ctrls.max_perf = max_perf; +- perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(cpudata->epp_cached); + cppc_set_perf(cpudata->cpu, &perf_ctrls); ++ perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(cpudata->epp_cached); ++ cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1); + } + } + +@@ -1636,8 +1637,9 @@ static void amd_pstate_epp_offline(struct cpufreq_policy *policy) + } else { + perf_ctrls.desired_perf = 0; + perf_ctrls.max_perf = min_perf; +- perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_BALANCE_POWERSAVE); + cppc_set_perf(cpudata->cpu, &perf_ctrls); ++ perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_BALANCE_POWERSAVE); ++ cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1); + } + mutex_unlock(&amd_pstate_limits_lock); + } +-- +2.39.5 + diff --git a/queue-6.12/cpufreq-amd-pstate-convert-mutex-use-to-guard.patch b/queue-6.12/cpufreq-amd-pstate-convert-mutex-use-to-guard.patch new file mode 100644 index 0000000000..e9413b075e --- /dev/null +++ b/queue-6.12/cpufreq-amd-pstate-convert-mutex-use-to-guard.patch @@ -0,0 +1,132 @@ +From ea16a7ceb11c3ba3d1179b32d4abce18d3272a18 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Mon, 9 Dec 2024 12:52:37 -0600 +Subject: cpufreq/amd-pstate: convert mutex use to guard() + +From: Mario Limonciello + +[ Upstream commit 6c093d5a5b73ec1caf1e706510ae6031af2f9d43 ] + +Using scoped guard declaration will unlock mutexes automatically. + +Reviewed-by: Gautham R. Shenoy +Link: https://lore.kernel.org/r/20241209185248.16301-5-mario.limonciello@amd.com +Signed-off-by: Mario Limonciello +Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting") +Signed-off-by: Sasha Levin +--- + drivers/cpufreq/amd-pstate.c | 32 ++++++++++++-------------------- + 1 file changed, 12 insertions(+), 20 deletions(-) + +diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c +index 145a48fc49034..33777f5ab7d16 100644 +--- a/drivers/cpufreq/amd-pstate.c ++++ b/drivers/cpufreq/amd-pstate.c +@@ -696,12 +696,12 @@ static int amd_pstate_set_boost(struct cpufreq_policy *policy, int state) + pr_err("Boost mode is not supported by this processor or SBIOS\n"); + return -EOPNOTSUPP; + } +- mutex_lock(&amd_pstate_driver_lock); ++ guard(mutex)(&amd_pstate_driver_lock); ++ + ret = amd_pstate_cpu_boost_update(policy, state); + WRITE_ONCE(cpudata->boost_state, !ret ? state : false); + policy->boost_enabled = !ret ? state : false; + refresh_frequency_limits(policy); +- mutex_unlock(&amd_pstate_driver_lock); + + return ret; + } +@@ -792,7 +792,8 @@ static void amd_pstate_update_limits(unsigned int cpu) + if (!amd_pstate_prefcore) + return; + +- mutex_lock(&amd_pstate_driver_lock); ++ guard(mutex)(&amd_pstate_driver_lock); ++ + ret = amd_get_highest_perf(cpu, &cur_high); + if (ret) + goto free_cpufreq_put; +@@ -812,7 +813,6 @@ static void amd_pstate_update_limits(unsigned int cpu) + if (!highest_perf_changed) + cpufreq_update_policy(cpu); + +- mutex_unlock(&amd_pstate_driver_lock); + } + + /* +@@ -1145,11 +1145,11 @@ static ssize_t store_energy_performance_preference( + if (ret < 0) + return -EINVAL; + +- mutex_lock(&amd_pstate_limits_lock); ++ guard(mutex)(&amd_pstate_limits_lock); ++ + ret = amd_pstate_set_energy_pref_index(cpudata, ret); +- mutex_unlock(&amd_pstate_limits_lock); + +- return ret ?: count; ++ return ret ? ret : count; + } + + static ssize_t show_energy_performance_preference( +@@ -1297,13 +1297,10 @@ EXPORT_SYMBOL_GPL(amd_pstate_update_status); + static ssize_t status_show(struct device *dev, + struct device_attribute *attr, char *buf) + { +- ssize_t ret; + +- mutex_lock(&amd_pstate_driver_lock); +- ret = amd_pstate_show_status(buf); +- mutex_unlock(&amd_pstate_driver_lock); ++ guard(mutex)(&amd_pstate_driver_lock); + +- return ret; ++ return amd_pstate_show_status(buf); + } + + static ssize_t status_store(struct device *a, struct device_attribute *b, +@@ -1312,9 +1309,8 @@ static ssize_t status_store(struct device *a, struct device_attribute *b, + char *p = memchr(buf, '\n', count); + int ret; + +- mutex_lock(&amd_pstate_driver_lock); ++ guard(mutex)(&amd_pstate_driver_lock); + ret = amd_pstate_update_status(buf, p ? p - buf : count); +- mutex_unlock(&amd_pstate_driver_lock); + + return ret < 0 ? ret : count; + } +@@ -1614,13 +1610,11 @@ static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy) + + min_perf = READ_ONCE(cpudata->lowest_perf); + +- mutex_lock(&amd_pstate_limits_lock); ++ guard(mutex)(&amd_pstate_limits_lock); + + amd_pstate_update_perf(cpudata, min_perf, 0, min_perf, false); + amd_pstate_set_epp(cpudata, AMD_CPPC_EPP_BALANCE_POWERSAVE); + +- mutex_unlock(&amd_pstate_limits_lock); +- + return 0; + } + +@@ -1656,13 +1650,11 @@ static int amd_pstate_epp_resume(struct cpufreq_policy *policy) + struct amd_cpudata *cpudata = policy->driver_data; + + if (cpudata->suspended) { +- mutex_lock(&amd_pstate_limits_lock); ++ guard(mutex)(&amd_pstate_limits_lock); + + /* enable amd pstate from suspend state*/ + amd_pstate_epp_reenable(cpudata); + +- mutex_unlock(&amd_pstate_limits_lock); +- + cpudata->suspended = false; + } + +-- +2.39.5 + diff --git a/queue-6.12/cpufreq-amd-pstate-fix-cpufreq_policy-ref-counting.patch b/queue-6.12/cpufreq-amd-pstate-fix-cpufreq_policy-ref-counting.patch new file mode 100644 index 0000000000..5839160f31 --- /dev/null +++ b/queue-6.12/cpufreq-amd-pstate-fix-cpufreq_policy-ref-counting.patch @@ -0,0 +1,55 @@ +From 71d244e6199e6d1092534fb1e982ffc84590ce97 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 5 Feb 2025 11:25:20 +0000 +Subject: cpufreq/amd-pstate: Fix cpufreq_policy ref counting + +From: Dhananjay Ugwekar + +[ Upstream commit 3ace20038e19f23fe73259513f1f08d4bf1a3c83 ] + +amd_pstate_update_limits() takes a cpufreq_policy reference but doesn't +decrement the refcount in one of the exit paths, fix that. + +Fixes: 45722e777fd9 ("cpufreq: amd-pstate: Optimize amd_pstate_update_limits()") +Signed-off-by: Dhananjay Ugwekar +Reviewed-by: Mario Limonciello +Link: https://lore.kernel.org/r/20250205112523.201101-10-dhananjay.ugwekar@amd.com +Signed-off-by: Mario Limonciello +Signed-off-by: Sasha Levin +--- + drivers/cpufreq/amd-pstate.c | 9 +++++---- + 1 file changed, 5 insertions(+), 4 deletions(-) + +diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c +index 33777f5ab7d16..bdfd8ffe04398 100644 +--- a/drivers/cpufreq/amd-pstate.c ++++ b/drivers/cpufreq/amd-pstate.c +@@ -778,20 +778,21 @@ static void amd_pstate_init_prefcore(struct amd_cpudata *cpudata) + + static void amd_pstate_update_limits(unsigned int cpu) + { +- struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); ++ struct cpufreq_policy *policy = NULL; + struct amd_cpudata *cpudata; + u32 prev_high = 0, cur_high = 0; + int ret; + bool highest_perf_changed = false; + ++ if (!amd_pstate_prefcore) ++ return; ++ ++ policy = cpufreq_cpu_get(cpu); + if (!policy) + return; + + cpudata = policy->driver_data; + +- if (!amd_pstate_prefcore) +- return; +- + guard(mutex)(&amd_pstate_driver_lock); + + ret = amd_get_highest_perf(cpu, &cur_high); +-- +2.39.5 + diff --git a/queue-6.12/cpufreq-amd-pstate-merge-amd_pstate_epp_cpu_offline-.patch b/queue-6.12/cpufreq-amd-pstate-merge-amd_pstate_epp_cpu_offline-.patch new file mode 100644 index 0000000000..52ad9fdac5 --- /dev/null +++ b/queue-6.12/cpufreq-amd-pstate-merge-amd_pstate_epp_cpu_offline-.patch @@ -0,0 +1,69 @@ +From 286c4edf998d184d941cc1f0b2cc45da6f805126 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 4 Dec 2024 14:48:42 +0000 +Subject: cpufreq/amd-pstate: Merge amd_pstate_epp_cpu_offline() and + amd_pstate_epp_offline() + +From: Dhananjay Ugwekar + +[ Upstream commit 53ec2101dfede8fecdd240662281a12e537c3411 ] + +amd_pstate_epp_offline() is only called from within +amd_pstate_epp_cpu_offline() and doesn't make much sense to have it at all. +Hence, remove it. + +Also remove the unncessary debug print in the offline path while at it. + +Signed-off-by: Dhananjay Ugwekar +Reviewed-by: Gautham R. Shenoy +Reviewed-by: Mario Limonciello +Link: https://lore.kernel.org/r/20241204144842.164178-6-Dhananjay.Ugwekar@amd.com +Signed-off-by: Mario Limonciello +Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting") +Signed-off-by: Sasha Levin +--- + drivers/cpufreq/amd-pstate.c | 17 ++++------------- + 1 file changed, 4 insertions(+), 13 deletions(-) + +diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c +index 4dfe5bdcb2932..145a48fc49034 100644 +--- a/drivers/cpufreq/amd-pstate.c ++++ b/drivers/cpufreq/amd-pstate.c +@@ -1604,11 +1604,14 @@ static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy) + return 0; + } + +-static void amd_pstate_epp_offline(struct cpufreq_policy *policy) ++static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy) + { + struct amd_cpudata *cpudata = policy->driver_data; + int min_perf; + ++ if (cpudata->suspended) ++ return 0; ++ + min_perf = READ_ONCE(cpudata->lowest_perf); + + mutex_lock(&amd_pstate_limits_lock); +@@ -1617,18 +1620,6 @@ static void amd_pstate_epp_offline(struct cpufreq_policy *policy) + amd_pstate_set_epp(cpudata, AMD_CPPC_EPP_BALANCE_POWERSAVE); + + mutex_unlock(&amd_pstate_limits_lock); +-} +- +-static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy) +-{ +- struct amd_cpudata *cpudata = policy->driver_data; +- +- pr_debug("AMD CPU Core %d going offline\n", cpudata->cpu); +- +- if (cpudata->suspended) +- return 0; +- +- amd_pstate_epp_offline(policy); + + return 0; + } +-- +2.39.5 + diff --git a/queue-6.12/cpufreq-amd-pstate-refactor-amd_pstate_epp_reenable-.patch b/queue-6.12/cpufreq-amd-pstate-refactor-amd_pstate_epp_reenable-.patch new file mode 100644 index 0000000000..9b31bf0aba --- /dev/null +++ b/queue-6.12/cpufreq-amd-pstate-refactor-amd_pstate_epp_reenable-.patch @@ -0,0 +1,97 @@ +From 915d2fe2dd57d8d19bda8de19196eadd04f0eef6 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 4 Dec 2024 14:48:40 +0000 +Subject: cpufreq/amd-pstate: Refactor amd_pstate_epp_reenable() and + amd_pstate_epp_offline() + +From: Dhananjay Ugwekar + +[ Upstream commit b1089e0c8817fda93d474eaa82ad86386887aefe ] + +Replace similar code chunks with amd_pstate_update_perf() and +amd_pstate_set_epp() function calls. + +Signed-off-by: Dhananjay Ugwekar +Reviewed-by: Mario Limonciello +Reviewed-by: Gautham R. Shenoy +Link: https://lore.kernel.org/r/20241204144842.164178-4-Dhananjay.Ugwekar@amd.com +[ML: Fix LKP reported error about unused variable] +Signed-off-by: Mario Limonciello +Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting") +Signed-off-by: Sasha Levin +--- + drivers/cpufreq/amd-pstate.c | 38 +++++++----------------------------- + 1 file changed, 7 insertions(+), 31 deletions(-) + +diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c +index 895d108428b40..19906141ef7fe 100644 +--- a/drivers/cpufreq/amd-pstate.c ++++ b/drivers/cpufreq/amd-pstate.c +@@ -1579,25 +1579,17 @@ static int amd_pstate_epp_set_policy(struct cpufreq_policy *policy) + + static void amd_pstate_epp_reenable(struct amd_cpudata *cpudata) + { +- struct cppc_perf_ctrls perf_ctrls; +- u64 value, max_perf; ++ u64 max_perf; + int ret; + + ret = amd_pstate_enable(true); + if (ret) + pr_err("failed to enable amd pstate during resume, return %d\n", ret); + +- value = READ_ONCE(cpudata->cppc_req_cached); + max_perf = READ_ONCE(cpudata->highest_perf); + +- if (cpu_feature_enabled(X86_FEATURE_CPPC)) { +- wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value); +- } else { +- perf_ctrls.max_perf = max_perf; +- cppc_set_perf(cpudata->cpu, &perf_ctrls); +- perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(cpudata->epp_cached); +- cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1); +- } ++ amd_pstate_update_perf(cpudata, 0, 0, max_perf, false); ++ amd_pstate_set_epp(cpudata, cpudata->epp_cached); + } + + static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy) +@@ -1617,31 +1609,15 @@ static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy) + static void amd_pstate_epp_offline(struct cpufreq_policy *policy) + { + struct amd_cpudata *cpudata = policy->driver_data; +- struct cppc_perf_ctrls perf_ctrls; + int min_perf; +- u64 value; + + min_perf = READ_ONCE(cpudata->lowest_perf); +- value = READ_ONCE(cpudata->cppc_req_cached); + + mutex_lock(&amd_pstate_limits_lock); +- if (cpu_feature_enabled(X86_FEATURE_CPPC)) { +- cpudata->epp_policy = CPUFREQ_POLICY_UNKNOWN; +- +- /* Set max perf same as min perf */ +- value &= ~AMD_CPPC_MAX_PERF(~0L); +- value |= AMD_CPPC_MAX_PERF(min_perf); +- value &= ~AMD_CPPC_MIN_PERF(~0L); +- value |= AMD_CPPC_MIN_PERF(min_perf); +- wrmsrl_on_cpu(cpudata->cpu, MSR_AMD_CPPC_REQ, value); +- } else { +- perf_ctrls.desired_perf = 0; +- perf_ctrls.min_perf = min_perf; +- perf_ctrls.max_perf = min_perf; +- cppc_set_perf(cpudata->cpu, &perf_ctrls); +- perf_ctrls.energy_perf = AMD_CPPC_ENERGY_PERF_PREF(HWP_EPP_BALANCE_POWERSAVE); +- cppc_set_epp_perf(cpudata->cpu, &perf_ctrls, 1); +- } ++ ++ amd_pstate_update_perf(cpudata, min_perf, 0, min_perf, false); ++ amd_pstate_set_epp(cpudata, AMD_CPPC_EPP_BALANCE_POWERSAVE); ++ + mutex_unlock(&amd_pstate_limits_lock); + } + +-- +2.39.5 + diff --git a/queue-6.12/cpufreq-amd-pstate-remove-the-cppc_state-check-in-of.patch b/queue-6.12/cpufreq-amd-pstate-remove-the-cppc_state-check-in-of.patch new file mode 100644 index 0000000000..f5e27cc5b8 --- /dev/null +++ b/queue-6.12/cpufreq-amd-pstate-remove-the-cppc_state-check-in-of.patch @@ -0,0 +1,56 @@ +From 07686cd20c1fa7f6b54a47e099382aae311067c0 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 4 Dec 2024 14:48:41 +0000 +Subject: cpufreq/amd-pstate: Remove the cppc_state check in offline/online + functions + +From: Dhananjay Ugwekar + +[ Upstream commit b78f8c87ec3e7499bb049986838636d3afbc7ece ] + +Only amd_pstate_epp driver (i.e. cppc_state = ACTIVE) enters the +amd_pstate_epp_offline() and amd_pstate_epp_cpu_online() functions, +so remove the unnecessary if condition checking if cppc_state is +equal to AMD_PSTATE_ACTIVE. + +Signed-off-by: Dhananjay Ugwekar +Reviewed-by: Mario Limonciello +Reviewed-by: Gautham R. Shenoy +Link: https://lore.kernel.org/r/20241204144842.164178-5-Dhananjay.Ugwekar@amd.com +Signed-off-by: Mario Limonciello +Stable-dep-of: 3ace20038e19 ("cpufreq/amd-pstate: Fix cpufreq_policy ref counting") +Signed-off-by: Sasha Levin +--- + drivers/cpufreq/amd-pstate.c | 9 +++------ + 1 file changed, 3 insertions(+), 6 deletions(-) + +diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c +index 19906141ef7fe..4dfe5bdcb2932 100644 +--- a/drivers/cpufreq/amd-pstate.c ++++ b/drivers/cpufreq/amd-pstate.c +@@ -1598,10 +1598,8 @@ static int amd_pstate_epp_cpu_online(struct cpufreq_policy *policy) + + pr_debug("AMD CPU Core %d going online\n", cpudata->cpu); + +- if (cppc_state == AMD_PSTATE_ACTIVE) { +- amd_pstate_epp_reenable(cpudata); +- cpudata->suspended = false; +- } ++ amd_pstate_epp_reenable(cpudata); ++ cpudata->suspended = false; + + return 0; + } +@@ -1630,8 +1628,7 @@ static int amd_pstate_epp_cpu_offline(struct cpufreq_policy *policy) + if (cpudata->suspended) + return 0; + +- if (cppc_state == AMD_PSTATE_ACTIVE) +- amd_pstate_epp_offline(policy); ++ amd_pstate_epp_offline(policy); + + return 0; + } +-- +2.39.5 + diff --git a/queue-6.12/flow_dissector-use-rcu-protection-to-fetch-dev_net.patch b/queue-6.12/flow_dissector-use-rcu-protection-to-fetch-dev_net.patch new file mode 100644 index 0000000000..bc7a99a493 --- /dev/null +++ b/queue-6.12/flow_dissector-use-rcu-protection-to-fetch-dev_net.patch @@ -0,0 +1,81 @@ +From 081a12d3c0b2ff44f3d44ba992b88a9e99d725cf Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 5 Feb 2025 15:51:17 +0000 +Subject: flow_dissector: use RCU protection to fetch dev_net() + +From: Eric Dumazet + +[ Upstream commit afec62cd0a4191cde6dd3a75382be4d51a38ce9b ] + +__skb_flow_dissect() can be called from arbitrary contexts. + +It must extend its RCU protection section to include +the call to dev_net(), which can become dev_net_rcu(). + +This makes sure the net structure can not disappear under us. + +Fixes: 9b52e3f267a6 ("flow_dissector: handle no-skb use case") +Signed-off-by: Eric Dumazet +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20250205155120.1676781-10-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/core/flow_dissector.c | 21 +++++++++++---------- + 1 file changed, 11 insertions(+), 10 deletions(-) + +diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c +index 0e638a37aa096..5db41bf2ed93e 100644 +--- a/net/core/flow_dissector.c ++++ b/net/core/flow_dissector.c +@@ -1108,10 +1108,12 @@ bool __skb_flow_dissect(const struct net *net, + FLOW_DISSECTOR_KEY_BASIC, + target_container); + ++ rcu_read_lock(); ++ + if (skb) { + if (!net) { + if (skb->dev) +- net = dev_net(skb->dev); ++ net = dev_net_rcu(skb->dev); + else if (skb->sk) + net = sock_net(skb->sk); + } +@@ -1122,7 +1124,6 @@ bool __skb_flow_dissect(const struct net *net, + enum netns_bpf_attach_type type = NETNS_BPF_FLOW_DISSECTOR; + struct bpf_prog_array *run_array; + +- rcu_read_lock(); + run_array = rcu_dereference(init_net.bpf.run_array[type]); + if (!run_array) + run_array = rcu_dereference(net->bpf.run_array[type]); +@@ -1150,17 +1151,17 @@ bool __skb_flow_dissect(const struct net *net, + prog = READ_ONCE(run_array->items[0].prog); + result = bpf_flow_dissect(prog, &ctx, n_proto, nhoff, + hlen, flags); +- if (result == BPF_FLOW_DISSECTOR_CONTINUE) +- goto dissect_continue; +- __skb_flow_bpf_to_target(&flow_keys, flow_dissector, +- target_container); +- rcu_read_unlock(); +- return result == BPF_OK; ++ if (result != BPF_FLOW_DISSECTOR_CONTINUE) { ++ __skb_flow_bpf_to_target(&flow_keys, flow_dissector, ++ target_container); ++ rcu_read_unlock(); ++ return result == BPF_OK; ++ } + } +-dissect_continue: +- rcu_read_unlock(); + } + ++ rcu_read_unlock(); ++ + if (dissector_uses_key(flow_dissector, + FLOW_DISSECTOR_KEY_ETH_ADDRS)) { + struct ethhdr *eth = eth_hdr(skb); +-- +2.39.5 + diff --git a/queue-6.12/hid-hid-steam-make-sure-rumble-work-is-canceled-on-r.patch b/queue-6.12/hid-hid-steam-make-sure-rumble-work-is-canceled-on-r.patch new file mode 100644 index 0000000000..fddf8770e8 --- /dev/null +++ b/queue-6.12/hid-hid-steam-make-sure-rumble-work-is-canceled-on-r.patch @@ -0,0 +1,38 @@ +From 185b5f815034c6e426816260490fa8b26c340ad5 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 25 Dec 2024 18:34:24 -0800 +Subject: HID: hid-steam: Make sure rumble work is canceled on removal + +From: Vicki Pfau + +[ Upstream commit cc4f952427aaa44ecfd92542e10a65cce67bd6f4 ] + +When a force feedback command is sent from userspace, work is scheduled to pass +this data to the controller without blocking userspace itself. However, in +theory, this work might not be properly canceled if the controller is removed +at the exact right time. This patch ensures the work is properly canceled when +the device is removed. + +Signed-off-by: Vicki Pfau +Signed-off-by: Jiri Kosina +Stable-dep-of: 79504249d7e2 ("HID: hid-steam: Move hidraw input (un)registering to work") +Signed-off-by: Sasha Levin +--- + drivers/hid/hid-steam.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/drivers/hid/hid-steam.c b/drivers/hid/hid-steam.c +index 9b6aec0733ae6..daca250e51c8b 100644 +--- a/drivers/hid/hid-steam.c ++++ b/drivers/hid/hid-steam.c +@@ -1306,6 +1306,7 @@ static void steam_remove(struct hid_device *hdev) + + cancel_delayed_work_sync(&steam->mode_switch); + cancel_work_sync(&steam->work_connect); ++ cancel_work_sync(&steam->rumble_work); + hid_destroy_device(steam->client_hdev); + steam->client_hdev = NULL; + steam->client_opened = 0; +-- +2.39.5 + diff --git a/queue-6.12/hid-hid-steam-move-hidraw-input-un-registering-to-wo.patch b/queue-6.12/hid-hid-steam-move-hidraw-input-un-registering-to-wo.patch new file mode 100644 index 0000000000..7afbe3813d --- /dev/null +++ b/queue-6.12/hid-hid-steam-move-hidraw-input-un-registering-to-wo.patch @@ -0,0 +1,117 @@ +From 5bc8accc48c65d965f70c8918cd314ddb1ac3ece Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 4 Feb 2025 19:55:27 -0800 +Subject: HID: hid-steam: Move hidraw input (un)registering to work + +From: Vicki Pfau + +[ Upstream commit 79504249d7e27cad4a3eeb9afc6386e418728ce0 ] + +Due to an interplay between locking in the input and hid transport subsystems, +attempting to register or deregister the relevant input devices during the +hidraw open/close events can lead to a lock ordering issue. Though this +shouldn't cause a deadlock, this commit moves the input device manipulation to +deferred work to sidestep the issue. + +Fixes: 385a4886778f6 ("HID: steam: remove input device when a hid client is running.") +Signed-off-by: Vicki Pfau +Signed-off-by: Jiri Kosina +Signed-off-by: Sasha Levin +--- + drivers/hid/hid-steam.c | 38 +++++++++++++++++++++++++++++++------- + 1 file changed, 31 insertions(+), 7 deletions(-) + +diff --git a/drivers/hid/hid-steam.c b/drivers/hid/hid-steam.c +index daca250e51c8b..7b35966898785 100644 +--- a/drivers/hid/hid-steam.c ++++ b/drivers/hid/hid-steam.c +@@ -313,6 +313,7 @@ struct steam_device { + u16 rumble_left; + u16 rumble_right; + unsigned int sensor_timestamp_us; ++ struct work_struct unregister_work; + }; + + static int steam_recv_report(struct steam_device *steam, +@@ -1072,6 +1073,31 @@ static void steam_mode_switch_cb(struct work_struct *work) + } + } + ++static void steam_work_unregister_cb(struct work_struct *work) ++{ ++ struct steam_device *steam = container_of(work, struct steam_device, ++ unregister_work); ++ unsigned long flags; ++ bool connected; ++ bool opened; ++ ++ spin_lock_irqsave(&steam->lock, flags); ++ opened = steam->client_opened; ++ connected = steam->connected; ++ spin_unlock_irqrestore(&steam->lock, flags); ++ ++ if (connected) { ++ if (opened) { ++ steam_sensors_unregister(steam); ++ steam_input_unregister(steam); ++ } else { ++ steam_set_lizard_mode(steam, lizard_mode); ++ steam_input_register(steam); ++ steam_sensors_register(steam); ++ } ++ } ++} ++ + static bool steam_is_valve_interface(struct hid_device *hdev) + { + struct hid_report_enum *rep_enum; +@@ -1117,8 +1143,7 @@ static int steam_client_ll_open(struct hid_device *hdev) + steam->client_opened++; + spin_unlock_irqrestore(&steam->lock, flags); + +- steam_sensors_unregister(steam); +- steam_input_unregister(steam); ++ schedule_work(&steam->unregister_work); + + return 0; + } +@@ -1135,11 +1160,7 @@ static void steam_client_ll_close(struct hid_device *hdev) + connected = steam->connected && !steam->client_opened; + spin_unlock_irqrestore(&steam->lock, flags); + +- if (connected) { +- steam_set_lizard_mode(steam, lizard_mode); +- steam_input_register(steam); +- steam_sensors_register(steam); +- } ++ schedule_work(&steam->unregister_work); + } + + static int steam_client_ll_raw_request(struct hid_device *hdev, +@@ -1231,6 +1252,7 @@ static int steam_probe(struct hid_device *hdev, + INIT_LIST_HEAD(&steam->list); + INIT_WORK(&steam->rumble_work, steam_haptic_rumble_cb); + steam->sensor_timestamp_us = 0; ++ INIT_WORK(&steam->unregister_work, steam_work_unregister_cb); + + /* + * With the real steam controller interface, do not connect hidraw. +@@ -1291,6 +1313,7 @@ static int steam_probe(struct hid_device *hdev, + cancel_work_sync(&steam->work_connect); + cancel_delayed_work_sync(&steam->mode_switch); + cancel_work_sync(&steam->rumble_work); ++ cancel_work_sync(&steam->unregister_work); + + return ret; + } +@@ -1307,6 +1330,7 @@ static void steam_remove(struct hid_device *hdev) + cancel_delayed_work_sync(&steam->mode_switch); + cancel_work_sync(&steam->work_connect); + cancel_work_sync(&steam->rumble_work); ++ cancel_work_sync(&steam->unregister_work); + hid_destroy_device(steam->client_hdev); + steam->client_hdev = NULL; + steam->client_opened = 0; +-- +2.39.5 + diff --git a/queue-6.12/include-net-add-static-inline-dst_dev_overhead-to-ds.patch b/queue-6.12/include-net-add-static-inline-dst_dev_overhead-to-ds.patch new file mode 100644 index 0000000000..866cfbf367 --- /dev/null +++ b/queue-6.12/include-net-add-static-inline-dst_dev_overhead-to-ds.patch @@ -0,0 +1,49 @@ +From 94442da5bb6720d911cd4dab449edd2d200b347c Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 3 Dec 2024 13:49:42 +0100 +Subject: include: net: add static inline dst_dev_overhead() to dst.h + +From: Justin Iurman + +[ Upstream commit 0600cf40e9b36fe17f9c9f04d4f9cef249eaa5e7 ] + +Add static inline dst_dev_overhead() function to include/net/dst.h. This +helper function is used by ioam6_iptunnel, rpl_iptunnel and +seg6_iptunnel to get the dev's overhead based on a cache entry +(dst_entry). If the cache is empty, the default and generic value +skb->mac_len is returned. Otherwise, LL_RESERVED_SPACE() over dst's dev +is returned. + +Signed-off-by: Justin Iurman +Cc: Alexander Lobakin +Cc: Vadim Fedorenko +Signed-off-by: Paolo Abeni +Stable-dep-of: 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels") +Signed-off-by: Sasha Levin +--- + include/net/dst.h | 9 +++++++++ + 1 file changed, 9 insertions(+) + +diff --git a/include/net/dst.h b/include/net/dst.h +index 0f303cc602520..08647c99d79c9 100644 +--- a/include/net/dst.h ++++ b/include/net/dst.h +@@ -440,6 +440,15 @@ static inline void dst_set_expires(struct dst_entry *dst, int timeout) + dst->expires = expires; + } + ++static inline unsigned int dst_dev_overhead(struct dst_entry *dst, ++ struct sk_buff *skb) ++{ ++ if (likely(dst)) ++ return LL_RESERVED_SPACE(dst->dev); ++ ++ return skb->mac_len; ++} ++ + INDIRECT_CALLABLE_DECLARE(int ip6_output(struct net *, struct sock *, + struct sk_buff *)); + INDIRECT_CALLABLE_DECLARE(int ip_output(struct net *, struct sock *, +-- +2.39.5 + diff --git a/queue-6.12/ipv4-add-rcu-protection-to-ip4_dst_hoplimit.patch b/queue-6.12/ipv4-add-rcu-protection-to-ip4_dst_hoplimit.patch new file mode 100644 index 0000000000..354c7005f7 --- /dev/null +++ b/queue-6.12/ipv4-add-rcu-protection-to-ip4_dst_hoplimit.patch @@ -0,0 +1,47 @@ +From cefa529ae75091cdf02550357f5616f1debc3840 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 5 Feb 2025 15:51:10 +0000 +Subject: ipv4: add RCU protection to ip4_dst_hoplimit() + +From: Eric Dumazet + +[ Upstream commit 469308552ca4560176cfc100e7ca84add1bebd7c ] + +ip4_dst_hoplimit() must use RCU protection to make +sure the net structure it reads does not disappear. + +Fixes: fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob") +Signed-off-by: Eric Dumazet +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20250205155120.1676781-3-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + include/net/route.h | 9 +++++++-- + 1 file changed, 7 insertions(+), 2 deletions(-) + +diff --git a/include/net/route.h b/include/net/route.h +index 1789f1e6640b4..da34b6fa9862d 100644 +--- a/include/net/route.h ++++ b/include/net/route.h +@@ -363,10 +363,15 @@ static inline int inet_iif(const struct sk_buff *skb) + static inline int ip4_dst_hoplimit(const struct dst_entry *dst) + { + int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT); +- struct net *net = dev_net(dst->dev); + +- if (hoplimit == 0) ++ if (hoplimit == 0) { ++ const struct net *net; ++ ++ rcu_read_lock(); ++ net = dev_net_rcu(dst->dev); + hoplimit = READ_ONCE(net->ipv4.sysctl_ip_default_ttl); ++ rcu_read_unlock(); ++ } + return hoplimit; + } + +-- +2.39.5 + diff --git a/queue-6.12/ipv4-icmp-convert-to-dev_net_rcu.patch b/queue-6.12/ipv4-icmp-convert-to-dev_net_rcu.patch new file mode 100644 index 0000000000..9af481d7a4 --- /dev/null +++ b/queue-6.12/ipv4-icmp-convert-to-dev_net_rcu.patch @@ -0,0 +1,150 @@ +From 989271e21e0151d868a02c4abfffcae6521ba34e Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 5 Feb 2025 15:51:16 +0000 +Subject: ipv4: icmp: convert to dev_net_rcu() + +From: Eric Dumazet + +[ Upstream commit 4b8474a0951e605d2a27a2c483da4eb4b8c63760 ] + +__icmp_send() must ensure rcu_read_lock() is held, as spotted +by Jakub. + +Other ICMP uses of dev_net() seem safe, change them to dev_net_rcu() +to get LOCKDEP support. + +Fixes: dde1bc0e6f86 ("[NETNS]: Add namespace for ICMP replying code.") +Closes: https://lore.kernel.org/netdev/20250203153633.46ce0337@kernel.org/ +Reported-by: Jakub Kicinski +Signed-off-by: Eric Dumazet +Link: https://patch.msgid.link/20250205155120.1676781-9-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/ipv4/icmp.c | 31 +++++++++++++++++-------------- + 1 file changed, 17 insertions(+), 14 deletions(-) + +diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c +index 932bd775fc268..f45bc187a92a7 100644 +--- a/net/ipv4/icmp.c ++++ b/net/ipv4/icmp.c +@@ -399,10 +399,10 @@ static void icmp_push_reply(struct sock *sk, + + static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) + { +- struct ipcm_cookie ipc; + struct rtable *rt = skb_rtable(skb); +- struct net *net = dev_net(rt->dst.dev); ++ struct net *net = dev_net_rcu(rt->dst.dev); + bool apply_ratelimit = false; ++ struct ipcm_cookie ipc; + struct flowi4 fl4; + struct sock *sk; + struct inet_sock *inet; +@@ -610,12 +610,14 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info, + struct sock *sk; + + if (!rt) +- goto out; ++ return; ++ ++ rcu_read_lock(); + + if (rt->dst.dev) +- net = dev_net(rt->dst.dev); ++ net = dev_net_rcu(rt->dst.dev); + else if (skb_in->dev) +- net = dev_net(skb_in->dev); ++ net = dev_net_rcu(skb_in->dev); + else + goto out; + +@@ -786,7 +788,8 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info, + icmp_xmit_unlock(sk); + out_bh_enable: + local_bh_enable(); +-out:; ++out: ++ rcu_read_unlock(); + } + EXPORT_SYMBOL(__icmp_send); + +@@ -835,7 +838,7 @@ static void icmp_socket_deliver(struct sk_buff *skb, u32 info) + * avoid additional coding at protocol handlers. + */ + if (!pskb_may_pull(skb, iph->ihl * 4 + 8)) { +- __ICMP_INC_STATS(dev_net(skb->dev), ICMP_MIB_INERRORS); ++ __ICMP_INC_STATS(dev_net_rcu(skb->dev), ICMP_MIB_INERRORS); + return; + } + +@@ -869,7 +872,7 @@ static enum skb_drop_reason icmp_unreach(struct sk_buff *skb) + struct net *net; + u32 info = 0; + +- net = dev_net(skb_dst(skb)->dev); ++ net = dev_net_rcu(skb_dst(skb)->dev); + + /* + * Incomplete header ? +@@ -980,7 +983,7 @@ static enum skb_drop_reason icmp_unreach(struct sk_buff *skb) + static enum skb_drop_reason icmp_redirect(struct sk_buff *skb) + { + if (skb->len < sizeof(struct iphdr)) { +- __ICMP_INC_STATS(dev_net(skb->dev), ICMP_MIB_INERRORS); ++ __ICMP_INC_STATS(dev_net_rcu(skb->dev), ICMP_MIB_INERRORS); + return SKB_DROP_REASON_PKT_TOO_SMALL; + } + +@@ -1012,7 +1015,7 @@ static enum skb_drop_reason icmp_echo(struct sk_buff *skb) + struct icmp_bxm icmp_param; + struct net *net; + +- net = dev_net(skb_dst(skb)->dev); ++ net = dev_net_rcu(skb_dst(skb)->dev); + /* should there be an ICMP stat for ignored echos? */ + if (READ_ONCE(net->ipv4.sysctl_icmp_echo_ignore_all)) + return SKB_NOT_DROPPED_YET; +@@ -1041,9 +1044,9 @@ static enum skb_drop_reason icmp_echo(struct sk_buff *skb) + + bool icmp_build_probe(struct sk_buff *skb, struct icmphdr *icmphdr) + { ++ struct net *net = dev_net_rcu(skb->dev); + struct icmp_ext_hdr *ext_hdr, _ext_hdr; + struct icmp_ext_echo_iio *iio, _iio; +- struct net *net = dev_net(skb->dev); + struct inet6_dev *in6_dev; + struct in_device *in_dev; + struct net_device *dev; +@@ -1182,7 +1185,7 @@ static enum skb_drop_reason icmp_timestamp(struct sk_buff *skb) + return SKB_NOT_DROPPED_YET; + + out_err: +- __ICMP_INC_STATS(dev_net(skb_dst(skb)->dev), ICMP_MIB_INERRORS); ++ __ICMP_INC_STATS(dev_net_rcu(skb_dst(skb)->dev), ICMP_MIB_INERRORS); + return SKB_DROP_REASON_PKT_TOO_SMALL; + } + +@@ -1199,7 +1202,7 @@ int icmp_rcv(struct sk_buff *skb) + { + enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED; + struct rtable *rt = skb_rtable(skb); +- struct net *net = dev_net(rt->dst.dev); ++ struct net *net = dev_net_rcu(rt->dst.dev); + struct icmphdr *icmph; + + if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) { +@@ -1372,9 +1375,9 @@ int icmp_err(struct sk_buff *skb, u32 info) + struct iphdr *iph = (struct iphdr *)skb->data; + int offset = iph->ihl<<2; + struct icmphdr *icmph = (struct icmphdr *)(skb->data + offset); ++ struct net *net = dev_net_rcu(skb->dev); + int type = icmp_hdr(skb)->type; + int code = icmp_hdr(skb)->code; +- struct net *net = dev_net(skb->dev); + + /* + * Use ping_err to handle all icmp errors except those +-- +2.39.5 + diff --git a/queue-6.12/ipv4-use-rcu-protection-in-__ip_rt_update_pmtu.patch b/queue-6.12/ipv4-use-rcu-protection-in-__ip_rt_update_pmtu.patch new file mode 100644 index 0000000000..c4f7463ecc --- /dev/null +++ b/queue-6.12/ipv4-use-rcu-protection-in-__ip_rt_update_pmtu.patch @@ -0,0 +1,77 @@ +From 6df497a23d547d01649610b3005e4624bb5fd212 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 5 Feb 2025 15:51:15 +0000 +Subject: ipv4: use RCU protection in __ip_rt_update_pmtu() + +From: Eric Dumazet + +[ Upstream commit 139512191bd06f1b496117c76372b2ce372c9a41 ] + +__ip_rt_update_pmtu() must use RCU protection to make +sure the net structure it reads does not disappear. + +Fixes: 2fbc6e89b2f1 ("ipv4: Update exception handling for multipath routes via same device") +Fixes: 1de6b15a434c ("Namespaceify min_pmtu sysctl") +Signed-off-by: Eric Dumazet +Link: https://patch.msgid.link/20250205155120.1676781-8-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/ipv4/route.c | 11 ++++++----- + 1 file changed, 6 insertions(+), 5 deletions(-) + +diff --git a/net/ipv4/route.c b/net/ipv4/route.c +index f707cdb26ff20..41b320f0c20eb 100644 +--- a/net/ipv4/route.c ++++ b/net/ipv4/route.c +@@ -1008,9 +1008,9 @@ out: kfree_skb_reason(skb, reason); + static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu) + { + struct dst_entry *dst = &rt->dst; +- struct net *net = dev_net(dst->dev); + struct fib_result res; + bool lock = false; ++ struct net *net; + u32 old_mtu; + + if (ip_mtu_locked(dst)) +@@ -1020,6 +1020,8 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu) + if (old_mtu < mtu) + return; + ++ rcu_read_lock(); ++ net = dev_net_rcu(dst->dev); + if (mtu < net->ipv4.ip_rt_min_pmtu) { + lock = true; + mtu = min(old_mtu, net->ipv4.ip_rt_min_pmtu); +@@ -1027,9 +1029,8 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu) + + if (rt->rt_pmtu == mtu && !lock && + time_before(jiffies, dst->expires - net->ipv4.ip_rt_mtu_expires / 2)) +- return; ++ goto out; + +- rcu_read_lock(); + if (fib_lookup(net, fl4, &res, 0) == 0) { + struct fib_nh_common *nhc; + +@@ -1043,14 +1044,14 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu) + update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock, + jiffies + net->ipv4.ip_rt_mtu_expires); + } +- rcu_read_unlock(); +- return; ++ goto out; + } + #endif /* CONFIG_IP_ROUTE_MULTIPATH */ + nhc = FIB_RES_NHC(res); + update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock, + jiffies + net->ipv4.ip_rt_mtu_expires); + } ++out: + rcu_read_unlock(); + } + +-- +2.39.5 + diff --git a/queue-6.12/ipv4-use-rcu-protection-in-inet_select_addr.patch b/queue-6.12/ipv4-use-rcu-protection-in-inet_select_addr.patch new file mode 100644 index 0000000000..c0e5102db3 --- /dev/null +++ b/queue-6.12/ipv4-use-rcu-protection-in-inet_select_addr.patch @@ -0,0 +1,41 @@ +From 2ec2ebabf622d94b9b1318952c34696e2bef9e99 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 5 Feb 2025 15:51:14 +0000 +Subject: ipv4: use RCU protection in inet_select_addr() + +From: Eric Dumazet + +[ Upstream commit 719817cd293e4fa389e1f69c396f3f816ed5aa41 ] + +inet_select_addr() must use RCU protection to make +sure the net structure it reads does not disappear. + +Fixes: c4544c724322 ("[NETNS]: Process inet_select_addr inside a namespace.") +Signed-off-by: Eric Dumazet +Link: https://patch.msgid.link/20250205155120.1676781-7-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/ipv4/devinet.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c +index 7cf5f7d0d0de2..a55e95046984d 100644 +--- a/net/ipv4/devinet.c ++++ b/net/ipv4/devinet.c +@@ -1351,10 +1351,11 @@ __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope) + __be32 addr = 0; + unsigned char localnet_scope = RT_SCOPE_HOST; + struct in_device *in_dev; +- struct net *net = dev_net(dev); ++ struct net *net; + int master_idx; + + rcu_read_lock(); ++ net = dev_net_rcu(dev); + in_dev = __in_dev_get_rcu(dev); + if (!in_dev) + goto no_in_dev; +-- +2.39.5 + diff --git a/queue-6.12/ipv4-use-rcu-protection-in-ip_dst_mtu_maybe_forward.patch b/queue-6.12/ipv4-use-rcu-protection-in-ip_dst_mtu_maybe_forward.patch new file mode 100644 index 0000000000..8b04468416 --- /dev/null +++ b/queue-6.12/ipv4-use-rcu-protection-in-ip_dst_mtu_maybe_forward.patch @@ -0,0 +1,57 @@ +From c95dbd419be896fa93fc9f5f6aca15b4bc2855f3 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 5 Feb 2025 15:51:11 +0000 +Subject: ipv4: use RCU protection in ip_dst_mtu_maybe_forward() + +From: Eric Dumazet + +[ Upstream commit 071d8012869b6af352acca346ade13e7be90a49f ] + +ip_dst_mtu_maybe_forward() must use RCU protection to make +sure the net structure it reads does not disappear. + +Fixes: f87c10a8aa1e8 ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing") +Signed-off-by: Eric Dumazet +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20250205155120.1676781-4-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + include/net/ip.h | 13 ++++++++++--- + 1 file changed, 10 insertions(+), 3 deletions(-) + +diff --git a/include/net/ip.h b/include/net/ip.h +index d92d3bc3ec0e2..fe4f854381143 100644 +--- a/include/net/ip.h ++++ b/include/net/ip.h +@@ -465,9 +465,12 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst, + bool forwarding) + { + const struct rtable *rt = dst_rtable(dst); +- struct net *net = dev_net(dst->dev); +- unsigned int mtu; ++ unsigned int mtu, res; ++ struct net *net; ++ ++ rcu_read_lock(); + ++ net = dev_net_rcu(dst->dev); + if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) || + ip_mtu_locked(dst) || + !forwarding) { +@@ -491,7 +494,11 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst, + out: + mtu = min_t(unsigned int, mtu, IP_MAX_MTU); + +- return mtu - lwtunnel_headroom(dst->lwtstate, mtu); ++ res = mtu - lwtunnel_headroom(dst->lwtstate, mtu); ++ ++ rcu_read_unlock(); ++ ++ return res; + } + + static inline unsigned int ip_skb_dst_mtu(struct sock *sk, +-- +2.39.5 + diff --git a/queue-6.12/ipv4-use-rcu-protection-in-ipv4_default_advmss.patch b/queue-6.12/ipv4-use-rcu-protection-in-ipv4_default_advmss.patch new file mode 100644 index 0000000000..f39154d525 --- /dev/null +++ b/queue-6.12/ipv4-use-rcu-protection-in-ipv4_default_advmss.patch @@ -0,0 +1,48 @@ +From 0d06ebdce5e1322d2c3790fdac2d3f10f8a4ae88 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 5 Feb 2025 15:51:12 +0000 +Subject: ipv4: use RCU protection in ipv4_default_advmss() + +From: Eric Dumazet + +[ Upstream commit 71b8471c93fa0bcab911fcb65da1eb6c4f5f735f ] + +ipv4_default_advmss() must use RCU protection to make +sure the net structure it reads does not disappear. + +Fixes: 2e9589ff809e ("ipv4: Namespaceify min_adv_mss sysctl knob") +Signed-off-by: Eric Dumazet +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20250205155120.1676781-5-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/ipv4/route.c | 11 ++++++++--- + 1 file changed, 8 insertions(+), 3 deletions(-) + +diff --git a/net/ipv4/route.c b/net/ipv4/route.c +index 2a27913588d05..9709ec3e2dce6 100644 +--- a/net/ipv4/route.c ++++ b/net/ipv4/route.c +@@ -1294,10 +1294,15 @@ static void set_class_tag(struct rtable *rt, u32 tag) + + static unsigned int ipv4_default_advmss(const struct dst_entry *dst) + { +- struct net *net = dev_net(dst->dev); + unsigned int header_size = sizeof(struct tcphdr) + sizeof(struct iphdr); +- unsigned int advmss = max_t(unsigned int, ipv4_mtu(dst) - header_size, +- net->ipv4.ip_rt_min_advmss); ++ unsigned int advmss; ++ struct net *net; ++ ++ rcu_read_lock(); ++ net = dev_net_rcu(dst->dev); ++ advmss = max_t(unsigned int, ipv4_mtu(dst) - header_size, ++ net->ipv4.ip_rt_min_advmss); ++ rcu_read_unlock(); + + return min(advmss, IPV4_MAX_PMTU - header_size); + } +-- +2.39.5 + diff --git a/queue-6.12/ipv4-use-rcu-protection-in-rt_is_expired.patch b/queue-6.12/ipv4-use-rcu-protection-in-rt_is_expired.patch new file mode 100644 index 0000000000..2090f982af --- /dev/null +++ b/queue-6.12/ipv4-use-rcu-protection-in-rt_is_expired.patch @@ -0,0 +1,44 @@ +From 4a621a026b1c90ff11300cc33a620d2c94dc6535 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 5 Feb 2025 15:51:13 +0000 +Subject: ipv4: use RCU protection in rt_is_expired() + +From: Eric Dumazet + +[ Upstream commit dd205fcc33d92d54eee4d7f21bb073af9bd5ce2b ] + +rt_is_expired() must use RCU protection to make +sure the net structure it reads does not disappear. + +Fixes: e84f84f27647 ("netns: place rt_genid into struct net") +Signed-off-by: Eric Dumazet +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20250205155120.1676781-6-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/ipv4/route.c | 8 +++++++- + 1 file changed, 7 insertions(+), 1 deletion(-) + +diff --git a/net/ipv4/route.c b/net/ipv4/route.c +index 9709ec3e2dce6..e31aa5a74ace4 100644 +--- a/net/ipv4/route.c ++++ b/net/ipv4/route.c +@@ -390,7 +390,13 @@ static inline int ip_rt_proc_init(void) + + static inline bool rt_is_expired(const struct rtable *rth) + { +- return rth->rt_genid != rt_genid_ipv4(dev_net(rth->dst.dev)); ++ bool res; ++ ++ rcu_read_lock(); ++ res = rth->rt_genid != rt_genid_ipv4(dev_net_rcu(rth->dst.dev)); ++ rcu_read_unlock(); ++ ++ return res; + } + + void rt_cache_flush(struct net *net) +-- +2.39.5 + diff --git a/queue-6.12/ipv6-icmp-convert-to-dev_net_rcu.patch b/queue-6.12/ipv6-icmp-convert-to-dev_net_rcu.patch new file mode 100644 index 0000000000..0e17244912 --- /dev/null +++ b/queue-6.12/ipv6-icmp-convert-to-dev_net_rcu.patch @@ -0,0 +1,191 @@ +From af9d8e5913252d2a98a619e3254dcc5f0f976ca9 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 5 Feb 2025 15:51:19 +0000 +Subject: ipv6: icmp: convert to dev_net_rcu() + +From: Eric Dumazet + +[ Upstream commit 34aef2b0ce3aa4eb4ef2e1f5cad3738d527032f5 ] + +icmp6_send() must acquire rcu_read_lock() sooner to ensure +the dev_net() call done from a safe context. + +Other ICMPv6 uses of dev_net() seem safe, change them to +dev_net_rcu() to get LOCKDEP support to catch bugs. + +Fixes: 9a43b709a230 ("[NETNS][IPV6] icmp6 - make icmpv6_socket per namespace") +Signed-off-by: Eric Dumazet +Link: https://patch.msgid.link/20250205155120.1676781-12-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/ipv6/icmp.c | 42 +++++++++++++++++++++++------------------- + 1 file changed, 23 insertions(+), 19 deletions(-) + +diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c +index a6984a29fdb9d..4d14ab7f7e99f 100644 +--- a/net/ipv6/icmp.c ++++ b/net/ipv6/icmp.c +@@ -76,7 +76,7 @@ static int icmpv6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, + { + /* icmpv6_notify checks 8 bytes can be pulled, icmp6hdr is 8 bytes */ + struct icmp6hdr *icmp6 = (struct icmp6hdr *) (skb->data + offset); +- struct net *net = dev_net(skb->dev); ++ struct net *net = dev_net_rcu(skb->dev); + + if (type == ICMPV6_PKT_TOOBIG) + ip6_update_pmtu(skb, net, info, skb->dev->ifindex, 0, sock_net_uid(net, NULL)); +@@ -473,7 +473,10 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, + + if (!skb->dev) + return; +- net = dev_net(skb->dev); ++ ++ rcu_read_lock(); ++ ++ net = dev_net_rcu(skb->dev); + mark = IP6_REPLY_MARK(net, skb->mark); + /* + * Make sure we respect the rules +@@ -496,7 +499,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, + !(type == ICMPV6_PARAMPROB && + code == ICMPV6_UNK_OPTION && + (opt_unrec(skb, info)))) +- return; ++ goto out; + + saddr = NULL; + } +@@ -526,7 +529,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, + if ((addr_type == IPV6_ADDR_ANY) || (addr_type & IPV6_ADDR_MULTICAST)) { + net_dbg_ratelimited("icmp6_send: addr_any/mcast source [%pI6c > %pI6c]\n", + &hdr->saddr, &hdr->daddr); +- return; ++ goto out; + } + + /* +@@ -535,7 +538,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, + if (is_ineligible(skb)) { + net_dbg_ratelimited("icmp6_send: no reply to icmp error [%pI6c > %pI6c]\n", + &hdr->saddr, &hdr->daddr); +- return; ++ goto out; + } + + /* Needed by both icmpv6_global_allow and icmpv6_xmit_lock */ +@@ -582,7 +585,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, + np = inet6_sk(sk); + + if (!icmpv6_xrlim_allow(sk, type, &fl6, apply_ratelimit)) +- goto out; ++ goto out_unlock; + + tmp_hdr.icmp6_type = type; + tmp_hdr.icmp6_code = code; +@@ -600,7 +603,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, + + dst = icmpv6_route_lookup(net, skb, sk, &fl6); + if (IS_ERR(dst)) +- goto out; ++ goto out_unlock; + + ipc6.hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst); + +@@ -616,7 +619,6 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, + goto out_dst_release; + } + +- rcu_read_lock(); + idev = __in6_dev_get(skb->dev); + + if (ip6_append_data(sk, icmpv6_getfrag, &msg, +@@ -630,13 +632,15 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info, + icmpv6_push_pending_frames(sk, &fl6, &tmp_hdr, + len + sizeof(struct icmp6hdr)); + } +- rcu_read_unlock(); ++ + out_dst_release: + dst_release(dst); +-out: ++out_unlock: + icmpv6_xmit_unlock(sk); + out_bh_enable: + local_bh_enable(); ++out: ++ rcu_read_unlock(); + } + EXPORT_SYMBOL(icmp6_send); + +@@ -679,8 +683,8 @@ int ip6_err_gen_icmpv6_unreach(struct sk_buff *skb, int nhs, int type, + skb_pull(skb2, nhs); + skb_reset_network_header(skb2); + +- rt = rt6_lookup(dev_net(skb->dev), &ipv6_hdr(skb2)->saddr, NULL, 0, +- skb, 0); ++ rt = rt6_lookup(dev_net_rcu(skb->dev), &ipv6_hdr(skb2)->saddr, ++ NULL, 0, skb, 0); + + if (rt && rt->dst.dev) + skb2->dev = rt->dst.dev; +@@ -717,7 +721,7 @@ EXPORT_SYMBOL(ip6_err_gen_icmpv6_unreach); + + static enum skb_drop_reason icmpv6_echo_reply(struct sk_buff *skb) + { +- struct net *net = dev_net(skb->dev); ++ struct net *net = dev_net_rcu(skb->dev); + struct sock *sk; + struct inet6_dev *idev; + struct ipv6_pinfo *np; +@@ -832,7 +836,7 @@ enum skb_drop_reason icmpv6_notify(struct sk_buff *skb, u8 type, + u8 code, __be32 info) + { + struct inet6_skb_parm *opt = IP6CB(skb); +- struct net *net = dev_net(skb->dev); ++ struct net *net = dev_net_rcu(skb->dev); + const struct inet6_protocol *ipprot; + enum skb_drop_reason reason; + int inner_offset; +@@ -889,7 +893,7 @@ enum skb_drop_reason icmpv6_notify(struct sk_buff *skb, u8 type, + static int icmpv6_rcv(struct sk_buff *skb) + { + enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED; +- struct net *net = dev_net(skb->dev); ++ struct net *net = dev_net_rcu(skb->dev); + struct net_device *dev = icmp6_dev(skb); + struct inet6_dev *idev = __in6_dev_get(dev); + const struct in6_addr *saddr, *daddr; +@@ -921,7 +925,7 @@ static int icmpv6_rcv(struct sk_buff *skb) + skb_set_network_header(skb, nh); + } + +- __ICMP6_INC_STATS(dev_net(dev), idev, ICMP6_MIB_INMSGS); ++ __ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_INMSGS); + + saddr = &ipv6_hdr(skb)->saddr; + daddr = &ipv6_hdr(skb)->daddr; +@@ -939,7 +943,7 @@ static int icmpv6_rcv(struct sk_buff *skb) + + type = hdr->icmp6_type; + +- ICMP6MSGIN_INC_STATS(dev_net(dev), idev, type); ++ ICMP6MSGIN_INC_STATS(dev_net_rcu(dev), idev, type); + + switch (type) { + case ICMPV6_ECHO_REQUEST: +@@ -1034,9 +1038,9 @@ static int icmpv6_rcv(struct sk_buff *skb) + + csum_error: + reason = SKB_DROP_REASON_ICMP_CSUM; +- __ICMP6_INC_STATS(dev_net(dev), idev, ICMP6_MIB_CSUMERRORS); ++ __ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_CSUMERRORS); + discard_it: +- __ICMP6_INC_STATS(dev_net(dev), idev, ICMP6_MIB_INERRORS); ++ __ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_INERRORS); + drop_no_count: + kfree_skb_reason(skb, reason); + return 0; +-- +2.39.5 + diff --git a/queue-6.12/ipv6-mcast-add-rcu-protection-to-mld_newpack.patch b/queue-6.12/ipv6-mcast-add-rcu-protection-to-mld_newpack.patch new file mode 100644 index 0000000000..07accf9da2 --- /dev/null +++ b/queue-6.12/ipv6-mcast-add-rcu-protection-to-mld_newpack.patch @@ -0,0 +1,80 @@ +From b5dfcef3b1bfee0811dc7d0a650074f8c6a261da Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 12 Feb 2025 14:10:21 +0000 +Subject: ipv6: mcast: add RCU protection to mld_newpack() + +From: Eric Dumazet + +[ Upstream commit a527750d877fd334de87eef81f1cb5f0f0ca3373 ] + +mld_newpack() can be called without RTNL or RCU being held. + +Note that we no longer can use sock_alloc_send_skb() because +ipv6.igmp_sk uses GFP_KERNEL allocations which can sleep. + +Instead use alloc_skb() and charge the net->ipv6.igmp_sk +socket under RCU protection. + +Fixes: b8ad0cbc58f7 ("[NETNS][IPV6] mcast - handle several network namespace") +Signed-off-by: Eric Dumazet +Reviewed-by: David Ahern +Link: https://patch.msgid.link/20250212141021.1663666-1-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/ipv6/mcast.c | 14 ++++++++++---- + 1 file changed, 10 insertions(+), 4 deletions(-) + +diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c +index 6551648512585..b7b62e5a562e5 100644 +--- a/net/ipv6/mcast.c ++++ b/net/ipv6/mcast.c +@@ -1730,21 +1730,19 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu) + struct net_device *dev = idev->dev; + int hlen = LL_RESERVED_SPACE(dev); + int tlen = dev->needed_tailroom; +- struct net *net = dev_net(dev); + const struct in6_addr *saddr; + struct in6_addr addr_buf; + struct mld2_report *pmr; + struct sk_buff *skb; + unsigned int size; + struct sock *sk; +- int err; ++ struct net *net; + +- sk = net->ipv6.igmp_sk; + /* we assume size > sizeof(ra) here + * Also try to not allocate high-order pages for big MTU + */ + size = min_t(int, mtu, PAGE_SIZE / 2) + hlen + tlen; +- skb = sock_alloc_send_skb(sk, size, 1, &err); ++ skb = alloc_skb(size, GFP_KERNEL); + if (!skb) + return NULL; + +@@ -1752,6 +1750,12 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu) + skb_reserve(skb, hlen); + skb_tailroom_reserve(skb, mtu, tlen); + ++ rcu_read_lock(); ++ ++ net = dev_net_rcu(dev); ++ sk = net->ipv6.igmp_sk; ++ skb_set_owner_w(skb, sk); ++ + if (ipv6_get_lladdr(dev, &addr_buf, IFA_F_TENTATIVE)) { + /* : + * use unspecified address as the source address +@@ -1763,6 +1767,8 @@ static struct sk_buff *mld_newpack(struct inet6_dev *idev, unsigned int mtu) + + ip6_mc_hdr(sk, skb, dev, saddr, &mld2_all_mcr, NEXTHDR_HOP, 0); + ++ rcu_read_unlock(); ++ + skb_put_data(skb, ra, sizeof(ra)); + + skb_set_transport_header(skb, skb_tail_pointer(skb) - skb->data); +-- +2.39.5 + diff --git a/queue-6.12/ipv6-mcast-extend-rcu-protection-in-igmp6_send.patch b/queue-6.12/ipv6-mcast-extend-rcu-protection-in-igmp6_send.patch new file mode 100644 index 0000000000..97e96d31b2 --- /dev/null +++ b/queue-6.12/ipv6-mcast-extend-rcu-protection-in-igmp6_send.patch @@ -0,0 +1,105 @@ +From 657a1960fb3a36ea9c1566b4d869c8e03f5dc527 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 7 Feb 2025 13:58:40 +0000 +Subject: ipv6: mcast: extend RCU protection in igmp6_send() + +From: Eric Dumazet + +[ Upstream commit 087c1faa594fa07a66933d750c0b2610aa1a2946 ] + +igmp6_send() can be called without RTNL or RCU being held. + +Extend RCU protection so that we can safely fetch the net pointer +and avoid a potential UAF. + +Note that we no longer can use sock_alloc_send_skb() because +ipv6.igmp_sk uses GFP_KERNEL allocations which can sleep. + +Instead use alloc_skb() and charge the net->ipv6.igmp_sk +socket under RCU protection. + +Fixes: b8ad0cbc58f7 ("[NETNS][IPV6] mcast - handle several network namespace") +Signed-off-by: Eric Dumazet +Reviewed-by: David Ahern +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20250207135841.1948589-9-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/ipv6/mcast.c | 31 +++++++++++++++---------------- + 1 file changed, 15 insertions(+), 16 deletions(-) + +diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c +index b244dbf61d5f3..6551648512585 100644 +--- a/net/ipv6/mcast.c ++++ b/net/ipv6/mcast.c +@@ -2122,21 +2122,21 @@ static void mld_send_cr(struct inet6_dev *idev) + + static void igmp6_send(struct in6_addr *addr, struct net_device *dev, int type) + { +- struct net *net = dev_net(dev); +- struct sock *sk = net->ipv6.igmp_sk; ++ const struct in6_addr *snd_addr, *saddr; ++ int err, len, payload_len, full_len; ++ struct in6_addr addr_buf; + struct inet6_dev *idev; + struct sk_buff *skb; + struct mld_msg *hdr; +- const struct in6_addr *snd_addr, *saddr; +- struct in6_addr addr_buf; + int hlen = LL_RESERVED_SPACE(dev); + int tlen = dev->needed_tailroom; +- int err, len, payload_len, full_len; + u8 ra[8] = { IPPROTO_ICMPV6, 0, + IPV6_TLV_ROUTERALERT, 2, 0, 0, + IPV6_TLV_PADN, 0 }; +- struct flowi6 fl6; + struct dst_entry *dst; ++ struct flowi6 fl6; ++ struct net *net; ++ struct sock *sk; + + if (type == ICMPV6_MGM_REDUCTION) + snd_addr = &in6addr_linklocal_allrouters; +@@ -2147,19 +2147,21 @@ static void igmp6_send(struct in6_addr *addr, struct net_device *dev, int type) + payload_len = len + sizeof(ra); + full_len = sizeof(struct ipv6hdr) + payload_len; + +- rcu_read_lock(); +- IP6_INC_STATS(net, __in6_dev_get(dev), IPSTATS_MIB_OUTREQUESTS); +- rcu_read_unlock(); ++ skb = alloc_skb(hlen + tlen + full_len, GFP_KERNEL); + +- skb = sock_alloc_send_skb(sk, hlen + tlen + full_len, 1, &err); ++ rcu_read_lock(); + ++ net = dev_net_rcu(dev); ++ idev = __in6_dev_get(dev); ++ IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTREQUESTS); + if (!skb) { +- rcu_read_lock(); +- IP6_INC_STATS(net, __in6_dev_get(dev), +- IPSTATS_MIB_OUTDISCARDS); ++ IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS); + rcu_read_unlock(); + return; + } ++ sk = net->ipv6.igmp_sk; ++ skb_set_owner_w(skb, sk); ++ + skb->priority = TC_PRIO_CONTROL; + skb_reserve(skb, hlen); + +@@ -2184,9 +2186,6 @@ static void igmp6_send(struct in6_addr *addr, struct net_device *dev, int type) + IPPROTO_ICMPV6, + csum_partial(hdr, len, 0)); + +- rcu_read_lock(); +- idev = __in6_dev_get(skb->dev); +- + icmpv6_flow_init(sk, &fl6, type, + &ipv6_hdr(skb)->saddr, &ipv6_hdr(skb)->daddr, + skb->dev->ifindex); +-- +2.39.5 + diff --git a/queue-6.12/ipv6-use-rcu-protection-in-ip6_default_advmss.patch b/queue-6.12/ipv6-use-rcu-protection-in-ip6_default_advmss.patch new file mode 100644 index 0000000000..1fae218a50 --- /dev/null +++ b/queue-6.12/ipv6-use-rcu-protection-in-ip6_default_advmss.patch @@ -0,0 +1,49 @@ +From 696b09f558018b55ac46120507790417aebd9c6a Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 5 Feb 2025 15:51:18 +0000 +Subject: ipv6: use RCU protection in ip6_default_advmss() + +From: Eric Dumazet + +[ Upstream commit 3c8ffcd248da34fc41e52a46e51505900115fc2a ] + +ip6_default_advmss() needs rcu protection to make +sure the net structure it reads does not disappear. + +Fixes: 5578689a4e3c ("[NETNS][IPV6] route6 - make route6 per namespace") +Signed-off-by: Eric Dumazet +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20250205155120.1676781-11-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/ipv6/route.c | 7 ++++++- + 1 file changed, 6 insertions(+), 1 deletion(-) + +diff --git a/net/ipv6/route.c b/net/ipv6/route.c +index 8ebfed5d63232..2736dea77575b 100644 +--- a/net/ipv6/route.c ++++ b/net/ipv6/route.c +@@ -3196,13 +3196,18 @@ static unsigned int ip6_default_advmss(const struct dst_entry *dst) + { + struct net_device *dev = dst->dev; + unsigned int mtu = dst_mtu(dst); +- struct net *net = dev_net(dev); ++ struct net *net; + + mtu -= sizeof(struct ipv6hdr) + sizeof(struct tcphdr); + ++ rcu_read_lock(); ++ ++ net = dev_net_rcu(dev); + if (mtu < net->ipv6.sysctl.ip6_rt_min_advmss) + mtu = net->ipv6.sysctl.ip6_rt_min_advmss; + ++ rcu_read_unlock(); ++ + /* + * Maximal non-jumbo IPv6 payload is IPV6_MAXPLEN and + * corresponding MSS is IPV6_MAXPLEN - tcp_header_size. +-- +2.39.5 + diff --git a/queue-6.12/ndisc-extend-rcu-protection-in-ndisc_send_skb.patch b/queue-6.12/ndisc-extend-rcu-protection-in-ndisc_send_skb.patch new file mode 100644 index 0000000000..f796b5fd89 --- /dev/null +++ b/queue-6.12/ndisc-extend-rcu-protection-in-ndisc_send_skb.patch @@ -0,0 +1,72 @@ +From 93046772ca90a811f5ad97d1db80f62b1b7cc2af Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 7 Feb 2025 13:58:39 +0000 +Subject: ndisc: extend RCU protection in ndisc_send_skb() + +From: Eric Dumazet + +[ Upstream commit ed6ae1f325d3c43966ec1b62ac1459e2b8e45640 ] + +ndisc_send_skb() can be called without RTNL or RCU held. + +Acquire rcu_read_lock() earlier, so that we can use dev_net_rcu() +and avoid a potential UAF. + +Fixes: 1762f7e88eb3 ("[NETNS][IPV6] ndisc - make socket control per namespace") +Signed-off-by: Eric Dumazet +Reviewed-by: David Ahern +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20250207135841.1948589-8-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/ipv6/ndisc.c | 12 ++++++++---- + 1 file changed, 8 insertions(+), 4 deletions(-) + +diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c +index 90f8aa2d7af2e..8699d1a188dc4 100644 +--- a/net/ipv6/ndisc.c ++++ b/net/ipv6/ndisc.c +@@ -471,16 +471,20 @@ static void ip6_nd_hdr(struct sk_buff *skb, + void ndisc_send_skb(struct sk_buff *skb, const struct in6_addr *daddr, + const struct in6_addr *saddr) + { ++ struct icmp6hdr *icmp6h = icmp6_hdr(skb); + struct dst_entry *dst = skb_dst(skb); +- struct net *net = dev_net(skb->dev); +- struct sock *sk = net->ipv6.ndisc_sk; + struct inet6_dev *idev; ++ struct net *net; ++ struct sock *sk; + int err; +- struct icmp6hdr *icmp6h = icmp6_hdr(skb); + u8 type; + + type = icmp6h->icmp6_type; + ++ rcu_read_lock(); ++ ++ net = dev_net_rcu(skb->dev); ++ sk = net->ipv6.ndisc_sk; + if (!dst) { + struct flowi6 fl6; + int oif = skb->dev->ifindex; +@@ -488,6 +492,7 @@ void ndisc_send_skb(struct sk_buff *skb, const struct in6_addr *daddr, + icmpv6_flow_init(sk, &fl6, type, saddr, daddr, oif); + dst = icmp6_dst_alloc(skb->dev, &fl6); + if (IS_ERR(dst)) { ++ rcu_read_unlock(); + kfree_skb(skb); + return; + } +@@ -502,7 +507,6 @@ void ndisc_send_skb(struct sk_buff *skb, const struct in6_addr *daddr, + + ip6_nd_hdr(skb, saddr, daddr, READ_ONCE(inet6_sk(sk)->hop_limit), skb->len); + +- rcu_read_lock(); + idev = __in6_dev_get(dst->dev); + IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTREQUESTS); + +-- +2.39.5 + diff --git a/queue-6.12/ndisc-use-rcu-protection-in-ndisc_alloc_skb.patch b/queue-6.12/ndisc-use-rcu-protection-in-ndisc_alloc_skb.patch new file mode 100644 index 0000000000..2b8ce24adb --- /dev/null +++ b/queue-6.12/ndisc-use-rcu-protection-in-ndisc_alloc_skb.patch @@ -0,0 +1,59 @@ +From a591ecc09cc3f5cb8580aff4cf53e13e3067139e Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 7 Feb 2025 13:58:34 +0000 +Subject: ndisc: use RCU protection in ndisc_alloc_skb() + +From: Eric Dumazet + +[ Upstream commit 628e6d18930bbd21f2d4562228afe27694f66da9 ] + +ndisc_alloc_skb() can be called without RTNL or RCU being held. + +Add RCU protection to avoid possible UAF. + +Fixes: de09334b9326 ("ndisc: Introduce ndisc_alloc_skb() helper.") +Signed-off-by: Eric Dumazet +Reviewed-by: David Ahern +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20250207135841.1948589-3-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/ipv6/ndisc.c | 10 ++++------ + 1 file changed, 4 insertions(+), 6 deletions(-) + +diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c +index 264b10a947577..90f8aa2d7af2e 100644 +--- a/net/ipv6/ndisc.c ++++ b/net/ipv6/ndisc.c +@@ -418,15 +418,11 @@ static struct sk_buff *ndisc_alloc_skb(struct net_device *dev, + { + int hlen = LL_RESERVED_SPACE(dev); + int tlen = dev->needed_tailroom; +- struct sock *sk = dev_net(dev)->ipv6.ndisc_sk; + struct sk_buff *skb; + + skb = alloc_skb(hlen + sizeof(struct ipv6hdr) + len + tlen, GFP_ATOMIC); +- if (!skb) { +- ND_PRINTK(0, err, "ndisc: %s failed to allocate an skb\n", +- __func__); ++ if (!skb) + return NULL; +- } + + skb->protocol = htons(ETH_P_IPV6); + skb->dev = dev; +@@ -437,7 +433,9 @@ static struct sk_buff *ndisc_alloc_skb(struct net_device *dev, + /* Manually assign socket ownership as we avoid calling + * sock_alloc_send_pskb() to bypass wmem buffer limits + */ +- skb_set_owner_w(skb, sk); ++ rcu_read_lock(); ++ skb_set_owner_w(skb, dev_net_rcu(dev)->ipv6.ndisc_sk); ++ rcu_read_unlock(); + + return skb; + } +-- +2.39.5 + diff --git a/queue-6.12/neighbour-use-rcu-protection-in-__neigh_notify.patch b/queue-6.12/neighbour-use-rcu-protection-in-__neigh_notify.patch new file mode 100644 index 0000000000..6ec0d2d6e3 --- /dev/null +++ b/queue-6.12/neighbour-use-rcu-protection-in-__neigh_notify.patch @@ -0,0 +1,58 @@ +From b666d3ec0cc14264d2b1d2a5c2195418702bac2d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 7 Feb 2025 13:58:35 +0000 +Subject: neighbour: use RCU protection in __neigh_notify() + +From: Eric Dumazet + +[ Upstream commit becbd5850c03ed33b232083dd66c6e38c0c0e569 ] + +__neigh_notify() can be called without RTNL or RCU protection. + +Use RCU protection to avoid potential UAF. + +Fixes: 426b5303eb43 ("[NETNS]: Modify the neighbour table code so it handles multiple network namespaces") +Signed-off-by: Eric Dumazet +Reviewed-by: David Ahern +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20250207135841.1948589-4-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/core/neighbour.c | 8 ++++++-- + 1 file changed, 6 insertions(+), 2 deletions(-) + +diff --git a/net/core/neighbour.c b/net/core/neighbour.c +index cc58315a40a79..c7f7ea61b524a 100644 +--- a/net/core/neighbour.c ++++ b/net/core/neighbour.c +@@ -3513,10 +3513,12 @@ static const struct seq_operations neigh_stat_seq_ops = { + static void __neigh_notify(struct neighbour *n, int type, int flags, + u32 pid) + { +- struct net *net = dev_net(n->dev); + struct sk_buff *skb; + int err = -ENOBUFS; ++ struct net *net; + ++ rcu_read_lock(); ++ net = dev_net_rcu(n->dev); + skb = nlmsg_new(neigh_nlmsg_size(), GFP_ATOMIC); + if (skb == NULL) + goto errout; +@@ -3529,9 +3531,11 @@ static void __neigh_notify(struct neighbour *n, int type, int flags, + goto errout; + } + rtnl_notify(skb, net, 0, RTNLGRP_NEIGH, NULL, GFP_ATOMIC); +- return; ++ goto out; + errout: + rtnl_set_sk_err(net, RTNLGRP_NEIGH, err); ++out: ++ rcu_read_unlock(); + } + + void neigh_app_ns(struct neighbour *n) +-- +2.39.5 + diff --git a/queue-6.12/net-add-dev_net_rcu-helper.patch b/queue-6.12/net-add-dev_net_rcu-helper.patch new file mode 100644 index 0000000000..2c682f4b07 --- /dev/null +++ b/queue-6.12/net-add-dev_net_rcu-helper.patch @@ -0,0 +1,62 @@ +From 04a4d5e517db5ce3250defe23f9d333b36ffab8d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 5 Feb 2025 15:51:09 +0000 +Subject: net: add dev_net_rcu() helper + +From: Eric Dumazet + +[ Upstream commit 482ad2a4ace2740ca0ff1cbc8f3c7f862f3ab507 ] + +dev->nd_net can change, readers should either +use rcu_read_lock() or RTNL. + +We currently use a generic helper, dev_net() with +no debugging support. We probably have many hidden bugs. + +Add dev_net_rcu() helper for callers using rcu_read_lock() +protection. + +Signed-off-by: Eric Dumazet +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20250205155120.1676781-2-edumazet@google.com +Signed-off-by: Jakub Kicinski +Stable-dep-of: 71b8471c93fa ("ipv4: use RCU protection in ipv4_default_advmss()") +Signed-off-by: Sasha Levin +--- + include/linux/netdevice.h | 6 ++++++ + include/net/net_namespace.h | 2 +- + 2 files changed, 7 insertions(+), 1 deletion(-) + +diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h +index 02d3bafebbe77..4f17b786828af 100644 +--- a/include/linux/netdevice.h ++++ b/include/linux/netdevice.h +@@ -2577,6 +2577,12 @@ struct net *dev_net(const struct net_device *dev) + return read_pnet(&dev->nd_net); + } + ++static inline ++struct net *dev_net_rcu(const struct net_device *dev) ++{ ++ return read_pnet_rcu(&dev->nd_net); ++} ++ + static inline + void dev_net_set(struct net_device *dev, struct net *net) + { +diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h +index 9398c8f499536..da93873df4dbd 100644 +--- a/include/net/net_namespace.h ++++ b/include/net/net_namespace.h +@@ -387,7 +387,7 @@ static inline struct net *read_pnet(const possible_net_t *pnet) + #endif + } + +-static inline struct net *read_pnet_rcu(possible_net_t *pnet) ++static inline struct net *read_pnet_rcu(const possible_net_t *pnet) + { + #ifdef CONFIG_NET_NS + return rcu_dereference(pnet->net); +-- +2.39.5 + diff --git a/queue-6.12/net-ipv4-cache-pmtu-for-all-packet-paths-if-multipat.patch b/queue-6.12/net-ipv4-cache-pmtu-for-all-packet-paths-if-multipat.patch new file mode 100644 index 0000000000..5f0624c001 --- /dev/null +++ b/queue-6.12/net-ipv4-cache-pmtu-for-all-packet-paths-if-multipat.patch @@ -0,0 +1,292 @@ +From b97fe83fff886ccd0049b9a3e014c55251140ddd Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 8 Nov 2024 09:34:24 +0000 +Subject: net: ipv4: Cache pmtu for all packet paths if multipath enabled + +From: Vladimir Vdovin + +[ Upstream commit 7d3f3b4367f315a61fc615e3138f3d320da8c466 ] + +Check number of paths by fib_info_num_path(), +and update_or_create_fnhe() for every path. +Problem is that pmtu is cached only for the oif +that has received icmp message "need to frag", +other oifs will still try to use "default" iface mtu. + +An example topology showing the problem: + + | host1 + +---------+ + | dummy0 | 10.179.20.18/32 mtu9000 + +---------+ + +-----------+----------------+ + +---------+ +---------+ + | ens17f0 | 10.179.2.141/31 | ens17f1 | 10.179.2.13/31 + +---------+ +---------+ + | (all here have mtu 9000) | + +------+ +------+ + | ro1 | 10.179.2.140/31 | ro2 | 10.179.2.12/31 + +------+ +------+ + | | +---------+------------+-------------------+------ + | + +-----+ + | ro3 | 10.10.10.10 mtu1500 + +-----+ + | + ======================================== + some networks + ======================================== + | + +-----+ + | eth0| 10.10.30.30 mtu9000 + +-----+ + | host2 + +host1 have enabled multipath and +sysctl net.ipv4.fib_multipath_hash_policy = 1: + +default proto static src 10.179.20.18 + nexthop via 10.179.2.12 dev ens17f1 weight 1 + nexthop via 10.179.2.140 dev ens17f0 weight 1 + +When host1 tries to do pmtud from 10.179.20.18/32 to host2, +host1 receives at ens17f1 iface an icmp packet from ro3 that ro3 mtu=1500. +And host1 caches it in nexthop exceptions cache. + +Problem is that it is cached only for the iface that has received icmp, +and there is no way that ro3 will send icmp msg to host1 via another path. + +Host1 now have this routes to host2: + +ip r g 10.10.30.30 sport 30000 dport 443 +10.10.30.30 via 10.179.2.12 dev ens17f1 src 10.179.20.18 uid 0 + cache expires 521sec mtu 1500 + +ip r g 10.10.30.30 sport 30033 dport 443 +10.10.30.30 via 10.179.2.140 dev ens17f0 src 10.179.20.18 uid 0 + cache + +So when host1 tries again to reach host2 with mtu>1500, +if packet flow is lucky enough to be hashed with oif=ens17f1 its ok, +if oif=ens17f0 it blackholes and still gets icmp msgs from ro3 to ens17f1, +until lucky day when ro3 will send it through another flow to ens17f0. + +Signed-off-by: Vladimir Vdovin +Reviewed-by: Ido Schimmel +Link: https://patch.msgid.link/20241108093427.317942-1-deliran@verdict.gg +Signed-off-by: Jakub Kicinski +Stable-dep-of: 139512191bd0 ("ipv4: use RCU protection in __ip_rt_update_pmtu()") +Signed-off-by: Sasha Levin +--- + net/ipv4/route.c | 13 ++++ + tools/testing/selftests/net/pmtu.sh | 112 +++++++++++++++++++++++----- + 2 files changed, 108 insertions(+), 17 deletions(-) + +diff --git a/net/ipv4/route.c b/net/ipv4/route.c +index e31aa5a74ace4..f707cdb26ff20 100644 +--- a/net/ipv4/route.c ++++ b/net/ipv4/route.c +@@ -1034,6 +1034,19 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu) + struct fib_nh_common *nhc; + + fib_select_path(net, &res, fl4, NULL); ++#ifdef CONFIG_IP_ROUTE_MULTIPATH ++ if (fib_info_num_path(res.fi) > 1) { ++ int nhsel; ++ ++ for (nhsel = 0; nhsel < fib_info_num_path(res.fi); nhsel++) { ++ nhc = fib_info_nhc(res.fi, nhsel); ++ update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock, ++ jiffies + net->ipv4.ip_rt_mtu_expires); ++ } ++ rcu_read_unlock(); ++ return; ++ } ++#endif /* CONFIG_IP_ROUTE_MULTIPATH */ + nhc = FIB_RES_NHC(res); + update_or_create_fnhe(nhc, fl4->daddr, 0, mtu, lock, + jiffies + net->ipv4.ip_rt_mtu_expires); +diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh +index 6c651c880fe83..66be7699c72c9 100755 +--- a/tools/testing/selftests/net/pmtu.sh ++++ b/tools/testing/selftests/net/pmtu.sh +@@ -197,6 +197,12 @@ + # + # - pmtu_ipv6_route_change + # Same as above but with IPv6 ++# ++# - pmtu_ipv4_mp_exceptions ++# Use the same topology as in pmtu_ipv4, but add routeable addresses ++# on host A and B on lo reachable via both routers. Host A and B ++# addresses have multipath routes to each other, b_r1 mtu = 1500. ++# Check that PMTU exceptions are created for both paths. + + source lib.sh + source net_helper.sh +@@ -266,7 +272,8 @@ tests=" + list_flush_ipv4_exception ipv4: list and flush cached exceptions 1 + list_flush_ipv6_exception ipv6: list and flush cached exceptions 1 + pmtu_ipv4_route_change ipv4: PMTU exception w/route replace 1 +- pmtu_ipv6_route_change ipv6: PMTU exception w/route replace 1" ++ pmtu_ipv6_route_change ipv6: PMTU exception w/route replace 1 ++ pmtu_ipv4_mp_exceptions ipv4: PMTU multipath nh exceptions 1" + + # Addressing and routing for tests with routers: four network segments, with + # index SEGMENT between 1 and 4, a common prefix (PREFIX4 or PREFIX6) and an +@@ -343,6 +350,9 @@ tunnel6_a_addr="fd00:2::a" + tunnel6_b_addr="fd00:2::b" + tunnel6_mask="64" + ++host4_a_addr="192.168.99.99" ++host4_b_addr="192.168.88.88" ++ + dummy6_0_prefix="fc00:1000::" + dummy6_1_prefix="fc00:1001::" + dummy6_mask="64" +@@ -984,6 +994,52 @@ setup_ovs_bridge() { + run_cmd ip route add ${prefix6}:${b_r1}::1 via ${prefix6}:${a_r1}::2 + } + ++setup_multipath_new() { ++ # Set up host A with multipath routes to host B host4_b_addr ++ run_cmd ${ns_a} ip addr add ${host4_a_addr} dev lo ++ run_cmd ${ns_a} ip nexthop add id 401 via ${prefix4}.${a_r1}.2 dev veth_A-R1 ++ run_cmd ${ns_a} ip nexthop add id 402 via ${prefix4}.${a_r2}.2 dev veth_A-R2 ++ run_cmd ${ns_a} ip nexthop add id 403 group 401/402 ++ run_cmd ${ns_a} ip route add ${host4_b_addr} src ${host4_a_addr} nhid 403 ++ ++ # Set up host B with multipath routes to host A host4_a_addr ++ run_cmd ${ns_b} ip addr add ${host4_b_addr} dev lo ++ run_cmd ${ns_b} ip nexthop add id 401 via ${prefix4}.${b_r1}.2 dev veth_B-R1 ++ run_cmd ${ns_b} ip nexthop add id 402 via ${prefix4}.${b_r2}.2 dev veth_B-R2 ++ run_cmd ${ns_b} ip nexthop add id 403 group 401/402 ++ run_cmd ${ns_b} ip route add ${host4_a_addr} src ${host4_b_addr} nhid 403 ++} ++ ++setup_multipath_old() { ++ # Set up host A with multipath routes to host B host4_b_addr ++ run_cmd ${ns_a} ip addr add ${host4_a_addr} dev lo ++ run_cmd ${ns_a} ip route add ${host4_b_addr} \ ++ src ${host4_a_addr} \ ++ nexthop via ${prefix4}.${a_r1}.2 weight 1 \ ++ nexthop via ${prefix4}.${a_r2}.2 weight 1 ++ ++ # Set up host B with multipath routes to host A host4_a_addr ++ run_cmd ${ns_b} ip addr add ${host4_b_addr} dev lo ++ run_cmd ${ns_b} ip route add ${host4_a_addr} \ ++ src ${host4_b_addr} \ ++ nexthop via ${prefix4}.${b_r1}.2 weight 1 \ ++ nexthop via ${prefix4}.${b_r2}.2 weight 1 ++} ++ ++setup_multipath() { ++ if [ "$USE_NH" = "yes" ]; then ++ setup_multipath_new ++ else ++ setup_multipath_old ++ fi ++ ++ # Set up routers with routes to dummies ++ run_cmd ${ns_r1} ip route add ${host4_a_addr} via ${prefix4}.${a_r1}.1 ++ run_cmd ${ns_r2} ip route add ${host4_a_addr} via ${prefix4}.${a_r2}.1 ++ run_cmd ${ns_r1} ip route add ${host4_b_addr} via ${prefix4}.${b_r1}.1 ++ run_cmd ${ns_r2} ip route add ${host4_b_addr} via ${prefix4}.${b_r2}.1 ++} ++ + setup() { + [ "$(id -u)" -ne 0 ] && echo " need to run as root" && return $ksft_skip + +@@ -1076,23 +1132,15 @@ link_get_mtu() { + } + + route_get_dst_exception() { +- ns_cmd="${1}" +- dst="${2}" +- dsfield="${3}" ++ ns_cmd="${1}"; shift + +- if [ -z "${dsfield}" ]; then +- dsfield=0 +- fi +- +- ${ns_cmd} ip route get "${dst}" dsfield "${dsfield}" ++ ${ns_cmd} ip route get "$@" + } + + route_get_dst_pmtu_from_exception() { +- ns_cmd="${1}" +- dst="${2}" +- dsfield="${3}" ++ ns_cmd="${1}"; shift + +- mtu_parse "$(route_get_dst_exception "${ns_cmd}" "${dst}" "${dsfield}")" ++ mtu_parse "$(route_get_dst_exception "${ns_cmd}" "$@")" + } + + check_pmtu_value() { +@@ -1235,10 +1283,10 @@ test_pmtu_ipv4_dscp_icmp_exception() { + run_cmd "${ns_a}" ping -q -M want -Q "${dsfield}" -c 1 -w 1 -s "${len}" "${dst2}" + + # Check that exceptions have been created with the correct PMTU +- pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst1}" "${policy_mark}")" ++ pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst1}" dsfield "${policy_mark}")" + check_pmtu_value "1400" "${pmtu_1}" "exceeding MTU" || return 1 + +- pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst2}" "${policy_mark}")" ++ pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst2}" dsfield "${policy_mark}")" + check_pmtu_value "1500" "${pmtu_2}" "exceeding MTU" || return 1 + } + +@@ -1285,9 +1333,9 @@ test_pmtu_ipv4_dscp_udp_exception() { + UDP:"${dst2}":50000,tos="${dsfield}" + + # Check that exceptions have been created with the correct PMTU +- pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst1}" "${policy_mark}")" ++ pmtu_1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst1}" dsfield "${policy_mark}")" + check_pmtu_value "1400" "${pmtu_1}" "exceeding MTU" || return 1 +- pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst2}" "${policy_mark}")" ++ pmtu_2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${dst2}" dsfield "${policy_mark}")" + check_pmtu_value "1500" "${pmtu_2}" "exceeding MTU" || return 1 + } + +@@ -2329,6 +2377,36 @@ test_pmtu_ipv6_route_change() { + test_pmtu_ipvX_route_change 6 + } + ++test_pmtu_ipv4_mp_exceptions() { ++ setup namespaces routing multipath || return $ksft_skip ++ ++ trace "${ns_a}" veth_A-R1 "${ns_r1}" veth_R1-A \ ++ "${ns_r1}" veth_R1-B "${ns_b}" veth_B-R1 \ ++ "${ns_a}" veth_A-R2 "${ns_r2}" veth_R2-A \ ++ "${ns_r2}" veth_R2-B "${ns_b}" veth_B-R2 ++ ++ # Set up initial MTU values ++ mtu "${ns_a}" veth_A-R1 2000 ++ mtu "${ns_r1}" veth_R1-A 2000 ++ mtu "${ns_r1}" veth_R1-B 1500 ++ mtu "${ns_b}" veth_B-R1 1500 ++ ++ mtu "${ns_a}" veth_A-R2 2000 ++ mtu "${ns_r2}" veth_R2-A 2000 ++ mtu "${ns_r2}" veth_R2-B 1500 ++ mtu "${ns_b}" veth_B-R2 1500 ++ ++ # Ping and expect two nexthop exceptions for two routes ++ run_cmd ${ns_a} ping -q -M want -i 0.1 -c 1 -s 1800 "${host4_b_addr}" ++ ++ # Check that exceptions have been created with the correct PMTU ++ pmtu_a_R1="$(route_get_dst_pmtu_from_exception "${ns_a}" "${host4_b_addr}" oif veth_A-R1)" ++ pmtu_a_R2="$(route_get_dst_pmtu_from_exception "${ns_a}" "${host4_b_addr}" oif veth_A-R2)" ++ ++ check_pmtu_value "1500" "${pmtu_a_R1}" "exceeding MTU (veth_A-R1)" || return 1 ++ check_pmtu_value "1500" "${pmtu_a_R2}" "exceeding MTU (veth_A-R2)" || return 1 ++} ++ + usage() { + echo + echo "$0 [OPTIONS] [TEST]..." +-- +2.39.5 + diff --git a/queue-6.12/net-ipv6-fix-dst-ref-loops-in-rpl-seg6-and-ioam6-lwt.patch b/queue-6.12/net-ipv6-fix-dst-ref-loops-in-rpl-seg6-and-ioam6-lwt.patch new file mode 100644 index 0000000000..18b2d14d08 --- /dev/null +++ b/queue-6.12/net-ipv6-fix-dst-ref-loops-in-rpl-seg6-and-ioam6-lwt.patch @@ -0,0 +1,94 @@ +From 01157585e1e4aefbcfed1d8919812874a7afed8c Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 29 Jan 2025 19:15:19 -0800 +Subject: net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels + +From: Jakub Kicinski + +[ Upstream commit 92191dd1073088753821b862b791dcc83e558e07 ] + +Some lwtunnels have a dst cache for post-transformation dst. +If the packet destination did not change we may end up recording +a reference to the lwtunnel in its own cache, and the lwtunnel +state will never be freed. + +Discovered by the ioam6.sh test, kmemleak was recently fixed +to catch per-cpu memory leaks. I'm not sure if rpl and seg6 +can actually hit this, but in principle I don't see why not. + +Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation") +Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") +Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel") +Reviewed-by: Simon Horman +Link: https://patch.msgid.link/20250130031519.2716843-2-kuba@kernel.org +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/ipv6/ioam6_iptunnel.c | 9 ++++++--- + net/ipv6/rpl_iptunnel.c | 9 ++++++--- + net/ipv6/seg6_iptunnel.c | 9 ++++++--- + 3 files changed, 18 insertions(+), 9 deletions(-) + +diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c +index e81b45b1f6555..fb6cb540cd1bc 100644 +--- a/net/ipv6/ioam6_iptunnel.c ++++ b/net/ipv6/ioam6_iptunnel.c +@@ -413,9 +413,12 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) + goto drop; + } + +- local_bh_disable(); +- dst_cache_set_ip6(&ilwt->cache, cache_dst, &fl6.saddr); +- local_bh_enable(); ++ /* cache only if we don't create a dst reference loop */ ++ if (dst->lwtstate != cache_dst->lwtstate) { ++ local_bh_disable(); ++ dst_cache_set_ip6(&ilwt->cache, cache_dst, &fl6.saddr); ++ local_bh_enable(); ++ } + + err = skb_cow_head(skb, LL_RESERVED_SPACE(cache_dst->dev)); + if (unlikely(err)) +diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c +index 7ba22d2f2bfef..be084089ec783 100644 +--- a/net/ipv6/rpl_iptunnel.c ++++ b/net/ipv6/rpl_iptunnel.c +@@ -236,9 +236,12 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb) + goto drop; + } + +- local_bh_disable(); +- dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr); +- local_bh_enable(); ++ /* cache only if we don't create a dst reference loop */ ++ if (orig_dst->lwtstate != dst->lwtstate) { ++ local_bh_disable(); ++ dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr); ++ local_bh_enable(); ++ } + + err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); + if (unlikely(err)) +diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c +index 4bf937bfc2633..316dbc2694f2a 100644 +--- a/net/ipv6/seg6_iptunnel.c ++++ b/net/ipv6/seg6_iptunnel.c +@@ -575,9 +575,12 @@ static int seg6_output_core(struct net *net, struct sock *sk, + goto drop; + } + +- local_bh_disable(); +- dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr); +- local_bh_enable(); ++ /* cache only if we don't create a dst reference loop */ ++ if (orig_dst->lwtstate != dst->lwtstate) { ++ local_bh_disable(); ++ dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr); ++ local_bh_enable(); ++ } + + err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); + if (unlikely(err)) +-- +2.39.5 + diff --git a/queue-6.12/net-ipv6-ioam6_iptunnel-mitigate-2-realloc-issue.patch b/queue-6.12/net-ipv6-ioam6_iptunnel-mitigate-2-realloc-issue.patch new file mode 100644 index 0000000000..3be07a320e --- /dev/null +++ b/queue-6.12/net-ipv6-ioam6_iptunnel-mitigate-2-realloc-issue.patch @@ -0,0 +1,190 @@ +From f753a5f7a35a7fd6411ab9a30c1a28ff123f9a37 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 3 Dec 2024 13:49:43 +0100 +Subject: net: ipv6: ioam6_iptunnel: mitigate 2-realloc issue + +From: Justin Iurman + +[ Upstream commit dce525185bc92864e5a318040285ee070563fe34 ] + +This patch mitigates the two-reallocations issue with ioam6_iptunnel by +providing the dst_entry (in the cache) to the first call to +skb_cow_head(). As a result, the very first iteration may still trigger +two reallocations (i.e., empty cache), while next iterations would only +trigger a single reallocation. + +Performance tests before/after applying this patch, which clearly shows +the improvement: +- inline mode: + - before: https://ibb.co/LhQ8V63 + - after: https://ibb.co/x5YT2bS +- encap mode: + - before: https://ibb.co/3Cjm5m0 + - after: https://ibb.co/TwpsxTC +- encap mode with tunsrc: + - before: https://ibb.co/Gpy9QPg + - after: https://ibb.co/PW1bZFT + +This patch also fixes an incorrect behavior: after the insertion, the +second call to skb_cow_head() makes sure that the dev has enough +headroom in the skb for layer 2 and stuff. In that case, the "old" +dst_entry was used, which is now fixed. After discussing with Paolo, it +appears that both patches can be merged into a single one -this one- +(for the sake of readability) and target net-next. + +Signed-off-by: Justin Iurman +Signed-off-by: Paolo Abeni +Stable-dep-of: 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels") +Signed-off-by: Sasha Levin +--- + net/ipv6/ioam6_iptunnel.c | 73 ++++++++++++++++++++------------------- + 1 file changed, 37 insertions(+), 36 deletions(-) + +diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c +index beb6b4cfc551c..e81b45b1f6555 100644 +--- a/net/ipv6/ioam6_iptunnel.c ++++ b/net/ipv6/ioam6_iptunnel.c +@@ -255,14 +255,15 @@ static int ioam6_do_fill(struct net *net, struct sk_buff *skb) + } + + static int ioam6_do_inline(struct net *net, struct sk_buff *skb, +- struct ioam6_lwt_encap *tuninfo) ++ struct ioam6_lwt_encap *tuninfo, ++ struct dst_entry *cache_dst) + { + struct ipv6hdr *oldhdr, *hdr; + int hdrlen, err; + + hdrlen = (tuninfo->eh.hdrlen + 1) << 3; + +- err = skb_cow_head(skb, hdrlen + skb->mac_len); ++ err = skb_cow_head(skb, hdrlen + dst_dev_overhead(cache_dst, skb)); + if (unlikely(err)) + return err; + +@@ -293,7 +294,8 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb, + struct ioam6_lwt_encap *tuninfo, + bool has_tunsrc, + struct in6_addr *tunsrc, +- struct in6_addr *tundst) ++ struct in6_addr *tundst, ++ struct dst_entry *cache_dst) + { + struct dst_entry *dst = skb_dst(skb); + struct ipv6hdr *hdr, *inner_hdr; +@@ -302,7 +304,7 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb, + hdrlen = (tuninfo->eh.hdrlen + 1) << 3; + len = sizeof(*hdr) + hdrlen; + +- err = skb_cow_head(skb, len + skb->mac_len); ++ err = skb_cow_head(skb, len + dst_dev_overhead(cache_dst, skb)); + if (unlikely(err)) + return err; + +@@ -336,7 +338,7 @@ static int ioam6_do_encap(struct net *net, struct sk_buff *skb, + + static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) + { +- struct dst_entry *dst = skb_dst(skb); ++ struct dst_entry *dst = skb_dst(skb), *cache_dst; + struct in6_addr orig_daddr; + struct ioam6_lwt *ilwt; + int err = -EINVAL; +@@ -354,6 +356,10 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) + + orig_daddr = ipv6_hdr(skb)->daddr; + ++ local_bh_disable(); ++ cache_dst = dst_cache_get(&ilwt->cache); ++ local_bh_enable(); ++ + switch (ilwt->mode) { + case IOAM6_IPTUNNEL_MODE_INLINE: + do_inline: +@@ -361,7 +367,7 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) + if (ipv6_hdr(skb)->nexthdr == NEXTHDR_HOP) + goto out; + +- err = ioam6_do_inline(net, skb, &ilwt->tuninfo); ++ err = ioam6_do_inline(net, skb, &ilwt->tuninfo, cache_dst); + if (unlikely(err)) + goto drop; + +@@ -371,7 +377,7 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) + /* Encapsulation (ip6ip6) */ + err = ioam6_do_encap(net, skb, &ilwt->tuninfo, + ilwt->has_tunsrc, &ilwt->tunsrc, +- &ilwt->tundst); ++ &ilwt->tundst, cache_dst); + if (unlikely(err)) + goto drop; + +@@ -389,41 +395,36 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) + goto drop; + } + +- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); +- if (unlikely(err)) +- goto drop; ++ if (unlikely(!cache_dst)) { ++ struct ipv6hdr *hdr = ipv6_hdr(skb); ++ struct flowi6 fl6; ++ ++ memset(&fl6, 0, sizeof(fl6)); ++ fl6.daddr = hdr->daddr; ++ fl6.saddr = hdr->saddr; ++ fl6.flowlabel = ip6_flowinfo(hdr); ++ fl6.flowi6_mark = skb->mark; ++ fl6.flowi6_proto = hdr->nexthdr; ++ ++ cache_dst = ip6_route_output(net, NULL, &fl6); ++ if (cache_dst->error) { ++ err = cache_dst->error; ++ dst_release(cache_dst); ++ goto drop; ++ } + +- if (!ipv6_addr_equal(&orig_daddr, &ipv6_hdr(skb)->daddr)) { + local_bh_disable(); +- dst = dst_cache_get(&ilwt->cache); ++ dst_cache_set_ip6(&ilwt->cache, cache_dst, &fl6.saddr); + local_bh_enable(); + +- if (unlikely(!dst)) { +- struct ipv6hdr *hdr = ipv6_hdr(skb); +- struct flowi6 fl6; +- +- memset(&fl6, 0, sizeof(fl6)); +- fl6.daddr = hdr->daddr; +- fl6.saddr = hdr->saddr; +- fl6.flowlabel = ip6_flowinfo(hdr); +- fl6.flowi6_mark = skb->mark; +- fl6.flowi6_proto = hdr->nexthdr; +- +- dst = ip6_route_output(net, NULL, &fl6); +- if (dst->error) { +- err = dst->error; +- dst_release(dst); +- goto drop; +- } +- +- local_bh_disable(); +- dst_cache_set_ip6(&ilwt->cache, dst, &fl6.saddr); +- local_bh_enable(); +- } ++ err = skb_cow_head(skb, LL_RESERVED_SPACE(cache_dst->dev)); ++ if (unlikely(err)) ++ goto drop; ++ } + ++ if (!ipv6_addr_equal(&orig_daddr, &ipv6_hdr(skb)->daddr)) { + skb_dst_drop(skb); +- skb_dst_set(skb, dst); +- ++ skb_dst_set(skb, cache_dst); + return dst_output(net, sk, skb); + } + out: +-- +2.39.5 + diff --git a/queue-6.12/net-ipv6-rpl_iptunnel-mitigate-2-realloc-issue.patch b/queue-6.12/net-ipv6-rpl_iptunnel-mitigate-2-realloc-issue.patch new file mode 100644 index 0000000000..93e88fd540 --- /dev/null +++ b/queue-6.12/net-ipv6-rpl_iptunnel-mitigate-2-realloc-issue.patch @@ -0,0 +1,154 @@ +From 4d332b0718bdb32b3cae5af26c04ace8e1cc93ea Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 3 Dec 2024 13:49:45 +0100 +Subject: net: ipv6: rpl_iptunnel: mitigate 2-realloc issue + +From: Justin Iurman + +[ Upstream commit 985ec6f5e6235242191370628acb73d7a9f0c0ea ] + +This patch mitigates the two-reallocations issue with rpl_iptunnel by +providing the dst_entry (in the cache) to the first call to +skb_cow_head(). As a result, the very first iteration would still +trigger two reallocations (i.e., empty cache), while next iterations +would only trigger a single reallocation. + +Performance tests before/after applying this patch, which clearly shows +there is no impact (it even shows improvement): +- before: https://ibb.co/nQJhqwc +- after: https://ibb.co/4ZvW6wV + +Signed-off-by: Justin Iurman +Cc: Alexander Aring +Signed-off-by: Paolo Abeni +Stable-dep-of: 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels") +Signed-off-by: Sasha Levin +--- + net/ipv6/rpl_iptunnel.c | 46 ++++++++++++++++++++++------------------- + 1 file changed, 25 insertions(+), 21 deletions(-) + +diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c +index db3c19a42e1ca..7ba22d2f2bfef 100644 +--- a/net/ipv6/rpl_iptunnel.c ++++ b/net/ipv6/rpl_iptunnel.c +@@ -125,7 +125,8 @@ static void rpl_destroy_state(struct lwtunnel_state *lwt) + } + + static int rpl_do_srh_inline(struct sk_buff *skb, const struct rpl_lwt *rlwt, +- const struct ipv6_rpl_sr_hdr *srh) ++ const struct ipv6_rpl_sr_hdr *srh, ++ struct dst_entry *cache_dst) + { + struct ipv6_rpl_sr_hdr *isrh, *csrh; + const struct ipv6hdr *oldhdr; +@@ -153,7 +154,7 @@ static int rpl_do_srh_inline(struct sk_buff *skb, const struct rpl_lwt *rlwt, + + hdrlen = ((csrh->hdrlen + 1) << 3); + +- err = skb_cow_head(skb, hdrlen + skb->mac_len); ++ err = skb_cow_head(skb, hdrlen + dst_dev_overhead(cache_dst, skb)); + if (unlikely(err)) { + kfree(buf); + return err; +@@ -186,7 +187,8 @@ static int rpl_do_srh_inline(struct sk_buff *skb, const struct rpl_lwt *rlwt, + return 0; + } + +-static int rpl_do_srh(struct sk_buff *skb, const struct rpl_lwt *rlwt) ++static int rpl_do_srh(struct sk_buff *skb, const struct rpl_lwt *rlwt, ++ struct dst_entry *cache_dst) + { + struct dst_entry *dst = skb_dst(skb); + struct rpl_iptunnel_encap *tinfo; +@@ -196,7 +198,7 @@ static int rpl_do_srh(struct sk_buff *skb, const struct rpl_lwt *rlwt) + + tinfo = rpl_encap_lwtunnel(dst->lwtstate); + +- return rpl_do_srh_inline(skb, rlwt, tinfo->srh); ++ return rpl_do_srh_inline(skb, rlwt, tinfo->srh, cache_dst); + } + + static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb) +@@ -208,14 +210,14 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb) + + rlwt = rpl_lwt_lwtunnel(orig_dst->lwtstate); + +- err = rpl_do_srh(skb, rlwt); +- if (unlikely(err)) +- goto drop; +- + local_bh_disable(); + dst = dst_cache_get(&rlwt->cache); + local_bh_enable(); + ++ err = rpl_do_srh(skb, rlwt, dst); ++ if (unlikely(err)) ++ goto drop; ++ + if (unlikely(!dst)) { + struct ipv6hdr *hdr = ipv6_hdr(skb); + struct flowi6 fl6; +@@ -237,15 +239,15 @@ static int rpl_output(struct net *net, struct sock *sk, struct sk_buff *skb) + local_bh_disable(); + dst_cache_set_ip6(&rlwt->cache, dst, &fl6.saddr); + local_bh_enable(); ++ ++ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); ++ if (unlikely(err)) ++ goto drop; + } + + skb_dst_drop(skb); + skb_dst_set(skb, dst); + +- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); +- if (unlikely(err)) +- goto drop; +- + return dst_output(net, sk, skb); + + drop: +@@ -262,29 +264,31 @@ static int rpl_input(struct sk_buff *skb) + + rlwt = rpl_lwt_lwtunnel(orig_dst->lwtstate); + +- err = rpl_do_srh(skb, rlwt); +- if (unlikely(err)) +- goto drop; +- + local_bh_disable(); + dst = dst_cache_get(&rlwt->cache); ++ local_bh_enable(); ++ ++ err = rpl_do_srh(skb, rlwt, dst); ++ if (unlikely(err)) ++ goto drop; + + if (!dst) { + ip6_route_input(skb); + dst = skb_dst(skb); + if (!dst->error) { ++ local_bh_disable(); + dst_cache_set_ip6(&rlwt->cache, dst, + &ipv6_hdr(skb)->saddr); ++ local_bh_enable(); + } ++ ++ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); ++ if (unlikely(err)) ++ goto drop; + } else { + skb_dst_drop(skb); + skb_dst_set(skb, dst); + } +- local_bh_enable(); +- +- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); +- if (unlikely(err)) +- goto drop; + + return dst_input(skb); + +-- +2.39.5 + diff --git a/queue-6.12/net-ipv6-seg6_iptunnel-mitigate-2-realloc-issue.patch b/queue-6.12/net-ipv6-seg6_iptunnel-mitigate-2-realloc-issue.patch new file mode 100644 index 0000000000..5d0e6034f4 --- /dev/null +++ b/queue-6.12/net-ipv6-seg6_iptunnel-mitigate-2-realloc-issue.patch @@ -0,0 +1,254 @@ +From 49b5da2d604b153b07f0b042c934c38cf1c379c8 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 3 Dec 2024 13:49:44 +0100 +Subject: net: ipv6: seg6_iptunnel: mitigate 2-realloc issue + +From: Justin Iurman + +[ Upstream commit 40475b63761abb6f8fdef960d03228a08662c9c4 ] + +This patch mitigates the two-reallocations issue with seg6_iptunnel by +providing the dst_entry (in the cache) to the first call to +skb_cow_head(). As a result, the very first iteration would still +trigger two reallocations (i.e., empty cache), while next iterations +would only trigger a single reallocation. + +Performance tests before/after applying this patch, which clearly shows +the improvement: +- before: https://ibb.co/3Cg4sNH +- after: https://ibb.co/8rQ350r + +Signed-off-by: Justin Iurman +Cc: David Lebrun +Signed-off-by: Paolo Abeni +Stable-dep-of: 92191dd10730 ("net: ipv6: fix dst ref loops in rpl, seg6 and ioam6 lwtunnels") +Signed-off-by: Sasha Levin +--- + net/ipv6/seg6_iptunnel.c | 85 ++++++++++++++++++++++++---------------- + 1 file changed, 52 insertions(+), 33 deletions(-) + +diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c +index 098632adc9b5a..4bf937bfc2633 100644 +--- a/net/ipv6/seg6_iptunnel.c ++++ b/net/ipv6/seg6_iptunnel.c +@@ -124,8 +124,8 @@ static __be32 seg6_make_flowlabel(struct net *net, struct sk_buff *skb, + return flowlabel; + } + +-/* encapsulate an IPv6 packet within an outer IPv6 header with a given SRH */ +-int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto) ++static int __seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, ++ int proto, struct dst_entry *cache_dst) + { + struct dst_entry *dst = skb_dst(skb); + struct net *net = dev_net(dst->dev); +@@ -137,7 +137,7 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto) + hdrlen = (osrh->hdrlen + 1) << 3; + tot_len = hdrlen + sizeof(*hdr); + +- err = skb_cow_head(skb, tot_len + skb->mac_len); ++ err = skb_cow_head(skb, tot_len + dst_dev_overhead(cache_dst, skb)); + if (unlikely(err)) + return err; + +@@ -197,11 +197,18 @@ int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto) + + return 0; + } ++ ++/* encapsulate an IPv6 packet within an outer IPv6 header with a given SRH */ ++int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto) ++{ ++ return __seg6_do_srh_encap(skb, osrh, proto, NULL); ++} + EXPORT_SYMBOL_GPL(seg6_do_srh_encap); + + /* encapsulate an IPv6 packet within an outer IPv6 header with reduced SRH */ + static int seg6_do_srh_encap_red(struct sk_buff *skb, +- struct ipv6_sr_hdr *osrh, int proto) ++ struct ipv6_sr_hdr *osrh, int proto, ++ struct dst_entry *cache_dst) + { + __u8 first_seg = osrh->first_segment; + struct dst_entry *dst = skb_dst(skb); +@@ -230,7 +237,7 @@ static int seg6_do_srh_encap_red(struct sk_buff *skb, + + tot_len = red_hdrlen + sizeof(struct ipv6hdr); + +- err = skb_cow_head(skb, tot_len + skb->mac_len); ++ err = skb_cow_head(skb, tot_len + dst_dev_overhead(cache_dst, skb)); + if (unlikely(err)) + return err; + +@@ -317,8 +324,8 @@ static int seg6_do_srh_encap_red(struct sk_buff *skb, + return 0; + } + +-/* insert an SRH within an IPv6 packet, just after the IPv6 header */ +-int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh) ++static int __seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, ++ struct dst_entry *cache_dst) + { + struct ipv6hdr *hdr, *oldhdr; + struct ipv6_sr_hdr *isrh; +@@ -326,7 +333,7 @@ int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh) + + hdrlen = (osrh->hdrlen + 1) << 3; + +- err = skb_cow_head(skb, hdrlen + skb->mac_len); ++ err = skb_cow_head(skb, hdrlen + dst_dev_overhead(cache_dst, skb)); + if (unlikely(err)) + return err; + +@@ -369,9 +376,8 @@ int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh) + + return 0; + } +-EXPORT_SYMBOL_GPL(seg6_do_srh_inline); + +-static int seg6_do_srh(struct sk_buff *skb) ++static int seg6_do_srh(struct sk_buff *skb, struct dst_entry *cache_dst) + { + struct dst_entry *dst = skb_dst(skb); + struct seg6_iptunnel_encap *tinfo; +@@ -384,7 +390,7 @@ static int seg6_do_srh(struct sk_buff *skb) + if (skb->protocol != htons(ETH_P_IPV6)) + return -EINVAL; + +- err = seg6_do_srh_inline(skb, tinfo->srh); ++ err = __seg6_do_srh_inline(skb, tinfo->srh, cache_dst); + if (err) + return err; + break; +@@ -402,9 +408,11 @@ static int seg6_do_srh(struct sk_buff *skb) + return -EINVAL; + + if (tinfo->mode == SEG6_IPTUN_MODE_ENCAP) +- err = seg6_do_srh_encap(skb, tinfo->srh, proto); ++ err = __seg6_do_srh_encap(skb, tinfo->srh, ++ proto, cache_dst); + else +- err = seg6_do_srh_encap_red(skb, tinfo->srh, proto); ++ err = seg6_do_srh_encap_red(skb, tinfo->srh, ++ proto, cache_dst); + + if (err) + return err; +@@ -425,11 +433,13 @@ static int seg6_do_srh(struct sk_buff *skb) + skb_push(skb, skb->mac_len); + + if (tinfo->mode == SEG6_IPTUN_MODE_L2ENCAP) +- err = seg6_do_srh_encap(skb, tinfo->srh, +- IPPROTO_ETHERNET); ++ err = __seg6_do_srh_encap(skb, tinfo->srh, ++ IPPROTO_ETHERNET, ++ cache_dst); + else + err = seg6_do_srh_encap_red(skb, tinfo->srh, +- IPPROTO_ETHERNET); ++ IPPROTO_ETHERNET, ++ cache_dst); + + if (err) + return err; +@@ -444,6 +454,13 @@ static int seg6_do_srh(struct sk_buff *skb) + return 0; + } + ++/* insert an SRH within an IPv6 packet, just after the IPv6 header */ ++int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh) ++{ ++ return __seg6_do_srh_inline(skb, osrh, NULL); ++} ++EXPORT_SYMBOL_GPL(seg6_do_srh_inline); ++ + static int seg6_input_finish(struct net *net, struct sock *sk, + struct sk_buff *skb) + { +@@ -458,31 +475,33 @@ static int seg6_input_core(struct net *net, struct sock *sk, + struct seg6_lwt *slwt; + int err; + +- err = seg6_do_srh(skb); +- if (unlikely(err)) +- goto drop; +- + slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate); + + local_bh_disable(); + dst = dst_cache_get(&slwt->cache); ++ local_bh_enable(); ++ ++ err = seg6_do_srh(skb, dst); ++ if (unlikely(err)) ++ goto drop; + + if (!dst) { + ip6_route_input(skb); + dst = skb_dst(skb); + if (!dst->error) { ++ local_bh_disable(); + dst_cache_set_ip6(&slwt->cache, dst, + &ipv6_hdr(skb)->saddr); ++ local_bh_enable(); + } ++ ++ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); ++ if (unlikely(err)) ++ goto drop; + } else { + skb_dst_drop(skb); + skb_dst_set(skb, dst); + } +- local_bh_enable(); +- +- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); +- if (unlikely(err)) +- goto drop; + + if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled)) + return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT, +@@ -528,16 +547,16 @@ static int seg6_output_core(struct net *net, struct sock *sk, + struct seg6_lwt *slwt; + int err; + +- err = seg6_do_srh(skb); +- if (unlikely(err)) +- goto drop; +- + slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate); + + local_bh_disable(); + dst = dst_cache_get(&slwt->cache); + local_bh_enable(); + ++ err = seg6_do_srh(skb, dst); ++ if (unlikely(err)) ++ goto drop; ++ + if (unlikely(!dst)) { + struct ipv6hdr *hdr = ipv6_hdr(skb); + struct flowi6 fl6; +@@ -559,15 +578,15 @@ static int seg6_output_core(struct net *net, struct sock *sk, + local_bh_disable(); + dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr); + local_bh_enable(); ++ ++ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); ++ if (unlikely(err)) ++ goto drop; + } + + skb_dst_drop(skb); + skb_dst_set(skb, dst); + +- err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); +- if (unlikely(err)) +- goto drop; +- + if (static_branch_unlikely(&nf_hooks_lwtunnel_enabled)) + return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk, skb, + NULL, skb_dst(skb)->dev, dst_output); +-- +2.39.5 + diff --git a/queue-6.12/openvswitch-use-rcu-protection-in-ovs_vport_cmd_fill.patch b/queue-6.12/openvswitch-use-rcu-protection-in-ovs_vport_cmd_fill.patch new file mode 100644 index 0000000000..2ee39b49e2 --- /dev/null +++ b/queue-6.12/openvswitch-use-rcu-protection-in-ovs_vport_cmd_fill.patch @@ -0,0 +1,66 @@ +From 38f927398ade60eb5687c7a62e805ff497f55810 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Fri, 7 Feb 2025 13:58:37 +0000 +Subject: openvswitch: use RCU protection in ovs_vport_cmd_fill_info() + +From: Eric Dumazet + +[ Upstream commit 90b2f49a502fa71090d9f4fe29a2f51fe5dff76d ] + +ovs_vport_cmd_fill_info() can be called without RTNL or RCU. + +Use RCU protection and dev_net_rcu() to avoid potential UAF. + +Fixes: 9354d4520342 ("openvswitch: reliable interface indentification in port dumps") +Signed-off-by: Eric Dumazet +Reviewed-by: Kuniyuki Iwashima +Link: https://patch.msgid.link/20250207135841.1948589-6-edumazet@google.com +Signed-off-by: Jakub Kicinski +Signed-off-by: Sasha Levin +--- + net/openvswitch/datapath.c | 12 +++++++++--- + 1 file changed, 9 insertions(+), 3 deletions(-) + +diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c +index 78d9961fcd446..8d3c01f0e2aa1 100644 +--- a/net/openvswitch/datapath.c ++++ b/net/openvswitch/datapath.c +@@ -2102,6 +2102,7 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb, + { + struct ovs_header *ovs_header; + struct ovs_vport_stats vport_stats; ++ struct net *net_vport; + int err; + + ovs_header = genlmsg_put(skb, portid, seq, &dp_vport_genl_family, +@@ -2118,12 +2119,15 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb, + nla_put_u32(skb, OVS_VPORT_ATTR_IFINDEX, vport->dev->ifindex)) + goto nla_put_failure; + +- if (!net_eq(net, dev_net(vport->dev))) { +- int id = peernet2id_alloc(net, dev_net(vport->dev), gfp); ++ rcu_read_lock(); ++ net_vport = dev_net_rcu(vport->dev); ++ if (!net_eq(net, net_vport)) { ++ int id = peernet2id_alloc(net, net_vport, GFP_ATOMIC); + + if (nla_put_s32(skb, OVS_VPORT_ATTR_NETNSID, id)) +- goto nla_put_failure; ++ goto nla_put_failure_unlock; + } ++ rcu_read_unlock(); + + ovs_vport_get_stats(vport, &vport_stats); + if (nla_put_64bit(skb, OVS_VPORT_ATTR_STATS, +@@ -2144,6 +2148,8 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb, + genlmsg_end(skb, ovs_header); + return 0; + ++nla_put_failure_unlock: ++ rcu_read_unlock(); + nla_put_failure: + err = -EMSGSIZE; + error: +-- +2.39.5 + diff --git a/queue-6.12/rust-kbuild-add-fzero-init-padding-bits-to-bindgen_s.patch b/queue-6.12/rust-kbuild-add-fzero-init-padding-bits-to-bindgen_s.patch new file mode 100644 index 0000000000..ed2850c879 --- /dev/null +++ b/queue-6.12/rust-kbuild-add-fzero-init-padding-bits-to-bindgen_s.patch @@ -0,0 +1,42 @@ +From 5d7442285575e56ad8f868e8cac77b5a9a0f800d Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Wed, 29 Jan 2025 14:50:02 -0700 +Subject: rust: kbuild: add -fzero-init-padding-bits to bindgen_skip_cflags + +From: Justin M. Forbes + +[ Upstream commit a9c621a217128eb3fb7522cf763992d9437fd5ba ] + +This seems to break the build when building with gcc15: + + Unable to generate bindings: ClangDiagnostic("error: unknown + argument: '-fzero-init-padding-bits=all'\n") + +Thus skip that flag. + +Signed-off-by: Justin M. Forbes +Fixes: dce4aab8441d ("kbuild: Use -fzero-init-padding-bits=all") +Reviewed-by: Kees Cook +Link: https://lore.kernel.org/r/20250129215003.1736127-1-jforbes@fedoraproject.org +[ Slightly reworded commit. - Miguel ] +Signed-off-by: Miguel Ojeda +Signed-off-by: Sasha Levin +--- + rust/Makefile | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/rust/Makefile b/rust/Makefile +index 9f59baacaf773..45779a064fa4f 100644 +--- a/rust/Makefile ++++ b/rust/Makefile +@@ -229,6 +229,7 @@ bindgen_skip_c_flags := -mno-fp-ret-in-387 -mpreferred-stack-boundary=% \ + -fzero-call-used-regs=% -fno-stack-clash-protection \ + -fno-inline-functions-called-once -fsanitize=bounds-strict \ + -fstrict-flex-arrays=% -fmin-function-alignment=% \ ++ -fzero-init-padding-bits=% \ + --param=% --param asan-% + + # Derived from `scripts/Makefile.clang`. +-- +2.39.5 + diff --git a/queue-6.12/scsi-ufs-core-introduce-a-new-clock_gating-lock.patch b/queue-6.12/scsi-ufs-core-introduce-a-new-clock_gating-lock.patch new file mode 100644 index 0000000000..1535a6da99 --- /dev/null +++ b/queue-6.12/scsi-ufs-core-introduce-a-new-clock_gating-lock.patch @@ -0,0 +1,336 @@ +From 479bee46067a69f86ef8b10ee99dbc2a7c765a73 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Sun, 24 Nov 2024 09:08:07 +0200 +Subject: scsi: ufs: core: Introduce a new clock_gating lock + +From: Avri Altman + +[ Upstream commit 209f4e43b8068c24cde227f464111030430153fa ] + +Introduce a new clock gating lock to serialize access to some of the clock +gating members instead of the host_lock. + +While at it, simplify the code with the guard() macro and co for automatic +cleanup of the new lock. There are some explicit +spin_lock_irqsave()/spin_unlock_irqrestore() snaking instances I left +behind because I couldn't make heads or tails of it. + +Additionally, move the trace_ufshcd_clk_gating() call from inside the +region protected by the lock as it doesn't needs protection. + +Signed-off-by: Avri Altman +Link: https://lore.kernel.org/r/20241124070808.194860-4-avri.altman@wdc.com +Reviewed-by: Bart Van Assche +Signed-off-by: Martin K. Petersen +Stable-dep-of: 839a74b5649c ("scsi: ufs: Fix toggling of clk_gating.state when clock gating is not allowed") +Signed-off-by: Sasha Levin +--- + drivers/ufs/core/ufshcd.c | 109 ++++++++++++++++++-------------------- + include/ufs/ufshcd.h | 9 +++- + 2 files changed, 59 insertions(+), 59 deletions(-) + +diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c +index 217619d64940e..5682fdcbf2da5 100644 +--- a/drivers/ufs/core/ufshcd.c ++++ b/drivers/ufs/core/ufshcd.c +@@ -1840,19 +1840,16 @@ static void ufshcd_exit_clk_scaling(struct ufs_hba *hba) + static void ufshcd_ungate_work(struct work_struct *work) + { + int ret; +- unsigned long flags; + struct ufs_hba *hba = container_of(work, struct ufs_hba, + clk_gating.ungate_work); + + cancel_delayed_work_sync(&hba->clk_gating.gate_work); + +- spin_lock_irqsave(hba->host->host_lock, flags); +- if (hba->clk_gating.state == CLKS_ON) { +- spin_unlock_irqrestore(hba->host->host_lock, flags); +- return; ++ scoped_guard(spinlock_irqsave, &hba->clk_gating.lock) { ++ if (hba->clk_gating.state == CLKS_ON) ++ return; + } + +- spin_unlock_irqrestore(hba->host->host_lock, flags); + ufshcd_hba_vreg_set_hpm(hba); + ufshcd_setup_clocks(hba, true); + +@@ -1887,7 +1884,7 @@ void ufshcd_hold(struct ufs_hba *hba) + if (!ufshcd_is_clkgating_allowed(hba) || + !hba->clk_gating.is_initialized) + return; +- spin_lock_irqsave(hba->host->host_lock, flags); ++ spin_lock_irqsave(&hba->clk_gating.lock, flags); + hba->clk_gating.active_reqs++; + + start: +@@ -1903,11 +1900,11 @@ void ufshcd_hold(struct ufs_hba *hba) + */ + if (ufshcd_can_hibern8_during_gating(hba) && + ufshcd_is_link_hibern8(hba)) { +- spin_unlock_irqrestore(hba->host->host_lock, flags); ++ spin_unlock_irqrestore(&hba->clk_gating.lock, flags); + flush_result = flush_work(&hba->clk_gating.ungate_work); + if (hba->clk_gating.is_suspended && !flush_result) + return; +- spin_lock_irqsave(hba->host->host_lock, flags); ++ spin_lock_irqsave(&hba->clk_gating.lock, flags); + goto start; + } + break; +@@ -1936,17 +1933,17 @@ void ufshcd_hold(struct ufs_hba *hba) + */ + fallthrough; + case REQ_CLKS_ON: +- spin_unlock_irqrestore(hba->host->host_lock, flags); ++ spin_unlock_irqrestore(&hba->clk_gating.lock, flags); + flush_work(&hba->clk_gating.ungate_work); + /* Make sure state is CLKS_ON before returning */ +- spin_lock_irqsave(hba->host->host_lock, flags); ++ spin_lock_irqsave(&hba->clk_gating.lock, flags); + goto start; + default: + dev_err(hba->dev, "%s: clk gating is in invalid state %d\n", + __func__, hba->clk_gating.state); + break; + } +- spin_unlock_irqrestore(hba->host->host_lock, flags); ++ spin_unlock_irqrestore(&hba->clk_gating.lock, flags); + } + EXPORT_SYMBOL_GPL(ufshcd_hold); + +@@ -1954,30 +1951,32 @@ static void ufshcd_gate_work(struct work_struct *work) + { + struct ufs_hba *hba = container_of(work, struct ufs_hba, + clk_gating.gate_work.work); +- unsigned long flags; + int ret; + +- spin_lock_irqsave(hba->host->host_lock, flags); +- /* +- * In case you are here to cancel this work the gating state +- * would be marked as REQ_CLKS_ON. In this case save time by +- * skipping the gating work and exit after changing the clock +- * state to CLKS_ON. +- */ +- if (hba->clk_gating.is_suspended || +- (hba->clk_gating.state != REQ_CLKS_OFF)) { +- hba->clk_gating.state = CLKS_ON; +- trace_ufshcd_clk_gating(dev_name(hba->dev), +- hba->clk_gating.state); +- goto rel_lock; +- } ++ scoped_guard(spinlock_irqsave, &hba->clk_gating.lock) { ++ /* ++ * In case you are here to cancel this work the gating state ++ * would be marked as REQ_CLKS_ON. In this case save time by ++ * skipping the gating work and exit after changing the clock ++ * state to CLKS_ON. ++ */ ++ if (hba->clk_gating.is_suspended || ++ hba->clk_gating.state != REQ_CLKS_OFF) { ++ hba->clk_gating.state = CLKS_ON; ++ trace_ufshcd_clk_gating(dev_name(hba->dev), ++ hba->clk_gating.state); ++ return; ++ } + +- if (ufshcd_is_ufs_dev_busy(hba) || +- hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL || +- hba->clk_gating.active_reqs) +- goto rel_lock; ++ if (hba->clk_gating.active_reqs) ++ return; ++ } + +- spin_unlock_irqrestore(hba->host->host_lock, flags); ++ scoped_guard(spinlock_irqsave, hba->host->host_lock) { ++ if (ufshcd_is_ufs_dev_busy(hba) || ++ hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) ++ return; ++ } + + /* put the link into hibern8 mode before turning off clocks */ + if (ufshcd_can_hibern8_during_gating(hba)) { +@@ -1988,7 +1987,7 @@ static void ufshcd_gate_work(struct work_struct *work) + __func__, ret); + trace_ufshcd_clk_gating(dev_name(hba->dev), + hba->clk_gating.state); +- goto out; ++ return; + } + ufshcd_set_link_hibern8(hba); + } +@@ -2008,32 +2007,34 @@ static void ufshcd_gate_work(struct work_struct *work) + * prevent from doing cancel work multiple times when there are + * new requests arriving before the current cancel work is done. + */ +- spin_lock_irqsave(hba->host->host_lock, flags); ++ guard(spinlock_irqsave)(&hba->clk_gating.lock); + if (hba->clk_gating.state == REQ_CLKS_OFF) { + hba->clk_gating.state = CLKS_OFF; + trace_ufshcd_clk_gating(dev_name(hba->dev), + hba->clk_gating.state); + } +-rel_lock: +- spin_unlock_irqrestore(hba->host->host_lock, flags); +-out: +- return; + } + +-/* host lock must be held before calling this variant */ + static void __ufshcd_release(struct ufs_hba *hba) + { ++ lockdep_assert_held(&hba->clk_gating.lock); ++ + if (!ufshcd_is_clkgating_allowed(hba)) + return; + + hba->clk_gating.active_reqs--; + + if (hba->clk_gating.active_reqs || hba->clk_gating.is_suspended || +- hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL || +- ufshcd_has_pending_tasks(hba) || !hba->clk_gating.is_initialized || ++ !hba->clk_gating.is_initialized || + hba->clk_gating.state == CLKS_OFF) + return; + ++ scoped_guard(spinlock_irqsave, hba->host->host_lock) { ++ if (ufshcd_has_pending_tasks(hba) || ++ hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) ++ return; ++ } ++ + hba->clk_gating.state = REQ_CLKS_OFF; + trace_ufshcd_clk_gating(dev_name(hba->dev), hba->clk_gating.state); + queue_delayed_work(hba->clk_gating.clk_gating_workq, +@@ -2043,11 +2044,8 @@ static void __ufshcd_release(struct ufs_hba *hba) + + void ufshcd_release(struct ufs_hba *hba) + { +- unsigned long flags; +- +- spin_lock_irqsave(hba->host->host_lock, flags); ++ guard(spinlock_irqsave)(&hba->clk_gating.lock); + __ufshcd_release(hba); +- spin_unlock_irqrestore(hba->host->host_lock, flags); + } + EXPORT_SYMBOL_GPL(ufshcd_release); + +@@ -2062,11 +2060,9 @@ static ssize_t ufshcd_clkgate_delay_show(struct device *dev, + void ufshcd_clkgate_delay_set(struct device *dev, unsigned long value) + { + struct ufs_hba *hba = dev_get_drvdata(dev); +- unsigned long flags; + +- spin_lock_irqsave(hba->host->host_lock, flags); ++ guard(spinlock_irqsave)(&hba->clk_gating.lock); + hba->clk_gating.delay_ms = value; +- spin_unlock_irqrestore(hba->host->host_lock, flags); + } + EXPORT_SYMBOL_GPL(ufshcd_clkgate_delay_set); + +@@ -2094,7 +2090,6 @@ static ssize_t ufshcd_clkgate_enable_store(struct device *dev, + struct device_attribute *attr, const char *buf, size_t count) + { + struct ufs_hba *hba = dev_get_drvdata(dev); +- unsigned long flags; + u32 value; + + if (kstrtou32(buf, 0, &value)) +@@ -2102,9 +2097,10 @@ static ssize_t ufshcd_clkgate_enable_store(struct device *dev, + + value = !!value; + +- spin_lock_irqsave(hba->host->host_lock, flags); ++ guard(spinlock_irqsave)(&hba->clk_gating.lock); ++ + if (value == hba->clk_gating.is_enabled) +- goto out; ++ return count; + + if (value) + __ufshcd_release(hba); +@@ -2112,8 +2108,7 @@ static ssize_t ufshcd_clkgate_enable_store(struct device *dev, + hba->clk_gating.active_reqs++; + + hba->clk_gating.is_enabled = value; +-out: +- spin_unlock_irqrestore(hba->host->host_lock, flags); ++ + return count; + } + +@@ -2155,6 +2150,8 @@ static void ufshcd_init_clk_gating(struct ufs_hba *hba) + INIT_DELAYED_WORK(&hba->clk_gating.gate_work, ufshcd_gate_work); + INIT_WORK(&hba->clk_gating.ungate_work, ufshcd_ungate_work); + ++ spin_lock_init(&hba->clk_gating.lock); ++ + hba->clk_gating.clk_gating_workq = alloc_ordered_workqueue( + "ufs_clk_gating_%d", WQ_MEM_RECLAIM | WQ_HIGHPRI, + hba->host->host_no); +@@ -9194,7 +9191,6 @@ static int ufshcd_setup_clocks(struct ufs_hba *hba, bool on) + int ret = 0; + struct ufs_clk_info *clki; + struct list_head *head = &hba->clk_list_head; +- unsigned long flags; + ktime_t start = ktime_get(); + bool clk_state_changed = false; + +@@ -9245,11 +9241,10 @@ static int ufshcd_setup_clocks(struct ufs_hba *hba, bool on) + clk_disable_unprepare(clki->clk); + } + } else if (!ret && on) { +- spin_lock_irqsave(hba->host->host_lock, flags); +- hba->clk_gating.state = CLKS_ON; ++ scoped_guard(spinlock_irqsave, &hba->clk_gating.lock) ++ hba->clk_gating.state = CLKS_ON; + trace_ufshcd_clk_gating(dev_name(hba->dev), + hba->clk_gating.state); +- spin_unlock_irqrestore(hba->host->host_lock, flags); + } + + if (clk_state_changed) +diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h +index d5e43a1dcff22..47cba116f87b8 100644 +--- a/include/ufs/ufshcd.h ++++ b/include/ufs/ufshcd.h +@@ -402,6 +402,9 @@ enum clk_gating_state { + * delay_ms + * @ungate_work: worker to turn on clocks that will be used in case of + * interrupt context ++ * @clk_gating_workq: workqueue for clock gating work. ++ * @lock: serialize access to some struct ufs_clk_gating members. An outer lock ++ * relative to the host lock + * @state: the current clocks state + * @delay_ms: gating delay in ms + * @is_suspended: clk gating is suspended when set to 1 which can be used +@@ -412,11 +415,14 @@ enum clk_gating_state { + * @is_initialized: Indicates whether clock gating is initialized or not + * @active_reqs: number of requests that are pending and should be waited for + * completion before gating clocks. +- * @clk_gating_workq: workqueue for clock gating work. + */ + struct ufs_clk_gating { + struct delayed_work gate_work; + struct work_struct ungate_work; ++ struct workqueue_struct *clk_gating_workq; ++ ++ spinlock_t lock; ++ + enum clk_gating_state state; + unsigned long delay_ms; + bool is_suspended; +@@ -425,7 +431,6 @@ struct ufs_clk_gating { + bool is_enabled; + bool is_initialized; + int active_reqs; +- struct workqueue_struct *clk_gating_workq; + }; + + /** +-- +2.39.5 + diff --git a/queue-6.12/scsi-ufs-core-introduce-ufshcd_has_pending_tasks.patch b/queue-6.12/scsi-ufs-core-introduce-ufshcd_has_pending_tasks.patch new file mode 100644 index 0000000000..885173f199 --- /dev/null +++ b/queue-6.12/scsi-ufs-core-introduce-ufshcd_has_pending_tasks.patch @@ -0,0 +1,58 @@ +From 12b6e46a31da130d4f8de87ff66b6441ef65db02 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Sun, 24 Nov 2024 09:08:05 +0200 +Subject: scsi: ufs: core: Introduce ufshcd_has_pending_tasks() + +From: Avri Altman + +[ Upstream commit e738ba458e7539be1757dcdf85835a5c7b11fad4 ] + +Prepare to remove hba->clk_gating.active_reqs check from +ufshcd_is_ufs_dev_busy(). + +Signed-off-by: Avri Altman +Link: https://lore.kernel.org/r/20241124070808.194860-2-avri.altman@wdc.com +Reviewed-by: Bart Van Assche +Signed-off-by: Martin K. Petersen +Stable-dep-of: 839a74b5649c ("scsi: ufs: Fix toggling of clk_gating.state when clock gating is not allowed") +Signed-off-by: Sasha Levin +--- + drivers/ufs/core/ufshcd.c | 13 +++++++++---- + 1 file changed, 9 insertions(+), 4 deletions(-) + +diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c +index b786cba9a270f..94d7992457a3b 100644 +--- a/drivers/ufs/core/ufshcd.c ++++ b/drivers/ufs/core/ufshcd.c +@@ -258,10 +258,16 @@ ufs_get_desired_pm_lvl_for_dev_link_state(enum ufs_dev_pwr_mode dev_state, + return UFS_PM_LVL_0; + } + ++static bool ufshcd_has_pending_tasks(struct ufs_hba *hba) ++{ ++ return hba->outstanding_tasks || hba->active_uic_cmd || ++ hba->uic_async_done; ++} ++ + static bool ufshcd_is_ufs_dev_busy(struct ufs_hba *hba) + { +- return (hba->clk_gating.active_reqs || hba->outstanding_reqs || hba->outstanding_tasks || +- hba->active_uic_cmd || hba->uic_async_done); ++ return hba->clk_gating.active_reqs || hba->outstanding_reqs || ++ ufshcd_has_pending_tasks(hba); + } + + static const struct ufs_dev_quirk ufs_fixups[] = { +@@ -2023,8 +2029,7 @@ static void __ufshcd_release(struct ufs_hba *hba) + + if (hba->clk_gating.active_reqs || hba->clk_gating.is_suspended || + hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL || +- hba->outstanding_tasks || !hba->clk_gating.is_initialized || +- hba->active_uic_cmd || hba->uic_async_done || ++ ufshcd_has_pending_tasks(hba) || !hba->clk_gating.is_initialized || + hba->clk_gating.state == CLKS_OFF) + return; + +-- +2.39.5 + diff --git a/queue-6.12/scsi-ufs-core-prepare-to-introduce-a-new-clock_gatin.patch b/queue-6.12/scsi-ufs-core-prepare-to-introduce-a-new-clock_gatin.patch new file mode 100644 index 0000000000..3ee909dd51 --- /dev/null +++ b/queue-6.12/scsi-ufs-core-prepare-to-introduce-a-new-clock_gatin.patch @@ -0,0 +1,61 @@ +From ac2299a3755b2c55968ad57ae2d0676a5d10ade6 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Sun, 24 Nov 2024 09:08:06 +0200 +Subject: scsi: ufs: core: Prepare to introduce a new clock_gating lock + +From: Avri Altman + +[ Upstream commit 7869c6521f5715688b3d1f1c897374a68544eef0 ] + +Remove hba->clk_gating.active_reqs check from ufshcd_is_ufs_dev_busy() +function to separate clock gating logic from general device busy checks. + +Signed-off-by: Avri Altman +Link: https://lore.kernel.org/r/20241124070808.194860-3-avri.altman@wdc.com +Reviewed-by: Bart Van Assche +Signed-off-by: Martin K. Petersen +Stable-dep-of: 839a74b5649c ("scsi: ufs: Fix toggling of clk_gating.state when clock gating is not allowed") +Signed-off-by: Sasha Levin +--- + drivers/ufs/core/ufshcd.c | 11 +++++++---- + 1 file changed, 7 insertions(+), 4 deletions(-) + +diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c +index 94d7992457a3b..217619d64940e 100644 +--- a/drivers/ufs/core/ufshcd.c ++++ b/drivers/ufs/core/ufshcd.c +@@ -266,8 +266,7 @@ static bool ufshcd_has_pending_tasks(struct ufs_hba *hba) + + static bool ufshcd_is_ufs_dev_busy(struct ufs_hba *hba) + { +- return hba->clk_gating.active_reqs || hba->outstanding_reqs || +- ufshcd_has_pending_tasks(hba); ++ return hba->outstanding_reqs || ufshcd_has_pending_tasks(hba); + } + + static const struct ufs_dev_quirk ufs_fixups[] = { +@@ -1973,7 +1972,9 @@ static void ufshcd_gate_work(struct work_struct *work) + goto rel_lock; + } + +- if (ufshcd_is_ufs_dev_busy(hba) || hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) ++ if (ufshcd_is_ufs_dev_busy(hba) || ++ hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL || ++ hba->clk_gating.active_reqs) + goto rel_lock; + + spin_unlock_irqrestore(hba->host->host_lock, flags); +@@ -8272,7 +8273,9 @@ static void ufshcd_rtc_work(struct work_struct *work) + hba = container_of(to_delayed_work(work), struct ufs_hba, ufs_rtc_update_work); + + /* Update RTC only when there are no requests in progress and UFSHCI is operational */ +- if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL) ++ if (!ufshcd_is_ufs_dev_busy(hba) && ++ hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL && ++ !hba->clk_gating.active_reqs) + ufshcd_update_rtc(hba); + + if (ufshcd_is_ufs_dev_active(hba) && hba->dev_info.rtc_update_period) +-- +2.39.5 + diff --git a/queue-6.12/scsi-ufs-fix-toggling-of-clk_gating.state-when-clock.patch b/queue-6.12/scsi-ufs-fix-toggling-of-clk_gating.state-when-clock.patch new file mode 100644 index 0000000000..5c00da9200 --- /dev/null +++ b/queue-6.12/scsi-ufs-fix-toggling-of-clk_gating.state-when-clock.patch @@ -0,0 +1,48 @@ +From 0eb9778926430a8663ccd7169436f98b363a6bf2 Mon Sep 17 00:00:00 2001 +From: Sasha Levin +Date: Tue, 28 Jan 2025 09:12:07 +0200 +Subject: scsi: ufs: Fix toggling of clk_gating.state when clock gating is not + allowed + +From: Avri Altman + +[ Upstream commit 839a74b5649c9f41d939a05059b5ca6b17156d03 ] + +This commit addresses an issue where clk_gating.state is being toggled in +ufshcd_setup_clocks() even if clock gating is not allowed. + +The fix is to add a check for hba->clk_gating.is_initialized before toggling +clk_gating.state in ufshcd_setup_clocks(). + +Since clk_gating.lock is now initialized unconditionally, it can no longer +lead to the spinlock being used before it is properly initialized, but +instead it is mostly for documentation purposes. + +Fixes: 1ab27c9cf8b6 ("ufs: Add support for clock gating") +Reported-by: Geert Uytterhoeven +Tested-by: Geert Uytterhoeven +Signed-off-by: Avri Altman +Link: https://lore.kernel.org/r/20250128071207.75494-3-avri.altman@wdc.com +Reviewed-by: Bart Van Assche +Signed-off-by: Martin K. Petersen +Signed-off-by: Sasha Levin +--- + drivers/ufs/core/ufshcd.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c +index 5682fdcbf2da5..a73fffd6c3de4 100644 +--- a/drivers/ufs/core/ufshcd.c ++++ b/drivers/ufs/core/ufshcd.c +@@ -9240,7 +9240,7 @@ static int ufshcd_setup_clocks(struct ufs_hba *hba, bool on) + if (!IS_ERR_OR_NULL(clki->clk) && clki->enabled) + clk_disable_unprepare(clki->clk); + } +- } else if (!ret && on) { ++ } else if (!ret && on && hba->clk_gating.is_initialized) { + scoped_guard(spinlock_irqsave, &hba->clk_gating.lock) + hba->clk_gating.state = CLKS_ON; + trace_ufshcd_clk_gating(dev_name(hba->dev), +-- +2.39.5 + diff --git a/queue-6.12/series b/queue-6.12/series index a470fc6288..34457c5d51 100644 --- a/queue-6.12/series +++ b/queue-6.12/series @@ -91,3 +91,45 @@ orangefs-fix-a-oob-in-orangefs_debug_write.patch kbuild-suppress-stdout-from-merge_config-for-silent-.patch asoc-intel-bytcr_rt5640-add-dmi-quirk-for-vexia-edu-.patch kbuild-use-fzero-init-padding-bits-all.patch +include-net-add-static-inline-dst_dev_overhead-to-ds.patch +net-ipv6-ioam6_iptunnel-mitigate-2-realloc-issue.patch +net-ipv6-seg6_iptunnel-mitigate-2-realloc-issue.patch +net-ipv6-rpl_iptunnel-mitigate-2-realloc-issue.patch +net-ipv6-fix-dst-ref-loops-in-rpl-seg6-and-ioam6-lwt.patch +clocksource-use-pr_info-for-checking-clocksource-syn.patch +clocksource-use-migrate_disable-to-avoid-calling-get.patch +scsi-ufs-core-introduce-ufshcd_has_pending_tasks.patch +scsi-ufs-core-prepare-to-introduce-a-new-clock_gatin.patch +scsi-ufs-core-introduce-a-new-clock_gating-lock.patch +scsi-ufs-fix-toggling-of-clk_gating.state-when-clock.patch +rust-kbuild-add-fzero-init-padding-bits-to-bindgen_s.patch +cpufreq-amd-pstate-call-cppc_set_epp_perf-in-the-ree.patch +cpufreq-amd-pstate-align-offline-flow-of-shared-memo.patch +cpufreq-amd-pstate-refactor-amd_pstate_epp_reenable-.patch +cpufreq-amd-pstate-remove-the-cppc_state-check-in-of.patch +cpufreq-amd-pstate-merge-amd_pstate_epp_cpu_offline-.patch +cpufreq-amd-pstate-convert-mutex-use-to-guard.patch +cpufreq-amd-pstate-fix-cpufreq_policy-ref-counting.patch +ipv4-add-rcu-protection-to-ip4_dst_hoplimit.patch +ipv4-use-rcu-protection-in-ip_dst_mtu_maybe_forward.patch +net-add-dev_net_rcu-helper.patch +ipv4-use-rcu-protection-in-ipv4_default_advmss.patch +ipv4-use-rcu-protection-in-rt_is_expired.patch +ipv4-use-rcu-protection-in-inet_select_addr.patch +net-ipv4-cache-pmtu-for-all-packet-paths-if-multipat.patch +ipv4-use-rcu-protection-in-__ip_rt_update_pmtu.patch +ipv4-icmp-convert-to-dev_net_rcu.patch +flow_dissector-use-rcu-protection-to-fetch-dev_net.patch +ipv6-use-rcu-protection-in-ip6_default_advmss.patch +ipv6-icmp-convert-to-dev_net_rcu.patch +hid-hid-steam-make-sure-rumble-work-is-canceled-on-r.patch +hid-hid-steam-move-hidraw-input-un-registering-to-wo.patch +ndisc-use-rcu-protection-in-ndisc_alloc_skb.patch +neighbour-use-rcu-protection-in-__neigh_notify.patch +arp-use-rcu-protection-in-arp_xmit.patch +openvswitch-use-rcu-protection-in-ovs_vport_cmd_fill.patch +ndisc-extend-rcu-protection-in-ndisc_send_skb.patch +ipv6-mcast-extend-rcu-protection-in-igmp6_send.patch +btrfs-rename-__get_extent_map-and-pass-btrfs_inode.patch +btrfs-fix-stale-page-cache-after-race-between-readah.patch +ipv6-mcast-add-rcu-protection-to-mld_newpack.patch