]> git.ipfire.org Git - thirdparty/kernel/linux.git/commitdiff
page_pool: avoid infinite loop to schedule delayed worker
authorJason Xing <kerneljasonxing@gmail.com>
Fri, 14 Feb 2025 06:42:50 +0000 (14:42 +0800)
committerPaolo Abeni <pabeni@redhat.com>
Tue, 18 Feb 2025 11:48:29 +0000 (12:48 +0100)
We noticed the kworker in page_pool_release_retry() was waken
up repeatedly and infinitely in production because of the
buggy driver causing the inflight less than 0 and warning
us in page_pool_inflight()[1].

Since the inflight value goes negative, it means we should
not expect the whole page_pool to get back to work normally.

This patch mitigates the adverse effect by not rescheduling
the kworker when detecting the inflight negative in
page_pool_release_retry().

[1]
[Mon Feb 10 20:36:11 2025] ------------[ cut here ]------------
[Mon Feb 10 20:36:11 2025] Negative(-51446) inflight packet-pages
...
[Mon Feb 10 20:36:11 2025] Call Trace:
[Mon Feb 10 20:36:11 2025]  page_pool_release_retry+0x23/0x70
[Mon Feb 10 20:36:11 2025]  process_one_work+0x1b1/0x370
[Mon Feb 10 20:36:11 2025]  worker_thread+0x37/0x3a0
[Mon Feb 10 20:36:11 2025]  kthread+0x11a/0x140
[Mon Feb 10 20:36:11 2025]  ? process_one_work+0x370/0x370
[Mon Feb 10 20:36:11 2025]  ? __kthread_cancel_work+0x40/0x40
[Mon Feb 10 20:36:11 2025]  ret_from_fork+0x35/0x40
[Mon Feb 10 20:36:11 2025] ---[ end trace ebffe800f33e7e34 ]---
Note: before this patch, the above calltrace would flood the
dmesg due to repeated reschedule of release_dw kworker.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250214064250.85987-1-kerneljasonxing@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
net/core/page_pool.c

index 1c6fec08bc43498c443b4278de0f910b793901db..acef1fcd8ddcfd1853a6f2055c1f1820ab248e8d 100644 (file)
@@ -1112,7 +1112,13 @@ static void page_pool_release_retry(struct work_struct *wq)
        int inflight;
 
        inflight = page_pool_release(pool);
-       if (!inflight)
+       /* In rare cases, a driver bug may cause inflight to go negative.
+        * Don't reschedule release if inflight is 0 or negative.
+        * - If 0, the page_pool has been destroyed
+        * - if negative, we will never recover
+        * in both cases no reschedule is necessary.
+        */
+       if (inflight <= 0)
                return;
 
        /* Periodic warning for page pools the user can't see */