The attach_global_ctx_data() has O(N^2) algorithm to allocate the
context data for each thread. This caused perfomance problems on large
systems with O(100k) threads.
Because kmalloc(GFP_KERNEL) can go sleep it cannot be called under the
RCU lock. So let's try with GFP_NOWAIT first so that it can proceed in
normal cases.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260211223222.3119790-3-namhyung@kernel.org
cd = NULL;
}
if (!cd) {
+ /*
+ * Try to allocate context quickly before
+ * traversing the whole thread list again.
+ */
+ if (!attach_task_ctx_data(p, ctx_cache, true, GFP_NOWAIT))
+ continue;
get_task_struct(p);
goto alloc;
}