Andrea reported the dl_server getting stuck for him. He tracked it
down to a state where dl_server_start() saw dl_defer_running==1, but
the dl_server's job is no longer valid at the time of
dl_server_start().
In the state diagram this corresponds to [4] D->A (or dl_server_stop()
due to no more runnable tasks) followed by [1], which in case of a
lapsed deadline must then be A->B.
Now our A has dl_defer_running==1, while B demands
dl_defer_running==0, therefore it must get cleared when the CBS wakeup
rules demand a replenish.
Fixes: a110a81c52a9 ("sched/deadline: Deferrable dl server")
Reported-by: Andrea Righi arighi@nvidia.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Tested-by: Andrea Righi arighi@nvidia.com
Link: https://lkml.kernel.org/r/20260123161645.2181752-1-arighi@nvidia.com
Link: https://patch.msgid.link/20260130124100.GC1079264@noisy.programming.kicks-ass.net
return;
}
+ /*
+ * When [4] D->A is followed by [1] A->B, dl_defer_running
+ * needs to be cleared, otherwise it will fail to properly
+ * start the zero-laxity timer.
+ */
+ dl_se->dl_defer_running = 0;
replenish_dl_new_period(dl_se, rq);
} else if (dl_server(dl_se) && dl_se->dl_defer) {
/*
* dl_server_active = 1;
* enqueue_dl_entity()
* update_dl_entity(WAKEUP)
+ * if (dl_time_before() || dl_entity_overflow())
+ * dl_defer_running = 0;
+ * replenish_dl_new_period();
+ * // fwd period
+ * dl_throttled = 1;
+ * dl_defer_armed = 1;
* if (!dl_defer_running)
* dl_defer_armed = 1;
* dl_throttled = 1;