BUG/MEDIUM: hlua_fcn/queue: bad pop_wait sequencing
I assumed that the hlua_yieldk() function used in queue:pop_wait()
function would eventually return when the continuation function would
return.
But this is wrong, the continuation function is simply called back by the
resume after the hlua_yieldk() which does not return in this case. The
caller is no longer the initial calling function, but Lua, so when the
continuation function eventually returns, it does not give the hand back
to the C calling function (queue:pop_wait()), like we're used to, but
directly to Lua which will continue the normal execution of the (Lua)
function that triggered the C-function, effectively bypassing the end
of the C calling function.
Because of this, the queue waiting list cleanup never occurs!
This causes some undesirable effects:
- pop_wait() will slowly leak over the time, because the allocated queue
waiting entry never gets deallocated when the function is finished
- queue:push() will become slower and slower because the wait list will
keep growing indefinitely as a result of the previous leak
- the task that performed at least 1 pop_wait() could suffer from
useless wakeups because it will stay indefinitely in the queue waiting
list, so every queue:push() will try to wake the task, even if the
task is not waiting for new queue items.
- last but not least, if the task that performed at least 1 pop_wait ends
or crashes, the next queue:push() will lead to invalid reads and
process crash because it will try to wakeup a ghost task that doesn't
exist anymore.
To fix this, the pop_wait function was reworked with the assumption that
the hlua_yieldk() with continuation function never returns. Indeed, it is
now the continuation function that will take care of the cleanup, instead
of the parent function.
This must be backported in 2.8 with
86fb22c5 ("MINOR: hlua_fcn: add Queue class")