During a workload involving conversions between lock modes PR and CW,
lock recovery can create a "conversion deadlock" state between locks
that have been recovered. When this occurs, kernel warning messages
are logged, e.g.
"dlm: WARN: pending deadlock 1e node 0 2 1bf21"
"dlm: receive_rcom_lock_args 2e middle convert gr 3 rq 2 remote 2 1e"
After this occurs, the deadlocked conversions both appear on the convert
queue of the resource being locked, and the conversion requests do not
complete.
Outside of recovery, conversions that would produce a deadlock are
resolved immediately, and return -EDEADLK. The locks are not placed
on the convert queue in the deadlocked state.
To fix this problem, an lkb under conversion between PR/CW is rebuilt
during recovery on a new master's granted queue, with the currently
granted mode, rather than being rebuilt on the new master's convert
queue, with the currently granted mode and the newly requested mode.
The in-progress convert is then resent to the new master after
recovery, so the conversion deadlock will be processed outside of
the recovery context and handled as described above.
Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>