When a device gets hot unplugged, we should de-activate and close the
renderer before freeing it.
Speficially this fixes a problem where plymouth's display "freezes" when
used with AMD GPUs which can be handled by both the radeon or the amdgpu
kms driver and the user has added kernel commandline options to prefer
the amdgpu driver. In this case the following happens:
1) radeon driver gets loaded, creates /dev/dri/card0 generating a udev
"add /dev/dri/card0" event.
2) radeon driver's "load" method checks if it should bind, comes to the
conclusion it should not, causing the probe of the device to be
aborted and /dev/dri/card0 to be unregistered generating a udev
"remove /dev/dri/card0" event
3) amdgpu driver loads, creates /dev/dri/card0 generating another udev
"add /dev/dri/card0" event.
4) plymouth loads while udev is still catching up with kernel events,
plymouth sees the first "add" event opens /dev/dri/card0 (which is
already managed by amdgpu at this time) finds outputs, shows splash
5) plymouth gets the "remove" events tears down the renderer, but
since deactivate and close were not called before this commit,
plymouth keeps the master rights attached to the old-fd which is
leaked at this point
6) plymouth gets the second "add" event brings everything back up, but
it cannot get master rights, since there can only be 1 master and the
leaked fd is the master. This leads to lots of:
"Couldn't set scan out buffer" errors and plymouth appears frozen
because of this, this is esp. bad with the disk encryption unlock screen
7) When gdm starts, it deactivates plymouth, but does not yet tell it to quit
to avoid flickering. The deactivate makes plymouth drop its master rights,
but this happens on the new broken fd which never got the master rights.
This also makes it impossible for gdm to get master rights, also breaking
gdm, with the following error in the logs:
"(EE) wl_drm@4: error 0: authenicate failed"
Note in theory calling ply_renderer_close() on teardown (from
ply_device_manager_free()) should be fine too, but for some reason
this causes the framebuffer to be all messed up (looks like wrong pitch /
tiling) when this is done after gdm has taken over the framebuffer.
Where as just exiting without calling drmModeRmFB, letting the kernel
clean up behind us, does not trigger this. This feels like it is a kernel
bug bug and I will file a bug for this. But for now we should not call
ply_renderer_close() on teardown to avoid this issue (which would be a
regression).
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1490490
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
static void
free_devices_from_device_path (ply_device_manager_t *manager,
- const char *device_path)
+ const char *device_path,
+ bool close)
{
void *key = NULL;
void *renderer = NULL;
ply_hashtable_remove (manager->renderers, (void *) device_path);
free (key);
+
+ if (manager->renderers_activated)
+ ply_renderer_deactivate (renderer);
+
+ if (close)
+ ply_renderer_close (renderer);
+
ply_renderer_free (renderer);
}
device_path = udev_device_get_devnode (device);
if (device_path != NULL)
- free_devices_from_device_path (manager, device_path);
+ free_devices_from_device_path (manager, device_path, true);
}
static bool
ply_renderer_t *renderer,
ply_device_manager_t *manager)
{
- free_devices_from_device_path (manager, device_path);
+ free_devices_from_device_path (manager, device_path, false);
}
static void