From: Michael Kelley Date: Wed, 14 May 2025 22:55:08 +0000 (-0700) Subject: Drivers: hv: vmbus: Add comments about races with "channels" sysfs dir X-Git-Tag: v6.16-rc1~68^2~1 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=e89f91222ab3aef84af316567ebc815a10313f3a;p=thirdparty%2Fkernel%2Flinux.git Drivers: hv: vmbus: Add comments about races with "channels" sysfs dir The VMBus driver code has some inherent races in the creation of the "channels" sysfs subdirectory and its per-channel numbered subdirectories. These races have not generally been recognized or understood. Add some comments to call them out. No code changes. Signed-off-by: Michael Kelley Link: https://lore.kernel.org/r/20250514225508.52629-1-mhklinux@outlook.com Signed-off-by: Wei Liu Message-ID: <20250514225508.52629-1-mhklinux@outlook.com> --- diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index d3321700ded66..c236081d0a872 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -714,7 +714,30 @@ static const struct hv_vmbus_device_id *hv_vmbus_get_id(const struct hv_driver * return id; } -/* vmbus_add_dynid - add a new device ID to this driver and re-probe devices */ +/* vmbus_add_dynid - add a new device ID to this driver and re-probe devices + * + * This function can race with vmbus_device_register(). This function is + * typically running on a user thread in response to writing to the "new_id" + * sysfs entry for a driver. vmbus_device_register() is running on a + * workqueue thread in response to the Hyper-V host offering a device to the + * guest. This function calls driver_attach(), which looks for an existing + * device matching the new id, and attaches the driver to which the new id + * has been assigned. vmbus_device_register() calls device_register(), which + * looks for a driver that matches the device being registered. If both + * operations are running simultaneously, the device driver probe function runs + * on whichever thread establishes the linkage between the driver and device. + * + * In most cases, it doesn't matter which thread runs the driver probe + * function. But if vmbus_device_register() does not find a matching driver, + * it proceeds to create the "channels" subdirectory and numbered per-channel + * subdirectory in sysfs. While that multi-step creation is in progress, this + * function could run the driver probe function. If the probe function checks + * for, or operates on, entries in the "channels" subdirectory, including by + * calling hv_create_ring_sysfs(), the operation may or may not succeed + * depending on the race. The race can't create a kernel failure in VMBus + * or device subsystem code, but probe functions in VMBus drivers doing such + * operations must be prepared for the failure case. + */ static int vmbus_add_dynid(struct hv_driver *drv, guid_t *guid) { struct vmbus_dynid *dynid; @@ -1928,7 +1951,8 @@ static const struct kobj_type vmbus_chan_ktype = { * ring for userspace to use. * Note: Race conditions can happen with userspace and it is not encouraged to create new * use-cases for this. This was added to maintain backward compatibility, while solving - * one of the race conditions in uio_hv_generic while creating sysfs. + * one of the race conditions in uio_hv_generic while creating sysfs. See comments with + * vmbus_add_dynid() and vmbus_device_register(). * * Returns 0 on success or error code on failure. */ @@ -2062,6 +2086,20 @@ int vmbus_device_register(struct hv_device *child_device_obj) return ret; } + /* + * If device_register() found a driver to assign to the device, the + * driver's probe function has already run at this point. If that + * probe function accesses or operates on the "channels" subdirectory + * in sysfs, those operations will have failed because the "channels" + * subdirectory doesn't exist until the code below runs. Or if the + * probe function creates a /dev entry, a user space program could + * find and open the /dev entry, and then create a race by accessing + * the "channels" subdirectory while the creation steps are in progress + * here. The race can't result in a kernel failure, but the user space + * program may get an error in accessing "channels" or its + * subdirectories. See also comments with vmbus_add_dynid() about a + * related race condition. + */ child_device_obj->channels_kset = kset_create_and_add("channels", NULL, kobj); if (!child_device_obj->channels_kset) {