From: Sean Christopherson Date: Thu, 22 May 2025 23:52:19 +0000 (-0700) Subject: KVM: Disallow binding multiple irqfds to an eventfd with a priority waiter X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=2cdd64cbf9906f9d2d52ef96e1471992ff6c27ec;p=thirdparty%2Flinux.git KVM: Disallow binding multiple irqfds to an eventfd with a priority waiter Disallow binding an irqfd to an eventfd that already has a priority waiter, i.e. to an eventfd that already has an attached irqfd. KVM always operates in exclusive mode for EPOLL_IN (unconditionally returns '1'), i.e. only the first waiter will be notified. KVM already disallows binding multiple irqfds to an eventfd in a single VM, but doesn't guard against multiple VMs binding to an eventfd. Adding the extra protection reduces the pain of a userspace VMM bug, e.g. if userspace fails to de-assign before re-assigning when transferring state for intra-host migration, then the migration will explicitly fail as opposed to dropping IRQs on the destination VM. Temporarily keep KVM's manual check on irqfds.items, but add a WARN, e.g. to allow sanity checking the waitqueue enforcement. Cc: Oliver Upton Cc: David Matlack Tested-by: K Prateek Nayak Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20250522235223.3178519-10-seanjc@google.com Signed-off-by: Sean Christopherson --- diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index f8c2486f95d59..ab9e3cfb81bfd 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -291,38 +291,57 @@ static void kvm_irqfd_register(struct file *file, wait_queue_head_t *wqh, struct kvm_kernel_irqfd *tmp; struct kvm *kvm = p->kvm; + /* + * Note, irqfds.lock protects the irqfd's irq_entry, i.e. its routing, + * and irqfds.items. It does NOT protect registering with the eventfd. + */ spin_lock_irq(&kvm->irqfds.lock); - list_for_each_entry(tmp, &kvm->irqfds.items, list) { - if (irqfd->eventfd != tmp->eventfd) - continue; - /* This fd is used for another irq already. */ - p->ret = -EBUSY; - spin_unlock_irq(&kvm->irqfds.lock); - return; - } - + /* + * Initialize the routing information prior to adding the irqfd to the + * eventfd's waitqueue, as irqfd_wakeup() can be invoked as soon as the + * irqfd is registered. + */ irqfd_update(kvm, irqfd); - list_add_tail(&irqfd->list, &kvm->irqfds.items); - /* * Add the irqfd as a priority waiter on the eventfd, with a custom * wake-up handler, so that KVM *and only KVM* is notified whenever the - * underlying eventfd is signaled. Temporarily lie to lockdep about - * holding irqfds.lock to avoid a false positive regarding potential - * deadlock with irqfd_wakeup() (see irqfd_wakeup() for details). + * underlying eventfd is signaled. */ init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup); + /* + * Temporarily lie to lockdep about holding irqfds.lock to avoid a + * false positive regarding potential deadlock with irqfd_wakeup() + * (see irqfd_wakeup() for details). + * + * Adding to the wait queue will fail if there is already a priority + * waiter, i.e. if the eventfd is associated with another irqfd (in any + * VM). Note, kvm_irqfd_deassign() waits for all in-flight shutdown + * jobs to complete, i.e. ensures the irqfd has been removed from the + * eventfd's waitqueue before returning to userspace. + */ spin_release(&kvm->irqfds.lock.dep_map, _RET_IP_); - irqfd->wait.flags |= WQ_FLAG_EXCLUSIVE; - add_wait_queue_priority(wqh, &irqfd->wait); + p->ret = add_wait_queue_priority_exclusive(wqh, &irqfd->wait); spin_acquire(&kvm->irqfds.lock.dep_map, 0, 0, _RET_IP_); + if (p->ret) + goto out; - spin_unlock_irq(&kvm->irqfds.lock); + list_for_each_entry(tmp, &kvm->irqfds.items, list) { + if (irqfd->eventfd != tmp->eventfd) + continue; - p->ret = 0; + WARN_ON_ONCE(1); + /* This fd is used for another irq already. */ + p->ret = -EBUSY; + goto out; + } + + list_add_tail(&irqfd->list, &kvm->irqfds.items); + +out: + spin_unlock_irq(&kvm->irqfds.lock); } #if IS_ENABLED(CONFIG_HAVE_KVM_IRQ_BYPASS)