From: Paolo Bonzini Date: Mon, 4 May 2026 08:44:52 +0000 (-0400) Subject: Merge branch 'kvm-mbec' into HEAD X-Git-Url: http://git.ipfire.org/gitweb/?a=commitdiff_plain;h=2be108307eae241359bb32ee259ba0b5378156aa;p=thirdparty%2Fkernel%2Flinux.git Merge branch 'kvm-mbec' into HEAD This topic branch introduces support for two related features that Hyper-V uses in its implementation of Virtual Secure Mode; these are Intel Mode-Based Execute Control and AMD Guest Mode Execution Trap. Both MBEC and GMET allow more granular control over execute permissions, with different levels of separation between supervisor and user mode. MBEC provides support for separate supervisor and user-mode bits in the PTEs; GMET instead lacks supervisor-mode only execution (with NX=0, "both" is represented by U=0 and user-mode only by U=1). GMET was clearly inspired by SMEP though with some differences and annoyances. The implementation starts from two changes to core MMU code, both of which help making the actual feature almost trivial to implement: - first, I'm cleaning up the implementation of nVMX exec-only, by properly adding read permissions to the ACC_* constant and to the permission bitmask machinery. Jon also had to add a fourth ACC_* bit, but used it only in the special case of nested MBEC; here instead ACC_READ_MASK is the normality, which simplifies testing a lot and removes gratuitous complexity. - second, I'm enforcing that KVM runs with MBEC/GMET enabled even in non-nested mode, if it wants to provide the feature to nested hypervisors. This makes the creation of SPTEs looks exactly the same for L1 and L2 guests, despite only the latter using MBEC/GMET fully; the difference lies only in the input access permissions. This strategy adds a limited amount of complexity to the core is limited, while providing for an almost entirely seamless support of nested hypervisors. Later patches have to use slightly different meanings for ACC_* in Intel and AMD. On the Intel side, some work is needed in order to split shadow_x_mask and ACC_EXEC_MASK in two; now that there is an actual ACC_READ_MASK to be used for exec-only pages, ACC_USER_MASK is unused and can be reused as ACC_USER_EXEC_MASK. However, unlike the older ACC_USER_MASK hack these differences are backed by concrete concepts of the page table format, and there is always a 1:1 mapping from ACC_* bits to PT_*_MASK or shadow_*_mask: Intel AMD -------------------- ------------------- ------------------- ACC_READ_MASK PT_PRESENT_MASK PT_PRESENT_MASK ACC_WRITE_MASK PT_WRITABLE_MASK PT_WRITABLE_MASK ACC_EXEC_MASK shadow_xs_mask shadow_nx_mask ACC_USER_MASK --- shadow_user_mask ACC_USER_EXEC_MASK shadow_xu_mask --- On Intel, ACC_EXEC_MASK is used for kernel-mode execution and is tied to shadow_xs_mask (when MBEC is disabled, ACC_USER_EXEC_MASK and the XU bit are computed but ineffective). update_permission_bitmask() precomputes all the necessary conditions. On the AMD side, the U bit maps to ACC_USER_MASK but nNPT adjusts the permission bitmask to ignore it for reads and writes when GMET is active. Despite the smaller scale of the changes compared to MBEC, there are some changes to make to use GMET for L1 guests, because the page tables have to be created with U=0. This means that the root page has role.access != ACC_ALL and its permissions have to be propagated down. Note that with MBEC the user/supervisor distinction depends on the U bit of the page tables rather than the CPL. Processors provide this information to the hypervisor through the "advanced EPT violation vmexit info" feature, which is a requirement for KVM to use MBEC, and kvm-intel.ko passes it to the MMU in PFERR_USER_MASK (unlike kvm-amd.ko which computes it from the CPL). This needs a small change to pass the effective XWU permissions of the page tables down to translate_nested_gpa(). The former "smep_andnot_wp" bit of cpu_role.base, now named "cr4_smep", is repurposed for nested TDP to indicate that MBEC/GMET is on. The minor pessimization for shadow page tables (toggling CR4.SMEP now always forces building a separate version of the shadow page tables, even though that's technically unnecessary if CR4.WP=1) is not really worth fretting about; in practice, guests are not going to flip CR4.SMEP in a way that would prevent efficient reuse of shadow page tables. Signed-off-by: Paolo Bonzini --- 2be108307eae241359bb32ee259ba0b5378156aa diff --cc arch/x86/kvm/hyperv.c index 4438ecac9a89b,f35fae3a7b3dd..015c6947b462e --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@@ -2040,8 -2040,10 +2040,10 @@@ static u64 kvm_hv_flush_tlb(struct kvm_ * flush). Translate the address here so the memory can be uniformly * read with kvm_read_guest(). */ - if (!hc->fast && is_guest_mode(vcpu)) { + if (!hc->fast && mmu_is_nested(vcpu)) { - hc->ingpa = translate_nested_gpa(vcpu, hc->ingpa, 0, NULL); + hc->ingpa = kvm_x86_ops.nested_ops->translate_nested_gpa( + vcpu, hc->ingpa, + PFERR_GUEST_FINAL_MASK, NULL, 0); if (unlikely(hc->ingpa == INVALID_GPA)) return HV_STATUS_INVALID_HYPERCALL_INPUT; }