git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob

   1 From stable+bounces-189245-greg=kroah.com@vger.kernel.org Fri Oct 24 20:17:13 2025
   2 From: Babu Moger <babu.moger@amd.com>
   3 Date: Fri, 24 Oct 2025 13:13:11 -0500
   4 Subject: x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID
   5 To: <stable@vger.kernel.org>
   6 Message-ID: <20251024181311.146536-1-babu.moger@amd.com>
   7
   8 From: Babu Moger <babu.moger@amd.com>
   9
  10 [ Upstream commit 15292f1b4c55a3a7c940dbcb6cb8793871ed3d92 ]
  11
  12 Users can create as many monitoring groups as the number of RMIDs supported
  13 by the hardware. However, on AMD systems, only a limited number of RMIDs
  14 are guaranteed to be actively tracked by the hardware. RMIDs that exceed
  15 this limit are placed in an "Unavailable" state.
  16
  17 When a bandwidth counter is read for such an RMID, the hardware sets
  18 MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked
  19 again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable
  20 remains set on first read after tracking re-starts and is clear on all
  21 subsequent reads as long as the RMID is tracked.
  22
  23 resctrl miscounts the bandwidth events after an RMID transitions from the
  24 "Unavailable" state back to being tracked. This happens because when the
  25 hardware starts counting again after resetting the counter to zero, resctrl
  26 in turn compares the new count against the counter value stored from the
  27 previous time the RMID was tracked.
  28
  29 This results in resctrl computing an event value that is either undercounting
  30 (when new counter is more than stored counter) or a mistaken overflow (when
  31 new counter is less than stored counter).
  32
  33 Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to
  34 zero whenever the RMID is in the "Unavailable" state to ensure accurate
  35 counting after the RMID resets to zero when it starts to be tracked again.
  36
  37 Example scenario that results in mistaken overflow
  38 ==================================================
  39 1. The resctrl filesystem is mounted, and a task is assigned to a
  40    monitoring group.
  41
  42    $mount -t resctrl resctrl /sys/fs/resctrl
  43    $mkdir /sys/fs/resctrl/mon_groups/test1/
  44    $echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks
  45
  46    $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
  47    21323            <- Total bytes on domain 0
  48    "Unavailable"    <- Total bytes on domain 1
  49
  50    Task is running on domain 0. Counter on domain 1 is "Unavailable".
  51
  52 2. The task runs on domain 0 for a while and then moves to domain 1. The
  53    counter starts incrementing on domain 1.
  54
  55    $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
  56    7345357          <- Total bytes on domain 0
  57    4545             <- Total bytes on domain 1
  58
  59 3. At some point, the RMID in domain 0 transitions to the "Unavailable"
  60    state because the task is no longer executing in that domain.
  61
  62    $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
  63    "Unavailable"    <- Total bytes on domain 0
  64    434341           <- Total bytes on domain 1
  65
  66 4.  Since the task continues to migrate between domains, it may eventually
  67     return to domain 0.
  68
  69     $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
  70     17592178699059  <- Overflow on domain 0
  71     3232332         <- Total bytes on domain 1
  72
  73 In this case, the RMID on domain 0 transitions from "Unavailable" state to
  74 active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when
  75 the counter is read and begins tracking the RMID counting from 0.
  76
  77 Subsequent reads succeed but return a value smaller than the previously
  78 saved MSR value (7345357). Consequently, the resctrl's overflow logic is
  79 triggered, it compares the previous value (7345357) with the new, smaller
  80 value and incorrectly interprets this as a counter overflow, adding a large
  81 delta.
  82
  83 In reality, this is a false positive: the counter did not overflow but was
  84 simply reset when the RMID transitioned from "Unavailable" back to active
  85 state.
  86
  87 Here is the text from APM [1] available from [2].
  88
  89 "In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the
  90 first QM_CTR read when it begins tracking an RMID that it was not
  91 previously tracking. The U bit will be zero for all subsequent reads from
  92 that RMID while it is still tracked by the hardware. Therefore, a QM_CTR
  93 read with the U bit set when that RMID is in use by a processor can be
  94 considered 0 when calculating the difference with a subsequent read."
  95
  96 [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
  97     Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory
  98     Bandwidth (MBM).
  99
 100   [ bp: Split commit message into smaller paragraph chunks for better
 101     consumption. ]
 102
 103 Fixes: 4d05bf71f157d ("x86/resctrl: Introduce AMD QOS feature")
 104 Signed-off-by: Babu Moger <babu.moger@amd.com>
 105 Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
 106 Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
 107 Tested-by: Reinette Chatre <reinette.chatre@intel.com>
 108 Cc: stable@vger.kernel.org # needs adjustments for <= v6.17
 109 Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
 110 (cherry picked from commit 15292f1b4c55a3a7c940dbcb6cb8793871ed3d92)
 111 [babu.moger@amd.com: Fix conflict for v6.1 stable]
 112 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 113 ---
 114  arch/x86/kernel/cpu/resctrl/monitor.c |    8 ++++++--
 115  1 file changed, 6 insertions(+), 2 deletions(-)
 116
 117 --- a/arch/x86/kernel/cpu/resctrl/monitor.c
 118 +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
 119 @@ -224,11 +224,15 @@ int resctrl_arch_rmid_read(struct rdt_re
 120         if (!cpumask_test_cpu(smp_processor_id(), &d->cpu_mask))
 121                 return -EINVAL;
 122
 123 +       am = get_arch_mbm_state(hw_dom, rmid, eventid);
 124 +
 125         ret = __rmid_read(rmid, eventid, &msr_val);
 126 -       if (ret)
 127 +       if (ret) {
 128 +               if (am && ret == -EINVAL)
 129 +                       am->prev_msr = 0;
 130                 return ret;
 131 +       }
 132
 133 -       am = get_arch_mbm_state(hw_dom, rmid, eventid);
 134         if (am) {
 135                 am->chunks += mbm_overflow_count(am->prev_msr, msr_val,
 136                                                  hw_res->mbm_width);