]> git.ipfire.org Git - thirdparty/kernel/stable-queue.git/blob
78805c815b5a99a3b10fb6d4a8fd5be38f3d7ede
[thirdparty/kernel/stable-queue.git] /
1 From stable+bounces-188200-greg=kroah.com@vger.kernel.org Mon Oct 20 18:41:02 2025
2 From: Sasha Levin <sashal@kernel.org>
3 Date: Mon, 20 Oct 2025 12:38:53 -0400
4 Subject: x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID
5 To: stable@vger.kernel.org
6 Cc: Babu Moger <babu.moger@amd.com>, "Borislav Petkov (AMD)" <bp@alien8.de>, Reinette Chatre <reinette.chatre@intel.com>, Sasha Levin <sashal@kernel.org>
7 Message-ID: <20251020163853.1841192-2-sashal@kernel.org>
8
9 From: Babu Moger <babu.moger@amd.com>
10
11 [ Upstream commit 15292f1b4c55a3a7c940dbcb6cb8793871ed3d92 ]
12
13 Users can create as many monitoring groups as the number of RMIDs supported
14 by the hardware. However, on AMD systems, only a limited number of RMIDs
15 are guaranteed to be actively tracked by the hardware. RMIDs that exceed
16 this limit are placed in an "Unavailable" state.
17
18 When a bandwidth counter is read for such an RMID, the hardware sets
19 MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked
20 again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable
21 remains set on first read after tracking re-starts and is clear on all
22 subsequent reads as long as the RMID is tracked.
23
24 resctrl miscounts the bandwidth events after an RMID transitions from the
25 "Unavailable" state back to being tracked. This happens because when the
26 hardware starts counting again after resetting the counter to zero, resctrl
27 in turn compares the new count against the counter value stored from the
28 previous time the RMID was tracked.
29
30 This results in resctrl computing an event value that is either undercounting
31 (when new counter is more than stored counter) or a mistaken overflow (when
32 new counter is less than stored counter).
33
34 Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to
35 zero whenever the RMID is in the "Unavailable" state to ensure accurate
36 counting after the RMID resets to zero when it starts to be tracked again.
37
38 Example scenario that results in mistaken overflow
39 ==================================================
40 1. The resctrl filesystem is mounted, and a task is assigned to a
41 monitoring group.
42
43 $mount -t resctrl resctrl /sys/fs/resctrl
44 $mkdir /sys/fs/resctrl/mon_groups/test1/
45 $echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks
46
47 $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
48 21323 <- Total bytes on domain 0
49 "Unavailable" <- Total bytes on domain 1
50
51 Task is running on domain 0. Counter on domain 1 is "Unavailable".
52
53 2. The task runs on domain 0 for a while and then moves to domain 1. The
54 counter starts incrementing on domain 1.
55
56 $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
57 7345357 <- Total bytes on domain 0
58 4545 <- Total bytes on domain 1
59
60 3. At some point, the RMID in domain 0 transitions to the "Unavailable"
61 state because the task is no longer executing in that domain.
62
63 $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
64 "Unavailable" <- Total bytes on domain 0
65 434341 <- Total bytes on domain 1
66
67 4. Since the task continues to migrate between domains, it may eventually
68 return to domain 0.
69
70 $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
71 17592178699059 <- Overflow on domain 0
72 3232332 <- Total bytes on domain 1
73
74 In this case, the RMID on domain 0 transitions from "Unavailable" state to
75 active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when
76 the counter is read and begins tracking the RMID counting from 0.
77
78 Subsequent reads succeed but return a value smaller than the previously
79 saved MSR value (7345357). Consequently, the resctrl's overflow logic is
80 triggered, it compares the previous value (7345357) with the new, smaller
81 value and incorrectly interprets this as a counter overflow, adding a large
82 delta.
83
84 In reality, this is a false positive: the counter did not overflow but was
85 simply reset when the RMID transitioned from "Unavailable" back to active
86 state.
87
88 Here is the text from APM [1] available from [2].
89
90 "In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the
91 first QM_CTR read when it begins tracking an RMID that it was not
92 previously tracking. The U bit will be zero for all subsequent reads from
93 that RMID while it is still tracked by the hardware. Therefore, a QM_CTR
94 read with the U bit set when that RMID is in use by a processor can be
95 considered 0 when calculating the difference with a subsequent read."
96
97 [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
98 Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory
99 Bandwidth (MBM).
100
101 [ bp: Split commit message into smaller paragraph chunks for better
102 consumption. ]
103
104 Fixes: 4d05bf71f157d ("x86/resctrl: Introduce AMD QOS feature")
105 Signed-off-by: Babu Moger <babu.moger@amd.com>
106 Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
107 Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
108 Tested-by: Reinette Chatre <reinette.chatre@intel.com>
109 Cc: stable@vger.kernel.org # needs adjustments for <= v6.17
110 Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
111 Signed-off-by: Sasha Levin <sashal@kernel.org>
112 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
113 ---
114 arch/x86/kernel/cpu/resctrl/monitor.c | 14 ++++++++++----
115 1 file changed, 10 insertions(+), 4 deletions(-)
116
117 --- a/arch/x86/kernel/cpu/resctrl/monitor.c
118 +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
119 @@ -337,7 +337,9 @@ int resctrl_arch_rmid_read(struct rdt_re
120 u32 unused, u32 rmid, enum resctrl_event_id eventid,
121 u64 *val, void *ignored)
122 {
123 + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
124 int cpu = cpumask_any(&d->hdr.cpu_mask);
125 + struct arch_mbm_state *am;
126 u64 msr_val;
127 u32 prmid;
128 int ret;
129 @@ -346,12 +348,16 @@ int resctrl_arch_rmid_read(struct rdt_re
130
131 prmid = logical_rmid_to_physical_rmid(cpu, rmid);
132 ret = __rmid_read_phys(prmid, eventid, &msr_val);
133 - if (ret)
134 - return ret;
135
136 - *val = get_corrected_val(r, d, rmid, eventid, msr_val);
137 + if (!ret) {
138 + *val = get_corrected_val(r, d, rmid, eventid, msr_val);
139 + } else if (ret == -EINVAL) {
140 + am = get_arch_mbm_state(hw_dom, rmid, eventid);
141 + if (am)
142 + am->prev_msr = 0;
143 + }
144
145 - return 0;
146 + return ret;
147 }
148
149 static void limbo_release_entry(struct rmid_entry *entry)