From: Zhongqiu Han Date: Sun, 19 Apr 2026 13:26:54 +0000 (+0800) Subject: cpufreq: governor: Fix stale prev_cpu_nice spike when enabling ignore_nice_load X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=24fc5870808dea4290e9563746cb5f2146043c6c;p=thirdparty%2Flinux.git cpufreq: governor: Fix stale prev_cpu_nice spike when enabling ignore_nice_load When ignore_nice_load is toggled from 0 to 1 via sysfs, dbs_update() may run concurrently and observe the new tunable value while prev_cpu_nice still holds a stale baseline, producing a spurious massive idle_time that results in an incorrect CPU load value. The race can be illustrated with two concurrent paths: Path A (sysfs write, holds attr_set->update_lock): governor_store() mutex_lock(&attr_set->update_lock) ignore_nice_load_store() dbs_data->ignore_nice_load = 1 /* (A1) */ gov_update_cpu_data(dbs_data) mutex_lock(&policy_dbs->update_mutex) /* (A2) */ j_cdbs->prev_cpu_nice = kcpustat_field(...) mutex_unlock(&policy_dbs->update_mutex) mutex_unlock(&attr_set->update_lock) Path B (work queue, wins the race between A1 and A2): dbs_work_handler() mutex_lock(&policy_dbs->update_mutex) /* acquired before A2 */ dbs_update() ignore_nice = dbs_data->ignore_nice_load /* sees new value: 1 */ cur_nice = kcpustat_field(...) idle_time += div_u64(cur_nice - j_cdbs->prev_cpu_nice, ..) /* stale */ j_cdbs->prev_cpu_nice = cur_nice mutex_unlock(&policy_dbs->update_mutex) Fix this by unconditionally sampling cur_nice and advancing prev_cpu_nice in dbs_update() on every call, regardless of ignore_nice. With prev_cpu_nice always reflecting the most recent sample, enabling ignore_nice_load can never produce a stale-baseline spike: the delta will always be the nice time accumulated in the last sampling interval, not since boot. The additional kcpustat_field() call per CPU per sample is negligible given that the sampling path already reads idle and load accounting. To keep prev_cpu_nice handling consistent with the always-tracking semantics introduced above: - gov_update_cpu_data() unconditionally resets prev_cpu_nice alongside prev_cpu_idle, so both baselines share the same timestamp when io_is_busy changes. This prevents an interval mismatch between idle_time and nice_delta on the next dbs_update() when ignore_nice_load is enabled. - cpufreq_dbs_governor_start() unconditionally initializes prev_cpu_nice so the baseline is always valid from the first dbs_update() call; remove the ignore_nice guard and the now-unused ignore_nice variable. Fixes: ee88415caf736b ("[CPUFREQ] Cleanup locking in conservative governor") Fixes: 5a75c82828e7c0 ("[CPUFREQ] Cleanup locking in ondemand governor") Fixes: 326c86deaed54a ("[CPUFREQ] Remove unneeded locks") Signed-off-by: Zhongqiu Han Link: https://patch.msgid.link/20260419132655.3800673-3-zhongqiu.han@oss.qualcomm.com Signed-off-by: Rafael J. Wysocki --- diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index fc6f705c5a9c7..8a85bd32defe7 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -92,6 +92,12 @@ EXPORT_SYMBOL_GPL(sampling_rate_store); * * Call under the @dbs_data->attr_set.update_lock. The per-policy * update_mutex is acquired and released internally for each policy. + * + * Note: prev_cpu_nice is reset here unconditionally alongside prev_cpu_idle. + * When io_is_busy changes, both baselines must be advanced to the same + * timestamp so that the next dbs_update() computes idle_time and nice_delta + * over the same interval, preventing an artificially inflated idle_time when + * ignore_nice_load is enabled. */ void gov_update_cpu_data(struct dbs_data *dbs_data) { @@ -106,8 +112,7 @@ void gov_update_cpu_data(struct dbs_data *dbs_data) j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_update_time, dbs_data->io_is_busy); - if (dbs_data->ignore_nice_load) - j_cdbs->prev_cpu_nice = kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE, j); + j_cdbs->prev_cpu_nice = kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE, j); } mutex_unlock(&policy_dbs->update_mutex); } @@ -121,6 +126,7 @@ unsigned int dbs_update(struct cpufreq_policy *policy) unsigned int ignore_nice = dbs_data->ignore_nice_load; unsigned int max_load = 0, idle_periods = UINT_MAX; unsigned int sampling_rate, io_busy, j; + u64 cur_nice; /* * Sometimes governors may use an additional multiplier to increase @@ -167,12 +173,18 @@ unsigned int dbs_update(struct cpufreq_policy *policy) j_cdbs->prev_cpu_idle = cur_idle_time; - if (ignore_nice) { - u64 cur_nice = kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE, j); - + /* + * Always sample cur_nice and advance prev_cpu_nice, regardless + * of ignore_nice. This keeps prev_cpu_nice current so that + * enabling ignore_nice_load via sysfs never produces a + * stale-baseline spike (the delta will be at most one sampling + * interval of accumulated nice time, not since boot). + */ + cur_nice = kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE, j); + if (ignore_nice) idle_time += div_u64(cur_nice - j_cdbs->prev_cpu_nice, NSEC_PER_USEC); - j_cdbs->prev_cpu_nice = cur_nice; - } + + j_cdbs->prev_cpu_nice = cur_nice; if (unlikely(!time_elapsed)) { /* @@ -519,7 +531,7 @@ int cpufreq_dbs_governor_start(struct cpufreq_policy *policy) struct dbs_governor *gov = dbs_governor_of(policy); struct policy_dbs_info *policy_dbs = policy->governor_data; struct dbs_data *dbs_data = policy_dbs->dbs_data; - unsigned int sampling_rate, ignore_nice, j; + unsigned int sampling_rate, j; unsigned int io_busy; if (!policy->cur) @@ -529,7 +541,6 @@ int cpufreq_dbs_governor_start(struct cpufreq_policy *policy) policy_dbs->rate_mult = 1; sampling_rate = dbs_data->sampling_rate; - ignore_nice = dbs_data->ignore_nice_load; mutex_lock(&policy_dbs->update_mutex); io_busy = dbs_data->io_is_busy; @@ -541,9 +552,7 @@ int cpufreq_dbs_governor_start(struct cpufreq_policy *policy) * Make the first invocation of dbs_update() compute the load. */ j_cdbs->prev_load = 0; - - if (ignore_nice) - j_cdbs->prev_cpu_nice = kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE, j); + j_cdbs->prev_cpu_nice = kcpustat_field(&kcpustat_cpu(j), CPUTIME_NICE, j); } mutex_unlock(&policy_dbs->update_mutex);