Patch series "mm/damon/core: make passed_sample_intervals comparisons
overflow-safe".
DAMON accounts time using its own jiffies-like time counter, namely
damon_ctx->passed_sample_intervals. The counter is incremented on each
iteration of kdamond_fn() main loop, which sleeps at least one sample
interval. Hence the name is like that.
DAMON has time-periodic operations including monitoring results
aggregation and DAMOS action application. DAMON sets the next time to do
each of such operations in the passed_sample_intervals unit. And it does
the operation when the counter becomes the same to or larger than the
pre-set values, and update the next time for the operation. Note that the
operation is done not only when the values exactly match but also when the
time is passed, because the values can be updated for online-committed
DAMON parameters.
The counter is 'unsigned long' type, and the comparison is done using
normal comparison operators. It is not safe from overflows. This can
cause rare and limited but odd situations.
Let's suppose there is an operation that should be executed every 20
sampling intervals, and the passed_sample_intervals value for next
execution of the operation is ULONG_MAX - 3. Once the
passed_sample_intervals reaches ULONG_MAX - 3, the operation will be
executed, and the next time value for doing the operation becomes 17
(ULONG_MAX - 3 + 20), since overflow happens. In the next iteration of
the kdamond_fn() main loop, passed_sample_intervals is larger than the
next operation time value, so the operation will be executed again. It
will continue executing the operation for each iteration, until the
passed_sample_intervals also overflows.
Note that this will not be common and problematic in the real world. The
sampling interval, which takes for each passed_sample_intervals increment,
is 5 ms by default. And it is usually [auto-]tuned for hundreds of
milliseconds. That means it takes about 248 days or 4,971 days to have
the overflow on 32 bit machines when the sampling interval is 5 ms and 100
ms, respectively (1<<32 * sampling_interval_in_seconds / 3600 / 24). On
64 bit machines, the numbers become
2924712086.77536 and
58494241735.5072
years. So the real user impact is negligible. But still this is better
to be fixed as long as the fix is simple and efficient.
Fix this by simply replacing the overflow-unsafe native comparison
operators with the existing overflow-safe time comparison helpers.
The first patch only cleans up the next DAMOS action application time
setup for consistency and reduced code. The second and the third patches
update DAMOS action application time setup and rest, respectively.
This patch (of 3):
There is a function for damos->next_apply_sis setup. But some places are
open-coding it. Consistently use the helper.
Link: https://lkml.kernel.org/r/20260307194915.203169-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
struct damon_target *t;
struct damon_region *r;
struct damos *s;
- unsigned long sample_interval = c->attrs.sample_interval ?
- c->attrs.sample_interval : 1;
bool has_schemes_to_apply = false;
damon_for_each_scheme(s, c) {
if (c->passed_sample_intervals < s->next_apply_sis)
continue;
damos_walk_complete(c, s);
- s->next_apply_sis = c->passed_sample_intervals +
- (s->apply_interval_us ? s->apply_interval_us :
- c->attrs.aggr_interval) / sample_interval;
+ damos_set_next_apply_sis(s, c);
s->last_applied = NULL;
damos_trace_stat(c, s);
}
{
unsigned long sample_interval = ctx->attrs.sample_interval ?
ctx->attrs.sample_interval : 1;
- unsigned long apply_interval;
struct damos *scheme;
ctx->passed_sample_intervals = 0;
ctx->attrs.intervals_goal.aggrs;
damon_for_each_scheme(scheme, ctx) {
- apply_interval = scheme->apply_interval_us ?
- scheme->apply_interval_us : ctx->attrs.aggr_interval;
- scheme->next_apply_sis = apply_interval / sample_interval;
+ damos_set_next_apply_sis(scheme, ctx);
damos_set_filters_default_reject(scheme);
}
}