MINOR: activity: collect CPU time spent on memory allocations for each task
When task profiling is enabled, the pool alloc/free code will measure the
time it takes to perform memory allocation after a cache miss or memory
freeing to the shared cache or OS. The time taken with the thread-local
cache is never measured as measuring that time is very expensive compared
to the pool access time. Here doing so costs around 2% performance at 2M
req/s, only when task profiling is enabled, so this remains reasonable.
The scheduler takes care of collecting that time and updating the
sched_activity entry corresponding to the current task when task profiling
is enabled.
The goal clearly is to track places that are wasting CPU time allocating
and releasing too often, or causing large evictions. This appears like
this in "show profiling tasks aggr":
Tasks activity over 11.428 sec till 0.000 sec ago:
function calls cpu_tot cpu_avg lkw_avg lkd_avg mem_avg lat_avg
process_stream
44183891 16.47m 22.36us 491.0ns 1.154us 1.000ns 101.1us
h1_io_cb
57386064 4.011m 4.193us 20.00ns 16.00ns - 29.47us
sc_conn_io_cb
42088024 49.04s 1.165us - - - 54.67us
h1_timeout_task 438171 196.5ms 448.0ns - - - 100.1us
srv_cleanup_toremove_conns 65 1.468ms 22.58us 184.0ns 87.00ns - 101.3us
task_process_applet 3 508.0us 169.3us - 107.0us 1.847us 29.67us
srv_cleanup_idle_conns 6 225.3us 37.55us 15.74us 36.84us - 49.47us
accept_queue_process 2 45.62us 22.81us - - 4.949us 54.33us