]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/sched.7
sched.7: Document sched_rt_period_us and sched_rt_runtime_us /proc files
[thirdparty/man-pages.git] / man7 / sched.7
1 .\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com>
2 .\" Various pieces from the old sched_setscheduler(2) page
3 .\" Copyright (C) Tom Bjorkholm, Markus Kuhn & David A. Wheeler 1996-1999
4 .\" and Copyright (C) 2007 Carsten Emde <Carsten.Emde@osadl.org>
5 .\" and Copyright (C) 2008 Michael Kerrisk <mtk.manpages@gmail.com>
6 .\"
7 .\" %%%LICENSE_START(GPLv2+_DOC_FULL)
8 .\" This is free documentation; you can redistribute it and/or
9 .\" modify it under the terms of the GNU General Public License as
10 .\" published by the Free Software Foundation; either version 2 of
11 .\" the License, or (at your option) any later version.
12 .\"
13 .\" The GNU General Public License's references to "object code"
14 .\" and "executables" are to be interpreted as the output of any
15 .\" document formatting or typesetting system, including
16 .\" intermediate and printed output.
17 .\"
18 .\" This manual is distributed in the hope that it will be useful,
19 .\" but WITHOUT ANY WARRANTY; without even the implied warranty of
20 .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 .\" GNU General Public License for more details.
22 .\"
23 .\" You should have received a copy of the GNU General Public
24 .\" License along with this manual; if not, see
25 .\" <http://www.gnu.org/licenses/>.
26 .\" %%%LICENSE_END
27 .\"
28 .\" Worth looking at: http://rt.wiki.kernel.org/index.php
29 .\"
30 .TH SCHED 7 2014-04-28 "Linux" "Linux Programmer's Manual"
31 .SH NAME
32 sched \- overview of scheduling APIs
33 .SH DESCRIPTION
34 .SS API summary
35 The Linux scheduling APIs are as follows:
36 .TP
37 .BR sched_setscheduler (2)
38 Set the scheduling policy and parameters of a specified thread.
39 .TP
40 .BR sched_getscheduler (2)
41 Return the scheduling policy of a specified thread.
42 .TP
43 .BR sched_setparam (2)
44 Set the scheduling parameters of a specified thread.
45 .TP
46 .BR sched_getparam (2)
47 Fetch the scheduling parameters of a specified thread.
48 .TP
49 .BR sched_get_priority_max (2)
50 Return the minimum priority available in a specified scheduling policy.
51 .TP
52 .BR sched_get_priority_min (2)
53 Return the maximum priority available in a specified scheduling policy.
54 .TP
55 .BR sched_rr_get_interval (2)
56 Fetch the quantum used for threads that are scheduled under
57 the "round-robin" scheduling policy.
58 .TP
59 .BR sched_yield (2)
60 Cause the caller to relinquish the CPU,
61 so that some other thread be executed.
62 .TP
63 .BR sched_setaffinity (2)
64 (Linux-specific)
65 Set the CPU affinity of a specified thread.
66 .TP
67 .BR sched_getaffinity (2)
68 (Linux-specific)
69 Get the CPU affinity of a specified thread.
70 .TP
71 .BR sched_setattr (2)
72 Set the scheduling policy and parameters of a specified thread.
73 This (Linux-specific) system call provides a superset of the functionality of
74 .BR sched_setscheduler (2)
75 and
76 .BR sched_setparam (2).
77 .TP
78 .BR sched_getattr (2)
79 Fetch the scheduling policy and parameters of a specified thread.
80 This (Linux-specific) system call provides a superset of the functionality of
81 .BR sched_getscheduler (2)
82 and
83 .BR sched_getparam (2).
84 .\"
85 .SS Scheduling policies
86 The scheduler is the kernel component that decides which runnable thread
87 will be executed by the CPU next.
88 Each thread has an associated scheduling policy and a \fIstatic\fP
89 scheduling priority,
90 .IR sched_priority .
91 The scheduler makes its decisions based on knowledge of the scheduling
92 policy and static priority of all threads on the system.
93
94 For threads scheduled under one of the normal scheduling policies
95 (\fBSCHED_OTHER\fP, \fBSCHED_IDLE\fP, \fBSCHED_BATCH\fP),
96 \fIsched_priority\fP is not used in scheduling
97 decisions (it must be specified as 0).
98
99 Processes scheduled under one of the real-time policies
100 (\fBSCHED_FIFO\fP, \fBSCHED_RR\fP) have a
101 \fIsched_priority\fP value in the range 1 (low) to 99 (high).
102 (As the numbers imply, real-time threads always have higher priority
103 than normal threads.)
104 Note well: POSIX.1-2001 requires an implementation to support only a
105 minimum 32 distinct priority levels for the real-time policies,
106 and some systems supply just this minimum.
107 Portable programs should use
108 .BR sched_get_priority_min (2)
109 and
110 .BR sched_get_priority_max (2)
111 to find the range of priorities supported for a particular policy.
112
113 Conceptually, the scheduler maintains a list of runnable
114 threads for each possible \fIsched_priority\fP value.
115 In order to determine which thread runs next, the scheduler looks for
116 the nonempty list with the highest static priority and selects the
117 thread at the head of this list.
118
119 A thread's scheduling policy determines
120 where it will be inserted into the list of threads
121 with equal static priority and how it will move inside this list.
122
123 All scheduling is preemptive: if a thread with a higher static
124 priority becomes ready to run, the currently running thread
125 will be preempted and
126 returned to the wait list for its static priority level.
127 The scheduling policy determines the
128 ordering only within the list of runnable threads with equal static
129 priority.
130 .SS SCHED_FIFO: First in-first out scheduling
131 \fBSCHED_FIFO\fP can be used only with static priorities higher than
132 0, which means that when a \fBSCHED_FIFO\fP threads becomes runnable,
133 it will always immediately preempt any currently running
134 \fBSCHED_OTHER\fP, \fBSCHED_BATCH\fP, or \fBSCHED_IDLE\fP thread.
135 \fBSCHED_FIFO\fP is a simple scheduling
136 algorithm without time slicing.
137 For threads scheduled under the
138 \fBSCHED_FIFO\fP policy, the following rules apply:
139 .IP * 3
140 A \fBSCHED_FIFO\fP thread that has been preempted by another thread of
141 higher priority will stay at the head of the list for its priority and
142 will resume execution as soon as all threads of higher priority are
143 blocked again.
144 .IP *
145 When a \fBSCHED_FIFO\fP thread becomes runnable, it
146 will be inserted at the end of the list for its priority.
147 .IP *
148 A call to
149 .BR sched_setscheduler (2),
150 .BR sched_setparam (2),
151 or
152 .BR sched_setattr (2)
153 will put the
154 \fBSCHED_FIFO\fP (or \fBSCHED_RR\fP) thread identified by
155 \fIpid\fP at the start of the list if it was runnable.
156 As a consequence, it may preempt the currently running thread if
157 it has the same priority.
158 (POSIX.1-2001 specifies that the thread should go to the end
159 of the list.)
160 .\" In 2.2.x and 2.4.x, the thread is placed at the front of the queue
161 .\" In 2.0.x, the Right Thing happened: the thread went to the back -- MTK
162 .IP *
163 A thread calling
164 .BR sched_yield (2)
165 will be put at the end of the list.
166 .PP
167 No other events will move a thread
168 scheduled under the \fBSCHED_FIFO\fP policy in the wait list of
169 runnable threads with equal static priority.
170
171 A \fBSCHED_FIFO\fP
172 thread runs until either it is blocked by an I/O request, it is
173 preempted by a higher priority thread, or it calls
174 .BR sched_yield (2).
175 .SS SCHED_RR: Round-robin scheduling
176 \fBSCHED_RR\fP is a simple enhancement of \fBSCHED_FIFO\fP.
177 Everything
178 described above for \fBSCHED_FIFO\fP also applies to \fBSCHED_RR\fP,
179 except that each thread is allowed to run only for a maximum time
180 quantum.
181 If a \fBSCHED_RR\fP thread has been running for a time
182 period equal to or longer than the time quantum, it will be put at the
183 end of the list for its priority.
184 A \fBSCHED_RR\fP thread that has
185 been preempted by a higher priority thread and subsequently resumes
186 execution as a running thread will complete the unexpired portion of
187 its round-robin time quantum.
188 The length of the time quantum can be
189 retrieved using
190 .BR sched_rr_get_interval (2).
191 .\" On Linux 2.4, the length of the RR interval is influenced
192 .\" by the process nice value -- MTK
193 .\"
194 .SS SCHED_OTHER: Default Linux time-sharing scheduling
195 \fBSCHED_OTHER\fP can be used at only static priority 0.
196 \fBSCHED_OTHER\fP is the standard Linux time-sharing scheduler that is
197 intended for all threads that do not require the special
198 real-time mechanisms.
199 The thread to run is chosen from the static
200 priority 0 list based on a \fIdynamic\fP priority that is determined only
201 inside this list.
202 The dynamic priority is based on the nice value (set by
203 .BR nice (2),
204 .BR setpriority (2),
205 or
206 .BR sched_setattr (2))
207 and increased for each time quantum the thread is ready to run,
208 but denied to run by the scheduler.
209 This ensures fair progress among all \fBSCHED_OTHER\fP threads.
210 .\"
211 .SS SCHED_BATCH: Scheduling batch processes
212 (Since Linux 2.6.16.)
213 \fBSCHED_BATCH\fP can be used only at static priority 0.
214 This policy is similar to \fBSCHED_OTHER\fP in that it schedules
215 the thread according to its dynamic priority
216 (based on the nice value).
217 The difference is that this policy
218 will cause the scheduler to always assume
219 that the thread is CPU-intensive.
220 Consequently, the scheduler will apply a small scheduling
221 penalty with respect to wakeup behaviour,
222 so that this thread is mildly disfavored in scheduling decisions.
223
224 .\" The following paragraph is drawn largely from the text that
225 .\" accompanied Ingo Molnar's patch for the implementation of
226 .\" SCHED_BATCH.
227 .\" commit b0a9499c3dd50d333e2aedb7e894873c58da3785
228 This policy is useful for workloads that are noninteractive,
229 but do not want to lower their nice value,
230 and for workloads that want a deterministic scheduling policy without
231 interactivity causing extra preemptions (between the workload's tasks).
232 .\"
233 .SS SCHED_IDLE: Scheduling very low priority jobs
234 (Since Linux 2.6.23.)
235 \fBSCHED_IDLE\fP can be used only at static priority 0;
236 the process nice value has no influence for this policy.
237
238 This policy is intended for running jobs at extremely low
239 priority (lower even than a +19 nice value with the
240 .B SCHED_OTHER
241 or
242 .B SCHED_BATCH
243 policies).
244 .\"
245 .SS Resetting scheduling policy for child processes
246 Each thread has a reset-on-fork scheduling flag.
247 When this flag is set, children created by
248 .BR fork (2)
249 do not inherit privileged scheduling policies.
250 The reset-on-fork flag can be set by either:
251 .IP * 3
252 ORing the
253 .B SCHED_RESET_ON_FORK
254 flag into the
255 .I policy
256 argument when calling
257 .BR sched_setscheduler (2)
258 (since Linux 2.6.32);
259 or
260 .IP *
261 specifying the
262 .B SCHED_FLAG_RESET_ON_FORK
263 flag in
264 .IR attr.sched_flags
265 when calling
266 .BR sched_setattr (2).
267 .PP
268 Note that the constants used with these two APIs have different names.
269 The state of the reset-on-fork flag can analogously be retrieved using
270 .BR sched_getscheduler (2)
271 and
272 .BR sched_getattr (2).
273
274 The reset-on-fork feature is intended for media-playback applications,
275 and can be used to prevent applications evading the
276 .BR RLIMIT_RTTIME
277 resource limit (see
278 .BR getrlimit (2))
279 by creating multiple child processes.
280
281 More precisely, if the reset-on-fork flag is set,
282 the following rules apply for subsequently created children:
283 .IP * 3
284 If the calling thread has a scheduling policy of
285 .B SCHED_FIFO
286 or
287 .BR SCHED_RR ,
288 the policy is reset to
289 .BR SCHED_OTHER
290 in child processes.
291 .IP *
292 If the calling process has a negative nice value,
293 the nice value is reset to zero in child processes.
294 .PP
295 After the reset-on-fork flag has been enabled,
296 it can be reset only if the thread has the
297 .BR CAP_SYS_NICE
298 capability.
299 This flag is disabled in child processes created by
300 .BR fork (2).
301 .\"
302 .SS Privileges and resource limits
303 In Linux kernels before 2.6.12, only privileged
304 .RB ( CAP_SYS_NICE )
305 threads can set a nonzero static priority (i.e., set a real-time
306 scheduling policy).
307 The only change that an unprivileged thread can make is to set the
308 .B SCHED_OTHER
309 policy, and this can be done only if the effective user ID of the caller
310 matches the real or effective user ID of the target thread
311 (i.e., the thread specified by
312 .IR pid )
313 whose policy is being changed.
314
315 Since Linux 2.6.12, the
316 .B RLIMIT_RTPRIO
317 resource limit defines a ceiling on an unprivileged thread's
318 static priority for the
319 .B SCHED_RR
320 and
321 .B SCHED_FIFO
322 policies.
323 The rules for changing scheduling policy and priority are as follows:
324 .IP * 3
325 If an unprivileged thread has a nonzero
326 .B RLIMIT_RTPRIO
327 soft limit, then it can change its scheduling policy and priority,
328 subject to the restriction that the priority cannot be set to a
329 value higher than the maximum of its current priority and its
330 .B RLIMIT_RTPRIO
331 soft limit.
332 .IP *
333 If the
334 .B RLIMIT_RTPRIO
335 soft limit is 0, then the only permitted changes are to lower the priority,
336 or to switch to a non-real-time policy.
337 .IP *
338 Subject to the same rules,
339 another unprivileged thread can also make these changes,
340 as long as the effective user ID of the thread making the change
341 matches the real or effective user ID of the target thread.
342 .IP *
343 Special rules apply for the
344 .BR SCHED_IDLE
345 policy.
346 In Linux kernels before 2.6.39,
347 an unprivileged thread operating under this policy cannot
348 change its policy, regardless of the value of its
349 .BR RLIMIT_RTPRIO
350 resource limit.
351 In Linux kernels since 2.6.39,
352 .\" commit c02aa73b1d18e43cfd79c2f193b225e84ca497c8
353 an unprivileged thread can switch to either the
354 .BR SCHED_BATCH
355 or the
356 .BR SCHED_NORMAL
357 policy so long as its nice value falls within the range permitted by its
358 .BR RLIMIT_NICE
359 resource limit (see
360 .BR getrlimit (2)).
361 .PP
362 Privileged
363 .RB ( CAP_SYS_NICE )
364 threads ignore the
365 .B RLIMIT_RTPRIO
366 limit; as with older kernels,
367 they can make arbitrary changes to scheduling policy and priority.
368 See
369 .BR getrlimit (2)
370 for further information on
371 .BR RLIMIT_RTPRIO .
372 .SS Limiting the CPU usage of real-time and deadline processes
373 A nonblocking infinite loop in a thread scheduled under the
374 .BR SCHED_FIFO ,
375 .BR SCHED_RR ,
376 or
377 .BR SCHED_DEADLINE
378 policy will block all threads with lower
379 priority forever.
380 Prior to Linux 2.6.25, the only way of preventing a runaway real-time
381 process from freezing the system was to run (at the console)
382 a shell scheduled under a higher static priority than the tested application.
383 This allows an emergency kill of tested
384 real-time applications that do not block or terminate as expected.
385
386 Since Linux 2.6.25, there are other techniques for dealing with runaway
387 real-time and deadline processes.
388 One of these is to use the
389 .BR RLIMIT_RTTIME
390 resource limit to set a ceiling on the CPU time that
391 a real-time process may consume.
392 See
393 .BR getrlimit (2)
394 for details.
395
396 Since version 2.6.25, Linux also provides two
397 .I /proc
398 files that can be used to reserve a certain amount of CPU time
399 to be used by non-real-time processes.
400 Reserving some CPU time in this fashion allows some CPU time to be
401 allocated to (say) a root shell that can be used to kill a runaway process.
402 Both of these files specify time values in microseconds:
403 .TP
404 .IR /proc/sys/kernel/sched_rt_period_us
405 This file specifies a scheduling period that is equivalent to
406 100% CPU bandwidth.
407 The value in this file can range from 1 to
408 .BR INT_MAX ,
409 giving an operating range of 1 microsecond to around 35 minutes.
410 The default value in this file is 1,000,000 (1 second).
411 .TP
412 .IR /proc/sys/kernel/sched_rt_runtime_us
413 The value in this file specifies how much of the "period" time
414 can be used by all real-time and deadline scheduled processes
415 on the system.
416 The value in this file can range from \-1 to
417 .BR INT_MAX \-1.
418 Specifying \-1 makes the runtime the same as the period;
419 that is, no CPU time is set aside for non-real-time processes
420 (which was the Linux behavior before kernel 2.6.25).
421 The default value in this file is 950,000 (0.95 seconds),
422 meaning that 5% of the CPU time is reserved for processes that
423 don't run under a real-time or deadline scheduling policy.
424 .PP
425 .SS Response time
426 A blocked high priority thread waiting for I/O has a certain
427 response time before it is scheduled again.
428 The device driver writer
429 can greatly reduce this response time by using a "slow interrupt"
430 interrupt handler.
431 .\" as described in
432 .\" .BR request_irq (9).
433 .SS Miscellaneous
434 Child processes inherit the scheduling policy and parameters across a
435 .BR fork (2).
436 The scheduling policy and parameters are preserved across
437 .BR execve (2).
438
439 Memory locking is usually needed for real-time processes to avoid
440 paging delays; this can be done with
441 .BR mlock (2)
442 or
443 .BR mlockall (2).
444 .SH NOTES
445 .PP
446 Originally, Standard Linux was intended as a general-purpose operating
447 system being able to handle background processes, interactive
448 applications, and less demanding real-time applications (applications that
449 need to usually meet timing deadlines).
450 Although the Linux kernel 2.6
451 allowed for kernel preemption and the newly introduced O(1) scheduler
452 ensures that the time needed to schedule is fixed and deterministic
453 irrespective of the number of active tasks, true real-time computing
454 was not possible up to kernel version 2.6.17.
455 .SS Real-time features in the mainline Linux kernel
456 .\" FIXME . Probably this text will need some minor tweaking
457 .\" by about the time of 2.6.30; ask Carsten Emde about this then.
458 From kernel version 2.6.18 onward, however, Linux is gradually
459 becoming equipped with real-time capabilities,
460 most of which are derived from the former
461 .I realtime-preempt
462 patches developed by Ingo Molnar, Thomas Gleixner,
463 Steven Rostedt, and others.
464 Until the patches have been completely merged into the
465 mainline kernel
466 (this is expected to be around kernel version 2.6.30),
467 they must be installed to achieve the best real-time performance.
468 These patches are named:
469 .in +4n
470 .nf
471
472 patch-\fIkernelversion\fP-rt\fIpatchversion\fP
473 .fi
474 .in
475 .PP
476 and can be downloaded from
477 .UR http://www.kernel.org\:/pub\:/linux\:/kernel\:/projects\:/rt/
478 .UE .
479
480 Without the patches and prior to their full inclusion into the mainline
481 kernel, the kernel configuration offers only the three preemption classes
482 .BR CONFIG_PREEMPT_NONE ,
483 .BR CONFIG_PREEMPT_VOLUNTARY ,
484 and
485 .B CONFIG_PREEMPT_DESKTOP
486 which respectively provide no, some, and considerable
487 reduction of the worst-case scheduling latency.
488
489 With the patches applied or after their full inclusion into the mainline
490 kernel, the additional configuration item
491 .B CONFIG_PREEMPT_RT
492 becomes available.
493 If this is selected, Linux is transformed into a regular
494 real-time operating system.
495 The FIFO and RR scheduling policies are then used to run a thread
496 with true real-time priority and a minimum worst-case scheduling latency.
497 .SH SEE ALSO
498 .ad l
499 .nh
500 .BR chrt (1),
501 .BR getpriority (2),
502 .BR mlock (2),
503 .BR mlockall (2),
504 .BR munlock (2),
505 .BR munlockall (2),
506 .BR nice (2),
507 .BR sched_get_priority_max (2),
508 .BR sched_get_priority_min (2),
509 .BR sched_getscheduler (2),
510 .BR sched_getaffinity (2),
511 .BR sched_getparam (2),
512 .BR sched_rr_get_interval (2),
513 .BR sched_setaffinity (2),
514 .BR sched_setscheduler (2),
515 .BR sched_setparam (2),
516 .BR sched_yield (2),
517 .BR setpriority (2),
518 .BR pthread_getaffinity_np (3),
519 .BR pthread_setaffinity_np (3),
520 .BR sched_getcpu (3),
521 .BR capabilities (7),
522 .BR cpuset (7)
523 .ad
524 .PP
525 .I Programming for the real world \- POSIX.4
526 by Bill O. Gallmeister, O'Reilly & Associates, Inc., ISBN 1-56592-074-0.
527 .PP
528 The Linux kernel source files
529 .IR Documentation/scheduler/sched-deadline.txt ,
530 .IR Documentation/scheduler/sched-rt-group.txt ,
531 .IR Documentation/scheduler/sched-design-CFS.txt ,
532 and
533 .IR Documentation/scheduler/sched-nice-design.txt