]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/sched.7
sched.7: tfix
[thirdparty/man-pages.git] / man7 / sched.7
1 .\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com>
2 .\" Various pieces from the old sched_setscheduler(2) page
3 .\" Copyright (C) Tom Bjorkholm, Markus Kuhn & David A. Wheeler 1996-1999
4 .\" and Copyright (C) 2007 Carsten Emde <Carsten.Emde@osadl.org>
5 .\" and Copyright (C) 2008 Michael Kerrisk <mtk.manpages@gmail.com>
6 .\"
7 .\" %%%LICENSE_START(GPLv2+_DOC_FULL)
8 .\" This is free documentation; you can redistribute it and/or
9 .\" modify it under the terms of the GNU General Public License as
10 .\" published by the Free Software Foundation; either version 2 of
11 .\" the License, or (at your option) any later version.
12 .\"
13 .\" The GNU General Public License's references to "object code"
14 .\" and "executables" are to be interpreted as the output of any
15 .\" document formatting or typesetting system, including
16 .\" intermediate and printed output.
17 .\"
18 .\" This manual is distributed in the hope that it will be useful,
19 .\" but WITHOUT ANY WARRANTY; without even the implied warranty of
20 .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 .\" GNU General Public License for more details.
22 .\"
23 .\" You should have received a copy of the GNU General Public
24 .\" License along with this manual; if not, see
25 .\" <http://www.gnu.org/licenses/>.
26 .\" %%%LICENSE_END
27 .\"
28 .\" Worth looking at: http://rt.wiki.kernel.org/index.php
29 .\"
30 .TH SCHED 7 2014-04-28 "Linux" "Linux Programmer's Manual"
31 .SH NAME
32 sched \- overview of scheduling APIs
33 .SH DESCRIPTION
34 .SS API summary
35 The Linux scheduling APIs are as follows:
36 .TP
37 .BR sched_setscheduler (2)
38 Set the scheduling policy and parameters of a specified thread.
39 .TP
40 .BR sched_getscheduler (2)
41 Return the scheduling policy of a specified thread.
42 .TP
43 .BR sched_setparam (2)
44 Set the scheduling parameters of a specified thread.
45 .TP
46 .BR sched_getparam (2)
47 Fetch the scheduling parameters of a specified thread.
48 .TP
49 .BR sched_get_priority_max (2)
50 Return the minimum priority available in a specified scheduling policy.
51 .TP
52 .BR sched_get_priority_min (2)
53 Return the maximum priority available in a specified scheduling policy.
54 .TP
55 .BR sched_rr_get_interval (2)
56 Fetch the quantum used for threads that are scheduled under
57 the "round-robin" scheduling policy.
58 .TP
59 .BR sched_yield (2)
60 Cause the caller to relinquish the CPU,
61 so that some other thread be executed.
62 .TP
63 .BR sched_setaffinity (2)
64 (Linux-specific)
65 Set the CPU affinity of a specified thread.
66 .TP
67 .BR sched_getaffinity (2)
68 (Linux-specific)
69 Get the CPU affinity of a specified thread.
70 .TP
71 .BR sched_setattr (2)
72 (Linux-specific)
73 A generalized API for setting the scheduling policy and parameters
74 of a specified thread.
75 .TP
76 .BR sched_getattr (2)
77 (Linux-specific)
78 A generalized API for fetching the scheduling policy and parameters
79 of a specified thread.
80 .\"
81 .SS Scheduling policies
82 The scheduler is the kernel component that decides which runnable thread
83 will be executed by the CPU next.
84 Each thread has an associated scheduling policy and a \fIstatic\fP
85 scheduling priority, \fIsched_priority\fP; these are the settings
86 that are modified by
87 .BR sched_setscheduler ().
88 The scheduler makes its decisions based on knowledge of the scheduling
89 policy and static priority of all threads on the system.
90
91 For threads scheduled under one of the normal scheduling policies
92 (\fBSCHED_OTHER\fP, \fBSCHED_IDLE\fP, \fBSCHED_BATCH\fP),
93 \fIsched_priority\fP is not used in scheduling
94 decisions (it must be specified as 0).
95
96 Processes scheduled under one of the real-time policies
97 (\fBSCHED_FIFO\fP, \fBSCHED_RR\fP) have a
98 \fIsched_priority\fP value in the range 1 (low) to 99 (high).
99 (As the numbers imply, real-time threads always have higher priority
100 than normal threads.)
101 Note well: POSIX.1-2001 requires an implementation to support only a
102 minimum 32 distinct priority levels for the real-time policies,
103 and some systems supply just this minimum.
104 Portable programs should use
105 .BR sched_get_priority_min (2)
106 and
107 .BR sched_get_priority_max (2)
108 to find the range of priorities supported for a particular policy.
109
110 Conceptually, the scheduler maintains a list of runnable
111 threads for each possible \fIsched_priority\fP value.
112 In order to determine which thread runs next, the scheduler looks for
113 the nonempty list with the highest static priority and selects the
114 thread at the head of this list.
115
116 A thread's scheduling policy determines
117 where it will be inserted into the list of threads
118 with equal static priority and how it will move inside this list.
119
120 All scheduling is preemptive: if a thread with a higher static
121 priority becomes ready to run, the currently running thread
122 will be preempted and
123 returned to the wait list for its static priority level.
124 The scheduling policy determines the
125 ordering only within the list of runnable threads with equal static
126 priority.
127 .SS SCHED_FIFO: First in-first out scheduling
128 \fBSCHED_FIFO\fP can be used only with static priorities higher than
129 0, which means that when a \fBSCHED_FIFO\fP threads becomes runnable,
130 it will always immediately preempt any currently running
131 \fBSCHED_OTHER\fP, \fBSCHED_BATCH\fP, or \fBSCHED_IDLE\fP thread.
132 \fBSCHED_FIFO\fP is a simple scheduling
133 algorithm without time slicing.
134 For threads scheduled under the
135 \fBSCHED_FIFO\fP policy, the following rules apply:
136 .IP * 3
137 A \fBSCHED_FIFO\fP thread that has been preempted by another thread of
138 higher priority will stay at the head of the list for its priority and
139 will resume execution as soon as all threads of higher priority are
140 blocked again.
141 .IP *
142 When a \fBSCHED_FIFO\fP thread becomes runnable, it
143 will be inserted at the end of the list for its priority.
144 .IP *
145 A call to
146 .BR sched_setscheduler ()
147 or
148 .BR sched_setparam (2)
149 will put the
150 \fBSCHED_FIFO\fP (or \fBSCHED_RR\fP) thread identified by
151 \fIpid\fP at the start of the list if it was runnable.
152 As a consequence, it may preempt the currently running thread if
153 it has the same priority.
154 (POSIX.1-2001 specifies that the thread should go to the end
155 of the list.)
156 .\" In 2.2.x and 2.4.x, the thread is placed at the front of the queue
157 .\" In 2.0.x, the Right Thing happened: the thread went to the back -- MTK
158 .IP *
159 A thread calling
160 .BR sched_yield (2)
161 will be put at the end of the list.
162 .PP
163 No other events will move a thread
164 scheduled under the \fBSCHED_FIFO\fP policy in the wait list of
165 runnable threads with equal static priority.
166
167 A \fBSCHED_FIFO\fP
168 thread runs until either it is blocked by an I/O request, it is
169 preempted by a higher priority thread, or it calls
170 .BR sched_yield (2).
171 .SS SCHED_RR: Round-robin scheduling
172 \fBSCHED_RR\fP is a simple enhancement of \fBSCHED_FIFO\fP.
173 Everything
174 described above for \fBSCHED_FIFO\fP also applies to \fBSCHED_RR\fP,
175 except that each thread is allowed to run only for a maximum time
176 quantum.
177 If a \fBSCHED_RR\fP thread has been running for a time
178 period equal to or longer than the time quantum, it will be put at the
179 end of the list for its priority.
180 A \fBSCHED_RR\fP thread that has
181 been preempted by a higher priority thread and subsequently resumes
182 execution as a running thread will complete the unexpired portion of
183 its round-robin time quantum.
184 The length of the time quantum can be
185 retrieved using
186 .BR sched_rr_get_interval (2).
187 .\" On Linux 2.4, the length of the RR interval is influenced
188 .\" by the process nice value -- MTK
189 .\"
190 .SS SCHED_OTHER: Default Linux time-sharing scheduling
191 \fBSCHED_OTHER\fP can be used at only static priority 0.
192 \fBSCHED_OTHER\fP is the standard Linux time-sharing scheduler that is
193 intended for all threads that do not require the special
194 real-time mechanisms.
195 The thread to run is chosen from the static
196 priority 0 list based on a \fIdynamic\fP priority that is determined only
197 inside this list.
198 The dynamic priority is based on the nice value (set by
199 .BR nice (2)
200 or
201 .BR setpriority (2))
202 and increased for each time quantum the thread is ready to run,
203 but denied to run by the scheduler.
204 This ensures fair progress among all \fBSCHED_OTHER\fP threads.
205 .\"
206 .SS SCHED_BATCH: Scheduling batch processes
207 (Since Linux 2.6.16.)
208 \fBSCHED_BATCH\fP can be used only at static priority 0.
209 This policy is similar to \fBSCHED_OTHER\fP in that it schedules
210 the thread according to its dynamic priority
211 (based on the nice value).
212 The difference is that this policy
213 will cause the scheduler to always assume
214 that the thread is CPU-intensive.
215 Consequently, the scheduler will apply a small scheduling
216 penalty with respect to wakeup behaviour,
217 so that this thread is mildly disfavored in scheduling decisions.
218
219 .\" The following paragraph is drawn largely from the text that
220 .\" accompanied Ingo Molnar's patch for the implementation of
221 .\" SCHED_BATCH.
222 .\" commit b0a9499c3dd50d333e2aedb7e894873c58da3785
223 This policy is useful for workloads that are noninteractive,
224 but do not want to lower their nice value,
225 and for workloads that want a deterministic scheduling policy without
226 interactivity causing extra preemptions (between the workload's tasks).
227 .\"
228 .SS SCHED_IDLE: Scheduling very low priority jobs
229 (Since Linux 2.6.23.)
230 \fBSCHED_IDLE\fP can be used only at static priority 0;
231 the process nice value has no influence for this policy.
232
233 This policy is intended for running jobs at extremely low
234 priority (lower even than a +19 nice value with the
235 .B SCHED_OTHER
236 or
237 .B SCHED_BATCH
238 policies).
239 .\"
240 .SS Resetting scheduling policy for child processes
241 Each thread has a reset-on-fork scheduling flag.
242 When this flag is set, children created by
243 .BR fork (2)
244 do not inherit privileged scheduling policies.
245 The reset-on-fork flag can be set by either:
246 .IP * 3
247 ORing the
248 .B SCHED_RESET_ON_FORK
249 flag into the
250 .I policy
251 argument when calling
252 .BR sched_setscheduler (2)
253 (since Linux 2.6.32);
254 or
255 .IP *
256 specifying the
257 .B SCHED_FLAG_RESET_ON_FORK
258 flag in
259 .IR attr.sched_flags
260 when calling
261 .BR sched_setattr (2).
262 .PP
263 Note that the constants used with these two APIs have different names.
264 The state of the reset-on-fork flag can analogously be retrieved using
265 .BR sched_getscheduler (2)
266 and
267 .BR sched_getattr (2).
268
269 The reset-on-fork feature is intended for media-playback applications,
270 and can be used to prevent applications evading the
271 .BR RLIMIT_RTTIME
272 resource limit (see
273 .BR getrlimit (2))
274 by creating multiple child processes.
275
276 More precisely, if the reset-on-fork flag is set,
277 the following rules apply for subsequently created children:
278 .IP * 3
279 If the calling thread has a scheduling policy of
280 .B SCHED_FIFO
281 or
282 .BR SCHED_RR ,
283 the policy is reset to
284 .BR SCHED_OTHER
285 in child processes.
286 .IP *
287 If the calling process has a negative nice value,
288 the nice value is reset to zero in child processes.
289 .PP
290 After the reset-on-fork flag has been enabled,
291 it can be reset only if the thread has the
292 .BR CAP_SYS_NICE
293 capability.
294 This flag is disabled in child processes created by
295 .BR fork (2).
296 .\"
297 .SS Privileges and resource limits
298 In Linux kernels before 2.6.12, only privileged
299 .RB ( CAP_SYS_NICE )
300 threads can set a nonzero static priority (i.e., set a real-time
301 scheduling policy).
302 The only change that an unprivileged thread can make is to set the
303 .B SCHED_OTHER
304 policy, and this can be done only if the effective user ID of the caller of
305 .BR sched_setscheduler ()
306 matches the real or effective user ID of the target thread
307 (i.e., the thread specified by
308 .IR pid )
309 whose policy is being changed.
310
311 Since Linux 2.6.12, the
312 .B RLIMIT_RTPRIO
313 resource limit defines a ceiling on an unprivileged thread's
314 static priority for the
315 .B SCHED_RR
316 and
317 .B SCHED_FIFO
318 policies.
319 The rules for changing scheduling policy and priority are as follows:
320 .IP * 3
321 If an unprivileged thread has a nonzero
322 .B RLIMIT_RTPRIO
323 soft limit, then it can change its scheduling policy and priority,
324 subject to the restriction that the priority cannot be set to a
325 value higher than the maximum of its current priority and its
326 .B RLIMIT_RTPRIO
327 soft limit.
328 .IP *
329 If the
330 .B RLIMIT_RTPRIO
331 soft limit is 0, then the only permitted changes are to lower the priority,
332 or to switch to a non-real-time policy.
333 .IP *
334 Subject to the same rules,
335 another unprivileged thread can also make these changes,
336 as long as the effective user ID of the thread making the change
337 matches the real or effective user ID of the target thread.
338 .IP *
339 Special rules apply for the
340 .BR SCHED_IDLE .
341 In Linux kernels before 2.6.39,
342 an unprivileged thread operating under this policy cannot
343 change its policy, regardless of the value of its
344 .BR RLIMIT_RTPRIO
345 resource limit.
346 In Linux kernels since 2.6.39,
347 .\" commit c02aa73b1d18e43cfd79c2f193b225e84ca497c8
348 an unprivileged thread can switch to either the
349 .BR SCHED_BATCH
350 or the
351 .BR SCHED_NORMAL
352 policy so long as its nice value falls within the range permitted by its
353 .BR RLIMIT_NICE
354 resource limit (see
355 .BR getrlimit (2)).
356 .PP
357 Privileged
358 .RB ( CAP_SYS_NICE )
359 threads ignore the
360 .B RLIMIT_RTPRIO
361 limit; as with older kernels,
362 they can make arbitrary changes to scheduling policy and priority.
363 See
364 .BR getrlimit (2)
365 for further information on
366 .BR RLIMIT_RTPRIO .
367 .SS Response time
368 A blocked high priority thread waiting for I/O has a certain
369 response time before it is scheduled again.
370 The device driver writer
371 can greatly reduce this response time by using a "slow interrupt"
372 interrupt handler.
373 .\" as described in
374 .\" .BR request_irq (9).
375 .SS Miscellaneous
376 Child processes inherit the scheduling policy and parameters across a
377 .BR fork (2).
378 The scheduling policy and parameters are preserved across
379 .BR execve (2).
380
381 Memory locking is usually needed for real-time processes to avoid
382 paging delays; this can be done with
383 .BR mlock (2)
384 or
385 .BR mlockall (2).
386
387 Since a nonblocking infinite loop in a thread scheduled under
388 \fBSCHED_FIFO\fP or \fBSCHED_RR\fP will block all threads with lower
389 priority forever, a software developer should always keep available on
390 the console a shell scheduled under a higher static priority than the
391 tested application.
392 This will allow an emergency kill of tested
393 real-time applications that do not block or terminate as expected.
394 See also the description of the
395 .BR RLIMIT_RTTIME
396 resource limit in
397 .BR getrlimit (2).
398 .SH NOTES
399 .PP
400 Originally, Standard Linux was intended as a general-purpose operating
401 system being able to handle background processes, interactive
402 applications, and less demanding real-time applications (applications that
403 need to usually meet timing deadlines).
404 Although the Linux kernel 2.6
405 allowed for kernel preemption and the newly introduced O(1) scheduler
406 ensures that the time needed to schedule is fixed and deterministic
407 irrespective of the number of active tasks, true real-time computing
408 was not possible up to kernel version 2.6.17.
409 .SS Real-time features in the mainline Linux kernel
410 .\" FIXME . Probably this text will need some minor tweaking
411 .\" by about the time of 2.6.30; ask Carsten Emde about this then.
412 From kernel version 2.6.18 onward, however, Linux is gradually
413 becoming equipped with real-time capabilities,
414 most of which are derived from the former
415 .I realtime-preempt
416 patches developed by Ingo Molnar, Thomas Gleixner,
417 Steven Rostedt, and others.
418 Until the patches have been completely merged into the
419 mainline kernel
420 (this is expected to be around kernel version 2.6.30),
421 they must be installed to achieve the best real-time performance.
422 These patches are named:
423 .in +4n
424 .nf
425
426 patch-\fIkernelversion\fP-rt\fIpatchversion\fP
427 .fi
428 .in
429 .PP
430 and can be downloaded from
431 .UR http://www.kernel.org\:/pub\:/linux\:/kernel\:/projects\:/rt/
432 .UE .
433
434 Without the patches and prior to their full inclusion into the mainline
435 kernel, the kernel configuration offers only the three preemption classes
436 .BR CONFIG_PREEMPT_NONE ,
437 .BR CONFIG_PREEMPT_VOLUNTARY ,
438 and
439 .B CONFIG_PREEMPT_DESKTOP
440 which respectively provide no, some, and considerable
441 reduction of the worst-case scheduling latency.
442
443 With the patches applied or after their full inclusion into the mainline
444 kernel, the additional configuration item
445 .B CONFIG_PREEMPT_RT
446 becomes available.
447 If this is selected, Linux is transformed into a regular
448 real-time operating system.
449 The FIFO and RR scheduling policies that can be selected using
450 .BR sched_setscheduler ()
451 are then used to run a thread
452 with true real-time priority and a minimum worst-case scheduling latency.
453 .SH SEE ALSO
454 .ad l
455 .nh
456 .BR chrt (1),
457 .BR getpriority (2),
458 .BR mlock (2),
459 .BR mlockall (2),
460 .BR munlock (2),
461 .BR munlockall (2),
462 .BR nice (2),
463 .BR sched_get_priority_max (2),
464 .BR sched_get_priority_min (2),
465 .BR sched_getscheduler (2),
466 .BR sched_getaffinity (2),
467 .BR sched_getparam (2),
468 .BR sched_rr_get_interval (2),
469 .BR sched_setaffinity (2),
470 .BR sched_setscheduler (2),
471 .BR sched_setparam (2),
472 .BR sched_yield (2),
473 .BR setpriority (2),
474 .BR pthread_getaffinity_np (3),
475 .BR pthread_setaffinity_np (3),
476 .BR sched_getcpu (3),
477 .BR capabilities (7),
478 .BR cpuset (7)
479 .ad
480 .PP
481 .I Programming for the real world \- POSIX.4
482 by Bill O. Gallmeister, O'Reilly & Associates, Inc., ISBN 1-56592-074-0.
483 .PP
484 The Linux kernel source files
485 .IR Documentation/scheduler/sched-deadline.txt ,
486 .IR Documentation/scheduler/sched-rt-group.txt ,
487 .IR Documentation/scheduler/sched-design-CFS.txt ,
488 and
489 .IR Documentation/scheduler/sched-nice-design.txt