]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/sched.7
sched.7: Rework summary text describing sched_setattr(2) and sched_getattr(2)
[thirdparty/man-pages.git] / man7 / sched.7
1 .\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com>
2 .\" Various pieces from the old sched_setscheduler(2) page
3 .\" Copyright (C) Tom Bjorkholm, Markus Kuhn & David A. Wheeler 1996-1999
4 .\" and Copyright (C) 2007 Carsten Emde <Carsten.Emde@osadl.org>
5 .\" and Copyright (C) 2008 Michael Kerrisk <mtk.manpages@gmail.com>
6 .\"
7 .\" %%%LICENSE_START(GPLv2+_DOC_FULL)
8 .\" This is free documentation; you can redistribute it and/or
9 .\" modify it under the terms of the GNU General Public License as
10 .\" published by the Free Software Foundation; either version 2 of
11 .\" the License, or (at your option) any later version.
12 .\"
13 .\" The GNU General Public License's references to "object code"
14 .\" and "executables" are to be interpreted as the output of any
15 .\" document formatting or typesetting system, including
16 .\" intermediate and printed output.
17 .\"
18 .\" This manual is distributed in the hope that it will be useful,
19 .\" but WITHOUT ANY WARRANTY; without even the implied warranty of
20 .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
21 .\" GNU General Public License for more details.
22 .\"
23 .\" You should have received a copy of the GNU General Public
24 .\" License along with this manual; if not, see
25 .\" <http://www.gnu.org/licenses/>.
26 .\" %%%LICENSE_END
27 .\"
28 .\" Worth looking at: http://rt.wiki.kernel.org/index.php
29 .\"
30 .TH SCHED 7 2014-04-28 "Linux" "Linux Programmer's Manual"
31 .SH NAME
32 sched \- overview of scheduling APIs
33 .SH DESCRIPTION
34 .SS API summary
35 The Linux scheduling APIs are as follows:
36 .TP
37 .BR sched_setscheduler (2)
38 Set the scheduling policy and parameters of a specified thread.
39 .TP
40 .BR sched_getscheduler (2)
41 Return the scheduling policy of a specified thread.
42 .TP
43 .BR sched_setparam (2)
44 Set the scheduling parameters of a specified thread.
45 .TP
46 .BR sched_getparam (2)
47 Fetch the scheduling parameters of a specified thread.
48 .TP
49 .BR sched_get_priority_max (2)
50 Return the minimum priority available in a specified scheduling policy.
51 .TP
52 .BR sched_get_priority_min (2)
53 Return the maximum priority available in a specified scheduling policy.
54 .TP
55 .BR sched_rr_get_interval (2)
56 Fetch the quantum used for threads that are scheduled under
57 the "round-robin" scheduling policy.
58 .TP
59 .BR sched_yield (2)
60 Cause the caller to relinquish the CPU,
61 so that some other thread be executed.
62 .TP
63 .BR sched_setaffinity (2)
64 (Linux-specific)
65 Set the CPU affinity of a specified thread.
66 .TP
67 .BR sched_getaffinity (2)
68 (Linux-specific)
69 Get the CPU affinity of a specified thread.
70 .TP
71 .BR sched_setattr (2)
72 Set the scheduling policy and parameters of a specified thread.
73 This (Linux-specific) system call provides a superset of the functionality of
74 .BR sched_setscheduler (2)
75 and
76 .BR sched_setparam (2).
77 .TP
78 .BR sched_getattr (2)
79 Fetch the scheduling policy and parameters of a specified thread.
80 This (Linux-specific) system call provides a superset of the functionality of
81 .BR sched_getscheduler (2)
82 and
83 .BR sched_getparam (2).
84 .\"
85 .SS Scheduling policies
86 The scheduler is the kernel component that decides which runnable thread
87 will be executed by the CPU next.
88 Each thread has an associated scheduling policy and a \fIstatic\fP
89 scheduling priority, \fIsched_priority\fP; these are the settings
90 that are modified by
91 .BR sched_setscheduler ().
92 The scheduler makes its decisions based on knowledge of the scheduling
93 policy and static priority of all threads on the system.
94
95 For threads scheduled under one of the normal scheduling policies
96 (\fBSCHED_OTHER\fP, \fBSCHED_IDLE\fP, \fBSCHED_BATCH\fP),
97 \fIsched_priority\fP is not used in scheduling
98 decisions (it must be specified as 0).
99
100 Processes scheduled under one of the real-time policies
101 (\fBSCHED_FIFO\fP, \fBSCHED_RR\fP) have a
102 \fIsched_priority\fP value in the range 1 (low) to 99 (high).
103 (As the numbers imply, real-time threads always have higher priority
104 than normal threads.)
105 Note well: POSIX.1-2001 requires an implementation to support only a
106 minimum 32 distinct priority levels for the real-time policies,
107 and some systems supply just this minimum.
108 Portable programs should use
109 .BR sched_get_priority_min (2)
110 and
111 .BR sched_get_priority_max (2)
112 to find the range of priorities supported for a particular policy.
113
114 Conceptually, the scheduler maintains a list of runnable
115 threads for each possible \fIsched_priority\fP value.
116 In order to determine which thread runs next, the scheduler looks for
117 the nonempty list with the highest static priority and selects the
118 thread at the head of this list.
119
120 A thread's scheduling policy determines
121 where it will be inserted into the list of threads
122 with equal static priority and how it will move inside this list.
123
124 All scheduling is preemptive: if a thread with a higher static
125 priority becomes ready to run, the currently running thread
126 will be preempted and
127 returned to the wait list for its static priority level.
128 The scheduling policy determines the
129 ordering only within the list of runnable threads with equal static
130 priority.
131 .SS SCHED_FIFO: First in-first out scheduling
132 \fBSCHED_FIFO\fP can be used only with static priorities higher than
133 0, which means that when a \fBSCHED_FIFO\fP threads becomes runnable,
134 it will always immediately preempt any currently running
135 \fBSCHED_OTHER\fP, \fBSCHED_BATCH\fP, or \fBSCHED_IDLE\fP thread.
136 \fBSCHED_FIFO\fP is a simple scheduling
137 algorithm without time slicing.
138 For threads scheduled under the
139 \fBSCHED_FIFO\fP policy, the following rules apply:
140 .IP * 3
141 A \fBSCHED_FIFO\fP thread that has been preempted by another thread of
142 higher priority will stay at the head of the list for its priority and
143 will resume execution as soon as all threads of higher priority are
144 blocked again.
145 .IP *
146 When a \fBSCHED_FIFO\fP thread becomes runnable, it
147 will be inserted at the end of the list for its priority.
148 .IP *
149 A call to
150 .BR sched_setscheduler ()
151 or
152 .BR sched_setparam (2)
153 will put the
154 \fBSCHED_FIFO\fP (or \fBSCHED_RR\fP) thread identified by
155 \fIpid\fP at the start of the list if it was runnable.
156 As a consequence, it may preempt the currently running thread if
157 it has the same priority.
158 (POSIX.1-2001 specifies that the thread should go to the end
159 of the list.)
160 .\" In 2.2.x and 2.4.x, the thread is placed at the front of the queue
161 .\" In 2.0.x, the Right Thing happened: the thread went to the back -- MTK
162 .IP *
163 A thread calling
164 .BR sched_yield (2)
165 will be put at the end of the list.
166 .PP
167 No other events will move a thread
168 scheduled under the \fBSCHED_FIFO\fP policy in the wait list of
169 runnable threads with equal static priority.
170
171 A \fBSCHED_FIFO\fP
172 thread runs until either it is blocked by an I/O request, it is
173 preempted by a higher priority thread, or it calls
174 .BR sched_yield (2).
175 .SS SCHED_RR: Round-robin scheduling
176 \fBSCHED_RR\fP is a simple enhancement of \fBSCHED_FIFO\fP.
177 Everything
178 described above for \fBSCHED_FIFO\fP also applies to \fBSCHED_RR\fP,
179 except that each thread is allowed to run only for a maximum time
180 quantum.
181 If a \fBSCHED_RR\fP thread has been running for a time
182 period equal to or longer than the time quantum, it will be put at the
183 end of the list for its priority.
184 A \fBSCHED_RR\fP thread that has
185 been preempted by a higher priority thread and subsequently resumes
186 execution as a running thread will complete the unexpired portion of
187 its round-robin time quantum.
188 The length of the time quantum can be
189 retrieved using
190 .BR sched_rr_get_interval (2).
191 .\" On Linux 2.4, the length of the RR interval is influenced
192 .\" by the process nice value -- MTK
193 .\"
194 .SS SCHED_OTHER: Default Linux time-sharing scheduling
195 \fBSCHED_OTHER\fP can be used at only static priority 0.
196 \fBSCHED_OTHER\fP is the standard Linux time-sharing scheduler that is
197 intended for all threads that do not require the special
198 real-time mechanisms.
199 The thread to run is chosen from the static
200 priority 0 list based on a \fIdynamic\fP priority that is determined only
201 inside this list.
202 The dynamic priority is based on the nice value (set by
203 .BR nice (2)
204 or
205 .BR setpriority (2))
206 and increased for each time quantum the thread is ready to run,
207 but denied to run by the scheduler.
208 This ensures fair progress among all \fBSCHED_OTHER\fP threads.
209 .\"
210 .SS SCHED_BATCH: Scheduling batch processes
211 (Since Linux 2.6.16.)
212 \fBSCHED_BATCH\fP can be used only at static priority 0.
213 This policy is similar to \fBSCHED_OTHER\fP in that it schedules
214 the thread according to its dynamic priority
215 (based on the nice value).
216 The difference is that this policy
217 will cause the scheduler to always assume
218 that the thread is CPU-intensive.
219 Consequently, the scheduler will apply a small scheduling
220 penalty with respect to wakeup behaviour,
221 so that this thread is mildly disfavored in scheduling decisions.
222
223 .\" The following paragraph is drawn largely from the text that
224 .\" accompanied Ingo Molnar's patch for the implementation of
225 .\" SCHED_BATCH.
226 .\" commit b0a9499c3dd50d333e2aedb7e894873c58da3785
227 This policy is useful for workloads that are noninteractive,
228 but do not want to lower their nice value,
229 and for workloads that want a deterministic scheduling policy without
230 interactivity causing extra preemptions (between the workload's tasks).
231 .\"
232 .SS SCHED_IDLE: Scheduling very low priority jobs
233 (Since Linux 2.6.23.)
234 \fBSCHED_IDLE\fP can be used only at static priority 0;
235 the process nice value has no influence for this policy.
236
237 This policy is intended for running jobs at extremely low
238 priority (lower even than a +19 nice value with the
239 .B SCHED_OTHER
240 or
241 .B SCHED_BATCH
242 policies).
243 .\"
244 .SS Resetting scheduling policy for child processes
245 Each thread has a reset-on-fork scheduling flag.
246 When this flag is set, children created by
247 .BR fork (2)
248 do not inherit privileged scheduling policies.
249 The reset-on-fork flag can be set by either:
250 .IP * 3
251 ORing the
252 .B SCHED_RESET_ON_FORK
253 flag into the
254 .I policy
255 argument when calling
256 .BR sched_setscheduler (2)
257 (since Linux 2.6.32);
258 or
259 .IP *
260 specifying the
261 .B SCHED_FLAG_RESET_ON_FORK
262 flag in
263 .IR attr.sched_flags
264 when calling
265 .BR sched_setattr (2).
266 .PP
267 Note that the constants used with these two APIs have different names.
268 The state of the reset-on-fork flag can analogously be retrieved using
269 .BR sched_getscheduler (2)
270 and
271 .BR sched_getattr (2).
272
273 The reset-on-fork feature is intended for media-playback applications,
274 and can be used to prevent applications evading the
275 .BR RLIMIT_RTTIME
276 resource limit (see
277 .BR getrlimit (2))
278 by creating multiple child processes.
279
280 More precisely, if the reset-on-fork flag is set,
281 the following rules apply for subsequently created children:
282 .IP * 3
283 If the calling thread has a scheduling policy of
284 .B SCHED_FIFO
285 or
286 .BR SCHED_RR ,
287 the policy is reset to
288 .BR SCHED_OTHER
289 in child processes.
290 .IP *
291 If the calling process has a negative nice value,
292 the nice value is reset to zero in child processes.
293 .PP
294 After the reset-on-fork flag has been enabled,
295 it can be reset only if the thread has the
296 .BR CAP_SYS_NICE
297 capability.
298 This flag is disabled in child processes created by
299 .BR fork (2).
300 .\"
301 .SS Privileges and resource limits
302 In Linux kernels before 2.6.12, only privileged
303 .RB ( CAP_SYS_NICE )
304 threads can set a nonzero static priority (i.e., set a real-time
305 scheduling policy).
306 The only change that an unprivileged thread can make is to set the
307 .B SCHED_OTHER
308 policy, and this can be done only if the effective user ID of the caller of
309 .BR sched_setscheduler ()
310 matches the real or effective user ID of the target thread
311 (i.e., the thread specified by
312 .IR pid )
313 whose policy is being changed.
314
315 Since Linux 2.6.12, the
316 .B RLIMIT_RTPRIO
317 resource limit defines a ceiling on an unprivileged thread's
318 static priority for the
319 .B SCHED_RR
320 and
321 .B SCHED_FIFO
322 policies.
323 The rules for changing scheduling policy and priority are as follows:
324 .IP * 3
325 If an unprivileged thread has a nonzero
326 .B RLIMIT_RTPRIO
327 soft limit, then it can change its scheduling policy and priority,
328 subject to the restriction that the priority cannot be set to a
329 value higher than the maximum of its current priority and its
330 .B RLIMIT_RTPRIO
331 soft limit.
332 .IP *
333 If the
334 .B RLIMIT_RTPRIO
335 soft limit is 0, then the only permitted changes are to lower the priority,
336 or to switch to a non-real-time policy.
337 .IP *
338 Subject to the same rules,
339 another unprivileged thread can also make these changes,
340 as long as the effective user ID of the thread making the change
341 matches the real or effective user ID of the target thread.
342 .IP *
343 Special rules apply for the
344 .BR SCHED_IDLE
345 policy.
346 In Linux kernels before 2.6.39,
347 an unprivileged thread operating under this policy cannot
348 change its policy, regardless of the value of its
349 .BR RLIMIT_RTPRIO
350 resource limit.
351 In Linux kernels since 2.6.39,
352 .\" commit c02aa73b1d18e43cfd79c2f193b225e84ca497c8
353 an unprivileged thread can switch to either the
354 .BR SCHED_BATCH
355 or the
356 .BR SCHED_NORMAL
357 policy so long as its nice value falls within the range permitted by its
358 .BR RLIMIT_NICE
359 resource limit (see
360 .BR getrlimit (2)).
361 .PP
362 Privileged
363 .RB ( CAP_SYS_NICE )
364 threads ignore the
365 .B RLIMIT_RTPRIO
366 limit; as with older kernels,
367 they can make arbitrary changes to scheduling policy and priority.
368 See
369 .BR getrlimit (2)
370 for further information on
371 .BR RLIMIT_RTPRIO .
372 .SS Response time
373 A blocked high priority thread waiting for I/O has a certain
374 response time before it is scheduled again.
375 The device driver writer
376 can greatly reduce this response time by using a "slow interrupt"
377 interrupt handler.
378 .\" as described in
379 .\" .BR request_irq (9).
380 .SS Miscellaneous
381 Child processes inherit the scheduling policy and parameters across a
382 .BR fork (2).
383 The scheduling policy and parameters are preserved across
384 .BR execve (2).
385
386 Memory locking is usually needed for real-time processes to avoid
387 paging delays; this can be done with
388 .BR mlock (2)
389 or
390 .BR mlockall (2).
391
392 Since a nonblocking infinite loop in a thread scheduled under
393 \fBSCHED_FIFO\fP or \fBSCHED_RR\fP will block all threads with lower
394 priority forever, a software developer should always keep available on
395 the console a shell scheduled under a higher static priority than the
396 tested application.
397 This will allow an emergency kill of tested
398 real-time applications that do not block or terminate as expected.
399 See also the description of the
400 .BR RLIMIT_RTTIME
401 resource limit in
402 .BR getrlimit (2).
403 .SH NOTES
404 .PP
405 Originally, Standard Linux was intended as a general-purpose operating
406 system being able to handle background processes, interactive
407 applications, and less demanding real-time applications (applications that
408 need to usually meet timing deadlines).
409 Although the Linux kernel 2.6
410 allowed for kernel preemption and the newly introduced O(1) scheduler
411 ensures that the time needed to schedule is fixed and deterministic
412 irrespective of the number of active tasks, true real-time computing
413 was not possible up to kernel version 2.6.17.
414 .SS Real-time features in the mainline Linux kernel
415 .\" FIXME . Probably this text will need some minor tweaking
416 .\" by about the time of 2.6.30; ask Carsten Emde about this then.
417 From kernel version 2.6.18 onward, however, Linux is gradually
418 becoming equipped with real-time capabilities,
419 most of which are derived from the former
420 .I realtime-preempt
421 patches developed by Ingo Molnar, Thomas Gleixner,
422 Steven Rostedt, and others.
423 Until the patches have been completely merged into the
424 mainline kernel
425 (this is expected to be around kernel version 2.6.30),
426 they must be installed to achieve the best real-time performance.
427 These patches are named:
428 .in +4n
429 .nf
430
431 patch-\fIkernelversion\fP-rt\fIpatchversion\fP
432 .fi
433 .in
434 .PP
435 and can be downloaded from
436 .UR http://www.kernel.org\:/pub\:/linux\:/kernel\:/projects\:/rt/
437 .UE .
438
439 Without the patches and prior to their full inclusion into the mainline
440 kernel, the kernel configuration offers only the three preemption classes
441 .BR CONFIG_PREEMPT_NONE ,
442 .BR CONFIG_PREEMPT_VOLUNTARY ,
443 and
444 .B CONFIG_PREEMPT_DESKTOP
445 which respectively provide no, some, and considerable
446 reduction of the worst-case scheduling latency.
447
448 With the patches applied or after their full inclusion into the mainline
449 kernel, the additional configuration item
450 .B CONFIG_PREEMPT_RT
451 becomes available.
452 If this is selected, Linux is transformed into a regular
453 real-time operating system.
454 The FIFO and RR scheduling policies that can be selected using
455 .BR sched_setscheduler ()
456 are then used to run a thread
457 with true real-time priority and a minimum worst-case scheduling latency.
458 .SH SEE ALSO
459 .ad l
460 .nh
461 .BR chrt (1),
462 .BR getpriority (2),
463 .BR mlock (2),
464 .BR mlockall (2),
465 .BR munlock (2),
466 .BR munlockall (2),
467 .BR nice (2),
468 .BR sched_get_priority_max (2),
469 .BR sched_get_priority_min (2),
470 .BR sched_getscheduler (2),
471 .BR sched_getaffinity (2),
472 .BR sched_getparam (2),
473 .BR sched_rr_get_interval (2),
474 .BR sched_setaffinity (2),
475 .BR sched_setscheduler (2),
476 .BR sched_setparam (2),
477 .BR sched_yield (2),
478 .BR setpriority (2),
479 .BR pthread_getaffinity_np (3),
480 .BR pthread_setaffinity_np (3),
481 .BR sched_getcpu (3),
482 .BR capabilities (7),
483 .BR cpuset (7)
484 .ad
485 .PP
486 .I Programming for the real world \- POSIX.4
487 by Bill O. Gallmeister, O'Reilly & Associates, Inc., ISBN 1-56592-074-0.
488 .PP
489 The Linux kernel source files
490 .IR Documentation/scheduler/sched-deadline.txt ,
491 .IR Documentation/scheduler/sched-rt-group.txt ,
492 .IR Documentation/scheduler/sched-design-CFS.txt ,
493 and
494 .IR Documentation/scheduler/sched-nice-design.txt