1 .\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com>
2 .\" and Copyright (C) 2014 Peter Zijlstra <peterz@infradead.org>
3 .\" and Copyright (C) 2014 Juri Lelli <juri.lelli@gmail.com>
4 .\" Various pieces from the old sched_setscheduler(2) page
5 .\" Copyright (C) Tom Bjorkholm, Markus Kuhn & David A. Wheeler 1996-1999
6 .\" and Copyright (C) 2007 Carsten Emde <Carsten.Emde@osadl.org>
7 .\" and Copyright (C) 2008 Michael Kerrisk <mtk.manpages@gmail.com>
9 .\" %%%LICENSE_START(GPLv2+_DOC_FULL)
10 .\" This is free documentation; you can redistribute it and/or
11 .\" modify it under the terms of the GNU General Public License as
12 .\" published by the Free Software Foundation; either version 2 of
13 .\" the License, or (at your option) any later version.
15 .\" The GNU General Public License's references to "object code"
16 .\" and "executables" are to be interpreted as the output of any
17 .\" document formatting or typesetting system, including
18 .\" intermediate and printed output.
20 .\" This manual is distributed in the hope that it will be useful,
21 .\" but WITHOUT ANY WARRANTY; without even the implied warranty of
22 .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
23 .\" GNU General Public License for more details.
25 .\" You should have received a copy of the GNU General Public
26 .\" License along with this manual; if not, see
27 .\" <http://www.gnu.org/licenses/>.
30 .\" Worth looking at: http://rt.wiki.kernel.org/index.php
32 .TH SCHED 7 2014-05-12 "Linux" "Linux Programmer's Manual"
34 sched \- overview of scheduling APIs
37 The Linux scheduling APIs are as follows:
39 .BR sched_setscheduler (2)
40 Set the scheduling policy and parameters of a specified thread.
42 .BR sched_getscheduler (2)
43 Return the scheduling policy of a specified thread.
45 .BR sched_setparam (2)
46 Set the scheduling parameters of a specified thread.
48 .BR sched_getparam (2)
49 Fetch the scheduling parameters of a specified thread.
51 .BR sched_get_priority_max (2)
52 Return the minimum priority available in a specified scheduling policy.
54 .BR sched_get_priority_min (2)
55 Return the maximum priority available in a specified scheduling policy.
57 .BR sched_rr_get_interval (2)
58 Fetch the quantum used for threads that are scheduled under
59 the "round-robin" scheduling policy.
62 Cause the caller to relinquish the CPU,
63 so that some other thread be executed.
65 .BR sched_setaffinity (2)
67 Set the CPU affinity of a specified thread.
69 .BR sched_getaffinity (2)
71 Get the CPU affinity of a specified thread.
74 Set the scheduling policy and parameters of a specified thread.
75 This (Linux-specific) system call provides a superset of the functionality of
76 .BR sched_setscheduler (2)
78 .BR sched_setparam (2).
81 Fetch the scheduling policy and parameters of a specified thread.
82 This (Linux-specific) system call provides a superset of the functionality of
83 .BR sched_getscheduler (2)
85 .BR sched_getparam (2).
87 .SS Scheduling policies
88 The scheduler is the kernel component that decides which runnable thread
89 will be executed by the CPU next.
90 Each thread has an associated scheduling policy and a \fIstatic\fP
93 The scheduler makes its decisions based on knowledge of the scheduling
94 policy and static priority of all threads on the system.
96 For threads scheduled under one of the normal scheduling policies
97 (\fBSCHED_OTHER\fP, \fBSCHED_IDLE\fP, \fBSCHED_BATCH\fP),
98 \fIsched_priority\fP is not used in scheduling
99 decisions (it must be specified as 0).
101 Processes scheduled under one of the real-time policies
102 (\fBSCHED_FIFO\fP, \fBSCHED_RR\fP) have a
103 \fIsched_priority\fP value in the range 1 (low) to 99 (high).
104 (As the numbers imply, real-time threads always have higher priority
105 than normal threads.)
106 Note well: POSIX.1-2001 requires an implementation to support only a
107 minimum 32 distinct priority levels for the real-time policies,
108 and some systems supply just this minimum.
109 Portable programs should use
110 .BR sched_get_priority_min (2)
112 .BR sched_get_priority_max (2)
113 to find the range of priorities supported for a particular policy.
115 Conceptually, the scheduler maintains a list of runnable
116 threads for each possible \fIsched_priority\fP value.
117 In order to determine which thread runs next, the scheduler looks for
118 the nonempty list with the highest static priority and selects the
119 thread at the head of this list.
121 A thread's scheduling policy determines
122 where it will be inserted into the list of threads
123 with equal static priority and how it will move inside this list.
125 All scheduling is preemptive: if a thread with a higher static
126 priority becomes ready to run, the currently running thread
127 will be preempted and
128 returned to the wait list for its static priority level.
129 The scheduling policy determines the
130 ordering only within the list of runnable threads with equal static
132 .SS SCHED_FIFO: First in-first out scheduling
133 \fBSCHED_FIFO\fP can be used only with static priorities higher than
134 0, which means that when a \fBSCHED_FIFO\fP threads becomes runnable,
135 it will always immediately preempt any currently running
136 \fBSCHED_OTHER\fP, \fBSCHED_BATCH\fP, or \fBSCHED_IDLE\fP thread.
137 \fBSCHED_FIFO\fP is a simple scheduling
138 algorithm without time slicing.
139 For threads scheduled under the
140 \fBSCHED_FIFO\fP policy, the following rules apply:
142 A \fBSCHED_FIFO\fP thread that has been preempted by another thread of
143 higher priority will stay at the head of the list for its priority and
144 will resume execution as soon as all threads of higher priority are
147 When a \fBSCHED_FIFO\fP thread becomes runnable, it
148 will be inserted at the end of the list for its priority.
151 .BR sched_setscheduler (2),
152 .BR sched_setparam (2),
154 .BR sched_setattr (2)
156 \fBSCHED_FIFO\fP (or \fBSCHED_RR\fP) thread identified by
157 \fIpid\fP at the start of the list if it was runnable.
158 As a consequence, it may preempt the currently running thread if
159 it has the same priority.
160 (POSIX.1-2001 specifies that the thread should go to the end
162 .\" In 2.2.x and 2.4.x, the thread is placed at the front of the queue
163 .\" In 2.0.x, the Right Thing happened: the thread went to the back -- MTK
167 will be put at the end of the list.
169 No other events will move a thread
170 scheduled under the \fBSCHED_FIFO\fP policy in the wait list of
171 runnable threads with equal static priority.
174 thread runs until either it is blocked by an I/O request, it is
175 preempted by a higher priority thread, or it calls
177 .SS SCHED_RR: Round-robin scheduling
178 \fBSCHED_RR\fP is a simple enhancement of \fBSCHED_FIFO\fP.
180 described above for \fBSCHED_FIFO\fP also applies to \fBSCHED_RR\fP,
181 except that each thread is allowed to run only for a maximum time
183 If a \fBSCHED_RR\fP thread has been running for a time
184 period equal to or longer than the time quantum, it will be put at the
185 end of the list for its priority.
186 A \fBSCHED_RR\fP thread that has
187 been preempted by a higher priority thread and subsequently resumes
188 execution as a running thread will complete the unexpired portion of
189 its round-robin time quantum.
190 The length of the time quantum can be
192 .BR sched_rr_get_interval (2).
193 .\" On Linux 2.4, the length of the RR interval is influenced
194 .\" by the process nice value -- MTK
196 .SS SCHED_DEADLINE: Sporadic task model deadline scheduling
197 Since version 3.14, Linux provides a deadline scheduling policy
198 .RB ( SCHED_DEADLINE ).
199 This policy is currently implemented using
200 GEDF (Global Earliest Deadline First)
201 in conjunction with CBS (Constant Bandwidth Server).
202 To set and fetch this policy and associated attributes,
203 one must use the Linux-specific
204 .BR sched_setattr (2)
206 .BR sched_getattr (2)
209 A sporadic task is one that has a sequence of jobs, where each
210 job is activated at most once per period.
212 .IR "relative deadline" ,
213 before which it should finish execution, and a
214 .IR "computation time" ,
215 which is the CPU time necessary for executing the job.
216 The moment when a task wakes up
217 because a new job has to be executed is called the
219 (also referred to as the request time or release time).
222 is the time at which a task starts its execution.
224 .I "absolute deadline"
225 is thus obtained by adding the relative deadline to the arrival time.
227 The following diagram clarifies these terms:
231 arrival/wakeup absolute deadline
235 -----x--------xooooooooooooooooo--------x--------x---
237 |<------- relative deadline ------>|
238 |<-------------- period ------------------->|
244 policy for a thread using
245 .BR sched_setattr (2),
246 one can specify three parameters:
251 These parameters do not necessarily correspond to the aforementioned terms:
252 usual practice is to set Runtime to something bigger than the average
253 computation time (or worst-case execution time for hard real-time tasks),
254 Deadline to the relative deadline, and Period to the period of the task.
261 arrival/wakeup absolute deadline
265 -----x--------xooooooooooooooooo--------x--------x---
266 |<-- Runtime ------->|
267 |<----------- Deadline ----------->|
268 |<-------------- Period ------------------->|
272 The three deadline-scheduling parameters correspond to the
280 .BR sched_setattr (2).
281 These fields express value in nanoseconds.
282 .\" FIXME It looks as though specifying sched_period as 0 means
283 .\" "make sched_period the same as sched_deadline", right?
284 .\" This needs to be documented.
287 is specified as 0, then it is made the same as
290 The kernel requires that:
292 sched_runtime <= sched_deadline <= sched_period
294 .\" See __checkparam_dl in kernel/sched/core.c
295 In addition, under the current implementation,
296 all of the parameter values must be at least 1024
297 (i.e., just over one microsecond,
298 which is the resolution of the implementation), and less than 2^63.
299 If any of these checks fails,
300 .BR sched_setattr (2)
304 The CBS guarantees non-interference between tasks, by throttling
305 threads that attempt to over-run their specified Runtime.
307 To ensure deadline scheduling guarantees,
308 the kernel must prevent situations where the set of
310 threads is not feasible (schedulable) within the given constraints.
311 The kernel thus performs an admittance test when setting or changing
313 policy and attributes.
314 This admission test calculates whether the change is feasible;
316 .BR sched_setattr (2)
320 For example, it is required (but not necessarily sufficient) for
321 the total utilization to be less than or equal to the total number of
322 CPUs available, where, since each thread can maximally run for
323 Runtime per Period, that thread's utilization is its
324 Runtime divided by its Period.
326 In order to fulfil the guarantees that are made when
327 a thread is admitted to the
331 threads are the highest priority (user controllable) threads in the
335 it will preempt any thread scheduled under one of the other policies.
339 by a thread scheduled under the
341 policy will fail with the error
343 unless the thread has its reset-on-fork flag set (see below).
349 will yield the current job and wait for a new period to begin.
351 .\" FIXME Calling sched_getparam() on a SCHED_DEADLINE thread
352 .\" fails with EINVAL, but sched_getscheduler() succeeds.
353 .\" Is that intended? (Why?)
356 .SS SCHED_OTHER: Default Linux time-sharing scheduling
357 \fBSCHED_OTHER\fP can be used at only static priority 0.
358 \fBSCHED_OTHER\fP is the standard Linux time-sharing scheduler that is
359 intended for all threads that do not require the special
360 real-time mechanisms.
361 The thread to run is chosen from the static
362 priority 0 list based on a \fIdynamic\fP priority that is determined only
364 The dynamic priority is based on the nice value (set by
368 .BR sched_setattr (2))
369 and increased for each time quantum the thread is ready to run,
370 but denied to run by the scheduler.
371 This ensures fair progress among all \fBSCHED_OTHER\fP threads.
373 .SS SCHED_BATCH: Scheduling batch processes
374 (Since Linux 2.6.16.)
375 \fBSCHED_BATCH\fP can be used only at static priority 0.
376 This policy is similar to \fBSCHED_OTHER\fP in that it schedules
377 the thread according to its dynamic priority
378 (based on the nice value).
379 The difference is that this policy
380 will cause the scheduler to always assume
381 that the thread is CPU-intensive.
382 Consequently, the scheduler will apply a small scheduling
383 penalty with respect to wakeup behaviour,
384 so that this thread is mildly disfavored in scheduling decisions.
386 .\" The following paragraph is drawn largely from the text that
387 .\" accompanied Ingo Molnar's patch for the implementation of
389 .\" commit b0a9499c3dd50d333e2aedb7e894873c58da3785
390 This policy is useful for workloads that are noninteractive,
391 but do not want to lower their nice value,
392 and for workloads that want a deterministic scheduling policy without
393 interactivity causing extra preemptions (between the workload's tasks).
395 .SS SCHED_IDLE: Scheduling very low priority jobs
396 (Since Linux 2.6.23.)
397 \fBSCHED_IDLE\fP can be used only at static priority 0;
398 the process nice value has no influence for this policy.
400 This policy is intended for running jobs at extremely low
401 priority (lower even than a +19 nice value with the
407 .SS Resetting scheduling policy for child processes
408 Each thread has a reset-on-fork scheduling flag.
409 When this flag is set, children created by
411 do not inherit privileged scheduling policies.
412 The reset-on-fork flag can be set by either:
415 .B SCHED_RESET_ON_FORK
418 argument when calling
419 .BR sched_setscheduler (2)
420 (since Linux 2.6.32);
424 .B SCHED_FLAG_RESET_ON_FORK
428 .BR sched_setattr (2).
430 Note that the constants used with these two APIs have different names.
431 The state of the reset-on-fork flag can analogously be retrieved using
432 .BR sched_getscheduler (2)
434 .BR sched_getattr (2).
436 The reset-on-fork feature is intended for media-playback applications,
437 and can be used to prevent applications evading the
441 by creating multiple child processes.
443 More precisely, if the reset-on-fork flag is set,
444 the following rules apply for subsequently created children:
446 If the calling thread has a scheduling policy of
450 the policy is reset to
454 If the calling process has a negative nice value,
455 the nice value is reset to zero in child processes.
457 After the reset-on-fork flag has been enabled,
458 it can be reset only if the thread has the
461 This flag is disabled in child processes created by
464 .SS Privileges and resource limits
465 In Linux kernels before 2.6.12, only privileged
467 threads can set a nonzero static priority (i.e., set a real-time
469 The only change that an unprivileged thread can make is to set the
471 policy, and this can be done only if the effective user ID of the caller
472 matches the real or effective user ID of the target thread
473 (i.e., the thread specified by
475 whose policy is being changed.
477 A thread must be privileged
479 in order to set or modify a
483 Since Linux 2.6.12, the
485 resource limit defines a ceiling on an unprivileged thread's
486 static priority for the
491 The rules for changing scheduling policy and priority are as follows:
493 If an unprivileged thread has a nonzero
495 soft limit, then it can change its scheduling policy and priority,
496 subject to the restriction that the priority cannot be set to a
497 value higher than the maximum of its current priority and its
503 soft limit is 0, then the only permitted changes are to lower the priority,
504 or to switch to a non-real-time policy.
506 Subject to the same rules,
507 another unprivileged thread can also make these changes,
508 as long as the effective user ID of the thread making the change
509 matches the real or effective user ID of the target thread.
511 Special rules apply for the
514 In Linux kernels before 2.6.39,
515 an unprivileged thread operating under this policy cannot
516 change its policy, regardless of the value of its
519 In Linux kernels since 2.6.39,
520 .\" commit c02aa73b1d18e43cfd79c2f193b225e84ca497c8
521 an unprivileged thread can switch to either the
525 policy so long as its nice value falls within the range permitted by its
534 limit; as with older kernels,
535 they can make arbitrary changes to scheduling policy and priority.
538 for further information on
540 .SS Limiting the CPU usage of real-time and deadline processes
541 A nonblocking infinite loop in a thread scheduled under the
546 policy will block all threads with lower
548 Prior to Linux 2.6.25, the only way of preventing a runaway real-time
549 process from freezing the system was to run (at the console)
550 a shell scheduled under a higher static priority than the tested application.
551 This allows an emergency kill of tested
552 real-time applications that do not block or terminate as expected.
554 Since Linux 2.6.25, there are other techniques for dealing with runaway
555 real-time and deadline processes.
556 One of these is to use the
558 resource limit to set a ceiling on the CPU time that
559 a real-time process may consume.
564 Since version 2.6.25, Linux also provides two
566 files that can be used to reserve a certain amount of CPU time
567 to be used by non-real-time processes.
568 Reserving some CPU time in this fashion allows some CPU time to be
569 allocated to (say) a root shell that can be used to kill a runaway process.
570 Both of these files specify time values in microseconds:
572 .IR /proc/sys/kernel/sched_rt_period_us
573 This file specifies a scheduling period that is equivalent to
575 The value in this file can range from 1 to
577 giving an operating range of 1 microsecond to around 35 minutes.
578 The default value in this file is 1,000,000 (1 second).
580 .IR /proc/sys/kernel/sched_rt_runtime_us
581 The value in this file specifies how much of the "period" time
582 can be used by all real-time and deadline scheduled processes
584 The value in this file can range from \-1 to
586 Specifying \-1 makes the runtime the same as the period;
587 that is, no CPU time is set aside for non-real-time processes
588 (which was the Linux behavior before kernel 2.6.25).
589 The default value in this file is 950,000 (0.95 seconds),
590 meaning that 5% of the CPU time is reserved for processes that
591 don't run under a real-time or deadline scheduling policy.
594 A blocked high priority thread waiting for I/O has a certain
595 response time before it is scheduled again.
596 The device driver writer
597 can greatly reduce this response time by using a "slow interrupt"
600 .\" .BR request_irq (9).
602 Child processes inherit the scheduling policy and parameters across a
604 The scheduling policy and parameters are preserved across
607 Memory locking is usually needed for real-time processes to avoid
608 paging delays; this can be done with
614 Originally, Standard Linux was intended as a general-purpose operating
615 system being able to handle background processes, interactive
616 applications, and less demanding real-time applications (applications that
617 need to usually meet timing deadlines).
618 Although the Linux kernel 2.6
619 allowed for kernel preemption and the newly introduced O(1) scheduler
620 ensures that the time needed to schedule is fixed and deterministic
621 irrespective of the number of active tasks, true real-time computing
622 was not possible up to kernel version 2.6.17.
623 .SS Real-time features in the mainline Linux kernel
624 .\" FIXME . Probably this text will need some minor tweaking
625 .\" by about the time of 2.6.30; ask Carsten Emde about this then.
626 From kernel version 2.6.18 onward, however, Linux is gradually
627 becoming equipped with real-time capabilities,
628 most of which are derived from the former
630 patches developed by Ingo Molnar, Thomas Gleixner,
631 Steven Rostedt, and others.
632 Until the patches have been completely merged into the
634 (this is expected to be around kernel version 2.6.30),
635 they must be installed to achieve the best real-time performance.
636 These patches are named:
640 patch-\fIkernelversion\fP-rt\fIpatchversion\fP
644 and can be downloaded from
645 .UR http://www.kernel.org\:/pub\:/linux\:/kernel\:/projects\:/rt/
648 Without the patches and prior to their full inclusion into the mainline
649 kernel, the kernel configuration offers only the three preemption classes
650 .BR CONFIG_PREEMPT_NONE ,
651 .BR CONFIG_PREEMPT_VOLUNTARY ,
653 .B CONFIG_PREEMPT_DESKTOP
654 which respectively provide no, some, and considerable
655 reduction of the worst-case scheduling latency.
657 With the patches applied or after their full inclusion into the mainline
658 kernel, the additional configuration item
661 If this is selected, Linux is transformed into a regular
662 real-time operating system.
663 The FIFO and RR scheduling policies are then used to run a thread
664 with true real-time priority and a minimum worst-case scheduling latency.
675 .BR sched_get_priority_max (2),
676 .BR sched_get_priority_min (2),
677 .BR sched_getscheduler (2),
678 .BR sched_getaffinity (2),
679 .BR sched_getparam (2),
680 .BR sched_rr_get_interval (2),
681 .BR sched_setaffinity (2),
682 .BR sched_setscheduler (2),
683 .BR sched_setparam (2),
686 .BR pthread_getaffinity_np (3),
687 .BR pthread_setaffinity_np (3),
688 .BR sched_getcpu (3),
689 .BR capabilities (7),
693 .I Programming for the real world \- POSIX.4
694 by Bill O. Gallmeister, O'Reilly & Associates, Inc., ISBN 1-56592-074-0.
696 The Linux kernel source files
697 .IR Documentation/scheduler/sched-deadline.txt ,
698 .IR Documentation/scheduler/sched-rt-group.txt ,
699 .IR Documentation/scheduler/sched-design-CFS.txt ,
701 .IR Documentation/scheduler/sched-nice-design.txt