]>
Commit | Line | Data |
---|---|---|
59c06be3 | 1 | .\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com> |
7a0d1838 PZ |
2 | .\" and Copyright (C) 2014 Peter Zijlstra <peterz@infradead.org> |
3 | .\" and Copyright (C) 2014 Juri Lelli <juri.lelli@gmail.com> | |
59c06be3 MK |
4 | .\" Various pieces from the old sched_setscheduler(2) page |
5 | .\" Copyright (C) Tom Bjorkholm, Markus Kuhn & David A. Wheeler 1996-1999 | |
6 | .\" and Copyright (C) 2007 Carsten Emde <Carsten.Emde@osadl.org> | |
7 | .\" and Copyright (C) 2008 Michael Kerrisk <mtk.manpages@gmail.com> | |
8 | .\" | |
9 | .\" %%%LICENSE_START(GPLv2+_DOC_FULL) | |
10 | .\" This is free documentation; you can redistribute it and/or | |
11 | .\" modify it under the terms of the GNU General Public License as | |
12 | .\" published by the Free Software Foundation; either version 2 of | |
13 | .\" the License, or (at your option) any later version. | |
14 | .\" | |
15 | .\" The GNU General Public License's references to "object code" | |
16 | .\" and "executables" are to be interpreted as the output of any | |
17 | .\" document formatting or typesetting system, including | |
18 | .\" intermediate and printed output. | |
19 | .\" | |
20 | .\" This manual is distributed in the hope that it will be useful, | |
21 | .\" but WITHOUT ANY WARRANTY; without even the implied warranty of | |
22 | .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
23 | .\" GNU General Public License for more details. | |
24 | .\" | |
25 | .\" You should have received a copy of the GNU General Public | |
26 | .\" License along with this manual; if not, see | |
27 | .\" <http://www.gnu.org/licenses/>. | |
28 | .\" %%%LICENSE_END | |
29 | .\" | |
30 | .\" Worth looking at: http://rt.wiki.kernel.org/index.php | |
31 | .\" | |
bc65e772 | 32 | .TH SCHED 7 2014-10-02 "Linux" "Linux Programmer's Manual" |
59c06be3 MK |
33 | .SH NAME |
34 | sched \- overview of scheduling APIs | |
35 | .SH DESCRIPTION | |
b16695a3 MK |
36 | .SS API summary |
37 | The Linux scheduling APIs are as follows: | |
38 | .TP | |
39 | .BR sched_setscheduler (2) | |
40 | Set the scheduling policy and parameters of a specified thread. | |
41 | .TP | |
42 | .BR sched_getscheduler (2) | |
43 | Return the scheduling policy of a specified thread. | |
44 | .TP | |
45 | .BR sched_setparam (2) | |
46 | Set the scheduling parameters of a specified thread. | |
47 | .TP | |
48 | .BR sched_getparam (2) | |
49 | Fetch the scheduling parameters of a specified thread. | |
50 | .TP | |
51 | .BR sched_get_priority_max (2) | |
52 | Return the minimum priority available in a specified scheduling policy. | |
53 | .TP | |
54 | .BR sched_get_priority_min (2) | |
55 | Return the maximum priority available in a specified scheduling policy. | |
56 | .TP | |
5813ff92 | 57 | .BR sched_rr_get_interval (2) |
b16695a3 MK |
58 | Fetch the quantum used for threads that are scheduled under |
59 | the "round-robin" scheduling policy. | |
60 | .TP | |
61 | .BR sched_yield (2) | |
62 | Cause the caller to relinquish the CPU, | |
63 | so that some other thread be executed. | |
64 | .TP | |
65 | .BR sched_setaffinity (2) | |
66 | (Linux-specific) | |
67 | Set the CPU affinity of a specified thread. | |
68 | .TP | |
69 | .BR sched_getaffinity (2) | |
70 | (Linux-specific) | |
91f5e870 | 71 | Get the CPU affinity of a specified thread. |
b16695a3 MK |
72 | .TP |
73 | .BR sched_setattr (2) | |
77dab50a MK |
74 | Set the scheduling policy and parameters of a specified thread. |
75 | This (Linux-specific) system call provides a superset of the functionality of | |
76 | .BR sched_setscheduler (2) | |
77 | and | |
78 | .BR sched_setparam (2). | |
b16695a3 MK |
79 | .TP |
80 | .BR sched_getattr (2) | |
77dab50a MK |
81 | Fetch the scheduling policy and parameters of a specified thread. |
82 | This (Linux-specific) system call provides a superset of the functionality of | |
83 | .BR sched_getscheduler (2) | |
84 | and | |
85 | .BR sched_getparam (2). | |
b16695a3 | 86 | .\" |
59c06be3 MK |
87 | .SS Scheduling policies |
88 | The scheduler is the kernel component that decides which runnable thread | |
89 | will be executed by the CPU next. | |
90 | Each thread has an associated scheduling policy and a \fIstatic\fP | |
2e868ccc MK |
91 | scheduling priority, |
92 | .IR sched_priority . | |
961df2a8 | 93 | The scheduler makes its decisions based on knowledge of the scheduling |
59c06be3 MK |
94 | policy and static priority of all threads on the system. |
95 | ||
96 | For threads scheduled under one of the normal scheduling policies | |
97 | (\fBSCHED_OTHER\fP, \fBSCHED_IDLE\fP, \fBSCHED_BATCH\fP), | |
98 | \fIsched_priority\fP is not used in scheduling | |
99 | decisions (it must be specified as 0). | |
100 | ||
101 | Processes scheduled under one of the real-time policies | |
102 | (\fBSCHED_FIFO\fP, \fBSCHED_RR\fP) have a | |
103 | \fIsched_priority\fP value in the range 1 (low) to 99 (high). | |
104 | (As the numbers imply, real-time threads always have higher priority | |
105 | than normal threads.) | |
106 | Note well: POSIX.1-2001 requires an implementation to support only a | |
107 | minimum 32 distinct priority levels for the real-time policies, | |
108 | and some systems supply just this minimum. | |
109 | Portable programs should use | |
110 | .BR sched_get_priority_min (2) | |
111 | and | |
112 | .BR sched_get_priority_max (2) | |
113 | to find the range of priorities supported for a particular policy. | |
114 | ||
115 | Conceptually, the scheduler maintains a list of runnable | |
116 | threads for each possible \fIsched_priority\fP value. | |
117 | In order to determine which thread runs next, the scheduler looks for | |
118 | the nonempty list with the highest static priority and selects the | |
119 | thread at the head of this list. | |
120 | ||
121 | A thread's scheduling policy determines | |
122 | where it will be inserted into the list of threads | |
123 | with equal static priority and how it will move inside this list. | |
124 | ||
125 | All scheduling is preemptive: if a thread with a higher static | |
126 | priority becomes ready to run, the currently running thread | |
127 | will be preempted and | |
128 | returned to the wait list for its static priority level. | |
129 | The scheduling policy determines the | |
130 | ordering only within the list of runnable threads with equal static | |
131 | priority. | |
132 | .SS SCHED_FIFO: First in-first out scheduling | |
133 | \fBSCHED_FIFO\fP can be used only with static priorities higher than | |
134 | 0, which means that when a \fBSCHED_FIFO\fP threads becomes runnable, | |
135 | it will always immediately preempt any currently running | |
136 | \fBSCHED_OTHER\fP, \fBSCHED_BATCH\fP, or \fBSCHED_IDLE\fP thread. | |
137 | \fBSCHED_FIFO\fP is a simple scheduling | |
138 | algorithm without time slicing. | |
139 | For threads scheduled under the | |
140 | \fBSCHED_FIFO\fP policy, the following rules apply: | |
141 | .IP * 3 | |
142 | A \fBSCHED_FIFO\fP thread that has been preempted by another thread of | |
143 | higher priority will stay at the head of the list for its priority and | |
144 | will resume execution as soon as all threads of higher priority are | |
145 | blocked again. | |
146 | .IP * | |
147 | When a \fBSCHED_FIFO\fP thread becomes runnable, it | |
148 | will be inserted at the end of the list for its priority. | |
149 | .IP * | |
150 | A call to | |
4c2eb0c2 MK |
151 | .BR sched_setscheduler (2), |
152 | .BR sched_setparam (2), | |
59c06be3 | 153 | or |
4c2eb0c2 | 154 | .BR sched_setattr (2) |
59c06be3 MK |
155 | will put the |
156 | \fBSCHED_FIFO\fP (or \fBSCHED_RR\fP) thread identified by | |
157 | \fIpid\fP at the start of the list if it was runnable. | |
158 | As a consequence, it may preempt the currently running thread if | |
159 | it has the same priority. | |
160 | (POSIX.1-2001 specifies that the thread should go to the end | |
161 | of the list.) | |
162 | .\" In 2.2.x and 2.4.x, the thread is placed at the front of the queue | |
163 | .\" In 2.0.x, the Right Thing happened: the thread went to the back -- MTK | |
164 | .IP * | |
165 | A thread calling | |
166 | .BR sched_yield (2) | |
167 | will be put at the end of the list. | |
168 | .PP | |
169 | No other events will move a thread | |
170 | scheduled under the \fBSCHED_FIFO\fP policy in the wait list of | |
171 | runnable threads with equal static priority. | |
172 | ||
173 | A \fBSCHED_FIFO\fP | |
174 | thread runs until either it is blocked by an I/O request, it is | |
175 | preempted by a higher priority thread, or it calls | |
176 | .BR sched_yield (2). | |
177 | .SS SCHED_RR: Round-robin scheduling | |
178 | \fBSCHED_RR\fP is a simple enhancement of \fBSCHED_FIFO\fP. | |
179 | Everything | |
180 | described above for \fBSCHED_FIFO\fP also applies to \fBSCHED_RR\fP, | |
181 | except that each thread is allowed to run only for a maximum time | |
182 | quantum. | |
183 | If a \fBSCHED_RR\fP thread has been running for a time | |
184 | period equal to or longer than the time quantum, it will be put at the | |
185 | end of the list for its priority. | |
186 | A \fBSCHED_RR\fP thread that has | |
187 | been preempted by a higher priority thread and subsequently resumes | |
188 | execution as a running thread will complete the unexpired portion of | |
189 | its round-robin time quantum. | |
190 | The length of the time quantum can be | |
191 | retrieved using | |
192 | .BR sched_rr_get_interval (2). | |
193 | .\" On Linux 2.4, the length of the RR interval is influenced | |
194 | .\" by the process nice value -- MTK | |
195 | .\" | |
7a0d1838 | 196 | .SS SCHED_DEADLINE: Sporadic task model deadline scheduling |
9cc1fa25 MK |
197 | Since version 3.14, Linux provides a deadline scheduling policy |
198 | .RB ( SCHED_DEADLINE ). | |
199 | This policy is currently implemented using | |
200 | GEDF (Global Earliest Deadline First) | |
201 | in conjunction with CBS (Constant Bandwidth Server). | |
202 | To set and fetch this policy and associated attributes, | |
203 | one must use the Linux-specific | |
204 | .BR sched_setattr (2) | |
205 | and | |
206 | .BR sched_getattr (2) | |
207 | system calls. | |
7a0d1838 PZ |
208 | |
209 | A sporadic task is one that has a sequence of jobs, where each | |
91c98da6 | 210 | job is activated at most once per period. |
9cc1fa25 MK |
211 | Each job also has a |
212 | .IR "relative deadline" , | |
91c98da6 | 213 | before which it should finish execution, and a |
9cc1fa25 MK |
214 | .IR "computation time" , |
215 | which is the CPU time necessary for executing the job. | |
216 | The moment when a task wakes up | |
217 | because a new job has to be executed is called the | |
218 | .IR "arrival time" | |
219 | (also referred to as the request time or release time). | |
220 | The | |
221 | .IR "start time" | |
222 | is the time at which a task starts its execution. | |
223 | The | |
0da5e58a | 224 | .I "absolute deadline" |
9cc1fa25 | 225 | is thus obtained by adding the relative deadline to the arrival time. |
7a0d1838 PZ |
226 | |
227 | The following diagram clarifies these terms: | |
228 | ||
91c98da6 | 229 | .in +4n |
7a0d1838 | 230 | .nf |
7bd7f43e MK |
231 | arrival/wakeup absolute deadline |
232 | | start time | | |
233 | | | | | |
234 | v v v | |
235 | -----x--------xooooooooooooooooo--------x--------x--- | |
0756f58f | 236 | |<- comp. time ->| |
7bd7f43e MK |
237 | |<------- relative deadline ------>| |
238 | |<-------------- period ------------------->| | |
7a0d1838 | 239 | .fi |
91c98da6 | 240 | .in |
7a0d1838 | 241 | |
9cc1fa25 | 242 | When setting a |
91c98da6 | 243 | .B SCHED_DEADLINE |
9cc1fa25 MK |
244 | policy for a thread using |
245 | .BR sched_setattr (2), | |
246 | one can specify three parameters: | |
247 | .IR Runtime , | |
248 | .IR Deadline , | |
249 | and | |
250 | .IR Period . | |
251 | These parameters do not necessarily correspond to the aforementioned terms: | |
252 | usual practice is to set Runtime to something bigger than the average | |
253 | computation time (or worst-case execution time for hard real-time tasks), | |
254 | Deadline to the relative deadline, and Period to the period of the task. | |
255 | Thus, for | |
256 | .BR SCHED_DEADLINE | |
257 | scheduling, we have: | |
7a0d1838 | 258 | |
91c98da6 | 259 | .in +4n |
7a0d1838 | 260 | .nf |
7bd7f43e MK |
261 | arrival/wakeup absolute deadline |
262 | | start time | | |
263 | | | | | |
264 | v v v | |
265 | -----x--------xooooooooooooooooo--------x--------x--- | |
266 | |<-- Runtime ------->| | |
267 | |<----------- Deadline ----------->| | |
268 | |<-------------- Period ------------------->| | |
7a0d1838 | 269 | .fi |
91c98da6 | 270 | .in |
7a0d1838 | 271 | |
9cc1fa25 MK |
272 | The three deadline-scheduling parameters correspond to the |
273 | .IR sched_runtime , | |
274 | .IR sched_deadline , | |
275 | and | |
276 | .IR sched_period | |
277 | fields of the | |
278 | .I sched_attr | |
279 | structure; see | |
280 | .BR sched_setattr (2). | |
281 | These fields express value in nanoseconds. | |
282 | .\" FIXME It looks as though specifying sched_period as 0 means | |
bea08fec | 283 | .\" "make sched_period the same as sched_deadline". |
9cc1fa25 MK |
284 | .\" This needs to be documented. |
285 | If | |
286 | .IR sched_period | |
287 | is specified as 0, then it is made the same as | |
288 | .IR sched_deadline . | |
289 | ||
290 | The kernel requires that: | |
291 | ||
292 | sched_runtime <= sched_deadline <= sched_period | |
293 | ||
294 | .\" See __checkparam_dl in kernel/sched/core.c | |
295 | In addition, under the current implementation, | |
296 | all of the parameter values must be at least 1024 | |
297 | (i.e., just over one microsecond, | |
7bd7f43e | 298 | which is the resolution of the implementation), and less than 2^63. |
9cc1fa25 MK |
299 | If any of these checks fails, |
300 | .BR sched_setattr (2) | |
301 | fails with the error | |
302 | .BR EINVAL . | |
7a0d1838 PZ |
303 | |
304 | The CBS guarantees non-interference between tasks, by throttling | |
9cc1fa25 | 305 | threads that attempt to over-run their specified Runtime. |
7a0d1838 | 306 | |
9cc1fa25 | 307 | To ensure deadline scheduling guarantees, |
0da5e58a | 308 | the kernel must prevent situations where the set of |
91c98da6 | 309 | .B SCHED_DEADLINE |
9cc1fa25 MK |
310 | threads is not feasible (schedulable) within the given constraints. |
311 | The kernel thus performs an admittance test when setting or changing | |
91c98da6 | 312 | .B SCHED_DEADLINE |
9cc1fa25 MK |
313 | policy and attributes. |
314 | This admission test calculates whether the change is feasible; | |
8e8cd193 | 315 | if it is not, |
91c98da6 | 316 | .BR sched_setattr (2) |
9cc1fa25 MK |
317 | fails with the error |
318 | .BR EBUSY . | |
7a0d1838 PZ |
319 | |
320 | For example, it is required (but not necessarily sufficient) for | |
9cc1fa25 MK |
321 | the total utilization to be less than or equal to the total number of |
322 | CPUs available, where, since each thread can maximally run for | |
323 | Runtime per Period, that thread's utilization is its | |
324 | Runtime divided by its Period. | |
7a0d1838 | 325 | |
9cc1fa25 MK |
326 | In order to fulfil the guarantees that are made when |
327 | a thread is admitted to the | |
328 | .BR SCHED_DEADLINE | |
329 | policy, | |
91c98da6 | 330 | .BR SCHED_DEADLINE |
9cc1fa25 MK |
331 | threads are the highest priority (user controllable) threads in the |
332 | system; if any | |
91c98da6 | 333 | .BR SCHED_DEADLINE |
9cc1fa25 MK |
334 | thread is runnable, |
335 | it will preempt any thread scheduled under one of the other policies. | |
7a0d1838 | 336 | |
9cc1fa25 | 337 | A call to |
91c98da6 | 338 | .BR fork (2) |
9cc1fa25 MK |
339 | by a thread scheduled under the |
340 | .B SCHED_DEADLINE | |
341 | policy will fail with the error | |
342 | .BR EAGAIN , | |
343 | unless the thread has its reset-on-fork flag set (see below). | |
7a0d1838 | 344 | |
91c98da6 MK |
345 | A |
346 | .B SCHED_DEADLINE | |
9cc1fa25 | 347 | thread that calls |
91c98da6 | 348 | .BR sched_yield (2) |
9cc1fa25 MK |
349 | will yield the current job and wait for a new period to begin. |
350 | .\" | |
351 | .\" FIXME Calling sched_getparam() on a SCHED_DEADLINE thread | |
352 | .\" fails with EINVAL, but sched_getscheduler() succeeds. | |
353 | .\" Is that intended? (Why?) | |
91c98da6 | 354 | .\" |
59c06be3 MK |
355 | .SS SCHED_OTHER: Default Linux time-sharing scheduling |
356 | \fBSCHED_OTHER\fP can be used at only static priority 0. | |
357 | \fBSCHED_OTHER\fP is the standard Linux time-sharing scheduler that is | |
358 | intended for all threads that do not require the special | |
359 | real-time mechanisms. | |
360 | The thread to run is chosen from the static | |
361 | priority 0 list based on a \fIdynamic\fP priority that is determined only | |
362 | inside this list. | |
363 | The dynamic priority is based on the nice value (set by | |
1608b671 MK |
364 | .BR nice (2), |
365 | .BR setpriority (2), | |
59c06be3 | 366 | or |
1608b671 | 367 | .BR sched_setattr (2)) |
59c06be3 MK |
368 | and increased for each time quantum the thread is ready to run, |
369 | but denied to run by the scheduler. | |
370 | This ensures fair progress among all \fBSCHED_OTHER\fP threads. | |
371 | .\" | |
372 | .SS SCHED_BATCH: Scheduling batch processes | |
373 | (Since Linux 2.6.16.) | |
374 | \fBSCHED_BATCH\fP can be used only at static priority 0. | |
375 | This policy is similar to \fBSCHED_OTHER\fP in that it schedules | |
376 | the thread according to its dynamic priority | |
377 | (based on the nice value). | |
378 | The difference is that this policy | |
379 | will cause the scheduler to always assume | |
380 | that the thread is CPU-intensive. | |
381 | Consequently, the scheduler will apply a small scheduling | |
a1fa36af | 382 | penalty with respect to wakeup behavior, |
59c06be3 MK |
383 | so that this thread is mildly disfavored in scheduling decisions. |
384 | ||
385 | .\" The following paragraph is drawn largely from the text that | |
386 | .\" accompanied Ingo Molnar's patch for the implementation of | |
387 | .\" SCHED_BATCH. | |
388 | .\" commit b0a9499c3dd50d333e2aedb7e894873c58da3785 | |
389 | This policy is useful for workloads that are noninteractive, | |
390 | but do not want to lower their nice value, | |
391 | and for workloads that want a deterministic scheduling policy without | |
392 | interactivity causing extra preemptions (between the workload's tasks). | |
393 | .\" | |
394 | .SS SCHED_IDLE: Scheduling very low priority jobs | |
395 | (Since Linux 2.6.23.) | |
396 | \fBSCHED_IDLE\fP can be used only at static priority 0; | |
397 | the process nice value has no influence for this policy. | |
398 | ||
399 | This policy is intended for running jobs at extremely low | |
400 | priority (lower even than a +19 nice value with the | |
401 | .B SCHED_OTHER | |
402 | or | |
403 | .B SCHED_BATCH | |
404 | policies). | |
405 | .\" | |
406 | .SS Resetting scheduling policy for child processes | |
005eaa8f MK |
407 | Each thread has a reset-on-fork scheduling flag. |
408 | When this flag is set, children created by | |
409 | .BR fork (2) | |
410 | do not inherit privileged scheduling policies. | |
411 | The reset-on-fork flag can be set by either: | |
412 | .IP * 3 | |
413 | ORing the | |
59c06be3 | 414 | .B SCHED_RESET_ON_FORK |
005eaa8f | 415 | flag into the |
59c06be3 | 416 | .I policy |
005eaa8f MK |
417 | argument when calling |
418 | .BR sched_setscheduler (2) | |
419 | (since Linux 2.6.32); | |
420 | or | |
421 | .IP * | |
422 | specifying the | |
423 | .B SCHED_FLAG_RESET_ON_FORK | |
424 | flag in | |
425 | .IR attr.sched_flags | |
59c06be3 | 426 | when calling |
005eaa8f MK |
427 | .BR sched_setattr (2). |
428 | .PP | |
429 | Note that the constants used with these two APIs have different names. | |
430 | The state of the reset-on-fork flag can analogously be retrieved using | |
431 | .BR sched_getscheduler (2) | |
432 | and | |
433 | .BR sched_getattr (2). | |
434 | ||
435 | The reset-on-fork feature is intended for media-playback applications, | |
59c06be3 MK |
436 | and can be used to prevent applications evading the |
437 | .BR RLIMIT_RTTIME | |
438 | resource limit (see | |
439 | .BR getrlimit (2)) | |
440 | by creating multiple child processes. | |
441 | ||
005eaa8f | 442 | More precisely, if the reset-on-fork flag is set, |
59c06be3 MK |
443 | the following rules apply for subsequently created children: |
444 | .IP * 3 | |
445 | If the calling thread has a scheduling policy of | |
446 | .B SCHED_FIFO | |
447 | or | |
448 | .BR SCHED_RR , | |
449 | the policy is reset to | |
450 | .BR SCHED_OTHER | |
451 | in child processes. | |
452 | .IP * | |
453 | If the calling process has a negative nice value, | |
454 | the nice value is reset to zero in child processes. | |
455 | .PP | |
005eaa8f | 456 | After the reset-on-fork flag has been enabled, |
59c06be3 MK |
457 | it can be reset only if the thread has the |
458 | .BR CAP_SYS_NICE | |
459 | capability. | |
460 | This flag is disabled in child processes created by | |
461 | .BR fork (2). | |
59c06be3 MK |
462 | .\" |
463 | .SS Privileges and resource limits | |
464 | In Linux kernels before 2.6.12, only privileged | |
465 | .RB ( CAP_SYS_NICE ) | |
466 | threads can set a nonzero static priority (i.e., set a real-time | |
467 | scheduling policy). | |
468 | The only change that an unprivileged thread can make is to set the | |
469 | .B SCHED_OTHER | |
759e1210 | 470 | policy, and this can be done only if the effective user ID of the caller |
59c06be3 MK |
471 | matches the real or effective user ID of the target thread |
472 | (i.e., the thread specified by | |
473 | .IR pid ) | |
474 | whose policy is being changed. | |
475 | ||
9cc1fa25 MK |
476 | A thread must be privileged |
477 | .RB ( CAP_SYS_NICE ) | |
0da5e58a | 478 | in order to set or modify a |
9cc1fa25 MK |
479 | .BR SCHED_DEADLINE |
480 | policy. | |
481 | ||
59c06be3 MK |
482 | Since Linux 2.6.12, the |
483 | .B RLIMIT_RTPRIO | |
484 | resource limit defines a ceiling on an unprivileged thread's | |
485 | static priority for the | |
486 | .B SCHED_RR | |
487 | and | |
488 | .B SCHED_FIFO | |
489 | policies. | |
490 | The rules for changing scheduling policy and priority are as follows: | |
491 | .IP * 3 | |
492 | If an unprivileged thread has a nonzero | |
493 | .B RLIMIT_RTPRIO | |
494 | soft limit, then it can change its scheduling policy and priority, | |
495 | subject to the restriction that the priority cannot be set to a | |
496 | value higher than the maximum of its current priority and its | |
497 | .B RLIMIT_RTPRIO | |
498 | soft limit. | |
499 | .IP * | |
500 | If the | |
501 | .B RLIMIT_RTPRIO | |
502 | soft limit is 0, then the only permitted changes are to lower the priority, | |
503 | or to switch to a non-real-time policy. | |
504 | .IP * | |
505 | Subject to the same rules, | |
506 | another unprivileged thread can also make these changes, | |
507 | as long as the effective user ID of the thread making the change | |
508 | matches the real or effective user ID of the target thread. | |
509 | .IP * | |
510 | Special rules apply for the | |
f7a858b4 MK |
511 | .BR SCHED_IDLE |
512 | policy. | |
59c06be3 MK |
513 | In Linux kernels before 2.6.39, |
514 | an unprivileged thread operating under this policy cannot | |
515 | change its policy, regardless of the value of its | |
516 | .BR RLIMIT_RTPRIO | |
517 | resource limit. | |
518 | In Linux kernels since 2.6.39, | |
519 | .\" commit c02aa73b1d18e43cfd79c2f193b225e84ca497c8 | |
520 | an unprivileged thread can switch to either the | |
521 | .BR SCHED_BATCH | |
522 | or the | |
523 | .BR SCHED_NORMAL | |
524 | policy so long as its nice value falls within the range permitted by its | |
525 | .BR RLIMIT_NICE | |
526 | resource limit (see | |
527 | .BR getrlimit (2)). | |
528 | .PP | |
529 | Privileged | |
530 | .RB ( CAP_SYS_NICE ) | |
531 | threads ignore the | |
532 | .B RLIMIT_RTPRIO | |
533 | limit; as with older kernels, | |
534 | they can make arbitrary changes to scheduling policy and priority. | |
535 | See | |
536 | .BR getrlimit (2) | |
537 | for further information on | |
538 | .BR RLIMIT_RTPRIO . | |
0c055c75 MK |
539 | .SS Limiting the CPU usage of real-time and deadline processes |
540 | A nonblocking infinite loop in a thread scheduled under the | |
541 | .BR SCHED_FIFO , | |
542 | .BR SCHED_RR , | |
543 | or | |
544 | .BR SCHED_DEADLINE | |
545 | policy will block all threads with lower | |
546 | priority forever. | |
547 | Prior to Linux 2.6.25, the only way of preventing a runaway real-time | |
548 | process from freezing the system was to run (at the console) | |
549 | a shell scheduled under a higher static priority than the tested application. | |
550 | This allows an emergency kill of tested | |
551 | real-time applications that do not block or terminate as expected. | |
552 | ||
553 | Since Linux 2.6.25, there are other techniques for dealing with runaway | |
554 | real-time and deadline processes. | |
555 | One of these is to use the | |
556 | .BR RLIMIT_RTTIME | |
557 | resource limit to set a ceiling on the CPU time that | |
558 | a real-time process may consume. | |
559 | See | |
560 | .BR getrlimit (2) | |
561 | for details. | |
562 | ||
563 | Since version 2.6.25, Linux also provides two | |
564 | .I /proc | |
565 | files that can be used to reserve a certain amount of CPU time | |
566 | to be used by non-real-time processes. | |
567 | Reserving some CPU time in this fashion allows some CPU time to be | |
568 | allocated to (say) a root shell that can be used to kill a runaway process. | |
569 | Both of these files specify time values in microseconds: | |
570 | .TP | |
571 | .IR /proc/sys/kernel/sched_rt_period_us | |
572 | This file specifies a scheduling period that is equivalent to | |
573 | 100% CPU bandwidth. | |
574 | The value in this file can range from 1 to | |
575 | .BR INT_MAX , | |
576 | giving an operating range of 1 microsecond to around 35 minutes. | |
577 | The default value in this file is 1,000,000 (1 second). | |
578 | .TP | |
579 | .IR /proc/sys/kernel/sched_rt_runtime_us | |
580 | The value in this file specifies how much of the "period" time | |
581 | can be used by all real-time and deadline scheduled processes | |
582 | on the system. | |
583 | The value in this file can range from \-1 to | |
584 | .BR INT_MAX \-1. | |
585 | Specifying \-1 makes the runtime the same as the period; | |
586 | that is, no CPU time is set aside for non-real-time processes | |
587 | (which was the Linux behavior before kernel 2.6.25). | |
588 | The default value in this file is 950,000 (0.95 seconds), | |
589 | meaning that 5% of the CPU time is reserved for processes that | |
590 | don't run under a real-time or deadline scheduling policy. | |
591 | .PP | |
59c06be3 | 592 | .SS Response time |
1154a064 | 593 | A blocked high priority thread waiting for I/O has a certain |
59c06be3 MK |
594 | response time before it is scheduled again. |
595 | The device driver writer | |
596 | can greatly reduce this response time by using a "slow interrupt" | |
597 | interrupt handler. | |
598 | .\" as described in | |
599 | .\" .BR request_irq (9). | |
600 | .SS Miscellaneous | |
601 | Child processes inherit the scheduling policy and parameters across a | |
602 | .BR fork (2). | |
603 | The scheduling policy and parameters are preserved across | |
604 | .BR execve (2). | |
605 | ||
606 | Memory locking is usually needed for real-time processes to avoid | |
607 | paging delays; this can be done with | |
608 | .BR mlock (2) | |
609 | or | |
610 | .BR mlockall (2). | |
59c06be3 | 611 | .SH NOTES |
59c06be3 MK |
612 | .PP |
613 | Originally, Standard Linux was intended as a general-purpose operating | |
614 | system being able to handle background processes, interactive | |
615 | applications, and less demanding real-time applications (applications that | |
616 | need to usually meet timing deadlines). | |
617 | Although the Linux kernel 2.6 | |
618 | allowed for kernel preemption and the newly introduced O(1) scheduler | |
619 | ensures that the time needed to schedule is fixed and deterministic | |
620 | irrespective of the number of active tasks, true real-time computing | |
621 | was not possible up to kernel version 2.6.17. | |
622 | .SS Real-time features in the mainline Linux kernel | |
623 | .\" FIXME . Probably this text will need some minor tweaking | |
624 | .\" by about the time of 2.6.30; ask Carsten Emde about this then. | |
625 | From kernel version 2.6.18 onward, however, Linux is gradually | |
626 | becoming equipped with real-time capabilities, | |
627 | most of which are derived from the former | |
628 | .I realtime-preempt | |
629 | patches developed by Ingo Molnar, Thomas Gleixner, | |
630 | Steven Rostedt, and others. | |
631 | Until the patches have been completely merged into the | |
632 | mainline kernel | |
633 | (this is expected to be around kernel version 2.6.30), | |
634 | they must be installed to achieve the best real-time performance. | |
635 | These patches are named: | |
636 | .in +4n | |
637 | .nf | |
638 | ||
639 | patch-\fIkernelversion\fP-rt\fIpatchversion\fP | |
640 | .fi | |
641 | .in | |
642 | .PP | |
643 | and can be downloaded from | |
644 | .UR http://www.kernel.org\:/pub\:/linux\:/kernel\:/projects\:/rt/ | |
645 | .UE . | |
646 | ||
647 | Without the patches and prior to their full inclusion into the mainline | |
648 | kernel, the kernel configuration offers only the three preemption classes | |
649 | .BR CONFIG_PREEMPT_NONE , | |
650 | .BR CONFIG_PREEMPT_VOLUNTARY , | |
651 | and | |
652 | .B CONFIG_PREEMPT_DESKTOP | |
653 | which respectively provide no, some, and considerable | |
654 | reduction of the worst-case scheduling latency. | |
655 | ||
656 | With the patches applied or after their full inclusion into the mainline | |
657 | kernel, the additional configuration item | |
658 | .B CONFIG_PREEMPT_RT | |
659 | becomes available. | |
660 | If this is selected, Linux is transformed into a regular | |
661 | real-time operating system. | |
759e1210 | 662 | The FIFO and RR scheduling policies are then used to run a thread |
59c06be3 | 663 | with true real-time priority and a minimum worst-case scheduling latency. |
59c06be3 MK |
664 | .SH SEE ALSO |
665 | .ad l | |
666 | .nh | |
667 | .BR chrt (1), | |
f19db853 | 668 | .BR taskset (1), |
59c06be3 MK |
669 | .BR getpriority (2), |
670 | .BR mlock (2), | |
671 | .BR mlockall (2), | |
672 | .BR munlock (2), | |
673 | .BR munlockall (2), | |
674 | .BR nice (2), | |
675 | .BR sched_get_priority_max (2), | |
676 | .BR sched_get_priority_min (2), | |
720a5280 | 677 | .BR sched_getscheduler (2), |
59c06be3 MK |
678 | .BR sched_getaffinity (2), |
679 | .BR sched_getparam (2), | |
680 | .BR sched_rr_get_interval (2), | |
681 | .BR sched_setaffinity (2), | |
720a5280 | 682 | .BR sched_setscheduler (2), |
59c06be3 MK |
683 | .BR sched_setparam (2), |
684 | .BR sched_yield (2), | |
685 | .BR setpriority (2), | |
720a5280 MK |
686 | .BR pthread_getaffinity_np (3), |
687 | .BR pthread_setaffinity_np (3), | |
688 | .BR sched_getcpu (3), | |
59c06be3 MK |
689 | .BR capabilities (7), |
690 | .BR cpuset (7) | |
691 | .ad | |
692 | .PP | |
693 | .I Programming for the real world \- POSIX.4 | |
694 | by Bill O. Gallmeister, O'Reilly & Associates, Inc., ISBN 1-56592-074-0. | |
695 | .PP | |
b963d0e3 MK |
696 | The Linux kernel source files |
697 | .IR Documentation/scheduler/sched-deadline.txt , | |
698 | .IR Documentation/scheduler/sched-rt-group.txt , | |
458689ed | 699 | .IR Documentation/scheduler/sched-design-CFS.txt , |
b963d0e3 | 700 | and |
d630434e | 701 | .IR Documentation/scheduler/sched-nice-design.txt |