]>
Commit | Line | Data |
---|---|---|
b06cd070 AC |
1 | .\" Copyright (C) 1994, 1995, Daniel Quinlan <quinlan@yggdrasil.com> |
2 | .\" Copyright (C) 2002-2008, 2017, Michael Kerrisk <mtk.manpages@gmail.com> | |
3 | .\" Copyright (C) , Andries Brouwer <aeb@cwi.nl> | |
4 | .\" Copyright (C) 2023, Alejandro Colomar <alx@kernel.org> | |
5 | .\" | |
6 | .\" SPDX-License-Identifier: GPL-3.0-or-later | |
7 | .\" | |
8 | .TH proc_sys_vm 5 (date) "Linux man-pages (unreleased)" | |
9 | .SH NAME | |
10 | /proc/sys/vm/ \- virtual memory subsystem | |
11 | .SH DESCRIPTION | |
12 | .TP | |
13 | .I /proc/sys/vm/ | |
14 | This directory contains files for memory management tuning, buffer, and | |
15 | cache management. | |
16 | .TP | |
17 | .IR /proc/sys/vm/admin_reserve_kbytes " (since Linux 3.10)" | |
18 | .\" commit 4eeab4f5580d11bffedc697684b91b0bca0d5009 | |
19 | This file defines the amount of free memory (in KiB) on the system that | |
20 | should be reserved for users with the capability | |
21 | .BR CAP_SYS_ADMIN . | |
22 | .IP | |
23 | The default value in this file is the minimum of [3% of free pages, 8MiB] | |
24 | expressed as KiB. | |
25 | The default is intended to provide enough for the superuser | |
26 | to log in and kill a process, if necessary, | |
27 | under the default overcommit 'guess' mode (i.e., 0 in | |
28 | .IR /proc/sys/vm/overcommit_memory ). | |
29 | .IP | |
30 | Systems running in "overcommit never" mode (i.e., 2 in | |
31 | .IR /proc/sys/vm/overcommit_memory ) | |
32 | should increase the value in this file to account | |
33 | for the full virtual memory size of the programs used to recover (e.g., | |
34 | .BR login (1) | |
35 | .BR ssh (1), | |
36 | and | |
37 | .BR top (1)) | |
38 | Otherwise, the superuser may not be able to log in to recover the system. | |
39 | For example, on x86-64 a suitable value is 131072 (128MiB reserved). | |
40 | .IP | |
41 | Changing the value in this file takes effect whenever | |
42 | an application requests memory. | |
43 | .TP | |
44 | .IR /proc/sys/vm/compact_memory " (since Linux 2.6.35)" | |
45 | When 1 is written to this file, all zones are compacted such that free | |
46 | memory is available in contiguous blocks where possible. | |
47 | The effect of this action can be seen by examining | |
48 | .IR /proc/buddyinfo . | |
49 | .IP | |
50 | Present only if the kernel was configured with | |
51 | .BR CONFIG_COMPACTION . | |
52 | .TP | |
53 | .IR /proc/sys/vm/drop_caches " (since Linux 2.6.16)" | |
54 | Writing to this file causes the kernel to drop clean caches, dentries, and | |
55 | inodes from memory, causing that memory to become free. | |
56 | This can be useful for memory management testing and | |
57 | performing reproducible filesystem benchmarks. | |
58 | Because writing to this file causes the benefits of caching to be lost, | |
59 | it can degrade overall system performance. | |
60 | .IP | |
61 | To free pagecache, use: | |
62 | .IP | |
63 | .in +4n | |
64 | .EX | |
65 | echo 1 > /proc/sys/vm/drop_caches | |
66 | .EE | |
67 | .in | |
68 | .IP | |
69 | To free dentries and inodes, use: | |
70 | .IP | |
71 | .in +4n | |
72 | .EX | |
73 | echo 2 > /proc/sys/vm/drop_caches | |
74 | .EE | |
75 | .in | |
76 | .IP | |
77 | To free pagecache, dentries, and inodes, use: | |
78 | .IP | |
79 | .in +4n | |
80 | .EX | |
81 | echo 3 > /proc/sys/vm/drop_caches | |
82 | .EE | |
83 | .in | |
84 | .IP | |
85 | Because writing to this file is a nondestructive operation and dirty objects | |
86 | are not freeable, the | |
87 | user should run | |
88 | .BR sync (1) | |
89 | first. | |
90 | .TP | |
91 | .IR /proc/sys/vm/sysctl_hugetlb_shm_group " (since Linux 2.6.7)" | |
92 | This writable file contains a group ID that is allowed | |
93 | to allocate memory using huge pages. | |
94 | If a process has a filesystem group ID or any supplementary group ID that | |
95 | matches this group ID, | |
96 | then it can make huge-page allocations without holding the | |
97 | .B CAP_IPC_LOCK | |
98 | capability; see | |
99 | .BR memfd_create (2), | |
100 | .BR mmap (2), | |
101 | and | |
102 | .BR shmget (2). | |
103 | .TP | |
104 | .IR /proc/sys/vm/legacy_va_layout " (since Linux 2.6.9)" | |
105 | .\" The following is from Documentation/filesystems/proc.txt | |
106 | If nonzero, this disables the new 32-bit memory-mapping layout; | |
107 | the kernel will use the legacy (2.4) layout for all processes. | |
108 | .TP | |
109 | .IR /proc/sys/vm/memory_failure_early_kill " (since Linux 2.6.32)" | |
110 | .\" The following is based on the text in Documentation/sysctl/vm.txt | |
111 | Control how to kill processes when an uncorrected memory error | |
112 | (typically a 2-bit error in a memory module) | |
113 | that cannot be handled by the kernel | |
114 | is detected in the background by hardware. | |
115 | In some cases (like the page still having a valid copy on disk), | |
116 | the kernel will handle the failure | |
117 | transparently without affecting any applications. | |
118 | But if there is no other up-to-date copy of the data, | |
119 | it will kill processes to prevent any data corruptions from propagating. | |
120 | .IP | |
121 | The file has one of the following values: | |
122 | .RS | |
123 | .TP | |
124 | .B 1 | |
125 | Kill all processes that have the corrupted-and-not-reloadable page mapped | |
126 | as soon as the corruption is detected. | |
127 | Note that this is not supported for a few types of pages, | |
128 | such as kernel internally | |
129 | allocated data or the swap cache, but works for the majority of user pages. | |
130 | .TP | |
131 | .B 0 | |
132 | Unmap the corrupted page from all processes and kill a process | |
133 | only if it tries to access the page. | |
134 | .RE | |
135 | .IP | |
136 | The kill is performed using a | |
137 | .B SIGBUS | |
138 | signal with | |
139 | .I si_code | |
140 | set to | |
141 | .BR BUS_MCEERR_AO . | |
142 | Processes can handle this if they want to; see | |
143 | .BR sigaction (2) | |
144 | for more details. | |
145 | .IP | |
146 | This feature is active only on architectures/platforms with advanced machine | |
147 | check handling and depends on the hardware capabilities. | |
148 | .IP | |
149 | Applications can override the | |
150 | .I memory_failure_early_kill | |
151 | setting individually with the | |
152 | .BR prctl (2) | |
153 | .B PR_MCE_KILL | |
154 | operation. | |
155 | .IP | |
156 | Present only if the kernel was configured with | |
157 | .BR CONFIG_MEMORY_FAILURE . | |
158 | .TP | |
159 | .IR /proc/sys/vm/memory_failure_recovery " (since Linux 2.6.32)" | |
160 | .\" The following is based on the text in Documentation/sysctl/vm.txt | |
161 | Enable memory failure recovery (when supported by the platform). | |
162 | .RS | |
163 | .TP | |
164 | .B 1 | |
165 | Attempt recovery. | |
166 | .TP | |
167 | .B 0 | |
168 | Always panic on a memory failure. | |
169 | .RE | |
170 | .IP | |
171 | Present only if the kernel was configured with | |
172 | .BR CONFIG_MEMORY_FAILURE . | |
173 | .TP | |
174 | .IR /proc/sys/vm/oom_dump_tasks " (since Linux 2.6.25)" | |
175 | .\" The following is from Documentation/sysctl/vm.txt | |
176 | Enables a system-wide task dump (excluding kernel threads) to be | |
177 | produced when the kernel performs an OOM-killing. | |
178 | The dump includes the following information | |
179 | for each task (thread, process): | |
180 | thread ID, real user ID, thread group ID (process ID), | |
181 | virtual memory size, resident set size, | |
182 | the CPU that the task is scheduled on, | |
183 | oom_adj score (see the description of | |
184 | .IR /proc/ pid /oom_adj ), | |
185 | and command name. | |
186 | This is helpful to determine why the OOM-killer was invoked | |
187 | and to identify the rogue task that caused it. | |
188 | .IP | |
189 | If this contains the value zero, this information is suppressed. | |
190 | On very large systems with thousands of tasks, | |
191 | it may not be feasible to dump the memory state information for each one. | |
192 | Such systems should not be forced to incur a performance penalty in | |
193 | OOM situations when the information may not be desired. | |
194 | .IP | |
195 | If this is set to nonzero, this information is shown whenever the | |
196 | OOM-killer actually kills a memory-hogging task. | |
197 | .IP | |
198 | The default value is 0. | |
199 | .TP | |
200 | .IR /proc/sys/vm/oom_kill_allocating_task " (since Linux 2.6.24)" | |
201 | .\" The following is from Documentation/sysctl/vm.txt | |
202 | This enables or disables killing the OOM-triggering task in | |
203 | out-of-memory situations. | |
204 | .IP | |
205 | If this is set to zero, the OOM-killer will scan through the entire | |
206 | tasklist and select a task based on heuristics to kill. | |
207 | This normally selects a rogue memory-hogging task that | |
208 | frees up a large amount of memory when killed. | |
209 | .IP | |
210 | If this is set to nonzero, the OOM-killer simply kills the task that | |
211 | triggered the out-of-memory condition. | |
212 | This avoids a possibly expensive tasklist scan. | |
213 | .IP | |
214 | If | |
215 | .I /proc/sys/vm/panic_on_oom | |
216 | is nonzero, it takes precedence over whatever value is used in | |
217 | .IR /proc/sys/vm/oom_kill_allocating_task . | |
218 | .IP | |
219 | The default value is 0. | |
220 | .TP | |
221 | .IR /proc/sys/vm/overcommit_kbytes " (since Linux 3.14)" | |
222 | .\" commit 49f0ce5f92321cdcf741e35f385669a421013cb7 | |
223 | This writable file provides an alternative to | |
224 | .I /proc/sys/vm/overcommit_ratio | |
225 | for controlling the | |
226 | .I CommitLimit | |
227 | when | |
228 | .I /proc/sys/vm/overcommit_memory | |
229 | has the value 2. | |
230 | It allows the amount of memory overcommitting to be specified as | |
231 | an absolute value (in kB), | |
232 | rather than as a percentage, as is done with | |
233 | .IR overcommit_ratio . | |
234 | This allows for finer-grained control of | |
235 | .I CommitLimit | |
236 | on systems with extremely large memory sizes. | |
237 | .IP | |
238 | Only one of | |
239 | .I overcommit_kbytes | |
240 | or | |
241 | .I overcommit_ratio | |
242 | can have an effect: | |
243 | if | |
244 | .I overcommit_kbytes | |
245 | has a nonzero value, then it is used to calculate | |
246 | .IR CommitLimit , | |
247 | otherwise | |
248 | .I overcommit_ratio | |
249 | is used. | |
250 | Writing a value to either of these files causes the | |
251 | value in the other file to be set to zero. | |
252 | .TP | |
253 | .I /proc/sys/vm/overcommit_memory | |
254 | This file contains the kernel virtual memory accounting mode. | |
255 | Values are: | |
256 | .RS | |
257 | .IP | |
258 | 0: heuristic overcommit (this is the default) | |
259 | .br | |
260 | 1: always overcommit, never check | |
261 | .br | |
262 | 2: always check, never overcommit | |
263 | .RE | |
264 | .IP | |
265 | In mode 0, calls of | |
266 | .BR mmap (2) | |
267 | with | |
268 | .B MAP_NORESERVE | |
269 | are not checked, and the default check is very weak, | |
270 | leading to the risk of getting a process "OOM-killed". | |
271 | .IP | |
272 | In mode 1, the kernel pretends there is always enough memory, | |
273 | until memory actually runs out. | |
274 | One use case for this mode is scientific computing applications | |
275 | that employ large sparse arrays. | |
276 | Before Linux 2.6.0, any nonzero value implies mode 1. | |
277 | .IP | |
278 | In mode 2 (available since Linux 2.6), the total virtual address space | |
279 | that can be allocated | |
280 | .RI ( CommitLimit | |
281 | in | |
282 | .IR /proc/meminfo ) | |
283 | is calculated as | |
284 | .IP | |
285 | .in +4n | |
286 | .EX | |
287 | CommitLimit = (total_RAM \- total_huge_TLB) * | |
288 | overcommit_ratio / 100 + total_swap | |
289 | .EE | |
290 | .in | |
291 | .IP | |
292 | where: | |
293 | .RS | |
294 | .IP \[bu] 3 | |
295 | .I total_RAM | |
296 | is the total amount of RAM on the system; | |
297 | .IP \[bu] | |
298 | .I total_huge_TLB | |
299 | is the amount of memory set aside for huge pages; | |
300 | .IP \[bu] | |
301 | .I overcommit_ratio | |
302 | is the value in | |
303 | .IR /proc/sys/vm/overcommit_ratio ; | |
304 | and | |
305 | .IP \[bu] | |
306 | .I total_swap | |
307 | is the amount of swap space. | |
308 | .RE | |
309 | .IP | |
310 | For example, on a system with 16 GB of physical RAM, 16 GB | |
311 | of swap, no space dedicated to huge pages, and an | |
312 | .I overcommit_ratio | |
313 | of 50, this formula yields a | |
314 | .I CommitLimit | |
315 | of 24 GB. | |
316 | .IP | |
317 | Since Linux 3.14, if the value in | |
318 | .I /proc/sys/vm/overcommit_kbytes | |
319 | is nonzero, then | |
320 | .I CommitLimit | |
321 | is instead calculated as: | |
322 | .IP | |
323 | .in +4n | |
324 | .EX | |
325 | CommitLimit = overcommit_kbytes + total_swap | |
326 | .EE | |
327 | .in | |
328 | .IP | |
329 | See also the description of | |
330 | .I /proc/sys/vm/admin_reserve_kbytes | |
331 | and | |
332 | .IR /proc/sys/vm/user_reserve_kbytes . | |
333 | .TP | |
334 | .IR /proc/sys/vm/overcommit_ratio " (since Linux 2.6.0)" | |
335 | This writable file defines a percentage by which memory | |
336 | can be overcommitted. | |
337 | The default value in the file is 50. | |
338 | See the description of | |
339 | .IR /proc/sys/vm/overcommit_memory . | |
340 | .TP | |
341 | .IR /proc/sys/vm/panic_on_oom " (since Linux 2.6.18)" | |
342 | .\" The following is adapted from Documentation/sysctl/vm.txt | |
343 | This enables or disables a kernel panic in | |
344 | an out-of-memory situation. | |
345 | .IP | |
346 | If this file is set to the value 0, | |
347 | the kernel's OOM-killer will kill some rogue process. | |
348 | Usually, the OOM-killer is able to kill a rogue process and the | |
349 | system will survive. | |
350 | .IP | |
351 | If this file is set to the value 1, | |
352 | then the kernel normally panics when out-of-memory happens. | |
353 | However, if a process limits allocations to certain nodes | |
354 | using memory policies | |
355 | .RB ( mbind (2) | |
356 | .BR MPOL_BIND ) | |
357 | or cpusets | |
358 | .RB ( cpuset (7)) | |
359 | and those nodes reach memory exhaustion status, | |
360 | one process may be killed by the OOM-killer. | |
361 | No panic occurs in this case: | |
362 | because other nodes' memory may be free, | |
363 | this means the system as a whole may not have reached | |
364 | an out-of-memory situation yet. | |
365 | .IP | |
366 | If this file is set to the value 2, | |
367 | the kernel always panics when an out-of-memory condition occurs. | |
368 | .IP | |
369 | The default value is 0. | |
370 | 1 and 2 are for failover of clustering. | |
371 | Select either according to your policy of failover. | |
372 | .TP | |
373 | .I /proc/sys/vm/swappiness | |
374 | .\" The following is from Documentation/sysctl/vm.txt | |
375 | The value in this file controls how aggressively the kernel will swap | |
376 | memory pages. | |
377 | Higher values increase aggressiveness, lower values | |
378 | decrease aggressiveness. | |
379 | The default value is 60. | |
380 | .TP | |
381 | .IR /proc/sys/vm/user_reserve_kbytes " (since Linux 3.10)" | |
382 | .\" commit c9b1d0981fcce3d9976d7b7a56e4e0503bc610dd | |
383 | Specifies an amount of memory (in KiB) to reserve for user processes. | |
384 | This is intended to prevent a user from starting a single memory hogging | |
385 | process, such that they cannot recover (kill the hog). | |
386 | The value in this file has an effect only when | |
387 | .I /proc/sys/vm/overcommit_memory | |
388 | is set to 2 ("overcommit never" mode). | |
389 | In this case, the system reserves an amount of memory that is the minimum | |
390 | of [3% of current process size, | |
391 | .IR user_reserve_kbytes ]. | |
392 | .IP | |
393 | The default value in this file is the minimum of [3% of free pages, 128MiB] | |
394 | expressed as KiB. | |
395 | .IP | |
396 | If the value in this file is set to zero, | |
397 | then a user will be allowed to allocate all free memory with a single process | |
398 | (minus the amount reserved by | |
399 | .IR /proc/sys/vm/admin_reserve_kbytes ). | |
400 | Any subsequent attempts to execute a command will result in | |
401 | "fork: Cannot allocate memory". | |
402 | .IP | |
403 | Changing the value in this file takes effect whenever | |
404 | an application requests memory. | |
405 | .TP | |
406 | .IR /proc/sys/vm/unprivileged_userfaultfd " (since Linux 5.2)" | |
407 | .\" cefdca0a86be517bc390fc4541e3674b8e7803b0 | |
408 | This (writable) file exposes a flag that controls whether | |
409 | unprivileged processes are allowed to employ | |
410 | .BR userfaultfd (2). | |
411 | If this file has the value 1, then unprivileged processes may use | |
412 | .BR userfaultfd (2). | |
413 | If this file has the value 0, then only processes that have the | |
414 | .B CAP_SYS_PTRACE | |
415 | capability may employ | |
416 | .BR userfaultfd (2). | |
417 | The default value in this file is 1. | |
418 | .SH SEE ALSO | |
419 | .BR proc (5), | |
420 | .BR proc_sys (5) |