]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man5/proc_sys_vm.5
proc_pid_io.5: Dewafflify
[thirdparty/man-pages.git] / man5 / proc_sys_vm.5
CommitLineData
b06cd070
AC
1.\" Copyright (C) 1994, 1995, Daniel Quinlan <quinlan@yggdrasil.com>
2.\" Copyright (C) 2002-2008, 2017, Michael Kerrisk <mtk.manpages@gmail.com>
3.\" Copyright (C) , Andries Brouwer <aeb@cwi.nl>
4.\" Copyright (C) 2023, Alejandro Colomar <alx@kernel.org>
5.\"
6.\" SPDX-License-Identifier: GPL-3.0-or-later
7.\"
8.TH proc_sys_vm 5 (date) "Linux man-pages (unreleased)"
9.SH NAME
10/proc/sys/vm/ \- virtual memory subsystem
11.SH DESCRIPTION
12.TP
13.I /proc/sys/vm/
14This directory contains files for memory management tuning, buffer, and
15cache management.
16.TP
17.IR /proc/sys/vm/admin_reserve_kbytes " (since Linux 3.10)"
18.\" commit 4eeab4f5580d11bffedc697684b91b0bca0d5009
19This file defines the amount of free memory (in KiB) on the system that
20should be reserved for users with the capability
21.BR CAP_SYS_ADMIN .
22.IP
23The default value in this file is the minimum of [3% of free pages, 8MiB]
24expressed as KiB.
25The default is intended to provide enough for the superuser
26to log in and kill a process, if necessary,
27under the default overcommit 'guess' mode (i.e., 0 in
28.IR /proc/sys/vm/overcommit_memory ).
29.IP
30Systems running in "overcommit never" mode (i.e., 2 in
31.IR /proc/sys/vm/overcommit_memory )
32should increase the value in this file to account
33for the full virtual memory size of the programs used to recover (e.g.,
34.BR login (1)
35.BR ssh (1),
36and
37.BR top (1))
38Otherwise, the superuser may not be able to log in to recover the system.
39For example, on x86-64 a suitable value is 131072 (128MiB reserved).
40.IP
41Changing the value in this file takes effect whenever
42an application requests memory.
43.TP
44.IR /proc/sys/vm/compact_memory " (since Linux 2.6.35)"
45When 1 is written to this file, all zones are compacted such that free
46memory is available in contiguous blocks where possible.
47The effect of this action can be seen by examining
48.IR /proc/buddyinfo .
49.IP
50Present only if the kernel was configured with
51.BR CONFIG_COMPACTION .
52.TP
53.IR /proc/sys/vm/drop_caches " (since Linux 2.6.16)"
54Writing to this file causes the kernel to drop clean caches, dentries, and
55inodes from memory, causing that memory to become free.
56This can be useful for memory management testing and
57performing reproducible filesystem benchmarks.
58Because writing to this file causes the benefits of caching to be lost,
59it can degrade overall system performance.
60.IP
61To free pagecache, use:
62.IP
63.in +4n
64.EX
65echo 1 > /proc/sys/vm/drop_caches
66.EE
67.in
68.IP
69To free dentries and inodes, use:
70.IP
71.in +4n
72.EX
73echo 2 > /proc/sys/vm/drop_caches
74.EE
75.in
76.IP
77To free pagecache, dentries, and inodes, use:
78.IP
79.in +4n
80.EX
81echo 3 > /proc/sys/vm/drop_caches
82.EE
83.in
84.IP
85Because writing to this file is a nondestructive operation and dirty objects
86are not freeable, the
87user should run
88.BR sync (1)
89first.
90.TP
91.IR /proc/sys/vm/sysctl_hugetlb_shm_group " (since Linux 2.6.7)"
92This writable file contains a group ID that is allowed
93to allocate memory using huge pages.
94If a process has a filesystem group ID or any supplementary group ID that
95matches this group ID,
96then it can make huge-page allocations without holding the
97.B CAP_IPC_LOCK
98capability; see
99.BR memfd_create (2),
100.BR mmap (2),
101and
102.BR shmget (2).
103.TP
104.IR /proc/sys/vm/legacy_va_layout " (since Linux 2.6.9)"
105.\" The following is from Documentation/filesystems/proc.txt
106If nonzero, this disables the new 32-bit memory-mapping layout;
107the kernel will use the legacy (2.4) layout for all processes.
108.TP
109.IR /proc/sys/vm/memory_failure_early_kill " (since Linux 2.6.32)"
110.\" The following is based on the text in Documentation/sysctl/vm.txt
111Control how to kill processes when an uncorrected memory error
112(typically a 2-bit error in a memory module)
113that cannot be handled by the kernel
114is detected in the background by hardware.
115In some cases (like the page still having a valid copy on disk),
116the kernel will handle the failure
117transparently without affecting any applications.
118But if there is no other up-to-date copy of the data,
119it will kill processes to prevent any data corruptions from propagating.
120.IP
121The file has one of the following values:
122.RS
123.TP
124.B 1
125Kill all processes that have the corrupted-and-not-reloadable page mapped
126as soon as the corruption is detected.
127Note that this is not supported for a few types of pages,
128such as kernel internally
129allocated data or the swap cache, but works for the majority of user pages.
130.TP
131.B 0
132Unmap the corrupted page from all processes and kill a process
133only if it tries to access the page.
134.RE
135.IP
136The kill is performed using a
137.B SIGBUS
138signal with
139.I si_code
140set to
141.BR BUS_MCEERR_AO .
142Processes can handle this if they want to; see
143.BR sigaction (2)
144for more details.
145.IP
146This feature is active only on architectures/platforms with advanced machine
147check handling and depends on the hardware capabilities.
148.IP
149Applications can override the
150.I memory_failure_early_kill
151setting individually with the
152.BR prctl (2)
153.B PR_MCE_KILL
154operation.
155.IP
156Present only if the kernel was configured with
157.BR CONFIG_MEMORY_FAILURE .
158.TP
159.IR /proc/sys/vm/memory_failure_recovery " (since Linux 2.6.32)"
160.\" The following is based on the text in Documentation/sysctl/vm.txt
161Enable memory failure recovery (when supported by the platform).
162.RS
163.TP
164.B 1
165Attempt recovery.
166.TP
167.B 0
168Always panic on a memory failure.
169.RE
170.IP
171Present only if the kernel was configured with
172.BR CONFIG_MEMORY_FAILURE .
173.TP
174.IR /proc/sys/vm/oom_dump_tasks " (since Linux 2.6.25)"
175.\" The following is from Documentation/sysctl/vm.txt
176Enables a system-wide task dump (excluding kernel threads) to be
177produced when the kernel performs an OOM-killing.
178The dump includes the following information
179for each task (thread, process):
180thread ID, real user ID, thread group ID (process ID),
181virtual memory size, resident set size,
182the CPU that the task is scheduled on,
183oom_adj score (see the description of
184.IR /proc/ pid /oom_adj ),
185and command name.
186This is helpful to determine why the OOM-killer was invoked
187and to identify the rogue task that caused it.
188.IP
189If this contains the value zero, this information is suppressed.
190On very large systems with thousands of tasks,
191it may not be feasible to dump the memory state information for each one.
192Such systems should not be forced to incur a performance penalty in
193OOM situations when the information may not be desired.
194.IP
195If this is set to nonzero, this information is shown whenever the
196OOM-killer actually kills a memory-hogging task.
197.IP
198The default value is 0.
199.TP
200.IR /proc/sys/vm/oom_kill_allocating_task " (since Linux 2.6.24)"
201.\" The following is from Documentation/sysctl/vm.txt
202This enables or disables killing the OOM-triggering task in
203out-of-memory situations.
204.IP
205If this is set to zero, the OOM-killer will scan through the entire
206tasklist and select a task based on heuristics to kill.
207This normally selects a rogue memory-hogging task that
208frees up a large amount of memory when killed.
209.IP
210If this is set to nonzero, the OOM-killer simply kills the task that
211triggered the out-of-memory condition.
212This avoids a possibly expensive tasklist scan.
213.IP
214If
215.I /proc/sys/vm/panic_on_oom
216is nonzero, it takes precedence over whatever value is used in
217.IR /proc/sys/vm/oom_kill_allocating_task .
218.IP
219The default value is 0.
220.TP
221.IR /proc/sys/vm/overcommit_kbytes " (since Linux 3.14)"
222.\" commit 49f0ce5f92321cdcf741e35f385669a421013cb7
223This writable file provides an alternative to
224.I /proc/sys/vm/overcommit_ratio
225for controlling the
226.I CommitLimit
227when
228.I /proc/sys/vm/overcommit_memory
229has the value 2.
230It allows the amount of memory overcommitting to be specified as
231an absolute value (in kB),
232rather than as a percentage, as is done with
233.IR overcommit_ratio .
234This allows for finer-grained control of
235.I CommitLimit
236on systems with extremely large memory sizes.
237.IP
238Only one of
239.I overcommit_kbytes
240or
241.I overcommit_ratio
242can have an effect:
243if
244.I overcommit_kbytes
245has a nonzero value, then it is used to calculate
246.IR CommitLimit ,
247otherwise
248.I overcommit_ratio
249is used.
250Writing a value to either of these files causes the
251value in the other file to be set to zero.
252.TP
253.I /proc/sys/vm/overcommit_memory
254This file contains the kernel virtual memory accounting mode.
255Values are:
256.RS
257.IP
2580: heuristic overcommit (this is the default)
259.br
2601: always overcommit, never check
261.br
2622: always check, never overcommit
263.RE
264.IP
265In mode 0, calls of
266.BR mmap (2)
267with
268.B MAP_NORESERVE
269are not checked, and the default check is very weak,
270leading to the risk of getting a process "OOM-killed".
271.IP
272In mode 1, the kernel pretends there is always enough memory,
273until memory actually runs out.
274One use case for this mode is scientific computing applications
275that employ large sparse arrays.
276Before Linux 2.6.0, any nonzero value implies mode 1.
277.IP
278In mode 2 (available since Linux 2.6), the total virtual address space
279that can be allocated
280.RI ( CommitLimit
281in
282.IR /proc/meminfo )
283is calculated as
284.IP
285.in +4n
286.EX
287CommitLimit = (total_RAM \- total_huge_TLB) *
288 overcommit_ratio / 100 + total_swap
289.EE
290.in
291.IP
292where:
293.RS
294.IP \[bu] 3
295.I total_RAM
296is the total amount of RAM on the system;
297.IP \[bu]
298.I total_huge_TLB
299is the amount of memory set aside for huge pages;
300.IP \[bu]
301.I overcommit_ratio
302is the value in
303.IR /proc/sys/vm/overcommit_ratio ;
304and
305.IP \[bu]
306.I total_swap
307is the amount of swap space.
308.RE
309.IP
310For example, on a system with 16 GB of physical RAM, 16 GB
311of swap, no space dedicated to huge pages, and an
312.I overcommit_ratio
313of 50, this formula yields a
314.I CommitLimit
315of 24 GB.
316.IP
317Since Linux 3.14, if the value in
318.I /proc/sys/vm/overcommit_kbytes
319is nonzero, then
320.I CommitLimit
321is instead calculated as:
322.IP
323.in +4n
324.EX
325CommitLimit = overcommit_kbytes + total_swap
326.EE
327.in
328.IP
329See also the description of
330.I /proc/sys/vm/admin_reserve_kbytes
331and
332.IR /proc/sys/vm/user_reserve_kbytes .
333.TP
334.IR /proc/sys/vm/overcommit_ratio " (since Linux 2.6.0)"
335This writable file defines a percentage by which memory
336can be overcommitted.
337The default value in the file is 50.
338See the description of
339.IR /proc/sys/vm/overcommit_memory .
340.TP
341.IR /proc/sys/vm/panic_on_oom " (since Linux 2.6.18)"
342.\" The following is adapted from Documentation/sysctl/vm.txt
343This enables or disables a kernel panic in
344an out-of-memory situation.
345.IP
346If this file is set to the value 0,
347the kernel's OOM-killer will kill some rogue process.
348Usually, the OOM-killer is able to kill a rogue process and the
349system will survive.
350.IP
351If this file is set to the value 1,
352then the kernel normally panics when out-of-memory happens.
353However, if a process limits allocations to certain nodes
354using memory policies
355.RB ( mbind (2)
356.BR MPOL_BIND )
357or cpusets
358.RB ( cpuset (7))
359and those nodes reach memory exhaustion status,
360one process may be killed by the OOM-killer.
361No panic occurs in this case:
362because other nodes' memory may be free,
363this means the system as a whole may not have reached
364an out-of-memory situation yet.
365.IP
366If this file is set to the value 2,
367the kernel always panics when an out-of-memory condition occurs.
368.IP
369The default value is 0.
3701 and 2 are for failover of clustering.
371Select either according to your policy of failover.
372.TP
373.I /proc/sys/vm/swappiness
374.\" The following is from Documentation/sysctl/vm.txt
375The value in this file controls how aggressively the kernel will swap
376memory pages.
377Higher values increase aggressiveness, lower values
378decrease aggressiveness.
379The default value is 60.
380.TP
381.IR /proc/sys/vm/user_reserve_kbytes " (since Linux 3.10)"
382.\" commit c9b1d0981fcce3d9976d7b7a56e4e0503bc610dd
383Specifies an amount of memory (in KiB) to reserve for user processes.
384This is intended to prevent a user from starting a single memory hogging
385process, such that they cannot recover (kill the hog).
386The value in this file has an effect only when
387.I /proc/sys/vm/overcommit_memory
388is set to 2 ("overcommit never" mode).
389In this case, the system reserves an amount of memory that is the minimum
390of [3% of current process size,
391.IR user_reserve_kbytes ].
392.IP
393The default value in this file is the minimum of [3% of free pages, 128MiB]
394expressed as KiB.
395.IP
396If the value in this file is set to zero,
397then a user will be allowed to allocate all free memory with a single process
398(minus the amount reserved by
399.IR /proc/sys/vm/admin_reserve_kbytes ).
400Any subsequent attempts to execute a command will result in
401"fork: Cannot allocate memory".
402.IP
403Changing the value in this file takes effect whenever
404an application requests memory.
405.TP
406.IR /proc/sys/vm/unprivileged_userfaultfd " (since Linux 5.2)"
407.\" cefdca0a86be517bc390fc4541e3674b8e7803b0
408This (writable) file exposes a flag that controls whether
409unprivileged processes are allowed to employ
410.BR userfaultfd (2).
411If this file has the value 1, then unprivileged processes may use
412.BR userfaultfd (2).
413If this file has the value 0, then only processes that have the
414.B CAP_SYS_PTRACE
415capability may employ
416.BR userfaultfd (2).
417The default value in this file is 1.
418.SH SEE ALSO
419.BR proc (5),
420.BR proc_sys (5)