]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/madvise.2
madvise.2: Improve MADV_DONTNEED description
[thirdparty/man-pages.git] / man2 / madvise.2
CommitLineData
e00c3a07 1.\" Copyright (C) 2001 David Gómez <davidge@jazzfree.com>
fea681da 2.\"
93015253 3.\" %%%LICENSE_START(VERBATIM)
fea681da
MK
4.\" Permission is granted to make and distribute verbatim copies of this
5.\" manual provided the copyright notice and this permission notice are
6.\" preserved on all copies.
7.\"
8.\" Permission is granted to copy and distribute modified versions of this
9.\" manual under the conditions for verbatim copying, provided that the
10.\" entire resulting derived work is distributed under the terms of a
11.\" permission notice identical to this one.
c13182ef 12.\"
fea681da
MK
13.\" Since the Linux kernel and libraries are constantly changing, this
14.\" manual page may be incorrect or out-of-date. The author(s) assume no
15.\" responsibility for errors or omissions, or for damages resulting from
16.\" the use of the information contained herein. The author(s) may not
17.\" have taken the same level of care in the production of this manual,
18.\" which is licensed free of charge, as they might when working
19.\" professionally.
c13182ef 20.\"
fea681da
MK
21.\" Formatted or processed versions of this manual, if unaccompanied by
22.\" the source, must acknowledge the copyright and authors of this work.
4b72fb64 23.\" %%%LICENSE_END
fea681da
MK
24.\"
25.\" Based on comments from mm/filemap.c. Last modified on 10-06-2001
c11b1abf 26.\" Modified, 25 Feb 2002, Michael Kerrisk, <mtk.manpages@gmail.com>
fea681da 27.\" Added notes on MADV_DONTNEED
5baa8f09
MK
28.\" 2010-06-19, mtk, Added documentation of MADV_MERGEABLE and
29.\" MADV_UNMERGEABLE
f5321b14
MK
30.\" 2010-06-15, Andi Kleen, Add documentation of MADV_HWPOISON.
31.\" 2010-06-19, Andi Kleen, Add documentation of MADV_SOFT_OFFLINE.
3d4b49b0
MK
32.\" 2011-09-18, Doug Goldstein <cardoe@cardoe.com>
33.\" Document MADV_HUGEPAGE and MADV_NOHUGEPAGE
347e325b 34.\"
0649afd4 35.TH MADVISE 2 2014-12-31 "Linux" "Linux Programmer's Manual"
fea681da
MK
36.SH NAME
37madvise \- give advice about use of memory
38.SH SYNOPSIS
fea681da
MK
39.B #include <sys/mman.h>
40.sp
14f5ae6d 41.BI "int madvise(void *" addr ", size_t " length ", int " advice );
cc4615cc
MK
42.sp
43.in -4n
44Feature Test Macro Requirements for glibc (see
45.BR feature_test_macros (7)):
46.in
47.sp
48.BR madvise ():
49_BSD_SOURCE
fea681da
MK
50.SH DESCRIPTION
51The
e511ffb6 52.BR madvise ()
845c8bea
MK
53system call is used to give advice or directions to the kernel
54about the address range beginning at address
14f5ae6d 55.I addr
fea681da
MK
56and with size
57.I length
c13182ef 58bytes.
845c8bea
MK
59Initially, the system call supported a set of "conventional"
60.I advice
61values, which are also available on several other implementations.
62(Note, though, that
63.BR madvise ()
64is not specified in POSIX.)
65Subsequently, a number of Linux-specific
66.IR advice
67values have been added.
68.\"
69.\" ======================================================================
70.\"
71.SS Conventional advice values
72The
73.I advice
74values listed below
75allow an application to tell the kernel how it expects to use
fea681da
MK
76some mapped or shared memory areas, so that the kernel can choose
77appropriate read-ahead and caching techniques.
845c8bea
MK
78These
79.I advice
80values do not influence the semantics of the application
fea681da
MK
81(except in the case of
82.BR MADV_DONTNEED ),
845c8bea 83but may influence its performance.
845c8bea
MK
84All of the
85.I advice
86values listed here have analogs in the POSIX-specified
87.BR posix_madvise (3)
88function, and the values have the same meanings, with the exception of
89.BR MADV_DONTNEED .
fea681da 90.LP
c13182ef 91The advice is indicated in the
fea681da 92.I advice
95467f1d 93argument, which is one of the following:
fea681da
MK
94.TP
95.B MADV_NORMAL
c13182ef
MK
96No special treatment.
97This is the default.
fea681da
MK
98.TP
99.B MADV_RANDOM
100Expect page references in random order.
101(Hence, read ahead may be less useful than normally.)
102.TP
103.B MADV_SEQUENTIAL
104Expect page references in sequential order.
105(Hence, pages in the given range can be aggressively read ahead,
106and may be freed soon after they are accessed.)
107.TP
108.B MADV_WILLNEED
109Expect access in the near future.
110(Hence, it might be a good idea to read some pages ahead.)
111.TP
112.B MADV_DONTNEED
113Do not expect access in the near future.
114(For the time being, the application is finished with the given range,
115so the kernel can free resources associated with it.)
a727d7cc
MK
116
117After a successful
118.B MADV_DONTNEED
119operation,
120the semantics of memory access in the specified region are changed:
121subsequent accesses of pages in the range will succeed, but will result
d5e9c9bb
MK
122in either repopulating the memory contents from the
123up-to-date contents of the underlying mapped file
cd15218e
MK
124(for shared file mappings, shared anonymous mappings,
125and shmem-based techniques such as System V shared memory segments)
126or zero-fill-on-demand pages for anonymous private mappings.
a727d7cc 127
d5e9c9bb
MK
128Note that, when applied to shared mappings,
129.BR MADV_DONTNEED
130might not lead to immediate freeing of the pages in the range.
131The kernel is free to delay freeing the pages until an appropriate moment.
132The resident set size (RSS) of the calling process will be immediately
133reduced however.
134
a727d7cc
MK
135.B MADV_DONTNEED
136cannot be applied to locked pages or Huge TLB pages.
845c8bea
MK
137.\"
138.\" ======================================================================
139.\"
140.SS Linux-specific advice values
141The following Linux-specific
142.I advice
143values have no counterparts in the POSIX-specified
144.BR posix_madvise (3),
145and may or may not have counterparts in the
146.BR madvise ()
fb2bb886
MK
147interface available on other implementations.
148Note that some of these operations change the semantics of memory accesses.
835c4d5c 149.TP
31c1f2b0 150.BR MADV_REMOVE " (since Linux 2.6.16)"
498f9213 151.\" commit f6b3ec238d12c8cc6cc71490c6e3127988460349
835c4d5c 152Free up a given range of pages
c13182ef 153and its associated backing store.
49170db5
MK
154This is equivalent to punching a hole in the corresponding byte
155range of the backing store (see
156.BR fallocate (2)).
157Subsequent accesses in the specified address range will see
158bytes containing zero.
bc6eb5ef
MK
159.\" Databases want to use this feature to drop a section of their
160.\" bufferpool (shared memory segments) - without writing back to
161.\" disk/swap space. This feature is also useful for supporting
162.\" hot-plug memory on UML.
49170db5 163
5575818d
MK
164The specified address range must be mapped shared and writable.
165This flag cannot be applied to locked pages or Huge TLB pages.
166
deb99649
MK
167In the initial implementation, only shmfs/tmpfs supported
168.BR MADV_REMOVE ;
169but since Linux 3.5,
170.\" commit 3f31d07571eeea18a7d34db9af21d2285b807a17
f7282b7b 171any filesystem which supports the
deb99649 172.BR fallocate (2)
f7282b7b 173.BR FALLOC_FL_PUNCH_HOLE
95467f1d 174mode also supports
f7282b7b 175.BR MADV_REMOVE .
deb99649
MK
176Other filesystems fail with the error
177.BR EOPNOTSUPP .
835c4d5c 178.TP
31c1f2b0 179.BR MADV_DONTFORK " (since Linux 2.6.16)"
498f9213 180.\" commit f822566165dd46ff5de9bf895cfa6c51f53bb0c4
835c4d5c
MK
181.\" See http://lwn.net/Articles/171941/
182Do not make the pages in this range available to the child after a
183.BR fork (2).
184This is useful to prevent copy-on-write semantics from changing
95467f1d 185the physical location of a page if the parent writes to it after a
835c4d5c
MK
186.BR fork (2).
187(Such page relocations cause problems for hardware that
95467f1d 188DMAs into the page.)
835c4d5c 189.\" [PATCH] madvise MADV_DONTFORK/MADV_DOFORK
c13182ef
MK
190.\" Currently, copy-on-write may change the physical address of
191.\" a page even if the user requested that the page is pinned in
192.\" memory (either by mlock or by get_user_pages). This happens
193.\" if the process forks meanwhile, and the parent writes to that
194.\" page. As a result, the page is orphaned: in case of
195.\" get_user_pages, the application will never see any data hardware
196.\" DMA's into this page after the COW. In case of mlock'd memory,
835c4d5c 197.\" the parent is not getting the realtime/security benefits of mlock.
c13182ef
MK
198.\"
199.\" In particular, this affects the Infiniband modules which do DMA from
835c4d5c 200.\" and into user pages all the time.
c13182ef
MK
201.\"
202.\" This patch adds madvise options to control whether memory range is
203.\" inherited across fork. Useful e.g. for when hardware is doing DMA
204.\" from/into these pages. Could also be useful to an application
205.\" wanting to speed up its forks by cutting large areas out of
835c4d5c 206.\" consideration.
49237f3d
MK
207.\"
208.\" SEE ALSO: http://lwn.net/Articles/171941/
209.\" "Tweaks to madvise() and posix_fadvise()", 14 Feb 2006
835c4d5c 210.TP
31c1f2b0 211.BR MADV_DOFORK " (since Linux 2.6.16)"
835c4d5c
MK
212Undo the effect of
213.BR MADV_DONTFORK ,
d9bfdb9c 214restoring the default behavior, whereby a mapping is inherited across
835c4d5c 215.BR fork (2).
523c2f67 216.TP
31c1f2b0 217.BR MADV_HWPOISON " (since Linux 2.6.32)
498f9213 218.\" commit 9893e49d64a4874ea67849ee2cfbf3f3d6817573
523c2f67 219Poison a page and handle it like a hardware memory corruption.
33a0ccb2 220This operation is available only for privileged
523c2f67
AK
221.RB ( CAP_SYS_ADMIN )
222processes.
223This operation may result in the calling process receiving a
224.B SIGBUS
225and the page being unmapped.
90f406fb 226
c7bdcd8f 227This feature is intended for testing of memory error-handling code;
33a0ccb2 228it is available only if the kernel was configured with
523c2f67 229.BR CONFIG_MEMORY_FAILURE .
5baa8f09 230.TP
31c1f2b0 231.BR MADV_SOFT_OFFLINE " (since Linux 2.6.33)
498f9213 232.\" commit afcf938ee0aac4ef95b1a23bac704c6fbeb26de6
ae24c212
AK
233Soft offline the pages in the range specified by
234.I addr
235and
236.IR length .
1ebc0ddb
MK
237The memory of each page in the specified range is preserved
238(i.e., when next accessed, the same content will be visible,
239but in a new physical page frame),
4be1f2ef
MK
240and the original page is offlined
241(i.e., no longer used, and taken out of normal memory management).
242The effect of the
ae24c212 243.B MADV_SOFT_OFFLINE
1ebc0ddb 244operation is invisible to (i.e., does not change the semantics of)
ae24c212 245the calling process.
90f406fb 246
ae24c212 247This feature is intended for testing of memory error-handling code;
33a0ccb2 248it is available only if the kernel was configured with
ae24c212
AK
249.BR CONFIG_MEMORY_FAILURE .
250.TP
5baa8f09 251.BR MADV_MERGEABLE " (since Linux 2.6.32)"
498f9213 252.\" commit f8af4da3b4c14e7267c4ffb952079af3912c51c5
5baa8f09
MK
253Enable Kernel Samepage Merging (KSM) for the pages in the range specified by
254.I addr
255and
e5963382 256.IR length .
3b18c59b 257The kernel regularly scans those areas of user memory that have
5baa8f09
MK
258been marked as mergeable,
259looking for pages with identical content.
260These are replaced by a single write-protected page (which is automatically
261copied if a process later wants to update the content of the page).
33a0ccb2 262KSM merges only private anonymous pages (see
5baa8f09 263.BR mmap (2)).
90f406fb 264
5baa8f09
MK
265The KSM feature is intended for applications that generate many
266instances of the same data (e.g., virtualization systems such as KVM).
267It can consume a lot of processing power; use with care.
66a9882e 268See the Linux kernel source file
5baa8f09
MK
269.I Documentation/vm/ksm.txt
270for more details.
90f406fb 271
5baa8f09
MK
272The
273.BR MADV_MERGEABLE
274and
275.BR MADV_UNMERGEABLE
33a0ccb2 276operations are available only if the kernel was configured with
8c3fb604 277.BR CONFIG_KSM .
5baa8f09
MK
278.TP
279.BR MADV_UNMERGEABLE " (since Linux 2.6.32)"
280Undo the effect of an earlier
281.BR MADV_MERGEABLE
282operation on the specified address range;
ff24dd19 283KSM unmerges whatever pages it had merged in the address range specified by
5baa8f09
MK
284.IR addr
285and
286.IR length .
e8dd3ed2
DG
287.TP
288.BR MADV_HUGEPAGE " (since Linux 2.6.38)"
498f9213 289.\" commit 0af4e98b6b095c74588af04872f83d333c958c32
3d4b49b0
MK
290.\" http://lwn.net/Articles/358904/
291.\" https://lwn.net/Articles/423584/
95467f1d 292Enable Transparent Huge Pages (THP) for pages in the range specified by
e8dd3ed2
DG
293.I addr
294and
295.IR length .
33a0ccb2 296Currently, Transparent Huge Pages work only with private anonymous pages (see
e8dd3ed2
DG
297.BR mmap (2)).
298The kernel will regularly scan the areas marked as huge page candidates
299to replace them with huge pages.
300The kernel will also allocate huge pages directly when the region is
3d4b49b0 301naturally aligned to the huge page size (see
e8dd3ed2 302.BR posix_memalign (2)).
90f406fb 303
c0e140e6 304This feature is primarily aimed at applications that use large mappings of
e9dedcd2 305data and access large regions of that memory at a time (e.g., virtualization
c0e140e6 306systems such as QEMU).
f61d734d 307It can very easily waste memory (e.g., a 2MB mapping that only ever accesses
e8dd3ed2 3081 byte will result in 2MB of wired memory instead of one 4KB page).
66a9882e 309See the Linux kernel source file
e8dd3ed2
DG
310.I Documentation/vm/transhuge.txt
311for more details.
90f406fb 312
e8dd3ed2
DG
313The
314.BR MADV_HUGEPAGE
315and
316.BR MADV_NOHUGEPAGE
33a0ccb2 317operations are available only if the kernel was configured with
8c3fb604 318.BR CONFIG_TRANSPARENT_HUGEPAGE .
e8dd3ed2
DG
319.TP
320.BR MADV_NOHUGEPAGE " (since Linux 2.6.38)"
321Ensures that memory in the address range specified by
322.IR addr
323and
324.IR length
325will not be collapsed into huge pages.
c639b314
JB
326.TP
327.BR MADV_DONTDUMP " (since Linux 3.4)"
498f9213
MK
328.\" commit 909af768e88867016f427264ae39d27a57b6a8ed
329.\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
c639b314
JB
330Exclude from a core dump those pages in the range specified by
331.I addr
332and
333.IR length .
334This is useful in applications that have large areas of memory
335that are known not to be useful in a core dump.
336The effect of
337.BR MADV_DONTDUMP
338takes precedence over the bit mask that is set via the
339.I /proc/PID/coredump_filter
340file (see
341.BR core (5)).
342.TP
343.BR MADV_DODUMP " (since Linux 3.4)"
344Undo the effect of an earlier
345.BR MADV_DONTDUMP .
47297adb 346.SH RETURN VALUE
95467f1d 347On success,
e511ffb6 348.BR madvise ()
c13182ef
MK
349returns zero.
350On error, it returns \-1 and
fea681da 351.I errno
d301ee6c 352is set appropriately.
fea681da
MK
353.SH ERRORS
354.TP
7208ad0a
MK
355.B EACCES
356.I advice
357is
358.BR MADV_REMOVE ,
359but the specified address range is not a shared writable mapping.
360.TP
fea681da
MK
361.B EAGAIN
362A kernel resource was temporarily unavailable.
363.TP
364.B EBADF
365The map exists, but the area maps something that isn't a file.
366.TP
367.B EINVAL
ac95034e
MK
368.I addr
369is not page-aligned or
c608a033 370.I length
601f3bc6 371is negative.
c608a033 372.\" .I length
fea681da 373.\" is zero,
ac95034e
MK
374.TP
375.B EINVAL
376.I advice
377is not a valid.
378.TP
379.B EINVAL
4335648d
MK
380.I advice
381is
382.B MADV_DONTNEED
383or
384.BR MADV_REMOVE
385and the specified address range includes locked or Huge TLB pages.
ac95034e
MK
386.TP
387.B EINVAL
c13182ef 388.I advice
ac95034e 389is
5baa8f09
MK
390.BR MADV_MERGEABLE
391or
ac95034e 392.BR MADV_UNMERGEABLE ,
5baa8f09
MK
393but the kernel was not configured with
394.BR CONFIG_KSM .
fea681da
MK
395.TP
396.B EIO
682edefb
MK
397(for
398.BR MADV_WILLNEED )
399Paging in this area would exceed the process's
fea681da
MK
400maximum resident set size.
401.TP
402.B ENOMEM
682edefb
MK
403(for
404.BR MADV_WILLNEED )
405Not enough memory: paging in failed.
fea681da
MK
406.TP
407.B ENOMEM
408Addresses in the specified range are not currently
409mapped, or are outside the address space of the process.
9c0b66eb
MK
410.TP
411.B EPERM
412.I advice
413is
414.BR MADV_HWPOISON ,
415but the caller does not have the
416.B CAP_SYS_ADMIN
417capability.
6e519900
MK
418.SH VERSIONS
419Since Linux 3.18,
420.\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb
421support for this system call is optional,
422depending on the setting of the
423.B CONFIG_ADVISE_SYSCALLS
424configuration option.
47297adb 425.SH CONFORMING TO
c73c7130
MK
426.BR madvise ()
427is not specified by any standards.
428Versions of this system call, implementing a wide variety of
429.I advice
430values, exist on many other implementations.
431Other implementations typically implement at least the flags listed
432above under
95467f1d 433.IR "Conventional advice flags" ,
c73c7130
MK
434albeit with some variation in semantics.
435
a1d5f77c
MK
436POSIX.1-2001 describes
437.BR posix_madvise (3)
682edefb
MK
438with constants
439.BR POSIX_MADV_NORMAL ,
f78ed33a 440.BR POSIX_MADV_RANDOM ,
b7bc9bfd
MK
441.BR POSIX_MADV_SEQUENTIAL ,
442.BR POSIX_MADV_WILLNEED ,
443and
444.BR POSIX_MADV_DONTNEED ,
95467f1d 445and so on, with behavior close to the similarly named flags listed above.
05fde7dd
MK
446(POSIX.1-2008 adds a further flag,
447.BR POSIX_MADV_NOREUSE ,
448that has no analog in
95467f1d 449.BR madvise (2).)
4fb31341 450.SH NOTES
c634028a 451.SS Linux notes
fea681da 452The Linux implementation requires that the address
14f5ae6d 453.I addr
fea681da
MK
454be page-aligned, and allows
455.I length
c13182ef
MK
456to be zero.
457If there are some parts of the specified address range
fea681da 458that are not mapped, the Linux version of
e511ffb6 459.BR madvise ()
c13182ef 460ignores them and applies the call to the rest (but returns
fea681da
MK
461.B ENOMEM
462from the system call, as it should).
889829be
MK
463.\" .SH HISTORY
464.\" The
465.\" .BR madvise ()
466.\" function first appeared in 4.4BSD.
47297adb 467.SH SEE ALSO
fea681da
MK
468.BR getrlimit (2),
469.BR mincore (2),
470.BR mmap (2),
471.BR mprotect (2),
472.BR msync (2),
c639b314 473.BR munmap (2),
b4c1dae9 474.BR posix_fadvise (2),
48cb32cd 475.BR prctl (2),
c639b314 476.BR core (5)