]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/madvise.2
time.1, access.2, arch_prctl.2, cacheflush.2, capget.2, clone.2, execve.2, fcntl...
[thirdparty/man-pages.git] / man2 / madvise.2
CommitLineData
e00c3a07 1.\" Copyright (C) 2001 David Gómez <davidge@jazzfree.com>
fea681da 2.\"
93015253 3.\" %%%LICENSE_START(VERBATIM)
fea681da
MK
4.\" Permission is granted to make and distribute verbatim copies of this
5.\" manual provided the copyright notice and this permission notice are
6.\" preserved on all copies.
7.\"
8.\" Permission is granted to copy and distribute modified versions of this
9.\" manual under the conditions for verbatim copying, provided that the
10.\" entire resulting derived work is distributed under the terms of a
11.\" permission notice identical to this one.
c13182ef 12.\"
fea681da
MK
13.\" Since the Linux kernel and libraries are constantly changing, this
14.\" manual page may be incorrect or out-of-date. The author(s) assume no
15.\" responsibility for errors or omissions, or for damages resulting from
16.\" the use of the information contained herein. The author(s) may not
17.\" have taken the same level of care in the production of this manual,
18.\" which is licensed free of charge, as they might when working
19.\" professionally.
c13182ef 20.\"
fea681da
MK
21.\" Formatted or processed versions of this manual, if unaccompanied by
22.\" the source, must acknowledge the copyright and authors of this work.
4b72fb64 23.\" %%%LICENSE_END
fea681da
MK
24.\"
25.\" Based on comments from mm/filemap.c. Last modified on 10-06-2001
c11b1abf 26.\" Modified, 25 Feb 2002, Michael Kerrisk, <mtk.manpages@gmail.com>
fea681da 27.\" Added notes on MADV_DONTNEED
5baa8f09
MK
28.\" 2010-06-19, mtk, Added documentation of MADV_MERGEABLE and
29.\" MADV_UNMERGEABLE
f5321b14
MK
30.\" 2010-06-15, Andi Kleen, Add documentation of MADV_HWPOISON.
31.\" 2010-06-19, Andi Kleen, Add documentation of MADV_SOFT_OFFLINE.
3d4b49b0
MK
32.\" 2011-09-18, Doug Goldstein <cardoe@cardoe.com>
33.\" Document MADV_HUGEPAGE and MADV_NOHUGEPAGE
347e325b 34.\"
6d322d5f 35.TH MADVISE 2 2015-02-21 "Linux" "Linux Programmer's Manual"
fea681da
MK
36.SH NAME
37madvise \- give advice about use of memory
38.SH SYNOPSIS
fea681da
MK
39.B #include <sys/mman.h>
40.sp
14f5ae6d 41.BI "int madvise(void *" addr ", size_t " length ", int " advice );
cc4615cc
MK
42.sp
43.in -4n
44Feature Test Macro Requirements for glibc (see
45.BR feature_test_macros (7)):
46.in
47.sp
48.BR madvise ():
49_BSD_SOURCE
fea681da
MK
50.SH DESCRIPTION
51The
e511ffb6 52.BR madvise ()
845c8bea
MK
53system call is used to give advice or directions to the kernel
54about the address range beginning at address
14f5ae6d 55.I addr
fea681da
MK
56and with size
57.I length
c13182ef 58bytes.
845c8bea
MK
59Initially, the system call supported a set of "conventional"
60.I advice
61values, which are also available on several other implementations.
62(Note, though, that
63.BR madvise ()
64is not specified in POSIX.)
65Subsequently, a number of Linux-specific
66.IR advice
67values have been added.
68.\"
69.\" ======================================================================
70.\"
71.SS Conventional advice values
72The
73.I advice
74values listed below
75allow an application to tell the kernel how it expects to use
fea681da
MK
76some mapped or shared memory areas, so that the kernel can choose
77appropriate read-ahead and caching techniques.
845c8bea
MK
78These
79.I advice
80values do not influence the semantics of the application
fea681da
MK
81(except in the case of
82.BR MADV_DONTNEED ),
845c8bea 83but may influence its performance.
845c8bea
MK
84All of the
85.I advice
86values listed here have analogs in the POSIX-specified
87.BR posix_madvise (3)
88function, and the values have the same meanings, with the exception of
89.BR MADV_DONTNEED .
fea681da 90.LP
c13182ef 91The advice is indicated in the
fea681da 92.I advice
95467f1d 93argument, which is one of the following:
fea681da
MK
94.TP
95.B MADV_NORMAL
c13182ef
MK
96No special treatment.
97This is the default.
fea681da
MK
98.TP
99.B MADV_RANDOM
100Expect page references in random order.
101(Hence, read ahead may be less useful than normally.)
102.TP
103.B MADV_SEQUENTIAL
104Expect page references in sequential order.
105(Hence, pages in the given range can be aggressively read ahead,
106and may be freed soon after they are accessed.)
107.TP
108.B MADV_WILLNEED
109Expect access in the near future.
110(Hence, it might be a good idea to read some pages ahead.)
111.TP
112.B MADV_DONTNEED
113Do not expect access in the near future.
114(For the time being, the application is finished with the given range,
115so the kernel can free resources associated with it.)
a727d7cc
MK
116
117After a successful
118.B MADV_DONTNEED
119operation,
120the semantics of memory access in the specified region are changed:
121subsequent accesses of pages in the range will succeed, but will result
d5e9c9bb
MK
122in either repopulating the memory contents from the
123up-to-date contents of the underlying mapped file
cd15218e
MK
124(for shared file mappings, shared anonymous mappings,
125and shmem-based techniques such as System V shared memory segments)
126or zero-fill-on-demand pages for anonymous private mappings.
a727d7cc 127
d5e9c9bb
MK
128Note that, when applied to shared mappings,
129.BR MADV_DONTNEED
130might not lead to immediate freeing of the pages in the range.
131The kernel is free to delay freeing the pages until an appropriate moment.
132The resident set size (RSS) of the calling process will be immediately
133reduced however.
134
a727d7cc 135.B MADV_DONTNEED
36e5bc92
MK
136cannot be applied to locked pages, Huge TLB pages, or
137.BR VM_PFNMAP
138pages.
139(Pages marked with the kernel-internal
140.B VM_PFNMAP
141.\" http://lwn.net/Articles/162860/
142flag are special memory areas that are not managed
143by the virtual memory subsystem.
144Such pages are typically created by device drivers that
145map the pages into user space.)
845c8bea
MK
146.\"
147.\" ======================================================================
148.\"
149.SS Linux-specific advice values
150The following Linux-specific
151.I advice
152values have no counterparts in the POSIX-specified
153.BR posix_madvise (3),
154and may or may not have counterparts in the
155.BR madvise ()
fb2bb886
MK
156interface available on other implementations.
157Note that some of these operations change the semantics of memory accesses.
835c4d5c 158.TP
31c1f2b0 159.BR MADV_REMOVE " (since Linux 2.6.16)"
498f9213 160.\" commit f6b3ec238d12c8cc6cc71490c6e3127988460349
835c4d5c 161Free up a given range of pages
c13182ef 162and its associated backing store.
49170db5
MK
163This is equivalent to punching a hole in the corresponding byte
164range of the backing store (see
165.BR fallocate (2)).
166Subsequent accesses in the specified address range will see
167bytes containing zero.
bc6eb5ef
MK
168.\" Databases want to use this feature to drop a section of their
169.\" bufferpool (shared memory segments) - without writing back to
170.\" disk/swap space. This feature is also useful for supporting
171.\" hot-plug memory on UML.
49170db5 172
5575818d 173The specified address range must be mapped shared and writable.
36e5bc92
MK
174This flag cannot be applied to locked pages, Huge TLB pages, or
175.BR VM_PFNMAP
176pages.
5575818d 177
deb99649
MK
178In the initial implementation, only shmfs/tmpfs supported
179.BR MADV_REMOVE ;
180but since Linux 3.5,
181.\" commit 3f31d07571eeea18a7d34db9af21d2285b807a17
f7282b7b 182any filesystem which supports the
deb99649 183.BR fallocate (2)
f7282b7b 184.BR FALLOC_FL_PUNCH_HOLE
95467f1d 185mode also supports
f7282b7b 186.BR MADV_REMOVE .
deb99649
MK
187Other filesystems fail with the error
188.BR EOPNOTSUPP .
835c4d5c 189.TP
31c1f2b0 190.BR MADV_DONTFORK " (since Linux 2.6.16)"
498f9213 191.\" commit f822566165dd46ff5de9bf895cfa6c51f53bb0c4
835c4d5c
MK
192.\" See http://lwn.net/Articles/171941/
193Do not make the pages in this range available to the child after a
194.BR fork (2).
195This is useful to prevent copy-on-write semantics from changing
95467f1d 196the physical location of a page if the parent writes to it after a
835c4d5c
MK
197.BR fork (2).
198(Such page relocations cause problems for hardware that
95467f1d 199DMAs into the page.)
835c4d5c 200.\" [PATCH] madvise MADV_DONTFORK/MADV_DOFORK
c13182ef
MK
201.\" Currently, copy-on-write may change the physical address of
202.\" a page even if the user requested that the page is pinned in
203.\" memory (either by mlock or by get_user_pages). This happens
204.\" if the process forks meanwhile, and the parent writes to that
205.\" page. As a result, the page is orphaned: in case of
206.\" get_user_pages, the application will never see any data hardware
207.\" DMA's into this page after the COW. In case of mlock'd memory,
835c4d5c 208.\" the parent is not getting the realtime/security benefits of mlock.
c13182ef
MK
209.\"
210.\" In particular, this affects the Infiniband modules which do DMA from
835c4d5c 211.\" and into user pages all the time.
c13182ef
MK
212.\"
213.\" This patch adds madvise options to control whether memory range is
214.\" inherited across fork. Useful e.g. for when hardware is doing DMA
215.\" from/into these pages. Could also be useful to an application
216.\" wanting to speed up its forks by cutting large areas out of
835c4d5c 217.\" consideration.
49237f3d
MK
218.\"
219.\" SEE ALSO: http://lwn.net/Articles/171941/
220.\" "Tweaks to madvise() and posix_fadvise()", 14 Feb 2006
835c4d5c 221.TP
31c1f2b0 222.BR MADV_DOFORK " (since Linux 2.6.16)"
835c4d5c
MK
223Undo the effect of
224.BR MADV_DONTFORK ,
d9bfdb9c 225restoring the default behavior, whereby a mapping is inherited across
835c4d5c 226.BR fork (2).
523c2f67 227.TP
31c1f2b0 228.BR MADV_HWPOISON " (since Linux 2.6.32)
498f9213 229.\" commit 9893e49d64a4874ea67849ee2cfbf3f3d6817573
523c2f67 230Poison a page and handle it like a hardware memory corruption.
33a0ccb2 231This operation is available only for privileged
523c2f67
AK
232.RB ( CAP_SYS_ADMIN )
233processes.
234This operation may result in the calling process receiving a
235.B SIGBUS
236and the page being unmapped.
90f406fb 237
c7bdcd8f 238This feature is intended for testing of memory error-handling code;
33a0ccb2 239it is available only if the kernel was configured with
523c2f67 240.BR CONFIG_MEMORY_FAILURE .
5baa8f09 241.TP
31c1f2b0 242.BR MADV_SOFT_OFFLINE " (since Linux 2.6.33)
498f9213 243.\" commit afcf938ee0aac4ef95b1a23bac704c6fbeb26de6
ae24c212
AK
244Soft offline the pages in the range specified by
245.I addr
246and
247.IR length .
1ebc0ddb
MK
248The memory of each page in the specified range is preserved
249(i.e., when next accessed, the same content will be visible,
250but in a new physical page frame),
4be1f2ef
MK
251and the original page is offlined
252(i.e., no longer used, and taken out of normal memory management).
253The effect of the
ae24c212 254.B MADV_SOFT_OFFLINE
1ebc0ddb 255operation is invisible to (i.e., does not change the semantics of)
ae24c212 256the calling process.
90f406fb 257
ae24c212 258This feature is intended for testing of memory error-handling code;
33a0ccb2 259it is available only if the kernel was configured with
ae24c212
AK
260.BR CONFIG_MEMORY_FAILURE .
261.TP
5baa8f09 262.BR MADV_MERGEABLE " (since Linux 2.6.32)"
498f9213 263.\" commit f8af4da3b4c14e7267c4ffb952079af3912c51c5
5baa8f09
MK
264Enable Kernel Samepage Merging (KSM) for the pages in the range specified by
265.I addr
266and
e5963382 267.IR length .
3b18c59b 268The kernel regularly scans those areas of user memory that have
5baa8f09
MK
269been marked as mergeable,
270looking for pages with identical content.
271These are replaced by a single write-protected page (which is automatically
272copied if a process later wants to update the content of the page).
33a0ccb2 273KSM merges only private anonymous pages (see
5baa8f09 274.BR mmap (2)).
90f406fb 275
5baa8f09
MK
276The KSM feature is intended for applications that generate many
277instances of the same data (e.g., virtualization systems such as KVM).
278It can consume a lot of processing power; use with care.
66a9882e 279See the Linux kernel source file
5baa8f09
MK
280.I Documentation/vm/ksm.txt
281for more details.
90f406fb 282
5baa8f09
MK
283The
284.BR MADV_MERGEABLE
285and
286.BR MADV_UNMERGEABLE
33a0ccb2 287operations are available only if the kernel was configured with
8c3fb604 288.BR CONFIG_KSM .
5baa8f09
MK
289.TP
290.BR MADV_UNMERGEABLE " (since Linux 2.6.32)"
291Undo the effect of an earlier
292.BR MADV_MERGEABLE
293operation on the specified address range;
ff24dd19 294KSM unmerges whatever pages it had merged in the address range specified by
5baa8f09
MK
295.IR addr
296and
297.IR length .
e8dd3ed2
DG
298.TP
299.BR MADV_HUGEPAGE " (since Linux 2.6.38)"
498f9213 300.\" commit 0af4e98b6b095c74588af04872f83d333c958c32
3d4b49b0
MK
301.\" http://lwn.net/Articles/358904/
302.\" https://lwn.net/Articles/423584/
95467f1d 303Enable Transparent Huge Pages (THP) for pages in the range specified by
e8dd3ed2
DG
304.I addr
305and
306.IR length .
33a0ccb2 307Currently, Transparent Huge Pages work only with private anonymous pages (see
e8dd3ed2
DG
308.BR mmap (2)).
309The kernel will regularly scan the areas marked as huge page candidates
310to replace them with huge pages.
311The kernel will also allocate huge pages directly when the region is
3d4b49b0 312naturally aligned to the huge page size (see
e8dd3ed2 313.BR posix_memalign (2)).
90f406fb 314
c0e140e6 315This feature is primarily aimed at applications that use large mappings of
e9dedcd2 316data and access large regions of that memory at a time (e.g., virtualization
c0e140e6 317systems such as QEMU).
f61d734d 318It can very easily waste memory (e.g., a 2MB mapping that only ever accesses
e8dd3ed2 3191 byte will result in 2MB of wired memory instead of one 4KB page).
66a9882e 320See the Linux kernel source file
e8dd3ed2
DG
321.I Documentation/vm/transhuge.txt
322for more details.
90f406fb 323
e8dd3ed2
DG
324The
325.BR MADV_HUGEPAGE
326and
327.BR MADV_NOHUGEPAGE
33a0ccb2 328operations are available only if the kernel was configured with
8c3fb604 329.BR CONFIG_TRANSPARENT_HUGEPAGE .
e8dd3ed2
DG
330.TP
331.BR MADV_NOHUGEPAGE " (since Linux 2.6.38)"
332Ensures that memory in the address range specified by
333.IR addr
334and
335.IR length
336will not be collapsed into huge pages.
c639b314
JB
337.TP
338.BR MADV_DONTDUMP " (since Linux 3.4)"
498f9213
MK
339.\" commit 909af768e88867016f427264ae39d27a57b6a8ed
340.\" commit accb61fe7bb0f5c2a4102239e4981650f9048519
c639b314
JB
341Exclude from a core dump those pages in the range specified by
342.I addr
343and
344.IR length .
345This is useful in applications that have large areas of memory
346that are known not to be useful in a core dump.
347The effect of
348.BR MADV_DONTDUMP
349takes precedence over the bit mask that is set via the
350.I /proc/PID/coredump_filter
351file (see
352.BR core (5)).
353.TP
354.BR MADV_DODUMP " (since Linux 3.4)"
355Undo the effect of an earlier
356.BR MADV_DONTDUMP .
47297adb 357.SH RETURN VALUE
95467f1d 358On success,
e511ffb6 359.BR madvise ()
c13182ef
MK
360returns zero.
361On error, it returns \-1 and
fea681da 362.I errno
d301ee6c 363is set appropriately.
fea681da
MK
364.SH ERRORS
365.TP
7208ad0a
MK
366.B EACCES
367.I advice
368is
369.BR MADV_REMOVE ,
370but the specified address range is not a shared writable mapping.
371.TP
fea681da
MK
372.B EAGAIN
373A kernel resource was temporarily unavailable.
374.TP
375.B EBADF
376The map exists, but the area maps something that isn't a file.
377.TP
378.B EINVAL
ac95034e
MK
379.I addr
380is not page-aligned or
c608a033 381.I length
601f3bc6 382is negative.
c608a033 383.\" .I length
fea681da 384.\" is zero,
ac95034e
MK
385.TP
386.B EINVAL
387.I advice
388is not a valid.
389.TP
390.B EINVAL
4335648d
MK
391.I advice
392is
393.B MADV_DONTNEED
394or
395.BR MADV_REMOVE
36e5bc92
MK
396and the specified address range includes locked, Huge TLB pages, or
397.B VM_PFNMAP
398pages.
ac95034e
MK
399.TP
400.B EINVAL
c13182ef 401.I advice
ac95034e 402is
5baa8f09
MK
403.BR MADV_MERGEABLE
404or
ac95034e 405.BR MADV_UNMERGEABLE ,
5baa8f09
MK
406but the kernel was not configured with
407.BR CONFIG_KSM .
fea681da
MK
408.TP
409.B EIO
682edefb
MK
410(for
411.BR MADV_WILLNEED )
412Paging in this area would exceed the process's
fea681da
MK
413maximum resident set size.
414.TP
415.B ENOMEM
682edefb
MK
416(for
417.BR MADV_WILLNEED )
418Not enough memory: paging in failed.
fea681da
MK
419.TP
420.B ENOMEM
421Addresses in the specified range are not currently
422mapped, or are outside the address space of the process.
9c0b66eb
MK
423.TP
424.B EPERM
425.I advice
426is
427.BR MADV_HWPOISON ,
428but the caller does not have the
429.B CAP_SYS_ADMIN
430capability.
6e519900
MK
431.SH VERSIONS
432Since Linux 3.18,
433.\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb
434support for this system call is optional,
435depending on the setting of the
436.B CONFIG_ADVISE_SYSCALLS
437configuration option.
47297adb 438.SH CONFORMING TO
c73c7130
MK
439.BR madvise ()
440is not specified by any standards.
441Versions of this system call, implementing a wide variety of
442.I advice
443values, exist on many other implementations.
444Other implementations typically implement at least the flags listed
445above under
95467f1d 446.IR "Conventional advice flags" ,
c73c7130
MK
447albeit with some variation in semantics.
448
a1d5f77c
MK
449POSIX.1-2001 describes
450.BR posix_madvise (3)
682edefb
MK
451with constants
452.BR POSIX_MADV_NORMAL ,
f78ed33a 453.BR POSIX_MADV_RANDOM ,
b7bc9bfd
MK
454.BR POSIX_MADV_SEQUENTIAL ,
455.BR POSIX_MADV_WILLNEED ,
456and
457.BR POSIX_MADV_DONTNEED ,
95467f1d 458and so on, with behavior close to the similarly named flags listed above.
05fde7dd
MK
459(POSIX.1-2008 adds a further flag,
460.BR POSIX_MADV_NOREUSE ,
461that has no analog in
95467f1d 462.BR madvise (2).)
4fb31341 463.SH NOTES
c634028a 464.SS Linux notes
fea681da 465The Linux implementation requires that the address
14f5ae6d 466.I addr
fea681da
MK
467be page-aligned, and allows
468.I length
c13182ef
MK
469to be zero.
470If there are some parts of the specified address range
fea681da 471that are not mapped, the Linux version of
e511ffb6 472.BR madvise ()
c13182ef 473ignores them and applies the call to the rest (but returns
fea681da
MK
474.B ENOMEM
475from the system call, as it should).
889829be
MK
476.\" .SH HISTORY
477.\" The
478.\" .BR madvise ()
479.\" function first appeared in 4.4BSD.
47297adb 480.SH SEE ALSO
fea681da
MK
481.BR getrlimit (2),
482.BR mincore (2),
483.BR mmap (2),
484.BR mprotect (2),
485.BR msync (2),
c639b314 486.BR munmap (2),
b4c1dae9 487.BR posix_fadvise (2),
48cb32cd 488.BR prctl (2),
c639b314 489.BR core (5)