]>
Commit | Line | Data |
---|---|---|
e00c3a07 | 1 | .\" Copyright (C) 2001 David Gómez <davidge@jazzfree.com> |
fea681da | 2 | .\" |
5fbde956 | 3 | .\" SPDX-License-Identifier: Linux-man-pages-copyleft |
fea681da MK |
4 | .\" |
5 | .\" Based on comments from mm/filemap.c. Last modified on 10-06-2001 | |
c11b1abf | 6 | .\" Modified, 25 Feb 2002, Michael Kerrisk, <mtk.manpages@gmail.com> |
fea681da | 7 | .\" Added notes on MADV_DONTNEED |
5baa8f09 MK |
8 | .\" 2010-06-19, mtk, Added documentation of MADV_MERGEABLE and |
9 | .\" MADV_UNMERGEABLE | |
f5321b14 MK |
10 | .\" 2010-06-15, Andi Kleen, Add documentation of MADV_HWPOISON. |
11 | .\" 2010-06-19, Andi Kleen, Add documentation of MADV_SOFT_OFFLINE. | |
3d4b49b0 MK |
12 | .\" 2011-09-18, Doug Goldstein <cardoe@cardoe.com> |
13 | .\" Document MADV_HUGEPAGE and MADV_NOHUGEPAGE | |
347e325b | 14 | .\" |
45186a5d | 15 | .TH MADVISE 2 2021-03-22 "Linux man-pages (unreleased)" |
fea681da MK |
16 | .SH NAME |
17 | madvise \- give advice about use of memory | |
f934f70e AC |
18 | .SH LIBRARY |
19 | Standard C library | |
8fc3b2cf | 20 | .RI ( libc ", " \-lc ) |
fea681da | 21 | .SH SYNOPSIS |
c7db92b9 | 22 | .nf |
fea681da | 23 | .B #include <sys/mman.h> |
68e4db0a | 24 | .PP |
14f5ae6d | 25 | .BI "int madvise(void *" addr ", size_t " length ", int " advice ); |
c7db92b9 | 26 | .fi |
68e4db0a | 27 | .PP |
d39ad78f | 28 | .RS -4 |
cc4615cc MK |
29 | Feature Test Macro Requirements for glibc (see |
30 | .BR feature_test_macros (7)): | |
d39ad78f | 31 | .RE |
68e4db0a | 32 | .PP |
cc4615cc | 33 | .BR madvise (): |
9d2adbae MK |
34 | .nf |
35 | Since glibc 2.19: | |
36 | _DEFAULT_SOURCE | |
37 | Up to and including glibc 2.19: | |
38 | _BSD_SOURCE | |
39 | .fi | |
fea681da MK |
40 | .SH DESCRIPTION |
41 | The | |
e511ffb6 | 42 | .BR madvise () |
845c8bea MK |
43 | system call is used to give advice or directions to the kernel |
44 | about the address range beginning at address | |
14f5ae6d | 45 | .I addr |
fea681da | 46 | and with size |
756761bf MK |
47 | .IR length . |
48 | .BR madvise () | |
49 | only operates on whole pages, therefore | |
50 | .I addr | |
51 | must be page-aligned. | |
52 | The value of | |
fea681da | 53 | .I length |
756761bf | 54 | is rounded up to a multiple of page size. |
a8db50d3 MK |
55 | In most cases, |
56 | the goal of such advice is to improve system or application performance. | |
efeece04 | 57 | .PP |
845c8bea MK |
58 | Initially, the system call supported a set of "conventional" |
59 | .I advice | |
60 | values, which are also available on several other implementations. | |
61 | (Note, though, that | |
62 | .BR madvise () | |
63 | is not specified in POSIX.) | |
64 | Subsequently, a number of Linux-specific | |
1ae6b2c7 | 65 | .I advice |
845c8bea MK |
66 | values have been added. |
67 | .\" | |
68 | .\" ====================================================================== | |
69 | .\" | |
70 | .SS Conventional advice values | |
71 | The | |
72 | .I advice | |
73 | values listed below | |
74 | allow an application to tell the kernel how it expects to use | |
fea681da MK |
75 | some mapped or shared memory areas, so that the kernel can choose |
76 | appropriate read-ahead and caching techniques. | |
845c8bea MK |
77 | These |
78 | .I advice | |
79 | values do not influence the semantics of the application | |
fea681da MK |
80 | (except in the case of |
81 | .BR MADV_DONTNEED ), | |
845c8bea | 82 | but may influence its performance. |
845c8bea MK |
83 | All of the |
84 | .I advice | |
85 | values listed here have analogs in the POSIX-specified | |
86 | .BR posix_madvise (3) | |
87 | function, and the values have the same meanings, with the exception of | |
88 | .BR MADV_DONTNEED . | |
dd3568a1 | 89 | .PP |
c13182ef | 90 | The advice is indicated in the |
fea681da | 91 | .I advice |
95467f1d | 92 | argument, which is one of the following: |
fea681da MK |
93 | .TP |
94 | .B MADV_NORMAL | |
c13182ef MK |
95 | No special treatment. |
96 | This is the default. | |
fea681da MK |
97 | .TP |
98 | .B MADV_RANDOM | |
99 | Expect page references in random order. | |
100 | (Hence, read ahead may be less useful than normally.) | |
101 | .TP | |
102 | .B MADV_SEQUENTIAL | |
103 | Expect page references in sequential order. | |
104 | (Hence, pages in the given range can be aggressively read ahead, | |
105 | and may be freed soon after they are accessed.) | |
106 | .TP | |
107 | .B MADV_WILLNEED | |
108 | Expect access in the near future. | |
109 | (Hence, it might be a good idea to read some pages ahead.) | |
110 | .TP | |
111 | .B MADV_DONTNEED | |
112 | Do not expect access in the near future. | |
113 | (For the time being, the application is finished with the given range, | |
114 | so the kernel can free resources associated with it.) | |
efeece04 | 115 | .IP |
a727d7cc MK |
116 | After a successful |
117 | .B MADV_DONTNEED | |
118 | operation, | |
119 | the semantics of memory access in the specified region are changed: | |
120 | subsequent accesses of pages in the range will succeed, but will result | |
d5e9c9bb MK |
121 | in either repopulating the memory contents from the |
122 | up-to-date contents of the underlying mapped file | |
cd15218e MK |
123 | (for shared file mappings, shared anonymous mappings, |
124 | and shmem-based techniques such as System V shared memory segments) | |
125 | or zero-fill-on-demand pages for anonymous private mappings. | |
efeece04 | 126 | .IP |
d5e9c9bb | 127 | Note that, when applied to shared mappings, |
1ae6b2c7 | 128 | .B MADV_DONTNEED |
d5e9c9bb MK |
129 | might not lead to immediate freeing of the pages in the range. |
130 | The kernel is free to delay freeing the pages until an appropriate moment. | |
131 | The resident set size (RSS) of the calling process will be immediately | |
132 | reduced however. | |
efeece04 | 133 | .IP |
a727d7cc | 134 | .B MADV_DONTNEED |
756761bf | 135 | cannot be applied to locked pages, or |
1ae6b2c7 | 136 | .B VM_PFNMAP |
36e5bc92 MK |
137 | pages. |
138 | (Pages marked with the kernel-internal | |
139 | .B VM_PFNMAP | |
140 | .\" http://lwn.net/Articles/162860/ | |
141 | flag are special memory areas that are not managed | |
142 | by the virtual memory subsystem. | |
143 | Such pages are typically created by device drivers that | |
144 | map the pages into user space.) | |
756761bf MK |
145 | .IP |
146 | Support for Huge TLB pages was added in Linux v5.18. | |
147 | Addresses within a mapping backed by Huge TLB pages must be aligned | |
148 | to the underlying Huge TLB page size, | |
149 | and the range length is rounded up | |
150 | to a multiple of the underlying Huge TLB page size. | |
845c8bea MK |
151 | .\" |
152 | .\" ====================================================================== | |
153 | .\" | |
154 | .SS Linux-specific advice values | |
155 | The following Linux-specific | |
156 | .I advice | |
157 | values have no counterparts in the POSIX-specified | |
158 | .BR posix_madvise (3), | |
159 | and may or may not have counterparts in the | |
160 | .BR madvise () | |
fb2bb886 MK |
161 | interface available on other implementations. |
162 | Note that some of these operations change the semantics of memory accesses. | |
835c4d5c | 163 | .TP |
31c1f2b0 | 164 | .BR MADV_REMOVE " (since Linux 2.6.16)" |
498f9213 | 165 | .\" commit f6b3ec238d12c8cc6cc71490c6e3127988460349 |
835c4d5c | 166 | Free up a given range of pages |
c13182ef | 167 | and its associated backing store. |
756761bf | 168 | This is equivalent to punching a hole in the corresponding |
49170db5 MK |
169 | range of the backing store (see |
170 | .BR fallocate (2)). | |
171 | Subsequent accesses in the specified address range will see | |
756761bf | 172 | data with a value of zero. |
bc6eb5ef MK |
173 | .\" Databases want to use this feature to drop a section of their |
174 | .\" bufferpool (shared memory segments) - without writing back to | |
175 | .\" disk/swap space. This feature is also useful for supporting | |
176 | .\" hot-plug memory on UML. | |
efeece04 | 177 | .IP |
5575818d | 178 | The specified address range must be mapped shared and writable. |
756761bf | 179 | This flag cannot be applied to locked pages, or |
1ae6b2c7 | 180 | .B VM_PFNMAP |
36e5bc92 | 181 | pages. |
efeece04 | 182 | .IP |
4e07c70f MK |
183 | In the initial implementation, only |
184 | .BR tmpfs (5) | |
756761bf | 185 | supported |
deb99649 MK |
186 | .BR MADV_REMOVE ; |
187 | but since Linux 3.5, | |
188 | .\" commit 3f31d07571eeea18a7d34db9af21d2285b807a17 | |
f7282b7b | 189 | any filesystem which supports the |
deb99649 | 190 | .BR fallocate (2) |
1ae6b2c7 | 191 | .B FALLOC_FL_PUNCH_HOLE |
95467f1d | 192 | mode also supports |
f7282b7b | 193 | .BR MADV_REMOVE . |
756761bf MK |
194 | Filesystems which do not support |
195 | .B MADV_REMOVE | |
196 | fail with the error | |
deb99649 | 197 | .BR EOPNOTSUPP . |
756761bf MK |
198 | .IP |
199 | Support for the Huge TLB filesystem was added in Linux v4.3. | |
835c4d5c | 200 | .TP |
31c1f2b0 | 201 | .BR MADV_DONTFORK " (since Linux 2.6.16)" |
498f9213 | 202 | .\" commit f822566165dd46ff5de9bf895cfa6c51f53bb0c4 |
835c4d5c MK |
203 | .\" See http://lwn.net/Articles/171941/ |
204 | Do not make the pages in this range available to the child after a | |
205 | .BR fork (2). | |
206 | This is useful to prevent copy-on-write semantics from changing | |
95467f1d | 207 | the physical location of a page if the parent writes to it after a |
835c4d5c MK |
208 | .BR fork (2). |
209 | (Such page relocations cause problems for hardware that | |
95467f1d | 210 | DMAs into the page.) |
835c4d5c | 211 | .\" [PATCH] madvise MADV_DONTFORK/MADV_DOFORK |
c13182ef MK |
212 | .\" Currently, copy-on-write may change the physical address of |
213 | .\" a page even if the user requested that the page is pinned in | |
214 | .\" memory (either by mlock or by get_user_pages). This happens | |
215 | .\" if the process forks meanwhile, and the parent writes to that | |
216 | .\" page. As a result, the page is orphaned: in case of | |
217 | .\" get_user_pages, the application will never see any data hardware | |
218 | .\" DMA's into this page after the COW. In case of mlock'd memory, | |
835c4d5c | 219 | .\" the parent is not getting the realtime/security benefits of mlock. |
c13182ef MK |
220 | .\" |
221 | .\" In particular, this affects the Infiniband modules which do DMA from | |
835c4d5c | 222 | .\" and into user pages all the time. |
c13182ef MK |
223 | .\" |
224 | .\" This patch adds madvise options to control whether memory range is | |
225 | .\" inherited across fork. Useful e.g. for when hardware is doing DMA | |
226 | .\" from/into these pages. Could also be useful to an application | |
227 | .\" wanting to speed up its forks by cutting large areas out of | |
835c4d5c | 228 | .\" consideration. |
49237f3d MK |
229 | .\" |
230 | .\" SEE ALSO: http://lwn.net/Articles/171941/ | |
231 | .\" "Tweaks to madvise() and posix_fadvise()", 14 Feb 2006 | |
835c4d5c | 232 | .TP |
31c1f2b0 | 233 | .BR MADV_DOFORK " (since Linux 2.6.16)" |
835c4d5c MK |
234 | Undo the effect of |
235 | .BR MADV_DONTFORK , | |
d9bfdb9c | 236 | restoring the default behavior, whereby a mapping is inherited across |
835c4d5c | 237 | .BR fork (2). |
523c2f67 | 238 | .TP |
9bfc9cb1 | 239 | .BR MADV_HWPOISON " (since Linux 2.6.32)" |
498f9213 | 240 | .\" commit 9893e49d64a4874ea67849ee2cfbf3f3d6817573 |
11c25e24 MK |
241 | Poison the pages in the range specified by |
242 | .I addr | |
243 | and | |
1ae6b2c7 | 244 | .I length |
11c25e24 MK |
245 | and handle subsequent references to those pages |
246 | like a hardware memory corruption. | |
33a0ccb2 | 247 | This operation is available only for privileged |
523c2f67 AK |
248 | .RB ( CAP_SYS_ADMIN ) |
249 | processes. | |
250 | This operation may result in the calling process receiving a | |
251 | .B SIGBUS | |
252 | and the page being unmapped. | |
efeece04 | 253 | .IP |
ae24c212 | 254 | This feature is intended for testing of memory error-handling code; |
33a0ccb2 | 255 | it is available only if the kernel was configured with |
ae24c212 AK |
256 | .BR CONFIG_MEMORY_FAILURE . |
257 | .TP | |
5baa8f09 | 258 | .BR MADV_MERGEABLE " (since Linux 2.6.32)" |
498f9213 | 259 | .\" commit f8af4da3b4c14e7267c4ffb952079af3912c51c5 |
5baa8f09 MK |
260 | Enable Kernel Samepage Merging (KSM) for the pages in the range specified by |
261 | .I addr | |
262 | and | |
e5963382 | 263 | .IR length . |
3b18c59b | 264 | The kernel regularly scans those areas of user memory that have |
5baa8f09 MK |
265 | been marked as mergeable, |
266 | looking for pages with identical content. | |
267 | These are replaced by a single write-protected page (which is automatically | |
268 | copied if a process later wants to update the content of the page). | |
33a0ccb2 | 269 | KSM merges only private anonymous pages (see |
5baa8f09 | 270 | .BR mmap (2)). |
efeece04 | 271 | .IP |
5baa8f09 MK |
272 | The KSM feature is intended for applications that generate many |
273 | instances of the same data (e.g., virtualization systems such as KVM). | |
274 | It can consume a lot of processing power; use with care. | |
66a9882e | 275 | See the Linux kernel source file |
b49c2acb | 276 | .I Documentation/admin\-guide/mm/ksm.rst |
5baa8f09 | 277 | for more details. |
efeece04 | 278 | .IP |
5baa8f09 | 279 | The |
1ae6b2c7 | 280 | .B MADV_MERGEABLE |
5baa8f09 | 281 | and |
1ae6b2c7 | 282 | .B MADV_UNMERGEABLE |
33a0ccb2 | 283 | operations are available only if the kernel was configured with |
8c3fb604 | 284 | .BR CONFIG_KSM . |
5baa8f09 MK |
285 | .TP |
286 | .BR MADV_UNMERGEABLE " (since Linux 2.6.32)" | |
287 | Undo the effect of an earlier | |
1ae6b2c7 | 288 | .B MADV_MERGEABLE |
5baa8f09 | 289 | operation on the specified address range; |
ff24dd19 | 290 | KSM unmerges whatever pages it had merged in the address range specified by |
1ae6b2c7 | 291 | .I addr |
5baa8f09 MK |
292 | and |
293 | .IR length . | |
e8dd3ed2 | 294 | .TP |
9bfc9cb1 | 295 | .BR MADV_SOFT_OFFLINE " (since Linux 2.6.33)" |
6b1e34f2 MK |
296 | .\" commit afcf938ee0aac4ef95b1a23bac704c6fbeb26de6 |
297 | Soft offline the pages in the range specified by | |
298 | .I addr | |
299 | and | |
300 | .IR length . | |
301 | The memory of each page in the specified range is preserved | |
302 | (i.e., when next accessed, the same content will be visible, | |
303 | but in a new physical page frame), | |
304 | and the original page is offlined | |
305 | (i.e., no longer used, and taken out of normal memory management). | |
306 | The effect of the | |
307 | .B MADV_SOFT_OFFLINE | |
308 | operation is invisible to (i.e., does not change the semantics of) | |
309 | the calling process. | |
efeece04 | 310 | .IP |
6b1e34f2 MK |
311 | This feature is intended for testing of memory error-handling code; |
312 | it is available only if the kernel was configured with | |
313 | .BR CONFIG_MEMORY_FAILURE . | |
314 | .TP | |
e8dd3ed2 | 315 | .BR MADV_HUGEPAGE " (since Linux 2.6.38)" |
498f9213 | 316 | .\" commit 0af4e98b6b095c74588af04872f83d333c958c32 |
3d4b49b0 MK |
317 | .\" http://lwn.net/Articles/358904/ |
318 | .\" https://lwn.net/Articles/423584/ | |
95467f1d | 319 | Enable Transparent Huge Pages (THP) for pages in the range specified by |
e8dd3ed2 DG |
320 | .I addr |
321 | and | |
322 | .IR length . | |
33a0ccb2 | 323 | Currently, Transparent Huge Pages work only with private anonymous pages (see |
e8dd3ed2 DG |
324 | .BR mmap (2)). |
325 | The kernel will regularly scan the areas marked as huge page candidates | |
326 | to replace them with huge pages. | |
327 | The kernel will also allocate huge pages directly when the region is | |
3d4b49b0 | 328 | naturally aligned to the huge page size (see |
e8dd3ed2 | 329 | .BR posix_memalign (2)). |
efeece04 | 330 | .IP |
c0e140e6 | 331 | This feature is primarily aimed at applications that use large mappings of |
e9dedcd2 | 332 | data and access large regions of that memory at a time (e.g., virtualization |
c0e140e6 | 333 | systems such as QEMU). |
ee8655b5 MK |
334 | It can very easily waste memory (e.g., a 2\ MB mapping that only ever accesses |
335 | 1 byte will result in 2\ MB of wired memory instead of one 4\ KB page). | |
66a9882e | 336 | See the Linux kernel source file |
b49c2acb | 337 | .I Documentation/admin\-guide/mm/transhuge.rst |
e8dd3ed2 | 338 | for more details. |
efeece04 | 339 | .IP |
38b08118 MK |
340 | Most common kernels configurations provide |
341 | .BR MADV_HUGEPAGE -style | |
342 | behavior by default, and thus | |
1ae6b2c7 | 343 | .B MADV_HUGEPAGE |
38b08118 MK |
344 | is normally not necessary. |
345 | It is mostly intended for embedded systems, where | |
20b9102a | 346 | .BR MADV_HUGEPAGE -style |
38b08118 MK |
347 | behavior may not be enabled by default in the kernel. |
348 | On such systems, | |
349 | this flag can be used in order to selectively enable THP. | |
350 | Whenever | |
1ae6b2c7 | 351 | .B MADV_HUGEPAGE |
38b08118 MK |
352 | is used, it should always be in regions of memory with |
353 | an access pattern that the developer knows in advance won't risk | |
354 | to increase the memory footprint of the application when transparent | |
355 | hugepages are enabled. | |
356 | .IP | |
e8dd3ed2 | 357 | The |
1ae6b2c7 | 358 | .B MADV_HUGEPAGE |
e8dd3ed2 | 359 | and |
1ae6b2c7 | 360 | .B MADV_NOHUGEPAGE |
33a0ccb2 | 361 | operations are available only if the kernel was configured with |
8c3fb604 | 362 | .BR CONFIG_TRANSPARENT_HUGEPAGE . |
e8dd3ed2 DG |
363 | .TP |
364 | .BR MADV_NOHUGEPAGE " (since Linux 2.6.38)" | |
365 | Ensures that memory in the address range specified by | |
1ae6b2c7 | 366 | .I addr |
e8dd3ed2 | 367 | and |
1ae6b2c7 | 368 | .I length |
38b08118 | 369 | will not be backed by transparent hugepages. |
c639b314 JB |
370 | .TP |
371 | .BR MADV_DONTDUMP " (since Linux 3.4)" | |
498f9213 MK |
372 | .\" commit 909af768e88867016f427264ae39d27a57b6a8ed |
373 | .\" commit accb61fe7bb0f5c2a4102239e4981650f9048519 | |
c639b314 JB |
374 | Exclude from a core dump those pages in the range specified by |
375 | .I addr | |
376 | and | |
377 | .IR length . | |
378 | This is useful in applications that have large areas of memory | |
379 | that are known not to be useful in a core dump. | |
380 | The effect of | |
1ae6b2c7 | 381 | .B MADV_DONTDUMP |
c639b314 | 382 | takes precedence over the bit mask that is set via the |
750653a8 | 383 | .I /proc/[pid]/coredump_filter |
c639b314 JB |
384 | file (see |
385 | .BR core (5)). | |
386 | .TP | |
387 | .BR MADV_DODUMP " (since Linux 3.4)" | |
388 | Undo the effect of an earlier | |
389 | .BR MADV_DONTDUMP . | |
9ec13698 | 390 | .TP |
d432f10d MK |
391 | .BR MADV_FREE " (since Linux 4.5)" |
392 | The application no longer requires the pages in the range specified by | |
1ae6b2c7 | 393 | .I addr |
d432f10d MK |
394 | and |
395 | .IR len . | |
396 | The kernel can thus free these pages, | |
397 | but the freeing could be delayed until memory pressure occurs. | |
398 | For each of the pages that has been marked to be freed | |
399 | but has not yet been freed, | |
400 | the free operation will be canceled if the caller writes into the page. | |
401 | After a successful | |
402 | .B MADV_FREE | |
403 | operation, any stale data (i.e., dirty, unwritten pages) will be lost | |
404 | when the kernel frees the pages. | |
405 | However, subsequent writes to pages in the range will succeed | |
406 | and then kernel cannot free those dirtied pages, | |
407 | so that the caller can always see just written data. | |
408 | If there is no subsequent write, | |
409 | the kernel can free the pages at any time. | |
410 | Once pages in the range have been freed, the caller will | |
411 | see zero-fill-on-demand pages upon subsequent page references. | |
efeece04 | 412 | .IP |
d432f10d MK |
413 | The |
414 | .B MADV_FREE | |
415 | operation | |
416 | can be applied only to private anonymous pages (see | |
9ec13698 | 417 | .BR mmap (2)). |
07ca8b34 MK |
418 | In Linux before version 4.12, |
419 | .\" commit 93e06c7a645343d222c9a838834a51042eebbbf7 | |
420 | when freeing pages on a swapless system, | |
421 | the pages in the given range are freed instantly, | |
9ec13698 | 422 | regardless of memory pressure. |
c0c4f6c2 RR |
423 | .TP |
424 | .BR MADV_WIPEONFORK " (since Linux 4.14)" | |
425 | .\" commit d2cd9ede6e193dd7d88b6d27399e96229a551b19 | |
426 | Present the child process with zero-filled memory in this range after a | |
427 | .BR fork (2). | |
2c63b13e MK |
428 | This is useful in forking servers in order to ensure |
429 | that sensitive per-process data | |
430 | (for example, PRNG seeds, cryptographic secrets, and so on) | |
431 | is not handed to child processes. | |
c0c4f6c2 RR |
432 | .IP |
433 | The | |
434 | .B MADV_WIPEONFORK | |
2c63b13e | 435 | operation can be applied only to private anonymous pages (see |
c0c4f6c2 | 436 | .BR mmap (2)). |
dca5d444 MK |
437 | .IP |
438 | Within the child created by | |
439 | .BR fork (2), | |
440 | the | |
441 | .B MADV_WIPEONFORK | |
442 | setting remains in place on the specified address range. | |
443 | This setting is cleared during | |
444 | .BR execve (2). | |
c0c4f6c2 RR |
445 | .TP |
446 | .BR MADV_KEEPONFORK " (since Linux 4.14)" | |
447 | .\" commit d2cd9ede6e193dd7d88b6d27399e96229a551b19 | |
448 | Undo the effect of an earlier | |
449 | .BR MADV_WIPEONFORK . | |
c9c9ab2e MK |
450 | .TP |
451 | .BR MADV_COLD " (since Linux 5.4)" | |
452 | .\" commit 9c276cc65a58faf98be8e56962745ec99ab87636 | |
453 | Deactivate a given range of pages. | |
454 | This will make the pages a more probable | |
455 | reclaim target should there be a memory pressure. | |
456 | This is a nondestructive operation. | |
457 | The advice might be ignored for some pages in the range when it is not | |
458 | applicable. | |
459 | .TP | |
460 | .BR MADV_PAGEOUT " (since Linux 5.4)" | |
461 | .\" commit 1a4e58cce84ee88129d5d49c064bd2852b481357 | |
462 | Reclaim a given range of pages. | |
463 | This is done to free up memory occupied by these pages. | |
464 | If a page is anonymous, it will be swapped out. | |
465 | If a page is file-backed and dirty, it will be written back to the backing | |
466 | storage. | |
467 | The advice might be ignored for some pages in the range when it is not | |
468 | applicable. | |
9f307c06 DH |
469 | .TP |
470 | .BR MADV_POPULATE_READ " (since Linux 5.14)" | |
471 | "Populate (prefault) page tables readable, | |
472 | faulting in all pages in the range just as if manually reading from each page; | |
473 | however, | |
474 | avoid the actual memory access that would have been performed after handling | |
475 | the fault. | |
476 | .IP | |
477 | In contrast to | |
478 | .BR MAP_POPULATE , | |
479 | .B MADV_POPULATE_READ | |
480 | does not hide errors, | |
481 | can be applied to (parts of) existing mappings and will always populate | |
482 | (prefault) page tables readable. | |
483 | One example use case is prefaulting a file mapping, | |
484 | reading all file content from disk; | |
485 | however, | |
486 | pages won't be dirtied and consequently won't have to be written back to disk | |
487 | when evicting the pages from memory. | |
488 | .IP | |
489 | Depending on the underlying mapping, | |
490 | map the shared zeropage, | |
491 | preallocate memory or read the underlying file; | |
492 | files with holes might or might not preallocate blocks. | |
493 | If populating fails, | |
494 | a | |
495 | .B SIGBUS | |
496 | signal is not generated; instead, an error is returned. | |
497 | .IP | |
498 | If | |
499 | .B MADV_POPULATE_READ | |
500 | succeeds, | |
501 | all page tables have been populated (prefaulted) readable once. | |
502 | If | |
503 | .B MADV_POPULATE_READ | |
504 | fails, | |
505 | some page tables might have been populated. | |
506 | .IP | |
507 | .B MADV_POPULATE_READ | |
508 | cannot be applied to mappings without read permissions | |
509 | and special mappings, | |
510 | for example, | |
511 | mappings marked with kernel-internal flags such as | |
512 | .B VM_PFNMAP | |
513 | or | |
514 | .BR VM_IO , | |
515 | or secret memory regions created using | |
516 | .BR memfd_secret(2) . | |
517 | .IP | |
518 | Note that with | |
519 | .BR MADV_POPULATE_READ , | |
520 | the process can be killed at any moment when the system runs out of memory. | |
521 | .TP | |
522 | .BR MADV_POPULATE_WRITE " (since Linux 5.14)" | |
523 | Populate (prefault) page tables writable, | |
524 | faulting in all pages in the range just as if manually writing to each | |
525 | each page; | |
526 | however, | |
527 | avoid the actual memory access that would have been performed after handling | |
528 | the fault. | |
529 | .IP | |
530 | In contrast to | |
531 | .BR MAP_POPULATE , | |
532 | MADV_POPULATE_WRITE does not hide errors, | |
533 | can be applied to (parts of) existing mappings and will always populate | |
534 | (prefault) page tables writable. | |
535 | One example use case is preallocating memory, | |
536 | breaking any CoW (Copy on Write). | |
537 | .IP | |
538 | Depending on the underlying mapping, | |
539 | preallocate memory or read the underlying file; | |
540 | files with holes will preallocate blocks. | |
541 | If populating fails, | |
542 | a | |
543 | .B SIGBUS | |
544 | signal is not generated; instead, an error is returned. | |
545 | .IP | |
546 | If | |
547 | .B MADV_POPULATE_WRITE | |
548 | succeeds, | |
549 | all page tables have been populated (prefaulted) writable once. | |
550 | If | |
551 | .B MADV_POPULATE_WRITE | |
552 | fails, | |
553 | some page tables might have been populated. | |
554 | .IP | |
555 | .B MADV_POPULATE_WRITE | |
556 | cannot be applied to mappings without write permissions | |
557 | and special mappings, | |
558 | for example, | |
559 | mappings marked with kernel-internal flags such as | |
560 | .B VM_PFNMAP | |
561 | or | |
562 | .BR VM_IO , | |
563 | or secret memory regions created using | |
564 | .BR memfd_secret(2) . | |
565 | .IP | |
566 | Note that with | |
567 | .BR MADV_POPULATE_WRITE , | |
568 | the process can be killed at any moment when the system runs out of memory. | |
47297adb | 569 | .SH RETURN VALUE |
95467f1d | 570 | On success, |
e511ffb6 | 571 | .BR madvise () |
c13182ef MK |
572 | returns zero. |
573 | On error, it returns \-1 and | |
fea681da | 574 | .I errno |
f6a4078b | 575 | is set to indicate the error. |
fea681da MK |
576 | .SH ERRORS |
577 | .TP | |
7208ad0a MK |
578 | .B EACCES |
579 | .I advice | |
580 | is | |
581 | .BR MADV_REMOVE , | |
582 | but the specified address range is not a shared writable mapping. | |
583 | .TP | |
fea681da MK |
584 | .B EAGAIN |
585 | A kernel resource was temporarily unavailable. | |
586 | .TP | |
587 | .B EBADF | |
588 | The map exists, but the area maps something that isn't a file. | |
589 | .TP | |
9f307c06 DH |
590 | .B EFAULT |
591 | .I advice | |
592 | is | |
593 | .B MADV_POPULATE_READ | |
594 | or | |
595 | .BR MADV_POPULATE_WRITE , | |
596 | and populating (prefaulting) page tables failed because a | |
597 | .B SIGBUS | |
598 | would have been generated on actual memory access and the reason is not a | |
599 | HW poisoned page | |
600 | (HW poisoned pages can, | |
601 | for example, | |
602 | be created using the | |
603 | .B MADV_HWPOISON | |
604 | flag described elsewhere in this page). | |
605 | .TP | |
fea681da | 606 | .B EINVAL |
ac95034e MK |
607 | .I addr |
608 | is not page-aligned or | |
c608a033 | 609 | .I length |
601f3bc6 | 610 | is negative. |
c608a033 | 611 | .\" .I length |
fea681da | 612 | .\" is zero, |
ac95034e MK |
613 | .TP |
614 | .B EINVAL | |
615 | .I advice | |
616 | is not a valid. | |
617 | .TP | |
618 | .B EINVAL | |
4335648d | 619 | .I advice |
8604677b CTR |
620 | is |
621 | .B MADV_COLD | |
622 | or | |
623 | .B MADV_PAGEOUT | |
624 | and the specified address range includes locked, Huge TLB pages, or | |
625 | .B VM_PFNMAP | |
626 | pages. | |
627 | .TP | |
628 | .B EINVAL | |
629 | .I advice | |
4335648d MK |
630 | is |
631 | .B MADV_DONTNEED | |
632 | or | |
1ae6b2c7 | 633 | .B MADV_REMOVE |
36e5bc92 MK |
634 | and the specified address range includes locked, Huge TLB pages, or |
635 | .B VM_PFNMAP | |
636 | pages. | |
ac95034e MK |
637 | .TP |
638 | .B EINVAL | |
c13182ef | 639 | .I advice |
ac95034e | 640 | is |
1ae6b2c7 | 641 | .B MADV_MERGEABLE |
5baa8f09 | 642 | or |
ac95034e | 643 | .BR MADV_UNMERGEABLE , |
5baa8f09 MK |
644 | but the kernel was not configured with |
645 | .BR CONFIG_KSM . | |
fea681da | 646 | .TP |
c0c4f6c2 RR |
647 | .B EINVAL |
648 | .I advice | |
649 | is | |
1ae6b2c7 | 650 | .B MADV_FREE |
c0c4f6c2 | 651 | or |
1ae6b2c7 | 652 | .B MADV_WIPEONFORK |
c0c4f6c2 RR |
653 | but the specified address range includes file, Huge TLB, |
654 | .BR MAP_SHARED , | |
655 | or | |
1ae6b2c7 | 656 | .B VM_PFNMAP |
c0c4f6c2 RR |
657 | ranges. |
658 | .TP | |
9f307c06 DH |
659 | .B EINVAL |
660 | .I advice | |
661 | is | |
662 | .B MADV_POPULATE_READ | |
663 | or | |
664 | .BR MADV_POPULATE_WRITE , | |
665 | but the specified address range includes ranges with insufficient permissions | |
666 | or special mappings, | |
667 | for example, | |
668 | mappings marked with kernel-internal flags such a | |
669 | .B VM_IO | |
670 | or | |
671 | .BR VM_PFNMAP , | |
672 | or secret memory regions created using | |
673 | .BR memfd_secret(2) . | |
674 | .TP | |
fea681da | 675 | .B EIO |
682edefb MK |
676 | (for |
677 | .BR MADV_WILLNEED ) | |
678 | Paging in this area would exceed the process's | |
fea681da MK |
679 | maximum resident set size. |
680 | .TP | |
681 | .B ENOMEM | |
682edefb MK |
682 | (for |
683 | .BR MADV_WILLNEED ) | |
684 | Not enough memory: paging in failed. | |
fea681da MK |
685 | .TP |
686 | .B ENOMEM | |
687 | Addresses in the specified range are not currently | |
688 | mapped, or are outside the address space of the process. | |
9c0b66eb | 689 | .TP |
9f307c06 DH |
690 | .B ENOMEM |
691 | .I advice | |
692 | is | |
693 | .B MADV_POPULATE_READ | |
694 | or | |
695 | .BR MADV_POPULATE_WRITE , | |
696 | and populating (prefaulting) page tables failed because there was not enough | |
697 | memory. | |
698 | .TP | |
9c0b66eb MK |
699 | .B EPERM |
700 | .I advice | |
701 | is | |
702 | .BR MADV_HWPOISON , | |
703 | but the caller does not have the | |
704 | .B CAP_SYS_ADMIN | |
705 | capability. | |
9f307c06 DH |
706 | .TP |
707 | .B EHWPOISON | |
708 | .I advice | |
709 | is | |
710 | .B MADV_POPULATE_READ | |
711 | or | |
712 | .BR MADV_POPULATE_WRITE , | |
713 | and populating (prefaulting) page tables failed because a HW poisoned page | |
714 | (HW poisoned pages can, | |
715 | for example, | |
716 | be created using the | |
717 | .B MADV_HWPOISON | |
718 | flag described elsewhere in this page) | |
719 | was encountered. | |
6e519900 MK |
720 | .SH VERSIONS |
721 | Since Linux 3.18, | |
722 | .\" commit d3ac21cacc24790eb45d735769f35753f5b56ceb | |
723 | support for this system call is optional, | |
724 | depending on the setting of the | |
725 | .B CONFIG_ADVISE_SYSCALLS | |
726 | configuration option. | |
3113c7f3 | 727 | .SH STANDARDS |
c73c7130 MK |
728 | .BR madvise () |
729 | is not specified by any standards. | |
730 | Versions of this system call, implementing a wide variety of | |
731 | .I advice | |
732 | values, exist on many other implementations. | |
733 | Other implementations typically implement at least the flags listed | |
734 | above under | |
95467f1d | 735 | .IR "Conventional advice flags" , |
c73c7130 | 736 | albeit with some variation in semantics. |
efeece04 | 737 | .PP |
a1d5f77c MK |
738 | POSIX.1-2001 describes |
739 | .BR posix_madvise (3) | |
682edefb MK |
740 | with constants |
741 | .BR POSIX_MADV_NORMAL , | |
f78ed33a | 742 | .BR POSIX_MADV_RANDOM , |
b7bc9bfd MK |
743 | .BR POSIX_MADV_SEQUENTIAL , |
744 | .BR POSIX_MADV_WILLNEED , | |
745 | and | |
746 | .BR POSIX_MADV_DONTNEED , | |
95467f1d | 747 | and so on, with behavior close to the similarly named flags listed above. |
4fb31341 | 748 | .SH NOTES |
c634028a | 749 | .SS Linux notes |
fea681da | 750 | The Linux implementation requires that the address |
14f5ae6d | 751 | .I addr |
fea681da MK |
752 | be page-aligned, and allows |
753 | .I length | |
c13182ef MK |
754 | to be zero. |
755 | If there are some parts of the specified address range | |
fea681da | 756 | that are not mapped, the Linux version of |
e511ffb6 | 757 | .BR madvise () |
c13182ef | 758 | ignores them and applies the call to the rest (but returns |
fea681da MK |
759 | .B ENOMEM |
760 | from the system call, as it should). | |
889829be MK |
761 | .\" .SH HISTORY |
762 | .\" The | |
763 | .\" .BR madvise () | |
764 | .\" function first appeared in 4.4BSD. | |
47297adb | 765 | .SH SEE ALSO |
fea681da | 766 | .BR getrlimit (2), |
1ae6b2c7 | 767 | .BR memfd_secret (2), |
fea681da MK |
768 | .BR mincore (2), |
769 | .BR mmap (2), | |
770 | .BR mprotect (2), | |
771 | .BR msync (2), | |
c639b314 | 772 | .BR munmap (2), |
48cb32cd | 773 | .BR prctl (2), |
81ec67d8 | 774 | .BR process_madvise (2), |
3a4e05a1 | 775 | .BR posix_madvise (3), |
c639b314 | 776 | .BR core (5) |