From: Greg Kroah-Hartman Date: Wed, 25 Jul 2012 15:50:33 +0000 (-0700) Subject: 3.0-stable patches X-Git-Tag: v3.4.7~9 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=5d92871b1223c5b0b4771c30833e0a7d476ba093;p=thirdparty%2Fkernel%2Fstable-queue.git 3.0-stable patches added patches: mm-compaction-allow-compaction-to-isolate-dirty-pages.patch mm-compaction-determine-if-dirty-pages-can-be-migrated-without-blocking-within-migratepage.patch mm-compaction-make-isolate_lru_page-filter-aware.patch mm-migration-clean-up-unmap_and_move.patch mm-zone_reclaim-make-isolate_lru_page-filter-aware.patch --- diff --git a/queue-3.0/mm-compaction-allow-compaction-to-isolate-dirty-pages.patch b/queue-3.0/mm-compaction-allow-compaction-to-isolate-dirty-pages.patch new file mode 100644 index 00000000000..371f394f69b --- /dev/null +++ b/queue-3.0/mm-compaction-allow-compaction-to-isolate-dirty-pages.patch @@ -0,0 +1,433 @@ +From a77ebd333cd810d7b680d544be88c875131c2bd3 Mon Sep 17 00:00:00 2001 +From: Mel Gorman +Date: Thu, 12 Jan 2012 17:19:22 -0800 +Subject: mm: compaction: allow compaction to isolate dirty pages + +From: Mel Gorman + +commit a77ebd333cd810d7b680d544be88c875131c2bd3 upstream. + +Stable note: Not tracked in Bugzilla. A fix aimed at preserving page aging + information by reducing LRU list churning had the side-effect of + reducing THP allocation success rates. This was part of a series + to restore the success rates while preserving the reclaim fix. + +Short summary: There are severe stalls when a USB stick using VFAT is +used with THP enabled that are reduced by this series. If you are +experiencing this problem, please test and report back and considering I +have seen complaints from openSUSE and Fedora users on this as well as a +few private mails, I'm guessing it's a widespread issue. This is a new +type of USB-related stall because it is due to synchronous compaction +writing where as in the past the big problem was dirty pages reaching +the end of the LRU and being written by reclaim. + +Am cc'ing Andrew this time and this series would replace +mm-do-not-stall-in-synchronous-compaction-for-thp-allocations.patch. +I'm also cc'ing Dave Jones as he might have merged that patch to Fedora +for wider testing and ideally it would be reverted and replaced by this +series. + +That said, the later patches could really do with some review. If this +series is not the answer then a new direction needs to be discussed +because as it is, the stalls are unacceptable as the results in this +leader show. + +For testers that try backporting this to 3.1, it won't work because +there is a non-obvious dependency on not writing back pages in direct +reclaim so you need those patches too. + +Changelog since V5 +o Rebase to 3.2-rc5 +o Tidy up the changelogs a bit + +Changelog since V4 +o Added reviewed-bys, credited Andrea properly for sync-light +o Allow dirty pages without mappings to be considered for migration +o Bound the number of pages freed for compaction +o Isolate PageReclaim pages on their own LRU list + +This is against 3.2-rc5 and follows on from discussions on "mm: Do +not stall in synchronous compaction for THP allocations" and "[RFC +PATCH 0/5] Reduce compaction-related stalls". Initially, the proposed +patch eliminated stalls due to compaction which sometimes resulted in +user-visible interactivity problems on browsers by simply never using +sync compaction. The downside was that THP success allocation rates +were lower because dirty pages were not being migrated as reported by +Andrea. His approach at fixing this was nacked on the grounds that +it reverted fixes from Rik merged that reduced the amount of pages +reclaimed as it severely impacted his workloads performance. + +This series attempts to reconcile the requirements of maximising THP +usage, without stalling in a user-visible fashion due to compaction +or cheating by reclaiming an excessive number of pages. + +Patch 1 partially reverts commit 39deaf85 to allow migration to isolate + dirty pages. This is because migration can move some dirty + pages without blocking. + +Patch 2 notes that the /proc/sys/vm/compact_memory handler is not using + synchronous compaction when it should be. This is unrelated + to the reported stalls but is worth fixing. + +Patch 3 checks if we isolated a compound page during lumpy scan and + account for it properly. For the most part, this affects + tracing so it's unrelated to the stalls but worth fixing. + +Patch 4 notes that it is possible to abort reclaim early for compaction + and return 0 to the page allocator potentially entering the + "may oom" path. This has not been observed in practice but + the rest of the series potentially makes it easier to happen. + +Patch 5 adds a sync parameter to the migratepage callback and gives + the callback responsibility for migrating the page without + blocking if sync==false. For example, fallback_migrate_page + will not call writepage if sync==false. This increases the + number of pages that can be handled by asynchronous compaction + thereby reducing stalls. + +Patch 6 restores filter-awareness to isolate_lru_page for migration. + In practice, it means that pages under writeback and pages + without a ->migratepage callback will not be isolated + for migration. + +Patch 7 avoids calling direct reclaim if compaction is deferred but + makes sure that compaction is only deferred if sync + compaction was used. + +Patch 8 introduces a sync-light migration mechanism that sync compaction + uses. The objective is to allow some stalls but to not call + ->writepage which can lead to significant user-visible stalls. + +Patch 9 notes that while we want to abort reclaim ASAP to allow + compation to go ahead that we leave a very small window of + opportunity for compaction to run. This patch allows more pages + to be freed by reclaim but bounds the number to a reasonable + level based on the high watermark on each zone. + +Patch 10 allows slabs to be shrunk even after compaction_ready() is + true for one zone. This is to avoid a problem whereby a single + small zone can abort reclaim even though no pages have been + reclaimed and no suitably large zone is in a usable state. + +Patch 11 fixes a problem with the rate of page scanning. As reclaim is + rarely stalling on pages under writeback it means that scan + rates are very high. This is particularly true for direct + reclaim which is not calling writepage. The vmstat figures + implied that much of this was busy work with PageReclaim pages + marked for immediate reclaim. This patch is a prototype that + moves these pages to their own LRU list. + +This has been tested and other than 2 USB keys getting trashed, +nothing horrible fell out. That said, I am a bit unhappy with the +rescue logic in patch 11 but did not find a better way around it. It +does significantly reduce scan rates and System CPU time indicating +it is the right direction to take. + +What is of critical importance is that stalls due to compaction +are massively reduced even though sync compaction was still +allowed. Testing from people complaining about stalls copying to USBs +with THP enabled are particularly welcome. + +The following tests all involve THP usage and USB keys in some +way. Each test follows this type of pattern + +1. Read from some fast fast storage, be it raw device or file. Each time + the copy finishes, start again until the test ends +2. Write a large file to a filesystem on a USB stick. Each time the copy + finishes, start again until the test ends +3. When memory is low, start an alloc process that creates a mapping + the size of physical memory to stress THP allocation. This is the + "real" part of the test and the part that is meant to trigger + stalls when THP is enabled. Copying continues in the background. +4. Record the CPU usage and time to execute of the alloc process +5. Record the number of THP allocs and fallbacks as well as the number of THP + pages in use a the end of the test just before alloc exited +6. Run the test 5 times to get an idea of variability +7. Between each run, sync is run and caches dropped and the test + waits until nr_dirty is a small number to avoid interference + or caching between iterations that would skew the figures. + +The individual tests were then + +writebackCPDeviceBasevfat + Disable THP, read from a raw device (sda), vfat on USB stick +writebackCPDeviceBaseext4 + Disable THP, read from a raw device (sda), ext4 on USB stick +writebackCPDevicevfat + THP enabled, read from a raw device (sda), vfat on USB stick +writebackCPDeviceext4 + THP enabled, read from a raw device (sda), ext4 on USB stick +writebackCPFilevfat + THP enabled, read from a file on fast storage and USB, both vfat +writebackCPFileext4 + THP enabled, read from a file on fast storage and USB, both ext4 + +The kernels tested were + +3.1 3.1 +vanilla 3.2-rc5 +freemore Patches 1-10 +immediate Patches 1-11 +andrea The 8 patches Andrea posted as a basis of comparison + +The results are very long unfortunately. I'll start with the case +where we are not using THP at all + +writebackCPDeviceBasevfat + 3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1 +System Time 1.28 ( 0.00%) 54.49 (-4143.46%) 48.63 (-3687.69%) 4.69 ( -265.11%) 51.88 (-3940.81%) ++/- 0.06 ( 0.00%) 2.45 (-4305.55%) 4.75 (-8430.57%) 7.46 (-13282.76%) 4.76 (-8440.70%) +User Time 0.09 ( 0.00%) 0.05 ( 40.91%) 0.06 ( 29.55%) 0.07 ( 15.91%) 0.06 ( 27.27%) ++/- 0.02 ( 0.00%) 0.01 ( 45.39%) 0.02 ( 25.07%) 0.00 ( 77.06%) 0.01 ( 52.24%) +Elapsed Time 110.27 ( 0.00%) 56.38 ( 48.87%) 49.95 ( 54.70%) 11.77 ( 89.33%) 53.43 ( 51.54%) ++/- 7.33 ( 0.00%) 3.77 ( 48.61%) 4.94 ( 32.63%) 6.71 ( 8.50%) 4.76 ( 35.03%) +THP Active 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) ++/- 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) +Fault Alloc 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) ++/- 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) +Fault Fallback 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) ++/- 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) + +The THP figures are obviously all 0 because THP was enabled. The +main thing to watch is the elapsed times and how they compare to +times when THP is enabled later. It's also important to note that +elapsed time is improved by this series as System CPu time is much +reduced. + +writebackCPDevicevfat + + 3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1 +System Time 1.22 ( 0.00%) 13.89 (-1040.72%) 46.40 (-3709.20%) 4.44 ( -264.37%) 47.37 (-3789.33%) ++/- 0.06 ( 0.00%) 22.82 (-37635.56%) 3.84 (-6249.44%) 6.48 (-10618.92%) 6.60 +(-10818.53%) +User Time 0.06 ( 0.00%) 0.06 ( -6.90%) 0.05 ( 17.24%) 0.05 ( 13.79%) 0.04 ( 31.03%) ++/- 0.01 ( 0.00%) 0.01 ( 33.33%) 0.01 ( 33.33%) 0.01 ( 39.14%) 0.01 ( 25.46%) +Elapsed Time 10445.54 ( 0.00%) 2249.92 ( 78.46%) 70.06 ( 99.33%) 16.59 ( 99.84%) 472.43 ( +95.48%) ++/- 643.98 ( 0.00%) 811.62 ( -26.03%) 10.02 ( 98.44%) 7.03 ( 98.91%) 59.99 ( 90.68%) +THP Active 15.60 ( 0.00%) 35.20 ( 225.64%) 65.00 ( 416.67%) 70.80 ( 453.85%) 62.20 ( 398.72%) ++/- 18.48 ( 0.00%) 51.29 ( 277.59%) 15.99 ( 86.52%) 37.91 ( 205.18%) 22.02 ( 119.18%) +Fault Alloc 121.80 ( 0.00%) 76.60 ( 62.89%) 155.40 ( 127.59%) 181.20 ( 148.77%) 286.60 ( 235.30%) ++/- 73.51 ( 0.00%) 61.11 ( 83.12%) 34.89 ( 47.46%) 31.88 ( 43.36%) 68.13 ( 92.68%) +Fault Fallback 881.20 ( 0.00%) 926.60 ( -5.15%) 847.60 ( 3.81%) 822.00 ( 6.72%) 716.60 ( 18.68%) ++/- 73.51 ( 0.00%) 61.26 ( 16.67%) 34.89 ( 52.54%) 31.65 ( 56.94%) 67.75 ( 7.84%) +MMTests Statistics: duration +User/Sys Time Running Test (seconds) 3540.88 1945.37 716.04 64.97 1937.03 +Total Elapsed Time (seconds) 52417.33 11425.90 501.02 230.95 2520.28 + +The first thing to note is the "Elapsed Time" for the vanilla kernels +of 2249 seconds versus 56 with THP disabled which might explain the +reports of USB stalls with THP enabled. Applying the patches brings +performance in line with THP-disabled performance while isolating +pages for immediate reclaim from the LRU cuts down System CPU time. + +The "Fault Alloc" success rate figures are also improved. The vanilla +kernel only managed to allocate 76.6 pages on average over the course +of 5 iterations where as applying the series allocated 181.20 on +average albeit it is well within variance. It's worth noting that +applies the series at least descreases the amount of variance which +implies an improvement. + +Andrea's series had a higher success rate for THP allocations but +at a severe cost to elapsed time which is still better than vanilla +but still much worse than disabling THP altogether. One can bring my +series close to Andrea's by removing this check + + /* + * If compaction is deferred for high-order allocations, it is because + * sync compaction recently failed. In this is the case and the caller + * has requested the system not be heavily disrupted, fail the + * allocation now instead of entering direct reclaim + */ + if (deferred_compaction && (gfp_mask & __GFP_NO_KSWAPD)) + goto nopage; + +I didn't include a patch that removed the above check because hurting +overall performance to improve the THP figure is not what the average +user wants. It's something to consider though if someone really wants +to maximise THP usage no matter what it does to the workload initially. + +This is summary of vmstat figures from the same test. + + 3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1 +Page Ins 3257266139 1111844061 17263623 10901575 161423219 +Page Outs 81054922 30364312 3626530 3657687 8753730 +Swap Ins 3294 2851 6560 4964 4592 +Swap Outs 390073 528094 620197 790912 698285 +Direct pages scanned 1077581700 3024951463 1764930052 115140570 5901188831 +Kswapd pages scanned 34826043 7112868 2131265 1686942 1893966 +Kswapd pages reclaimed 28950067 4911036 1246044 966475 1497726 +Direct pages reclaimed 805148398 280167837 3623473 2215044 40809360 +Kswapd efficiency 83% 69% 58% 57% 79% +Kswapd velocity 664.399 622.521 4253.852 7304.360 751.490 +Direct efficiency 74% 9% 0% 1% 0% +Direct velocity 20557.737 264745.137 3522673.849 498551.938 2341481.435 +Percentage direct scans 96% 99% 99% 98% 99% +Page writes by reclaim 722646 529174 620319 791018 699198 +Page writes file 332573 1080 122 106 913 +Page writes anon 390073 528094 620197 790912 698285 +Page reclaim immediate 0 2552514720 1635858848 111281140 5478375032 +Page rescued immediate 0 0 0 87848 0 +Slabs scanned 23552 23552 9216 8192 9216 +Direct inode steals 231 0 0 0 0 +Kswapd inode steals 0 0 0 0 0 +Kswapd skipped wait 28076 786 0 61 6 +THP fault alloc 609 383 753 906 1433 +THP collapse alloc 12 6 0 0 6 +THP splits 536 211 456 593 1136 +THP fault fallback 4406 4633 4263 4110 3583 +THP collapse fail 120 127 0 0 4 +Compaction stalls 1810 728 623 779 3200 +Compaction success 196 53 60 80 123 +Compaction failures 1614 675 563 699 3077 +Compaction pages moved 193158 53545 243185 333457 226688 +Compaction move failure 9952 9396 16424 23676 45070 + +The main things to look at are + +1. Page In/out figures are much reduced by the series. + +2. Direct page scanning is incredibly high (264745.137 pages scanned + per second on the vanilla kernel) but isolating PageReclaim pages + on their own list reduces the number of pages scanned significantly. + +3. The fact that "Page rescued immediate" is a positive number implies + that we sometimes race removing pages from the LRU_IMMEDIATE list + that need to be put back on a normal LRU but it happens only for + 0.07% of the pages marked for immediate reclaim. + +writebackCPDeviceext4 + 3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1 +System Time 1.51 ( 0.00%) 1.77 ( -17.66%) 1.46 ( 2.92%) 1.15 ( 23.77%) 1.89 ( -25.63%) ++/- 0.27 ( 0.00%) 0.67 ( -148.52%) 0.33 ( -22.76%) 0.30 ( -11.15%) 0.19 ( 30.16%) +User Time 0.03 ( 0.00%) 0.04 ( -37.50%) 0.05 ( -62.50%) 0.07 ( -112.50%) 0.04 ( -18.75%) ++/- 0.01 ( 0.00%) 0.02 ( -146.64%) 0.02 ( -97.91%) 0.02 ( -75.59%) 0.02 ( -63.30%) +Elapsed Time 124.93 ( 0.00%) 114.49 ( 8.36%) 96.77 ( 22.55%) 27.48 ( 78.00%) 205.70 ( -64.65%) ++/- 20.20 ( 0.00%) 74.39 ( -268.34%) 59.88 ( -196.48%) 7.72 ( 61.79%) 25.03 ( -23.95%) +THP Active 161.80 ( 0.00%) 83.60 ( 51.67%) 141.20 ( 87.27%) 84.60 ( 52.29%) 82.60 ( 51.05%) ++/- 71.95 ( 0.00%) 43.80 ( 60.88%) 26.91 ( 37.40%) 59.02 ( 82.03%) 52.13 ( 72.45%) +Fault Alloc 471.40 ( 0.00%) 228.60 ( 48.49%) 282.20 ( 59.86%) 225.20 ( 47.77%) 388.40 ( 82.39%) ++/- 88.07 ( 0.00%) 87.42 ( 99.26%) 73.79 ( 83.78%) 109.62 ( 124.47%) 82.62 ( 93.81%) +Fault Fallback 531.60 ( 0.00%) 774.60 ( -45.71%) 720.80 ( -35.59%) 777.80 ( -46.31%) 614.80 ( -15.65%) ++/- 88.07 ( 0.00%) 87.26 ( 0.92%) 73.79 ( 16.22%) 109.62 ( -24.47%) 82.29 ( 6.56%) +MMTests Statistics: duration +User/Sys Time Running Test (seconds) 50.22 33.76 30.65 24.14 128.45 +Total Elapsed Time (seconds) 1113.73 1132.19 1029.45 759.49 1707.26 + +Similar test but the USB stick is using ext4 instead of vfat. As +ext4 does not use writepage for migration, the large stalls due to +compaction when THP is enabled are not observed. Still, isolating +PageReclaim pages on their own list helped completion time largely +by reducing the number of pages scanned by direct reclaim although +time spend in congestion_wait could also be a factor. + +Again, Andrea's series had far higher success rates for THP allocation +at the cost of elapsed time. I didn't look too closely but a quick +look at the vmstat figures tells me kswapd reclaimed 8 times more pages +than the patch series and direct reclaim reclaimed roughly three times +as many pages. It follows that if memory is aggressively reclaimed, +there will be more available for THP. + +writebackCPFilevfat + 3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1 +System Time 1.76 ( 0.00%) 29.10 (-1555.52%) 46.01 (-2517.18%) 4.79 ( -172.35%) 54.89 (-3022.53%) ++/- 0.14 ( 0.00%) 25.61 (-18185.17%) 2.15 (-1434.83%) 6.60 (-4610.03%) 9.75 +(-6863.76%) +User Time 0.05 ( 0.00%) 0.07 ( -45.83%) 0.05 ( -4.17%) 0.06 ( -29.17%) 0.06 ( -16.67%) ++/- 0.02 ( 0.00%) 0.02 ( 20.11%) 0.02 ( -3.14%) 0.01 ( 31.58%) 0.01 ( 47.41%) +Elapsed Time 22520.79 ( 0.00%) 1082.85 ( 95.19%) 73.30 ( 99.67%) 32.43 ( 99.86%) 291.84 ( 98.70%) ++/- 7277.23 ( 0.00%) 706.29 ( 90.29%) 19.05 ( 99.74%) 17.05 ( 99.77%) 125.55 ( 98.27%) +THP Active 83.80 ( 0.00%) 12.80 ( 15.27%) 15.60 ( 18.62%) 13.00 ( 15.51%) 0.80 ( 0.95%) ++/- 66.81 ( 0.00%) 20.19 ( 30.22%) 5.92 ( 8.86%) 15.06 ( 22.54%) 1.17 ( 1.75%) +Fault Alloc 171.00 ( 0.00%) 67.80 ( 39.65%) 97.40 ( 56.96%) 125.60 ( 73.45%) 133.00 ( 77.78%) ++/- 82.91 ( 0.00%) 30.69 ( 37.02%) 53.91 ( 65.02%) 55.05 ( 66.40%) 21.19 ( 25.56%) +Fault Fallback 832.00 ( 0.00%) 935.20 ( -12.40%) 906.00 ( -8.89%) 877.40 ( -5.46%) 870.20 ( -4.59%) ++/- 82.91 ( 0.00%) 30.69 ( 62.98%) 54.01 ( 34.86%) 55.05 ( 33.60%) 20.91 ( 74.78%) +MMTests Statistics: duration +User/Sys Time Running Test (seconds) 7229.81 928.42 704.52 80.68 1330.76 +Total Elapsed Time (seconds) 112849.04 5618.69 571.11 360.54 1664.28 + +In this case, the test is reading/writing only from filesystems but as +it's vfat, it's slow due to calling writepage during compaction. Little +to observe really - the time to complete the test goes way down +with the series applied and THP allocation success rates go up in +comparison to 3.2-rc5. The success rates are lower than 3.1.0 but +the elapsed time for that kernel is abysmal so it is not really a +sensible comparison. + +As before, Andrea's series allocates more THPs at the cost of overall +performance. + +writebackCPFileext4 + 3.1.0-vanilla rc5-vanilla freemore-v6r1 isolate-v6r1 andrea-v2r1 +System Time 1.51 ( 0.00%) 1.77 ( -17.66%) 1.46 ( 2.92%) 1.15 ( 23.77%) 1.89 ( -25.63%) ++/- 0.27 ( 0.00%) 0.67 ( -148.52%) 0.33 ( -22.76%) 0.30 ( -11.15%) 0.19 ( 30.16%) +User Time 0.03 ( 0.00%) 0.04 ( -37.50%) 0.05 ( -62.50%) 0.07 ( -112.50%) 0.04 ( -18.75%) ++/- 0.01 ( 0.00%) 0.02 ( -146.64%) 0.02 ( -97.91%) 0.02 ( -75.59%) 0.02 ( -63.30%) +Elapsed Time 124.93 ( 0.00%) 114.49 ( 8.36%) 96.77 ( 22.55%) 27.48 ( 78.00%) 205.70 ( -64.65%) ++/- 20.20 ( 0.00%) 74.39 ( -268.34%) 59.88 ( -196.48%) 7.72 ( 61.79%) 25.03 ( -23.95%) +THP Active 161.80 ( 0.00%) 83.60 ( 51.67%) 141.20 ( 87.27%) 84.60 ( 52.29%) 82.60 ( 51.05%) ++/- 71.95 ( 0.00%) 43.80 ( 60.88%) 26.91 ( 37.40%) 59.02 ( 82.03%) 52.13 ( 72.45%) +Fault Alloc 471.40 ( 0.00%) 228.60 ( 48.49%) 282.20 ( 59.86%) 225.20 ( 47.77%) 388.40 ( 82.39%) ++/- 88.07 ( 0.00%) 87.42 ( 99.26%) 73.79 ( 83.78%) 109.62 ( 124.47%) 82.62 ( 93.81%) +Fault Fallback 531.60 ( 0.00%) 774.60 ( -45.71%) 720.80 ( -35.59%) 777.80 ( -46.31%) 614.80 ( -15.65%) ++/- 88.07 ( 0.00%) 87.26 ( 0.92%) 73.79 ( 16.22%) 109.62 ( -24.47%) 82.29 ( 6.56%) +MMTests Statistics: duration +User/Sys Time Running Test (seconds) 50.22 33.76 30.65 24.14 128.45 +Total Elapsed Time (seconds) 1113.73 1132.19 1029.45 759.49 1707.26 + +Same type of story - elapsed times go down. In this case, allocation +success rates are roughtly the same. As before, Andrea's has higher +success rates but takes a lot longer. + +Overall the series does reduce latencies and while the tests are +inherency racy as alloc competes with the cp processes, the variability +was included. The THP allocation rates are not as high as they could +be but that is because we would have to be more aggressive about +reclaim and compaction impacting overall performance. + +This patch: + +Commit 39deaf85 ("mm: compaction: make isolate_lru_page() filter-aware") +noted that compaction does not migrate dirty or writeback pages and that +is was meaningless to pick the page and re-add it to the LRU list. + +What was missed during review is that asynchronous migration moves dirty +pages if their ->migratepage callback is migrate_page() because these can +be moved without blocking. This potentially impacted hugepage allocation +success rates by a factor depending on how many dirty pages are in the +system. + +This patch partially reverts 39deaf85 to allow migration to isolate dirty +pages again. This increases how much compaction disrupts the LRU but that +is addressed later in the series. + +Signed-off-by: Mel Gorman +Reviewed-by: Andrea Arcangeli +Reviewed-by: Rik van Riel +Reviewed-by: Minchan Kim +Cc: Dave Jones +Cc: Jan Kara +Cc: Andy Isaacson +Cc: Nai Xia +Cc: Johannes Weiner +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Greg Kroah-Hartman + +--- + mm/compaction.c | 3 --- + 1 file changed, 3 deletions(-) + +--- a/mm/compaction.c ++++ b/mm/compaction.c +@@ -371,9 +371,6 @@ static isolate_migrate_t isolate_migrate + continue; + } + +- if (!cc->sync) +- mode |= ISOLATE_CLEAN; +- + /* Try isolate the page */ + if (__isolate_lru_page(page, mode, 0) != 0) + continue; diff --git a/queue-3.0/mm-compaction-determine-if-dirty-pages-can-be-migrated-without-blocking-within-migratepage.patch b/queue-3.0/mm-compaction-determine-if-dirty-pages-can-be-migrated-without-blocking-within-migratepage.patch new file mode 100644 index 00000000000..fc58f25d6ee --- /dev/null +++ b/queue-3.0/mm-compaction-determine-if-dirty-pages-can-be-migrated-without-blocking-within-migratepage.patch @@ -0,0 +1,362 @@ +From b969c4ab9f182a6e1b2a0848be349f99714947b0 Mon Sep 17 00:00:00 2001 +From: Mel Gorman +Date: Thu, 12 Jan 2012 17:19:34 -0800 +Subject: mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage + +From: Mel Gorman + +commit b969c4ab9f182a6e1b2a0848be349f99714947b0 upstream. + +Stable note: Not tracked in Bugzilla. A fix aimed at preserving page + aging information by reducing LRU list churning had the side-effect + of reducing THP allocation success rates. This was part of a series + to restore the success rates while preserving the reclaim fix. + +Asynchronous compaction is used when allocating transparent hugepages to +avoid blocking for long periods of time. Due to reports of stalling, +there was a debate on disabling synchronous compaction but this severely +impacted allocation success rates. Part of the reason was that many dirty +pages are skipped in asynchronous compaction by the following check; + + if (PageDirty(page) && !sync && + mapping->a_ops->migratepage != migrate_page) + rc = -EBUSY; + +This skips over all mapping aops using buffer_migrate_page() even though +it is possible to migrate some of these pages without blocking. This +patch updates the ->migratepage callback with a "sync" parameter. It is +the responsibility of the callback to fail gracefully if migration would +block. + +Signed-off-by: Mel Gorman +Reviewed-by: Rik van Riel +Cc: Andrea Arcangeli +Cc: Minchan Kim +Cc: Dave Jones +Cc: Jan Kara +Cc: Andy Isaacson +Cc: Nai Xia +Cc: Johannes Weiner +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Mel Gorman +Signed-off-by: Greg Kroah-Hartman + +--- + fs/btrfs/disk-io.c | 4 - + fs/hugetlbfs/inode.c | 3 - + fs/nfs/internal.h | 2 + fs/nfs/write.c | 4 - + include/linux/fs.h | 9 ++- + include/linux/migrate.h | 2 + mm/migrate.c | 129 ++++++++++++++++++++++++++++++++++-------------- + 7 files changed, 106 insertions(+), 47 deletions(-) + +--- a/fs/btrfs/disk-io.c ++++ b/fs/btrfs/disk-io.c +@@ -801,7 +801,7 @@ static int btree_submit_bio_hook(struct + + #ifdef CONFIG_MIGRATION + static int btree_migratepage(struct address_space *mapping, +- struct page *newpage, struct page *page) ++ struct page *newpage, struct page *page, bool sync) + { + /* + * we can't safely write a btree page from here, +@@ -816,7 +816,7 @@ static int btree_migratepage(struct addr + if (page_has_private(page) && + !try_to_release_page(page, GFP_KERNEL)) + return -EAGAIN; +- return migrate_page(mapping, newpage, page); ++ return migrate_page(mapping, newpage, page, sync); + } + #endif + +--- a/fs/hugetlbfs/inode.c ++++ b/fs/hugetlbfs/inode.c +@@ -568,7 +568,8 @@ static int hugetlbfs_set_page_dirty(stru + } + + static int hugetlbfs_migrate_page(struct address_space *mapping, +- struct page *newpage, struct page *page) ++ struct page *newpage, struct page *page, ++ bool sync) + { + int rc; + +--- a/fs/nfs/internal.h ++++ b/fs/nfs/internal.h +@@ -315,7 +315,7 @@ void nfs_commit_release_pages(struct nfs + + #ifdef CONFIG_MIGRATION + extern int nfs_migrate_page(struct address_space *, +- struct page *, struct page *); ++ struct page *, struct page *, bool); + #else + #define nfs_migrate_page NULL + #endif +--- a/fs/nfs/write.c ++++ b/fs/nfs/write.c +@@ -1662,7 +1662,7 @@ out_error: + + #ifdef CONFIG_MIGRATION + int nfs_migrate_page(struct address_space *mapping, struct page *newpage, +- struct page *page) ++ struct page *page, bool sync) + { + /* + * If PagePrivate is set, then the page is currently associated with +@@ -1677,7 +1677,7 @@ int nfs_migrate_page(struct address_spac + + nfs_fscache_release_page(page, GFP_KERNEL); + +- return migrate_page(mapping, newpage, page); ++ return migrate_page(mapping, newpage, page, sync); + } + #endif + +--- a/include/linux/fs.h ++++ b/include/linux/fs.h +@@ -607,9 +607,12 @@ struct address_space_operations { + loff_t offset, unsigned long nr_segs); + int (*get_xip_mem)(struct address_space *, pgoff_t, int, + void **, unsigned long *); +- /* migrate the contents of a page to the specified target */ ++ /* ++ * migrate the contents of a page to the specified target. If sync ++ * is false, it must not block. ++ */ + int (*migratepage) (struct address_space *, +- struct page *, struct page *); ++ struct page *, struct page *, bool); + int (*launder_page) (struct page *); + int (*is_partially_uptodate) (struct page *, read_descriptor_t *, + unsigned long); +@@ -2478,7 +2481,7 @@ extern int generic_check_addressable(uns + + #ifdef CONFIG_MIGRATION + extern int buffer_migrate_page(struct address_space *, +- struct page *, struct page *); ++ struct page *, struct page *, bool); + #else + #define buffer_migrate_page NULL + #endif +--- a/include/linux/migrate.h ++++ b/include/linux/migrate.h +@@ -11,7 +11,7 @@ typedef struct page *new_page_t(struct p + + extern void putback_lru_pages(struct list_head *l); + extern int migrate_page(struct address_space *, +- struct page *, struct page *); ++ struct page *, struct page *, bool); + extern int migrate_pages(struct list_head *l, new_page_t x, + unsigned long private, bool offlining, + bool sync); +--- a/mm/migrate.c ++++ b/mm/migrate.c +@@ -220,6 +220,55 @@ out: + pte_unmap_unlock(ptep, ptl); + } + ++#ifdef CONFIG_BLOCK ++/* Returns true if all buffers are successfully locked */ ++static bool buffer_migrate_lock_buffers(struct buffer_head *head, bool sync) ++{ ++ struct buffer_head *bh = head; ++ ++ /* Simple case, sync compaction */ ++ if (sync) { ++ do { ++ get_bh(bh); ++ lock_buffer(bh); ++ bh = bh->b_this_page; ++ ++ } while (bh != head); ++ ++ return true; ++ } ++ ++ /* async case, we cannot block on lock_buffer so use trylock_buffer */ ++ do { ++ get_bh(bh); ++ if (!trylock_buffer(bh)) { ++ /* ++ * We failed to lock the buffer and cannot stall in ++ * async migration. Release the taken locks ++ */ ++ struct buffer_head *failed_bh = bh; ++ put_bh(failed_bh); ++ bh = head; ++ while (bh != failed_bh) { ++ unlock_buffer(bh); ++ put_bh(bh); ++ bh = bh->b_this_page; ++ } ++ return false; ++ } ++ ++ bh = bh->b_this_page; ++ } while (bh != head); ++ return true; ++} ++#else ++static inline bool buffer_migrate_lock_buffers(struct buffer_head *head, ++ bool sync) ++{ ++ return true; ++} ++#endif /* CONFIG_BLOCK */ ++ + /* + * Replace the page in the mapping. + * +@@ -229,7 +278,8 @@ out: + * 3 for pages with a mapping and PagePrivate/PagePrivate2 set. + */ + static int migrate_page_move_mapping(struct address_space *mapping, +- struct page *newpage, struct page *page) ++ struct page *newpage, struct page *page, ++ struct buffer_head *head, bool sync) + { + int expected_count; + void **pslot; +@@ -259,6 +309,19 @@ static int migrate_page_move_mapping(str + } + + /* ++ * In the async migration case of moving a page with buffers, lock the ++ * buffers using trylock before the mapping is moved. If the mapping ++ * was moved, we later failed to lock the buffers and could not move ++ * the mapping back due to an elevated page count, we would have to ++ * block waiting on other references to be dropped. ++ */ ++ if (!sync && head && !buffer_migrate_lock_buffers(head, sync)) { ++ page_unfreeze_refs(page, expected_count); ++ spin_unlock_irq(&mapping->tree_lock); ++ return -EAGAIN; ++ } ++ ++ /* + * Now we know that no one else is looking at the page. + */ + get_page(newpage); /* add cache reference */ +@@ -415,13 +478,13 @@ EXPORT_SYMBOL(fail_migrate_page); + * Pages are locked upon entry and exit. + */ + int migrate_page(struct address_space *mapping, +- struct page *newpage, struct page *page) ++ struct page *newpage, struct page *page, bool sync) + { + int rc; + + BUG_ON(PageWriteback(page)); /* Writeback must be complete */ + +- rc = migrate_page_move_mapping(mapping, newpage, page); ++ rc = migrate_page_move_mapping(mapping, newpage, page, NULL, sync); + + if (rc) + return rc; +@@ -438,28 +501,28 @@ EXPORT_SYMBOL(migrate_page); + * exist. + */ + int buffer_migrate_page(struct address_space *mapping, +- struct page *newpage, struct page *page) ++ struct page *newpage, struct page *page, bool sync) + { + struct buffer_head *bh, *head; + int rc; + + if (!page_has_buffers(page)) +- return migrate_page(mapping, newpage, page); ++ return migrate_page(mapping, newpage, page, sync); + + head = page_buffers(page); + +- rc = migrate_page_move_mapping(mapping, newpage, page); ++ rc = migrate_page_move_mapping(mapping, newpage, page, head, sync); + + if (rc) + return rc; + +- bh = head; +- do { +- get_bh(bh); +- lock_buffer(bh); +- bh = bh->b_this_page; +- +- } while (bh != head); ++ /* ++ * In the async case, migrate_page_move_mapping locked the buffers ++ * with an IRQ-safe spinlock held. In the sync case, the buffers ++ * need to be locked now ++ */ ++ if (sync) ++ BUG_ON(!buffer_migrate_lock_buffers(head, sync)); + + ClearPagePrivate(page); + set_page_private(newpage, page_private(page)); +@@ -536,10 +599,13 @@ static int writeout(struct address_space + * Default handling if a filesystem does not provide a migration function. + */ + static int fallback_migrate_page(struct address_space *mapping, +- struct page *newpage, struct page *page) ++ struct page *newpage, struct page *page, bool sync) + { +- if (PageDirty(page)) ++ if (PageDirty(page)) { ++ if (!sync) ++ return -EBUSY; + return writeout(mapping, page); ++ } + + /* + * Buffers may be managed in a filesystem specific way. +@@ -549,7 +615,7 @@ static int fallback_migrate_page(struct + !try_to_release_page(page, GFP_KERNEL)) + return -EAGAIN; + +- return migrate_page(mapping, newpage, page); ++ return migrate_page(mapping, newpage, page, sync); + } + + /* +@@ -585,29 +651,18 @@ static int move_to_new_page(struct page + + mapping = page_mapping(page); + if (!mapping) +- rc = migrate_page(mapping, newpage, page); +- else { ++ rc = migrate_page(mapping, newpage, page, sync); ++ else if (mapping->a_ops->migratepage) + /* +- * Do not writeback pages if !sync and migratepage is +- * not pointing to migrate_page() which is nonblocking +- * (swapcache/tmpfs uses migratepage = migrate_page). ++ * Most pages have a mapping and most filesystems provide a ++ * migratepage callback. Anonymous pages are part of swap ++ * space which also has its own migratepage callback. This ++ * is the most common path for page migration. + */ +- if (PageDirty(page) && !sync && +- mapping->a_ops->migratepage != migrate_page) +- rc = -EBUSY; +- else if (mapping->a_ops->migratepage) +- /* +- * Most pages have a mapping and most filesystems +- * should provide a migration function. Anonymous +- * pages are part of swap space which also has its +- * own migration function. This is the most common +- * path for page migration. +- */ +- rc = mapping->a_ops->migratepage(mapping, +- newpage, page); +- else +- rc = fallback_migrate_page(mapping, newpage, page); +- } ++ rc = mapping->a_ops->migratepage(mapping, ++ newpage, page, sync); ++ else ++ rc = fallback_migrate_page(mapping, newpage, page, sync); + + if (rc) { + newpage->mapping = NULL; diff --git a/queue-3.0/mm-compaction-make-isolate_lru_page-filter-aware.patch b/queue-3.0/mm-compaction-make-isolate_lru_page-filter-aware.patch new file mode 100644 index 00000000000..d49aee3a30d --- /dev/null +++ b/queue-3.0/mm-compaction-make-isolate_lru_page-filter-aware.patch @@ -0,0 +1,89 @@ +From 39deaf8585152f1a35c1676d3d7dc6ae0fb65967 Mon Sep 17 00:00:00 2001 +From: Minchan Kim +Date: Mon, 31 Oct 2011 17:06:51 -0700 +Subject: mm: compaction: make isolate_lru_page() filter-aware + +From: Minchan Kim + +commit 39deaf8585152f1a35c1676d3d7dc6ae0fb65967 upstream. + +Stable note: Not tracked in Bugzilla. THP and compaction disrupt the LRU + list leading to poor reclaim decisions which has a variable + performance impact. + +In async mode, compaction doesn't migrate dirty or writeback pages. So, +it's meaningless to pick the page and re-add it to lru list. + +Of course, when we isolate the page in compaction, the page might be dirty +or writeback but when we try to migrate the page, the page would be not +dirty, writeback. So it could be migrated. But it's very unlikely as +isolate and migration cycle is much faster than writeout. + +So, this patch helps cpu overhead and prevent unnecessary LRU churning. + +Signed-off-by: Minchan Kim +Acked-by: Johannes Weiner +Reviewed-by: KAMEZAWA Hiroyuki +Reviewed-by: KOSAKI Motohiro +Acked-by: Mel Gorman +Acked-by: Rik van Riel +Reviewed-by: Michal Hocko +Cc: Andrea Arcangeli +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Mel Gorman +Signed-off-by: Greg Kroah-Hartman + +--- + include/linux/mmzone.h | 2 ++ + mm/compaction.c | 7 +++++-- + mm/vmscan.c | 3 +++ + 3 files changed, 10 insertions(+), 2 deletions(-) + +--- a/include/linux/mmzone.h ++++ b/include/linux/mmzone.h +@@ -162,6 +162,8 @@ static inline int is_unevictable_lru(enu + #define ISOLATE_INACTIVE ((__force isolate_mode_t)0x1) + /* Isolate active pages */ + #define ISOLATE_ACTIVE ((__force isolate_mode_t)0x2) ++/* Isolate clean file */ ++#define ISOLATE_CLEAN ((__force isolate_mode_t)0x4) + + /* LRU Isolation modes. */ + typedef unsigned __bitwise__ isolate_mode_t; +--- a/mm/compaction.c ++++ b/mm/compaction.c +@@ -261,6 +261,7 @@ static isolate_migrate_t isolate_migrate + unsigned long last_pageblock_nr = 0, pageblock_nr; + unsigned long nr_scanned = 0, nr_isolated = 0; + struct list_head *migratelist = &cc->migratepages; ++ isolate_mode_t mode = ISOLATE_ACTIVE|ISOLATE_INACTIVE; + + /* Do not scan outside zone boundaries */ + low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn); +@@ -370,9 +371,11 @@ static isolate_migrate_t isolate_migrate + continue; + } + ++ if (!cc->sync) ++ mode |= ISOLATE_CLEAN; ++ + /* Try isolate the page */ +- if (__isolate_lru_page(page, +- ISOLATE_ACTIVE|ISOLATE_INACTIVE, 0) != 0) ++ if (__isolate_lru_page(page, mode, 0) != 0) + continue; + + VM_BUG_ON(PageTransCompound(page)); +--- a/mm/vmscan.c ++++ b/mm/vmscan.c +@@ -1045,6 +1045,9 @@ int __isolate_lru_page(struct page *page + + ret = -EBUSY; + ++ if ((mode & ISOLATE_CLEAN) && (PageDirty(page) || PageWriteback(page))) ++ return ret; ++ + if (likely(get_page_unless_zero(page))) { + /* + * Be careful not to clear PageLRU until after we're diff --git a/queue-3.0/mm-migration-clean-up-unmap_and_move.patch b/queue-3.0/mm-migration-clean-up-unmap_and_move.patch new file mode 100644 index 00000000000..9f903707a39 --- /dev/null +++ b/queue-3.0/mm-migration-clean-up-unmap_and_move.patch @@ -0,0 +1,146 @@ +From 0dabec93de633a87adfbbe1d800a4c56cd19d73b Mon Sep 17 00:00:00 2001 +From: Minchan Kim +Date: Mon, 31 Oct 2011 17:06:57 -0700 +Subject: mm: migration: clean up unmap_and_move() + +From: Minchan Kim + +commit 0dabec93de633a87adfbbe1d800a4c56cd19d73b upstream. + +Stable note: Not tracked in Bugzilla. This patch makes later patches + easier to apply but has no other impact. + +unmap_and_move() is one a big messy function. Clean it up. + +Signed-off-by: Minchan Kim +Reviewed-by: KOSAKI Motohiro +Cc: Johannes Weiner +Cc: KAMEZAWA Hiroyuki +Cc: Mel Gorman +Cc: Rik van Riel +Cc: Michal Hocko +Cc: Andrea Arcangeli +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds + +--- + mm/migrate.c | 75 +++++++++++++++++++++++++++++++---------------------------- + 1 file changed, 40 insertions(+), 35 deletions(-) + +--- a/mm/migrate.c ++++ b/mm/migrate.c +@@ -621,38 +621,18 @@ static int move_to_new_page(struct page + return rc; + } + +-/* +- * Obtain the lock on page, remove all ptes and migrate the page +- * to the newly allocated page in newpage. +- */ +-static int unmap_and_move(new_page_t get_new_page, unsigned long private, +- struct page *page, int force, bool offlining, bool sync) ++static int __unmap_and_move(struct page *page, struct page *newpage, ++ int force, bool offlining, bool sync) + { +- int rc = 0; +- int *result = NULL; +- struct page *newpage = get_new_page(page, private, &result); ++ int rc = -EAGAIN; + int remap_swapcache = 1; + int charge = 0; + struct mem_cgroup *mem; + struct anon_vma *anon_vma = NULL; + +- if (!newpage) +- return -ENOMEM; +- +- if (page_count(page) == 1) { +- /* page was freed from under us. So we are done. */ +- goto move_newpage; +- } +- if (unlikely(PageTransHuge(page))) +- if (unlikely(split_huge_page(page))) +- goto move_newpage; +- +- /* prepare cgroup just returns 0 or -ENOMEM */ +- rc = -EAGAIN; +- + if (!trylock_page(page)) { + if (!force || !sync) +- goto move_newpage; ++ goto out; + + /* + * It's not safe for direct compaction to call lock_page. +@@ -668,7 +648,7 @@ static int unmap_and_move(new_page_t get + * altogether. + */ + if (current->flags & PF_MEMALLOC) +- goto move_newpage; ++ goto out; + + lock_page(page); + } +@@ -785,27 +765,52 @@ uncharge: + mem_cgroup_end_migration(mem, page, newpage, rc == 0); + unlock: + unlock_page(page); ++out: ++ return rc; ++} ++ ++/* ++ * Obtain the lock on page, remove all ptes and migrate the page ++ * to the newly allocated page in newpage. ++ */ ++static int unmap_and_move(new_page_t get_new_page, unsigned long private, ++ struct page *page, int force, bool offlining, bool sync) ++{ ++ int rc = 0; ++ int *result = NULL; ++ struct page *newpage = get_new_page(page, private, &result); ++ ++ if (!newpage) ++ return -ENOMEM; + +-move_newpage: ++ if (page_count(page) == 1) { ++ /* page was freed from under us. So we are done. */ ++ goto out; ++ } ++ ++ if (unlikely(PageTransHuge(page))) ++ if (unlikely(split_huge_page(page))) ++ goto out; ++ ++ rc = __unmap_and_move(page, newpage, force, offlining, sync); ++out: + if (rc != -EAGAIN) { +- /* +- * A page that has been migrated has all references +- * removed and will be freed. A page that has not been +- * migrated will have kepts its references and be +- * restored. +- */ +- list_del(&page->lru); ++ /* ++ * A page that has been migrated has all references ++ * removed and will be freed. A page that has not been ++ * migrated will have kepts its references and be ++ * restored. ++ */ ++ list_del(&page->lru); + dec_zone_page_state(page, NR_ISOLATED_ANON + + page_is_file_cache(page)); + putback_lru_page(page); + } +- + /* + * Move the new page to the LRU. If migration was not successful + * then this will free the page. + */ + putback_lru_page(newpage); +- + if (result) { + if (rc) + *result = rc; diff --git a/queue-3.0/mm-zone_reclaim-make-isolate_lru_page-filter-aware.patch b/queue-3.0/mm-zone_reclaim-make-isolate_lru_page-filter-aware.patch new file mode 100644 index 00000000000..d0c2b37e5f3 --- /dev/null +++ b/queue-3.0/mm-zone_reclaim-make-isolate_lru_page-filter-aware.patch @@ -0,0 +1,104 @@ +From f80c0673610e36ae29d63e3297175e22f70dde5f Mon Sep 17 00:00:00 2001 +From: Minchan Kim +Date: Mon, 31 Oct 2011 17:06:55 -0700 +Subject: mm: zone_reclaim: make isolate_lru_page() filter-aware + +From: Minchan Kim + +commit f80c0673610e36ae29d63e3297175e22f70dde5f upstream. + +Stable note: Not tracked in Bugzilla. THP and compaction disrupt the LRU list + leading to poor reclaim decisions which has a variable + performance impact. + +In __zone_reclaim case, we don't want to shrink mapped page. Nonetheless, +we have isolated mapped page and re-add it into LRU's head. It's +unnecessary CPU overhead and makes LRU churning. + +Of course, when we isolate the page, the page might be mapped but when we +try to migrate the page, the page would be not mapped. So it could be +migrated. But race is rare and although it happens, it's no big deal. + +Signed-off-by: Minchan Kim +Acked-by: Johannes Weiner +Reviewed-by: KAMEZAWA Hiroyuki +Reviewed-by: KOSAKI Motohiro +Reviewed-by: Michal Hocko +Cc: Mel Gorman +Cc: Rik van Riel +Cc: Andrea Arcangeli +Signed-off-by: Andrew Morton +Signed-off-by: Linus Torvalds +Signed-off-by: Mel Gorman +Signed-off-by: Greg Kroah-Hartman + +--- + include/linux/mmzone.h | 2 ++ + mm/vmscan.c | 20 ++++++++++++++++++-- + 2 files changed, 20 insertions(+), 2 deletions(-) + +--- a/include/linux/mmzone.h ++++ b/include/linux/mmzone.h +@@ -164,6 +164,8 @@ static inline int is_unevictable_lru(enu + #define ISOLATE_ACTIVE ((__force isolate_mode_t)0x2) + /* Isolate clean file */ + #define ISOLATE_CLEAN ((__force isolate_mode_t)0x4) ++/* Isolate unmapped file */ ++#define ISOLATE_UNMAPPED ((__force isolate_mode_t)0x8) + + /* LRU Isolation modes. */ + typedef unsigned __bitwise__ isolate_mode_t; +--- a/mm/vmscan.c ++++ b/mm/vmscan.c +@@ -1048,6 +1048,9 @@ int __isolate_lru_page(struct page *page + if ((mode & ISOLATE_CLEAN) && (PageDirty(page) || PageWriteback(page))) + return ret; + ++ if ((mode & ISOLATE_UNMAPPED) && page_mapped(page)) ++ return ret; ++ + if (likely(get_page_unless_zero(page))) { + /* + * Be careful not to clear PageLRU until after we're +@@ -1471,6 +1474,12 @@ shrink_inactive_list(unsigned long nr_to + reclaim_mode |= ISOLATE_ACTIVE; + + lru_add_drain(); ++ ++ if (!sc->may_unmap) ++ reclaim_mode |= ISOLATE_UNMAPPED; ++ if (!sc->may_writepage) ++ reclaim_mode |= ISOLATE_CLEAN; ++ + spin_lock_irq(&zone->lru_lock); + + if (scanning_global_lru(sc)) { +@@ -1588,19 +1597,26 @@ static void shrink_active_list(unsigned + struct page *page; + struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); + unsigned long nr_rotated = 0; ++ isolate_mode_t reclaim_mode = ISOLATE_ACTIVE; + + lru_add_drain(); ++ ++ if (!sc->may_unmap) ++ reclaim_mode |= ISOLATE_UNMAPPED; ++ if (!sc->may_writepage) ++ reclaim_mode |= ISOLATE_CLEAN; ++ + spin_lock_irq(&zone->lru_lock); + if (scanning_global_lru(sc)) { + nr_taken = isolate_pages_global(nr_pages, &l_hold, + &pgscanned, sc->order, +- ISOLATE_ACTIVE, zone, ++ reclaim_mode, zone, + 1, file); + zone->pages_scanned += pgscanned; + } else { + nr_taken = mem_cgroup_isolate_pages(nr_pages, &l_hold, + &pgscanned, sc->order, +- ISOLATE_ACTIVE, zone, ++ reclaim_mode, zone, + sc->mem_cgroup, 1, file); + /* + * mem_cgroup_isolate_pages() keeps track of diff --git a/queue-3.0/series b/queue-3.0/series index ca0e09acdd8..30fcb6914bb 100644 --- a/queue-3.0/series +++ b/queue-3.0/series @@ -16,3 +16,8 @@ vmscan-limit-direct-reclaim-for-higher-order-allocations.patch vmscan-abort-reclaim-compaction-if-compaction-can-proceed.patch mm-compaction-trivial-clean-up-in-acct_isolated.patch mm-change-isolate-mode-from-define-to-bitwise-type.patch +mm-compaction-make-isolate_lru_page-filter-aware.patch +mm-zone_reclaim-make-isolate_lru_page-filter-aware.patch +mm-migration-clean-up-unmap_and_move.patch +mm-compaction-allow-compaction-to-isolate-dirty-pages.patch +mm-compaction-determine-if-dirty-pages-can-be-migrated-without-blocking-within-migratepage.patch