--- /dev/null
+From c3b94f44fcb0725471ecebb701c077a0ed67bd07 Mon Sep 17 00:00:00 2001
+From: Hugh Dickins <hughd@google.com>
+Date: Tue, 31 Jul 2012 16:45:59 -0700
+Subject: memcg: further prevent OOM with too many dirty pages
+
+From: Hugh Dickins <hughd@google.com>
+
+commit c3b94f44fcb0725471ecebb701c077a0ed67bd07 upstream.
+
+The may_enter_fs test turns out to be too restrictive: though I saw no
+problem with it when testing on 3.5-rc6, it very soon OOMed when I tested
+on 3.5-rc6-mm1. I don't know what the difference there is, perhaps I just
+slightly changed the way I started off the testing: dd if=/dev/zero
+of=/mnt/temp bs=1M count=1024; rm -f /mnt/temp; sync repeatedly, in 20M
+memory.limit_in_bytes cgroup to ext4 on USB stick.
+
+ext4 (and gfs2 and xfs) turn out to allocate new pages for writing with
+AOP_FLAG_NOFS: that seems a little worrying, and it's unclear to me why
+the transaction needs to be started even before allocating pagecache
+memory. But it may not be worth worrying about these days: if direct
+reclaim avoids FS writeback, does __GFP_FS now mean anything?
+
+Anyway, we insisted on the may_enter_fs test to avoid hangs with the loop
+device; but since that also masks off __GFP_IO, we can test for __GFP_IO
+directly, ignoring may_enter_fs and __GFP_FS.
+
+But even so, the test still OOMs sometimes: when originally testing on
+3.5-rc6, it OOMed about one time in five or ten; when testing just now on
+3.5-rc6-mm1, it OOMed on the first iteration.
+
+This residual problem comes from an accumulation of pages under ordinary
+writeback, not marked PageReclaim, so rightly not causing the memcg check
+to wait on their writeback: these too can prevent shrink_page_list() from
+freeing any pages, so many times that memcg reclaim fails and OOMs.
+
+Deal with these in the same way as direct reclaim now deals with dirty FS
+pages: mark them PageReclaim. It is appropriate to rotate these to tail
+of list when writepage completes, but more importantly, the PageReclaim
+flag makes memcg reclaim wait on them if encountered again. Increment
+NR_VMSCAN_IMMEDIATE? That's arguable: I chose not.
+
+Setting PageReclaim here may occasionally race with end_page_writeback()
+clearing it: lru_deactivate_fn() already faced the same race, and
+correctly concluded that the window is small and the issue non-critical.
+
+With these changes, the test runs indefinitely without OOMing on ext4,
+ext3 and ext2: I'll move on to test with other filesystems later.
+
+Trivia: invert conditions for a clearer block without an else, and goto
+keep_locked to do the unlock_page.
+
+Signed-off-by: Hugh Dickins <hughd@google.com>
+Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
+Cc: Minchan Kim <minchan@kernel.org>
+Cc: Rik van Riel <riel@redhat.com>
+Cc: Ying Han <yinghan@google.com>
+Cc: Greg Thelen <gthelen@google.com>
+Cc: Hugh Dickins <hughd@google.com>
+Cc: Mel Gorman <mgorman@suse.de>
+Cc: Johannes Weiner <hannes@cmpxchg.org>
+Cc: Fengguang Wu <fengguang.wu@intel.com>
+Acked-by: Michal Hocko <mhocko@suse.cz>
+Cc: Dave Chinner <david@fromorbit.com>
+Cc: Theodore Ts'o <tytso@mit.edu>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/vmscan.c | 33 ++++++++++++++++++++++++---------
+ 1 file changed, 24 insertions(+), 9 deletions(-)
+
+--- a/mm/vmscan.c
++++ b/mm/vmscan.c
+@@ -723,23 +723,38 @@ static unsigned long shrink_page_list(st
+ /*
+ * memcg doesn't have any dirty pages throttling so we
+ * could easily OOM just because too many pages are in
+- * writeback from reclaim and there is nothing else to
+- * reclaim.
++ * writeback and there is nothing else to reclaim.
+ *
+- * Check may_enter_fs, certainly because a loop driver
++ * Check __GFP_IO, certainly because a loop driver
+ * thread might enter reclaim, and deadlock if it waits
+ * on a page for which it is needed to do the write
+ * (loop masks off __GFP_IO|__GFP_FS for this reason);
+ * but more thought would probably show more reasons.
++ *
++ * Don't require __GFP_FS, since we're not going into
++ * the FS, just waiting on its writeback completion.
++ * Worryingly, ext4 gfs2 and xfs allocate pages with
++ * grab_cache_page_write_begin(,,AOP_FLAG_NOFS), so
++ * testing may_enter_fs here is liable to OOM on them.
+ */
+- if (!global_reclaim(sc) && PageReclaim(page) &&
+- may_enter_fs)
+- wait_on_page_writeback(page);
+- else {
++ if (global_reclaim(sc) ||
++ !PageReclaim(page) || !(sc->gfp_mask & __GFP_IO)) {
++ /*
++ * This is slightly racy - end_page_writeback()
++ * might have just cleared PageReclaim, then
++ * setting PageReclaim here end up interpreted
++ * as PageReadahead - but that does not matter
++ * enough to care. What we do want is for this
++ * page to have PageReclaim set next time memcg
++ * reclaim reaches the tests above, so it will
++ * then wait_on_page_writeback() to avoid OOM;
++ * and it's also appropriate in global reclaim.
++ */
++ SetPageReclaim(page);
+ nr_writeback++;
+- unlock_page(page);
+- goto keep;
++ goto keep_locked;
+ }
++ wait_on_page_writeback(page);
+ }
+
+ references = page_check_references(page, sc);
--- /dev/null
+From e62e384e9da8d9a0c599795464a7e76fd490931c Mon Sep 17 00:00:00 2001
+From: Michal Hocko <mhocko@suse.cz>
+Date: Tue, 31 Jul 2012 16:45:55 -0700
+Subject: memcg: prevent OOM with too many dirty pages
+
+From: Michal Hocko <mhocko@suse.cz>
+
+commit e62e384e9da8d9a0c599795464a7e76fd490931c upstream.
+
+The current implementation of dirty pages throttling is not memcg aware
+which makes it easy to have memcg LRUs full of dirty pages. Without
+throttling, these LRUs can be scanned faster than the rate of writeback,
+leading to memcg OOM conditions when the hard limit is small.
+
+This patch fixes the problem by throttling the allocating process
+(possibly a writer) during the hard limit reclaim by waiting on
+PageReclaim pages. We are waiting only for PageReclaim pages because
+those are the pages that made one full round over LRU and that means that
+the writeback is much slower than scanning.
+
+The solution is far from being ideal - long term solution is memcg aware
+dirty throttling - but it is meant to be a band aid until we have a real
+fix. We are seeing this happening during nightly backups which are placed
+into containers to prevent from eviction of the real working set.
+
+The change affects only memcg reclaim and only when we encounter
+PageReclaim pages which is a signal that the reclaim doesn't catch up on
+with the writers so somebody should be throttled. This could be
+potentially unfair because it could be somebody else from the group who
+gets throttled on behalf of the writer but as writers need to allocate as
+well and they allocate in higher rate the probability that only innocent
+processes would be penalized is not that high.
+
+I have tested this change by a simple dd copying /dev/zero to tmpfs or
+ext3 running under small memcg (1G copy under 5M, 60M, 300M and 2G
+containers) and dd got killed by OOM killer every time. With the patch I
+could run the dd with the same size under 5M controller without any OOM.
+The issue is more visible with slower devices for output.
+
+* With the patch
+================
+* tmpfs size=2G
+---------------
+$ vim cgroup_cache_oom_test.sh
+$ ./cgroup_cache_oom_test.sh 5M
+using Limit 5M for group
+1000+0 records in
+1000+0 records out
+1048576000 bytes (1.0 GB) copied, 30.4049 s, 34.5 MB/s
+$ ./cgroup_cache_oom_test.sh 60M
+using Limit 60M for group
+1000+0 records in
+1000+0 records out
+1048576000 bytes (1.0 GB) copied, 31.4561 s, 33.3 MB/s
+$ ./cgroup_cache_oom_test.sh 300M
+using Limit 300M for group
+1000+0 records in
+1000+0 records out
+1048576000 bytes (1.0 GB) copied, 20.4618 s, 51.2 MB/s
+$ ./cgroup_cache_oom_test.sh 2G
+using Limit 2G for group
+1000+0 records in
+1000+0 records out
+1048576000 bytes (1.0 GB) copied, 1.42172 s, 738 MB/s
+
+* ext3
+------
+$ ./cgroup_cache_oom_test.sh 5M
+using Limit 5M for group
+1000+0 records in
+1000+0 records out
+1048576000 bytes (1.0 GB) copied, 27.9547 s, 37.5 MB/s
+$ ./cgroup_cache_oom_test.sh 60M
+using Limit 60M for group
+1000+0 records in
+1000+0 records out
+1048576000 bytes (1.0 GB) copied, 30.3221 s, 34.6 MB/s
+$ ./cgroup_cache_oom_test.sh 300M
+using Limit 300M for group
+1000+0 records in
+1000+0 records out
+1048576000 bytes (1.0 GB) copied, 24.5764 s, 42.7 MB/s
+$ ./cgroup_cache_oom_test.sh 2G
+using Limit 2G for group
+1000+0 records in
+1000+0 records out
+1048576000 bytes (1.0 GB) copied, 3.35828 s, 312 MB/s
+
+* Without the patch
+===================
+* tmpfs size=2G
+---------------
+$ ./cgroup_cache_oom_test.sh 5M
+using Limit 5M for group
+./cgroup_cache_oom_test.sh: line 46: 4668 Killed dd if=/dev/zero of=$OUT/zero bs=1M count=$count
+$ ./cgroup_cache_oom_test.sh 60M
+using Limit 60M for group
+1000+0 records in
+1000+0 records out
+1048576000 bytes (1.0 GB) copied, 25.4989 s, 41.1 MB/s
+$ ./cgroup_cache_oom_test.sh 300M
+using Limit 300M for group
+1000+0 records in
+1000+0 records out
+1048576000 bytes (1.0 GB) copied, 24.3928 s, 43.0 MB/s
+$ ./cgroup_cache_oom_test.sh 2G
+using Limit 2G for group
+1000+0 records in
+1000+0 records out
+1048576000 bytes (1.0 GB) copied, 1.49797 s, 700 MB/s
+
+* ext3
+------
+$ ./cgroup_cache_oom_test.sh 5M
+using Limit 5M for group
+./cgroup_cache_oom_test.sh: line 46: 4689 Killed dd if=/dev/zero of=$OUT/zero bs=1M count=$count
+$ ./cgroup_cache_oom_test.sh 60M
+using Limit 60M for group
+./cgroup_cache_oom_test.sh: line 46: 4692 Killed dd if=/dev/zero of=$OUT/zero bs=1M count=$count
+$ ./cgroup_cache_oom_test.sh 300M
+using Limit 300M for group
+1000+0 records in
+1000+0 records out
+1048576000 bytes (1.0 GB) copied, 20.248 s, 51.8 MB/s
+$ ./cgroup_cache_oom_test.sh 2G
+using Limit 2G for group
+1000+0 records in
+1000+0 records out
+1048576000 bytes (1.0 GB) copied, 2.85201 s, 368 MB/s
+
+[akpm@linux-foundation.org: tweak changelog, reordered the test to optimize for CONFIG_CGROUP_MEM_RES_CTLR=n]
+[hughd@google.com: fix deadlock with loop driver]
+Reviewed-by: Mel Gorman <mgorman@suse.de>
+Acked-by: Johannes Weiner <hannes@cmpxchg.org>
+Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
+Signed-off-by: Michal Hocko <mhocko@suse.cz>
+Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
+Cc: Minchan Kim <minchan@kernel.org>
+Cc: Rik van Riel <riel@redhat.com>
+Cc: Ying Han <yinghan@google.com>
+Cc: Greg Thelen <gthelen@google.com>
+Cc: Hugh Dickins <hughd@google.com>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/vmscan.c | 23 ++++++++++++++++++++---
+ 1 file changed, 20 insertions(+), 3 deletions(-)
+
+--- a/mm/vmscan.c
++++ b/mm/vmscan.c
+@@ -720,9 +720,26 @@ static unsigned long shrink_page_list(st
+ (PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
+
+ if (PageWriteback(page)) {
+- nr_writeback++;
+- unlock_page(page);
+- goto keep;
++ /*
++ * memcg doesn't have any dirty pages throttling so we
++ * could easily OOM just because too many pages are in
++ * writeback from reclaim and there is nothing else to
++ * reclaim.
++ *
++ * Check may_enter_fs, certainly because a loop driver
++ * thread might enter reclaim, and deadlock if it waits
++ * on a page for which it is needed to do the write
++ * (loop masks off __GFP_IO|__GFP_FS for this reason);
++ * but more thought would probably show more reasons.
++ */
++ if (!global_reclaim(sc) && PageReclaim(page) &&
++ may_enter_fs)
++ wait_on_page_writeback(page);
++ else {
++ nr_writeback++;
++ unlock_page(page);
++ goto keep;
++ }
+ }
+
+ references = page_check_references(page, sc);
--- /dev/null
+From dc32f63453f56d07a1073a697dcd843dd3098c09 Mon Sep 17 00:00:00 2001
+From: Joonsoo Kim <js1304@gmail.com>
+Date: Mon, 30 Jul 2012 14:39:04 -0700
+Subject: mm: fix wrong argument of migrate_huge_pages() in soft_offline_huge_page()
+
+From: Joonsoo Kim <js1304@gmail.com>
+
+commit dc32f63453f56d07a1073a697dcd843dd3098c09 upstream.
+
+Commit a6bc32b89922 ("mm: compaction: introduce sync-light migration for
+use by compaction") changed the declaration of migrate_pages() and
+migrate_huge_pages().
+
+But it missed changing the argument of migrate_huge_pages() in
+soft_offline_huge_page(). In this case, we should call
+migrate_huge_pages() with MIGRATE_SYNC.
+
+Additionally, there is a mismatch between type the of argument and the
+function declaration for migrate_pages().
+
+Signed-off-by: Joonsoo Kim <js1304@gmail.com>
+Cc: Christoph Lameter <cl@linux.com>
+Cc: Mel Gorman <mgorman@suse.de>
+Acked-by: David Rientjes <rientjes@google.com>
+Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ mm/memory-failure.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+--- a/mm/memory-failure.c
++++ b/mm/memory-failure.c
+@@ -1431,8 +1431,8 @@ static int soft_offline_huge_page(struct
+ /* Keep page count to indicate a given hugepage is isolated. */
+
+ list_add(&hpage->lru, &pagelist);
+- ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0,
+- true);
++ ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, false,
++ MIGRATE_SYNC);
+ if (ret) {
+ struct page *page1, *page2;
+ list_for_each_entry_safe(page1, page2, &pagelist, lru)
+@@ -1561,7 +1561,7 @@ int soft_offline_page(struct page *page,
+ page_is_file_cache(page));
+ list_add(&page->lru, &pagelist);
+ ret = migrate_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL,
+- 0, MIGRATE_SYNC);
++ false, MIGRATE_SYNC);
+ if (ret) {
+ putback_lru_pages(&pagelist);
+ pr_info("soft offline: %#lx: migration failed %d, type %lx\n",
--- /dev/null
+From 6c4088ac3a4d82779903433bcd5f048c58fb1aca Mon Sep 17 00:00:00 2001
+From: Greg Pearson <greg.pearson@hp.com>
+Date: Mon, 30 Jul 2012 14:39:05 -0700
+Subject: pcdp: use early_ioremap/early_iounmap to access pcdp table
+
+From: Greg Pearson <greg.pearson@hp.com>
+
+commit 6c4088ac3a4d82779903433bcd5f048c58fb1aca upstream.
+
+efi_setup_pcdp_console() is called during boot to parse the HCDP/PCDP
+EFI system table and setup an early console for printk output. The
+routine uses ioremap/iounmap to setup access to the HCDP/PCDP table
+information.
+
+The call to ioremap is happening early in the boot process which leads
+to a panic on x86_64 systems:
+
+ panic+0x01ca
+ do_exit+0x043c
+ oops_end+0x00a7
+ no_context+0x0119
+ __bad_area_nosemaphore+0x0138
+ bad_area_nosemaphore+0x000e
+ do_page_fault+0x0321
+ page_fault+0x0020
+ reserve_memtype+0x02a1
+ __ioremap_caller+0x0123
+ ioremap_nocache+0x0012
+ efi_setup_pcdp_console+0x002b
+ setup_arch+0x03a9
+ start_kernel+0x00d4
+ x86_64_start_reservations+0x012c
+ x86_64_start_kernel+0x00fe
+
+This replaces the calls to ioremap/iounmap in efi_setup_pcdp_console()
+with calls to early_ioremap/early_iounmap which can be called during
+early boot.
+
+This patch was tested on an x86_64 prototype system which uses the
+HCDP/PCDP table for early console setup.
+
+Signed-off-by: Greg Pearson <greg.pearson@hp.com>
+Acked-by: Khalid Aziz <khalid.aziz@hp.com>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+
+---
+ drivers/firmware/pcdp.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/drivers/firmware/pcdp.c
++++ b/drivers/firmware/pcdp.c
+@@ -95,7 +95,7 @@ efi_setup_pcdp_console(char *cmdline)
+ if (efi.hcdp == EFI_INVALID_TABLE_ADDR)
+ return -ENODEV;
+
+- pcdp = ioremap(efi.hcdp, 4096);
++ pcdp = early_ioremap(efi.hcdp, 4096);
+ printk(KERN_INFO "PCDP: v%d at 0x%lx\n", pcdp->rev, efi.hcdp);
+
+ if (strstr(cmdline, "console=hcdp")) {
+@@ -131,6 +131,6 @@ efi_setup_pcdp_console(char *cmdline)
+ }
+
+ out:
+- iounmap(pcdp);
++ early_iounmap(pcdp, 4096);
+ return rc;
+ }
media-ene_ir-fix-driver-initialisation.patch
media-m5mols-correct-reported-iso-values.patch
media-videobuf-dma-contig-restore-buffer-mapping-for-uncached-bufers.patch
+pcdp-use-early_ioremap-early_iounmap-to-access-pcdp-table.patch
+memcg-prevent-oom-with-too-many-dirty-pages.patch
+memcg-further-prevent-oom-with-too-many-dirty-pages.patch
+mm-fix-wrong-argument-of-migrate_huge_pages-in-soft_offline_huge_page.patch