From: Greg Kroah-Hartman Date: Tue, 12 Nov 2019 05:41:11 +0000 (+0100) Subject: drop patch I could not backport properly :( X-Git-Tag: v4.4.201~11 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=aeeb7aab89508cee3c45320cd7bdf680ab604a8a;p=thirdparty%2Fkernel%2Fstable-queue.git drop patch I could not backport properly :( --- diff --git a/queue-4.14/mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch b/queue-4.14/mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch deleted file mode 100644 index bd1e3caea8c..00000000000 --- a/queue-4.14/mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch +++ /dev/null @@ -1,120 +0,0 @@ -From 3e8fc0075e24338b1117cdff6a79477427b8dbed Mon Sep 17 00:00:00 2001 -From: Mel Gorman -Date: Tue, 5 Nov 2019 21:16:27 -0800 -Subject: mm, meminit: recalculate pcpu batch and high limits after init completes - -From: Mel Gorman - -commit 3e8fc0075e24338b1117cdff6a79477427b8dbed upstream. - -Deferred memory initialisation updates zone->managed_pages during the -initialisation phase but before that finishes, the per-cpu page -allocator (pcpu) calculates the number of pages allocated/freed in -batches as well as the maximum number of pages allowed on a per-cpu -list. As zone->managed_pages is not up to date yet, the pcpu -initialisation calculates inappropriately low batch and high values. - -This increases zone lock contention quite severely in some cases with -the degree of severity depending on how many CPUs share a local zone and -the size of the zone. A private report indicated that kernel build -times were excessive with extremely high system CPU usage. A perf -profile indicated that a large chunk of time was lost on zone->lock -contention. - -This patch recalculates the pcpu batch and high values after deferred -initialisation completes for every populated zone in the system. It was -tested on a 2-socket AMD EPYC 2 machine using a kernel compilation -workload -- allmodconfig and all available CPUs. - -mmtests configuration: config-workload-kernbench-max Configuration was -modified to build on a fresh XFS partition. - -kernbench - 5.4.0-rc3 5.4.0-rc3 - vanilla resetpcpu-v2 -Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%* -Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%* -Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%* -Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%) -Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%) -Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%) - - 5.4.0-rc3 5.4.0-rc3 - vanilla resetpcpu-v2 -Duration User 39766.24 49221.79 -Duration System 44298.10 13361.67 -Duration Elapsed 519.11 388.87 - -The patch reduces system CPU usage by 69.86% and total build time by -26.65%. The variance of system CPU usage is also much reduced. - -Before, this was the breakdown of batch and high values over all zones -was: - - 256 batch: 1 - 256 batch: 63 - 512 batch: 7 - 256 high: 0 - 256 high: 378 - 512 high: 42 - -512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After -the patch: - - 256 batch: 1 - 768 batch: 63 - 256 high: 0 - 768 high: 378 - -[mgorman@techsingularity.net: fix merge/linkage snafu] - Link: http://lkml.kernel.org/r/20191023084705.GD3016@techsingularity.netLink: http://lkml.kernel.org/r/20191021094808.28824-2-mgorman@techsingularity.net -Signed-off-by: Mel Gorman -Acked-by: Michal Hocko -Acked-by: Vlastimil Babka -Acked-by: David Hildenbrand -Cc: Matt Fleming -Cc: Thomas Gleixner -Cc: Borislav Petkov -Cc: Qian Cai -Cc: [4.1+] -Signed-off-by: Andrew Morton -Signed-off-by: Linus Torvalds -Signed-off-by: Greg Kroah-Hartman - ---- - mm/page_alloc.c | 10 ++++++++-- - 1 file changed, 8 insertions(+), 2 deletions(-) - ---- a/mm/page_alloc.c -+++ b/mm/page_alloc.c -@@ -1998,6 +1998,14 @@ static void steal_suitable_fallback(stru - old_block_type = get_pageblock_migratetype(page); - - /* -+ * The number of managed pages has changed due to the initialisation -+ * so the pcpu batch and high limits needs to be updated or the limits -+ * will be artificially small. -+ */ -+ for_each_populated_zone(zone) -+ zone_pcp_update(zone); -+ -+ /* - * This can happen due to races and we want to prevent broken - * highatomic accounting. - */ -@@ -7659,7 +7667,6 @@ void free_contig_range(unsigned long pfn - } - #endif - --#ifdef CONFIG_MEMORY_HOTPLUG - /* - * The zone indicated has a new number of managed_pages; batch sizes and percpu - * page high values need to be recalulated. -@@ -7673,7 +7680,6 @@ void __meminit zone_pcp_update(struct zo - per_cpu_ptr(zone->pageset, cpu)); - mutex_unlock(&pcp_batch_high_lock); - } --#endif - - void zone_pcp_reset(struct zone *zone) - { diff --git a/queue-4.14/series b/queue-4.14/series index a348a2d96e8..3b6f20587d1 100644 --- a/queue-4.14/series +++ b/queue-4.14/series @@ -12,7 +12,6 @@ qede-fix-null-pointer-deref-in-__qede_remove.patch alsa-timer-fix-incorrectly-assigned-timer-instance.patch alsa-bebob-fix-to-detect-configured-source-of-sampling-clock-for-focusrite-saffire-pro-i-o-series.patch alsa-hda-ca0132-fix-possible-workqueue-stall.patch -mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch mm-thp-handle-page-cache-thp-correctly-in-pagetranscompoundmap.patch mm-vmstat-hide-proc-pagetypeinfo-from-normal-users.patch dump_stack-avoid-the-livelock-of-the-dump_lock.patch diff --git a/queue-4.4/mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch b/queue-4.4/mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch deleted file mode 100644 index 7d72802e1ab..00000000000 --- a/queue-4.4/mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch +++ /dev/null @@ -1,120 +0,0 @@ -From 3e8fc0075e24338b1117cdff6a79477427b8dbed Mon Sep 17 00:00:00 2001 -From: Mel Gorman -Date: Tue, 5 Nov 2019 21:16:27 -0800 -Subject: mm, meminit: recalculate pcpu batch and high limits after init completes - -From: Mel Gorman - -commit 3e8fc0075e24338b1117cdff6a79477427b8dbed upstream. - -Deferred memory initialisation updates zone->managed_pages during the -initialisation phase but before that finishes, the per-cpu page -allocator (pcpu) calculates the number of pages allocated/freed in -batches as well as the maximum number of pages allowed on a per-cpu -list. As zone->managed_pages is not up to date yet, the pcpu -initialisation calculates inappropriately low batch and high values. - -This increases zone lock contention quite severely in some cases with -the degree of severity depending on how many CPUs share a local zone and -the size of the zone. A private report indicated that kernel build -times were excessive with extremely high system CPU usage. A perf -profile indicated that a large chunk of time was lost on zone->lock -contention. - -This patch recalculates the pcpu batch and high values after deferred -initialisation completes for every populated zone in the system. It was -tested on a 2-socket AMD EPYC 2 machine using a kernel compilation -workload -- allmodconfig and all available CPUs. - -mmtests configuration: config-workload-kernbench-max Configuration was -modified to build on a fresh XFS partition. - -kernbench - 5.4.0-rc3 5.4.0-rc3 - vanilla resetpcpu-v2 -Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%* -Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%* -Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%* -Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%) -Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%) -Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%) - - 5.4.0-rc3 5.4.0-rc3 - vanilla resetpcpu-v2 -Duration User 39766.24 49221.79 -Duration System 44298.10 13361.67 -Duration Elapsed 519.11 388.87 - -The patch reduces system CPU usage by 69.86% and total build time by -26.65%. The variance of system CPU usage is also much reduced. - -Before, this was the breakdown of batch and high values over all zones -was: - - 256 batch: 1 - 256 batch: 63 - 512 batch: 7 - 256 high: 0 - 256 high: 378 - 512 high: 42 - -512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After -the patch: - - 256 batch: 1 - 768 batch: 63 - 256 high: 0 - 768 high: 378 - -[mgorman@techsingularity.net: fix merge/linkage snafu] - Link: http://lkml.kernel.org/r/20191023084705.GD3016@techsingularity.netLink: http://lkml.kernel.org/r/20191021094808.28824-2-mgorman@techsingularity.net -Signed-off-by: Mel Gorman -Acked-by: Michal Hocko -Acked-by: Vlastimil Babka -Acked-by: David Hildenbrand -Cc: Matt Fleming -Cc: Thomas Gleixner -Cc: Borislav Petkov -Cc: Qian Cai -Cc: [4.1+] -Signed-off-by: Andrew Morton -Signed-off-by: Linus Torvalds -Signed-off-by: Greg Kroah-Hartman - ---- - mm/page_alloc.c | 10 ++++++++-- - 1 file changed, 8 insertions(+), 2 deletions(-) - ---- a/mm/page_alloc.c -+++ b/mm/page_alloc.c -@@ -2010,6 +2010,14 @@ void drain_all_pages(struct zone *zone) - int cpu; - - /* -+ * The number of managed pages has changed due to the initialisation -+ * so the pcpu batch and high limits needs to be updated or the limits -+ * will be artificially small. -+ */ -+ for_each_populated_zone(zone) -+ zone_pcp_update(zone); -+ -+ /* - * Allocate in the BSS so we wont require allocation in - * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y - */ -@@ -6868,7 +6876,6 @@ void free_contig_range(unsigned long pfn - } - #endif - --#ifdef CONFIG_MEMORY_HOTPLUG - /* - * The zone indicated has a new number of managed_pages; batch sizes and percpu - * page high values need to be recalulated. -@@ -6882,7 +6889,6 @@ void __meminit zone_pcp_update(struct zo - per_cpu_ptr(zone->pageset, cpu)); - mutex_unlock(&pcp_batch_high_lock); - } --#endif - - void zone_pcp_reset(struct zone *zone) - { diff --git a/queue-4.4/series b/queue-4.4/series index f3d6b05c285..5248c74cf2c 100644 --- a/queue-4.4/series +++ b/queue-4.4/series @@ -6,7 +6,6 @@ qede-fix-null-pointer-deref-in-__qede_remove.patch nfc-netlink-fix-double-device-reference-drop.patch alsa-bebob-fix-to-detect-configured-source-of-sampling-clock-for-focusrite-saffire-pro-i-o-series.patch alsa-hda-ca0132-fix-possible-workqueue-stall.patch -mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch mm-vmstat-hide-proc-pagetypeinfo-from-normal-users.patch dump_stack-avoid-the-livelock-of-the-dump_lock.patch perf-tools-fix-time-sorting.patch diff --git a/queue-4.9/mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch b/queue-4.9/mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch deleted file mode 100644 index e594c5c8f95..00000000000 --- a/queue-4.9/mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch +++ /dev/null @@ -1,120 +0,0 @@ -From 3e8fc0075e24338b1117cdff6a79477427b8dbed Mon Sep 17 00:00:00 2001 -From: Mel Gorman -Date: Tue, 5 Nov 2019 21:16:27 -0800 -Subject: mm, meminit: recalculate pcpu batch and high limits after init completes - -From: Mel Gorman - -commit 3e8fc0075e24338b1117cdff6a79477427b8dbed upstream. - -Deferred memory initialisation updates zone->managed_pages during the -initialisation phase but before that finishes, the per-cpu page -allocator (pcpu) calculates the number of pages allocated/freed in -batches as well as the maximum number of pages allowed on a per-cpu -list. As zone->managed_pages is not up to date yet, the pcpu -initialisation calculates inappropriately low batch and high values. - -This increases zone lock contention quite severely in some cases with -the degree of severity depending on how many CPUs share a local zone and -the size of the zone. A private report indicated that kernel build -times were excessive with extremely high system CPU usage. A perf -profile indicated that a large chunk of time was lost on zone->lock -contention. - -This patch recalculates the pcpu batch and high values after deferred -initialisation completes for every populated zone in the system. It was -tested on a 2-socket AMD EPYC 2 machine using a kernel compilation -workload -- allmodconfig and all available CPUs. - -mmtests configuration: config-workload-kernbench-max Configuration was -modified to build on a fresh XFS partition. - -kernbench - 5.4.0-rc3 5.4.0-rc3 - vanilla resetpcpu-v2 -Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%* -Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%* -Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%* -Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%) -Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%) -Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%) - - 5.4.0-rc3 5.4.0-rc3 - vanilla resetpcpu-v2 -Duration User 39766.24 49221.79 -Duration System 44298.10 13361.67 -Duration Elapsed 519.11 388.87 - -The patch reduces system CPU usage by 69.86% and total build time by -26.65%. The variance of system CPU usage is also much reduced. - -Before, this was the breakdown of batch and high values over all zones -was: - - 256 batch: 1 - 256 batch: 63 - 512 batch: 7 - 256 high: 0 - 256 high: 378 - 512 high: 42 - -512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After -the patch: - - 256 batch: 1 - 768 batch: 63 - 256 high: 0 - 768 high: 378 - -[mgorman@techsingularity.net: fix merge/linkage snafu] - Link: http://lkml.kernel.org/r/20191023084705.GD3016@techsingularity.netLink: http://lkml.kernel.org/r/20191021094808.28824-2-mgorman@techsingularity.net -Signed-off-by: Mel Gorman -Acked-by: Michal Hocko -Acked-by: Vlastimil Babka -Acked-by: David Hildenbrand -Cc: Matt Fleming -Cc: Thomas Gleixner -Cc: Borislav Petkov -Cc: Qian Cai -Cc: [4.1+] -Signed-off-by: Andrew Morton -Signed-off-by: Linus Torvalds -Signed-off-by: Greg Kroah-Hartman - ---- - mm/page_alloc.c | 10 ++++++++-- - 1 file changed, 8 insertions(+), 2 deletions(-) - ---- a/mm/page_alloc.c -+++ b/mm/page_alloc.c -@@ -2051,6 +2051,14 @@ static void reserve_highatomic_pageblock - unsigned long max_managed, flags; - - /* -+ * The number of managed pages has changed due to the initialisation -+ * so the pcpu batch and high limits needs to be updated or the limits -+ * will be artificially small. -+ */ -+ for_each_populated_zone(zone) -+ zone_pcp_update(zone); -+ -+ /* - * Limit the number reserved to 1 pageblock or roughly 1% of a zone. - * Check is race-prone but harmless. - */ -@@ -7385,7 +7393,6 @@ void free_contig_range(unsigned long pfn - } - #endif - --#ifdef CONFIG_MEMORY_HOTPLUG - /* - * The zone indicated has a new number of managed_pages; batch sizes and percpu - * page high values need to be recalulated. -@@ -7399,7 +7406,6 @@ void __meminit zone_pcp_update(struct zo - per_cpu_ptr(zone->pageset, cpu)); - mutex_unlock(&pcp_batch_high_lock); - } --#endif - - void zone_pcp_reset(struct zone *zone) - { diff --git a/queue-4.9/series b/queue-4.9/series index e1d83ab94b8..6b5b3516a3e 100644 --- a/queue-4.9/series +++ b/queue-4.9/series @@ -9,7 +9,6 @@ qede-fix-null-pointer-deref-in-__qede_remove.patch alsa-timer-fix-incorrectly-assigned-timer-instance.patch alsa-bebob-fix-to-detect-configured-source-of-sampling-clock-for-focusrite-saffire-pro-i-o-series.patch alsa-hda-ca0132-fix-possible-workqueue-stall.patch -mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch mm-thp-handle-page-cache-thp-correctly-in-pagetranscompoundmap.patch mm-vmstat-hide-proc-pagetypeinfo-from-normal-users.patch dump_stack-avoid-the-livelock-of-the-dump_lock.patch