patches for 4.19

author Sasha Levin <sashal@kernel.org>

Tue, 26 Feb 2019 00:49:56 +0000 (19:49 -0500)

committer Sasha Levin <sashal@kernel.org>

Tue, 26 Feb 2019 00:49:56 +0000 (19:49 -0500)
author Sasha Levin <sashal@kernel.org>
Tue, 26 Feb 2019 00:49:56 +0000 (19:49 -0500)
committer Sasha Levin <sashal@kernel.org>
Tue, 26 Feb 2019 00:49:56 +0000 (19:49 -0500)
diff --git a/queue-4.19/genirq-matrix-improve-target-cpu-selection-for-manag.patch b/queue-4.19/genirq-matrix-improve-target-cpu-selection-for-manag.patch

new file mode 100644 (file)

index 0000000..b63da61
--- /dev/null
+++ b/queue-4.19/genirq-matrix-improve-target-cpu-selection-for-manag.patch
@@ -0,0 +1,216 @@
+From c7ca3df05628b3d8f8a33e2f69b1b0bd8411f0c5 Mon Sep 17 00:00:00 2001
+From: Long Li <longli@microsoft.com>
+Date: Tue, 6 Nov 2018 04:00:00 +0000
+Subject: genirq/matrix: Improve target CPU selection for managed interrupts.
+
+[ Upstream commit e8da8794a7fd9eef1ec9a07f0d4897c68581c72b ]
+
+On large systems with multiple devices of the same class (e.g. NVMe disks,
+using managed interrupts), the kernel can affinitize these interrupts to a
+small subset of CPUs instead of spreading them out evenly.
+
+irq_matrix_alloc_managed() tries to select the CPU in the supplied cpumask
+of possible target CPUs which has the lowest number of interrupt vectors
+allocated.
+
+This is done by searching the CPU with the highest number of available
+vectors. While this is correct for non-managed CPUs it can select the wrong
+CPU for managed interrupts. Under certain constellations this results in
+affinitizing the managed interrupts of several devices to a single CPU in
+a set.
+
+The book keeping of available vectors works the following way:
+
+ 1) Non-managed interrupts:
+
+    available is decremented when the interrupt is actually requested by
+    the device driver and a vector is assigned. It's incremented when the
+    interrupt and the vector are freed.
+
+ 2) Managed interrupts:
+
+    Managed interrupts guarantee vector reservation when the MSI/MSI-X
+    functionality of a device is enabled, which is achieved by reserving
+    vectors in the bitmaps of the possible target CPUs. This reservation
+    decrements the available count on each possible target CPU.
+
+    When the interrupt is requested by the device driver then a vector is
+    allocated from the reserved region. The operation is reversed when the
+    interrupt is freed by the device driver. Neither of these operations
+    affect the available count.
+
+    The reservation persist up to the point where the MSI/MSI-X
+    functionality is disabled and only this operation increments the
+    available count again.
+
+For non-managed interrupts the available count is the correct selection
+criterion because the guaranteed reservations need to be taken into
+account. Using the allocated counter could lead to a failing allocation in
+the following situation (total vector space of 10 assumed):
+
+                CPU0   CPU1
+ available:        2      0
+ allocated:        5      3   <--- CPU1 is selected, but available space = 0
+ managed reserved:  3     7
+
+ while available yields the correct result.
+
+For managed interrupts the available count is not the appropriate
+selection criterion because as explained above the available count is not
+affected by the actual vector allocation.
+
+The following example illustrates that. Total vector space of 10
+assumed. The starting point is:
+
+                CPU0   CPU1
+ available:        5      4
+ allocated:        2      3
+ managed reserved:  3     3
+
+ Allocating vectors for three non-managed interrupts will result in
+ affinitizing the first two to CPU0 and the third one to CPU1 because the
+ available count is adjusted with each allocation:
+
+                 CPU0  CPU1
+ available:         5     4    <- Select CPU0 for 1st allocation
+ --> allocated:             3     3
+
+ available:         4     4    <- Select CPU0 for 2nd allocation
+ --> allocated:             4     3
+
+ available:         3     4    <- Select CPU1 for 3rd allocation
+ --> allocated:             4     4
+
+ But the allocation of three managed interrupts starting from the same
+ point will affinitize all of them to CPU0 because the available count is
+ not affected by the allocation (see above). So the end result is:
+
+                 CPU0  CPU1
+ available:         5     4
+ allocated:         5     3
+
+Introduce a "managed_allocated" field in struct cpumap to track the vector
+allocation for managed interrupts separately. Use this information to
+select the target CPU when a vector is allocated for a managed interrupt,
+which results in more evenly distributed vector assignments. The above
+example results in the following allocations:
+
+                CPU0   CPU1
+ managed_allocated: 0     0    <- Select CPU0 for 1st allocation
+ --> allocated:            3      3
+
+ managed_allocated: 1     0    <- Select CPU1 for 2nd allocation
+ --> allocated:            3      4
+
+ managed_allocated: 1     1    <- Select CPU0 for 3rd allocation
+ --> allocated:            4      4
+
+The allocation of non-managed interrupts is not affected by this change and
+is still evaluating the available count.
+
+The overall distribution of interrupt vectors for both types of interrupts
+might still not be perfectly even depending on the number of non-managed
+and managed interrupts in a system, but due to the reservation guarantee
+for managed interrupts this cannot be avoided.
+
+Expose the new field in debugfs as well.
+
+[ tglx: Clarified the background of the problem in the changelog and
+       described it independent of NVME ]
+
+Signed-off-by: Long Li <longli@microsoft.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: Michael Kelley <mikelley@microsoft.com>
+Link: https://lkml.kernel.org/r/20181106040000.27316-1-longli@linuxonhyperv.com
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ kernel/irq/matrix.c | 34 ++++++++++++++++++++++++++++++----
+ 1 file changed, 30 insertions(+), 4 deletions(-)
+
+diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
+index 6e6d467f3dec5..92337703ca9fd 100644
+--- a/kernel/irq/matrix.c
++++ b/kernel/irq/matrix.c
+@@ -14,6 +14,7 @@ struct cpumap {
+       unsigned int            available;
+       unsigned int            allocated;
+       unsigned int            managed;
++      unsigned int            managed_allocated;
+       bool                    initialized;
+       bool                    online;
+       unsigned long           alloc_map[IRQ_MATRIX_SIZE];
+@@ -145,6 +146,27 @@ static unsigned int matrix_find_best_cpu(struct irq_matrix *m,
+       return best_cpu;
+ }
+ 
++/* Find the best CPU which has the lowest number of managed IRQs allocated */
++static unsigned int matrix_find_best_cpu_managed(struct irq_matrix *m,
++                                              const struct cpumask *msk)
++{
++      unsigned int cpu, best_cpu, allocated = UINT_MAX;
++      struct cpumap *cm;
++
++      best_cpu = UINT_MAX;
++
++      for_each_cpu(cpu, msk) {
++              cm = per_cpu_ptr(m->maps, cpu);
++
++              if (!cm->online || cm->managed_allocated > allocated)
++                      continue;
++
++              best_cpu = cpu;
++              allocated = cm->managed_allocated;
++      }
++      return best_cpu;
++}
++
+ /**
+  * irq_matrix_assign_system - Assign system wide entry in the matrix
+  * @m:                Matrix pointer
+@@ -269,7 +291,7 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
+       if (cpumask_empty(msk))
+               return -EINVAL;
+ 
+-      cpu = matrix_find_best_cpu(m, msk);
++      cpu = matrix_find_best_cpu_managed(m, msk);
+       if (cpu == UINT_MAX)
+               return -ENOSPC;
+ 
+@@ -282,6 +304,7 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
+               return -ENOSPC;
+       set_bit(bit, cm->alloc_map);
+       cm->allocated++;
++      cm->managed_allocated++;
+       m->total_allocated++;
+       *mapped_cpu = cpu;
+       trace_irq_matrix_alloc_managed(bit, cpu, m, cm);
+@@ -395,6 +418,8 @@ void irq_matrix_free(struct irq_matrix *m, unsigned int cpu,
+ 
+       clear_bit(bit, cm->alloc_map);
+       cm->allocated--;
++      if(managed)
++              cm->managed_allocated--;
+ 
+       if (cm->online)
+               m->total_allocated--;
+@@ -464,13 +489,14 @@ void irq_matrix_debug_show(struct seq_file *sf, struct irq_matrix *m, int ind)
+       seq_printf(sf, "Total allocated:  %6u\n", m->total_allocated);
+       seq_printf(sf, "System: %u: %*pbl\n", nsys, m->matrix_bits,
+                  m->system_map);
+-      seq_printf(sf, "%*s| CPU | avl | man | act | vectors\n", ind, " ");
++      seq_printf(sf, "%*s| CPU | avl | man | mac | act | vectors\n", ind, " ");
+       cpus_read_lock();
+       for_each_online_cpu(cpu) {
+               struct cpumap *cm = per_cpu_ptr(m->maps, cpu);
+ 
+-              seq_printf(sf, "%*s %4d  %4u  %4u  %4u  %*pbl\n", ind, " ",
+-                         cpu, cm->available, cm->managed, cm->allocated,
++              seq_printf(sf, "%*s %4d  %4u  %4u  %4u %4u  %*pbl\n", ind, " ",
++                         cpu, cm->available, cm->managed,
++                         cm->managed_allocated, cm->allocated,
+                          m->matrix_bits, cm->alloc_map);
+       }
+       cpus_read_unlock();
+-- 
+2.19.1
+
diff --git a/queue-4.19/irq-matrix-split-out-the-cpu-selection-code-into-a-h.patch b/queue-4.19/irq-matrix-split-out-the-cpu-selection-code-into-a-h.patch

new file mode 100644 (file)

index 0000000..3f3a503
--- /dev/null
+++ b/queue-4.19/irq-matrix-split-out-the-cpu-selection-code-into-a-h.patch
@@ -0,0 +1,113 @@
+From 504302562cb34ba1a9b73753f9735da29d8f5ef2 Mon Sep 17 00:00:00 2001
+From: Dou Liyang <douly.fnst@cn.fujitsu.com>
+Date: Sun, 9 Sep 2018 01:58:37 +0800
+Subject: irq/matrix: Split out the CPU selection code into a helper
+
+[ Upstream commit 8ffe4e61c06a48324cfd97f1199bb9838acce2f2 ]
+
+Linux finds the CPU which has the lowest vector allocation count to spread
+out the non managed interrupts across the possible target CPUs, but does
+not do so for managed interrupts.
+
+Split out the CPU selection code into a helper function for reuse. No
+functional change.
+
+Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: hpa@zytor.com
+Link: https://lkml.kernel.org/r/20180908175838.14450-1-dou_liyang@163.com
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ kernel/irq/matrix.c | 65 ++++++++++++++++++++++++++-------------------
+ 1 file changed, 38 insertions(+), 27 deletions(-)
+
+diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
+index 5092494bf2614..67768bbe736ed 100644
+--- a/kernel/irq/matrix.c
++++ b/kernel/irq/matrix.c
+@@ -124,6 +124,27 @@ static unsigned int matrix_alloc_area(struct irq_matrix *m, struct cpumap *cm,
+       return area;
+ }
+ 
++/* Find the best CPU which has the lowest vector allocation count */
++static unsigned int matrix_find_best_cpu(struct irq_matrix *m,
++                                      const struct cpumask *msk)
++{
++      unsigned int cpu, best_cpu, maxavl = 0;
++      struct cpumap *cm;
++
++      best_cpu = UINT_MAX;
++
++      for_each_cpu(cpu, msk) {
++              cm = per_cpu_ptr(m->maps, cpu);
++
++              if (!cm->online || cm->available <= maxavl)
++                      continue;
++
++              best_cpu = cpu;
++              maxavl = cm->available;
++      }
++      return best_cpu;
++}
++
+ /**
+  * irq_matrix_assign_system - Assign system wide entry in the matrix
+  * @m:                Matrix pointer
+@@ -322,37 +343,27 @@ void irq_matrix_remove_reserved(struct irq_matrix *m)
+ int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
+                    bool reserved, unsigned int *mapped_cpu)
+ {
+-      unsigned int cpu, best_cpu, maxavl = 0;
++      unsigned int cpu, bit;
+       struct cpumap *cm;
+-      unsigned int bit;
+ 
+-      best_cpu = UINT_MAX;
+-      for_each_cpu(cpu, msk) {
+-              cm = per_cpu_ptr(m->maps, cpu);
+-
+-              if (!cm->online || cm->available <= maxavl)
+-                      continue;
++      cpu = matrix_find_best_cpu(m, msk);
++      if (cpu == UINT_MAX)
++              return -ENOSPC;
+ 
+-              best_cpu = cpu;
+-              maxavl = cm->available;
+-      }
++      cm = per_cpu_ptr(m->maps, cpu);
++      bit = matrix_alloc_area(m, cm, 1, false);
++      if (bit >= m->alloc_end)
++              return -ENOSPC;
++      cm->allocated++;
++      cm->available--;
++      m->total_allocated++;
++      m->global_available--;
++      if (reserved)
++              m->global_reserved--;
++      *mapped_cpu = cpu;
++      trace_irq_matrix_alloc(bit, cpu, m, cm);
++      return bit;
+ 
+-      if (maxavl) {
+-              cm = per_cpu_ptr(m->maps, best_cpu);
+-              bit = matrix_alloc_area(m, cm, 1, false);
+-              if (bit < m->alloc_end) {
+-                      cm->allocated++;
+-                      cm->available--;
+-                      m->total_allocated++;
+-                      m->global_available--;
+-                      if (reserved)
+-                              m->global_reserved--;
+-                      *mapped_cpu = best_cpu;
+-                      trace_irq_matrix_alloc(bit, best_cpu, m, cm);
+-                      return bit;
+-              }
+-      }
+-      return -ENOSPC;
+ }
+ 
+ /**
+-- 
+2.19.1
+
diff --git a/queue-4.19/irq-matrix-spread-managed-interrupts-on-allocation.patch b/queue-4.19/irq-matrix-spread-managed-interrupts-on-allocation.patch

new file mode 100644 (file)

index 0000000..82e82bc
--- /dev/null
+++ b/queue-4.19/irq-matrix-spread-managed-interrupts-on-allocation.patch
@@ -0,0 +1,115 @@
+From ead271d20be11196f16560c385cc132a5a4f1a8a Mon Sep 17 00:00:00 2001
+From: Dou Liyang <douly.fnst@cn.fujitsu.com>
+Date: Sun, 9 Sep 2018 01:58:38 +0800
+Subject: irq/matrix: Spread managed interrupts on allocation
+
+[ Upstream commit 76f99ae5b54d48430d1f0c5512a84da0ff9761e0 ]
+
+Linux spreads out the non managed interrupt across the possible target CPUs
+to avoid vector space exhaustion.
+
+Managed interrupts are treated differently, as for them the vectors are
+reserved (with guarantee) when the interrupt descriptors are initialized.
+
+When the interrupt is requested a real vector is assigned. The assignment
+logic uses the first CPU in the affinity mask for assignment. If the
+interrupt has more than one CPU in the affinity mask, which happens when a
+multi queue device has less queues than CPUs, then doing the same search as
+for non managed interrupts makes sense as it puts the interrupt on the
+least interrupt plagued CPU. For single CPU affine vectors that's obviously
+a NOOP.
+
+Restructre the matrix allocation code so it does the 'best CPU' search, add
+the sanity check for an empty affinity mask and adapt the call site in the
+x86 vector management code.
+
+[ tglx: Added the empty mask check to the core and improved change log ]
+
+Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Cc: hpa@zytor.com
+Link: https://lkml.kernel.org/r/20180908175838.14450-2-dou_liyang@163.com
+Signed-off-by: Sasha Levin <sashal@kernel.org>
+---
+ arch/x86/kernel/apic/vector.c |  9 ++++-----
+ include/linux/irq.h           |  3 ++-
+ kernel/irq/matrix.c           | 17 ++++++++++++++---
+ 3 files changed, 20 insertions(+), 9 deletions(-)
+
+diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
+index 7654febd51027..652e7ffa9b9de 100644
+--- a/arch/x86/kernel/apic/vector.c
++++ b/arch/x86/kernel/apic/vector.c
+@@ -313,14 +313,13 @@ assign_managed_vector(struct irq_data *irqd, const struct cpumask *dest)
+       struct apic_chip_data *apicd = apic_chip_data(irqd);
+       int vector, cpu;
+ 
+-      cpumask_and(vector_searchmask, vector_searchmask, affmsk);
+-      cpu = cpumask_first(vector_searchmask);
+-      if (cpu >= nr_cpu_ids)
+-              return -EINVAL;
++      cpumask_and(vector_searchmask, dest, affmsk);
++
+       /* set_affinity might call here for nothing */
+       if (apicd->vector && cpumask_test_cpu(apicd->cpu, vector_searchmask))
+               return 0;
+-      vector = irq_matrix_alloc_managed(vector_matrix, cpu);
++      vector = irq_matrix_alloc_managed(vector_matrix, vector_searchmask,
++                                        &cpu);
+       trace_vector_alloc_managed(irqd->irq, vector, vector);
+       if (vector < 0)
+               return vector;
+diff --git a/include/linux/irq.h b/include/linux/irq.h
+index 201de12a99571..c9bffda04a450 100644
+--- a/include/linux/irq.h
++++ b/include/linux/irq.h
+@@ -1151,7 +1151,8 @@ void irq_matrix_offline(struct irq_matrix *m);
+ void irq_matrix_assign_system(struct irq_matrix *m, unsigned int bit, bool replace);
+ int irq_matrix_reserve_managed(struct irq_matrix *m, const struct cpumask *msk);
+ void irq_matrix_remove_managed(struct irq_matrix *m, const struct cpumask *msk);
+-int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned int cpu);
++int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
++                              unsigned int *mapped_cpu);
+ void irq_matrix_reserve(struct irq_matrix *m);
+ void irq_matrix_remove_reserved(struct irq_matrix *m);
+ int irq_matrix_alloc(struct irq_matrix *m, const struct cpumask *msk,
+diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
+index 67768bbe736ed..6e6d467f3dec5 100644
+--- a/kernel/irq/matrix.c
++++ b/kernel/irq/matrix.c
+@@ -260,11 +260,21 @@ void irq_matrix_remove_managed(struct irq_matrix *m, const struct cpumask *msk)
+  * @m:                Matrix pointer
+  * @cpu:      On which CPU the interrupt should be allocated
+  */
+-int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned int cpu)
++int irq_matrix_alloc_managed(struct irq_matrix *m, const struct cpumask *msk,
++                           unsigned int *mapped_cpu)
+ {
+-      struct cpumap *cm = per_cpu_ptr(m->maps, cpu);
+-      unsigned int bit, end = m->alloc_end;
++      unsigned int bit, cpu, end = m->alloc_end;
++      struct cpumap *cm;
++
++      if (cpumask_empty(msk))
++              return -EINVAL;
+ 
++      cpu = matrix_find_best_cpu(m, msk);
++      if (cpu == UINT_MAX)
++              return -ENOSPC;
++
++      cm = per_cpu_ptr(m->maps, cpu);
++      end = m->alloc_end;
+       /* Get managed bit which are not allocated */
+       bitmap_andnot(m->scratch_map, cm->managed_map, cm->alloc_map, end);
+       bit = find_first_bit(m->scratch_map, end);
+@@ -273,6 +283,7 @@ int irq_matrix_alloc_managed(struct irq_matrix *m, unsigned int cpu)
+       set_bit(bit, cm->alloc_map);
+       cm->allocated++;
+       m->total_allocated++;
++      *mapped_cpu = cpu;
+       trace_irq_matrix_alloc_managed(bit, cpu, m, cm);
+       return bit;
+ }
+-- 
+2.19.1
+
diff --git a/queue-4.19/series b/queue-4.19/series

index e6455f04afbe6e414a0ea5d8c064771718c1be8b..856fd25ed1bbb4d43ad29b55927e9c0b7b9033eb 100644 (file)
--- a/queue-4.19/series
+++ b/queue-4.19/series
@@ -150,3 +150,6 @@ netfilter-nfnetlink_osf-add-missing-fmatch-check.patch
  netfilter-ipt_clusterip-fix-sleep-in-atomic-bug-in-clusterip_config_entry_put.patch
  udlfb-handle-unplug-properly.patch
  pinctrl-max77620-use-define-directive-for-max77620_pinconf_param-values.patch
+irq-matrix-split-out-the-cpu-selection-code-into-a-h.patch
+irq-matrix-spread-managed-interrupts-on-allocation.patch
+genirq-matrix-improve-target-cpu-selection-for-manag.patch
author	Sasha Levin <sashal@kernel.org>
	Tue, 26 Feb 2019 00:49:56 +0000 (19:49 -0500)
committer	Sasha Levin <sashal@kernel.org>
	Tue, 26 Feb 2019 00:49:56 +0000 (19:49 -0500)
queue-4.19/genirq-matrix-improve-target-cpu-selection-for-manag.patch	[new file with mode: 0644]	patch \| blob
queue-4.19/irq-matrix-split-out-the-cpu-selection-code-into-a-h.patch	[new file with mode: 0644]	patch \| blob
queue-4.19/irq-matrix-spread-managed-interrupts-on-allocation.patch	[new file with mode: 0644]	patch \| blob
queue-4.19/series		patch \| blob \| blame \| history