--- /dev/null
+From 8a540e990d7da36813cb71a4a422712bfba448a4 Mon Sep 17 00:00:00 2001
+From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
+Date: Sat, 7 Oct 2023 01:14:21 -0400
+Subject: btrfs: fix stripe length calculation for non-zoned data chunk allocation
+
+From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
+
+commit 8a540e990d7da36813cb71a4a422712bfba448a4 upstream.
+
+Commit f6fca3917b4d "btrfs: store chunk size in space-info struct"
+broke data chunk allocations on non-zoned multi-device filesystems when
+using default chunk_size. Commit 5da431b71d4b "btrfs: fix the max chunk
+size and stripe length calculation" partially fixed that, and this patch
+completes the fix for that case.
+
+After commit f6fca3917b4d and 5da431b71d4b, the sequence of events for
+a data chunk allocation on a non-zoned filesystem is:
+
+ 1. btrfs_create_chunk calls init_alloc_chunk_ctl, which copies
+ space_info->chunk_size (default 10 GiB) to ctl->max_stripe_len
+ unmodified. Before f6fca3917b4d, ctl->max_stripe_len value was
+ 1 GiB for non-zoned data chunks and not configurable.
+
+ 2. btrfs_create_chunk calls gather_device_info which consumes
+ and produces more fields of chunk_ctl.
+
+ 3. gather_device_info multiplies ctl->max_stripe_len by
+ ctl->dev_stripes (which is 1 in all cases except dup)
+ and calls find_free_dev_extent with that number as num_bytes.
+
+ 4. find_free_dev_extent locates the first dev_extent hole on
+ a device which is at least as large as num_bytes. With default
+ max_chunk_size from f6fca3917b4d, it finds the first hole which is
+ longer than 10 GiB, or the largest hole if that hole is shorter
+ than 10 GiB. This is different from the pre-f6fca3917b4d
+ behavior, where num_bytes is 1 GiB, and find_free_dev_extent
+ may choose a different hole.
+
+ 5. gather_device_info repeats step 4 with all devices to find
+ the first or largest dev_extent hole that can be allocated on
+ each device.
+
+ 6. gather_device_info sorts the device list by the hole size
+ on each device, using total unallocated space on each device to
+ break ties, then returns to btrfs_create_chunk with the list.
+
+ 7. btrfs_create_chunk calls decide_stripe_size_regular.
+
+ 8. decide_stripe_size_regular finds the largest stripe_len that
+ fits across the first nr_devs device dev_extent holes that were
+ found by gather_device_info (and satisfies other constraints
+ on stripe_len that are not relevant here).
+
+ 9. decide_stripe_size_regular caps the length of the stripe it
+ computed at 1 GiB. This cap appeared in 5da431b71d4b to correct
+ one of the other regressions introduced in f6fca3917b4d.
+
+ 10. btrfs_create_chunk creates a new chunk with the above
+ computed size and number of devices.
+
+At step 4, gather_device_info() has found a location where stripe up to
+10 GiB in length could be allocated on several devices, and selected
+which devices should have a dev_extent allocated on them, but at step
+9, only 1 GiB of the space that was found on each device can be used.
+This mismatch causes new suboptimal chunk allocation cases that did not
+occur in pre-f6fca3917b4d kernels.
+
+Consider a filesystem using raid1 profile with 3 devices. After some
+balances, device 1 has 10x 1 GiB unallocated space, while devices 2
+and 3 have 1x 10 GiB unallocated space, i.e. the same total amount of
+space, but distributed across different numbers of dev_extent holes.
+For visualization, let's ignore all the chunks that were allocated before
+this point, and focus on the remaining holes:
+
+ Device 1: [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10x 1 GiB unallocated)
+ Device 2: [__________] (10 GiB contig unallocated)
+ Device 3: [__________] (10 GiB contig unallocated)
+
+Before f6fca3917b4d, the allocator would fill these optimally by
+allocating chunks with dev_extents on devices 1 and 2 ([12]), 1 and 3
+([13]), or 2 and 3 ([23]):
+
+ [after 0 chunk allocations]
+ Device 1: [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
+ Device 2: [__________] (10 GiB)
+ Device 3: [__________] (10 GiB)
+
+ [after 1 chunk allocation]
+ Device 1: [12] [_] [_] [_] [_] [_] [_] [_] [_] [_]
+ Device 2: [12] [_________] (9 GiB)
+ Device 3: [__________] (10 GiB)
+
+ [after 2 chunk allocations]
+ Device 1: [12] [13] [_] [_] [_] [_] [_] [_] [_] [_] (8 GiB)
+ Device 2: [12] [_________] (9 GiB)
+ Device 3: [13] [_________] (9 GiB)
+
+ [after 3 chunk allocations]
+ Device 1: [12] [13] [12] [_] [_] [_] [_] [_] [_] [_] (7 GiB)
+ Device 2: [12] [12] [________] (8 GiB)
+ Device 3: [13] [_________] (9 GiB)
+
+ [...]
+
+ [after 12 chunk allocations]
+ Device 1: [12] [13] [12] [13] [12] [13] [12] [13] [_] [_] (2 GiB)
+ Device 2: [12] [12] [23] [23] [12] [12] [23] [23] [__] (2 GiB)
+ Device 3: [13] [13] [23] [23] [13] [23] [13] [23] [__] (2 GiB)
+
+ [after 13 chunk allocations]
+ Device 1: [12] [13] [12] [13] [12] [13] [12] [13] [12] [_] (1 GiB)
+ Device 2: [12] [12] [23] [23] [12] [12] [23] [23] [12] [_] (1 GiB)
+ Device 3: [13] [13] [23] [23] [13] [23] [13] [23] [__] (2 GiB)
+
+ [after 14 chunk allocations]
+ Device 1: [12] [13] [12] [13] [12] [13] [12] [13] [12] [13] (full)
+ Device 2: [12] [12] [23] [23] [12] [12] [23] [23] [12] [_] (1 GiB)
+ Device 3: [13] [13] [23] [23] [13] [23] [13] [23] [13] [_] (1 GiB)
+
+ [after 15 chunk allocations]
+ Device 1: [12] [13] [12] [13] [12] [13] [12] [13] [12] [13] (full)
+ Device 2: [12] [12] [23] [23] [12] [12] [23] [23] [12] [23] (full)
+ Device 3: [13] [13] [23] [23] [13] [23] [13] [23] [13] [23] (full)
+
+This allocates all of the space with no waste. The sorting function used
+by gather_device_info considers free space holes above 1 GiB in length
+to be equal to 1 GiB, so once find_free_dev_extent locates a sufficiently
+long hole on each device, all the holes appear equal in the sort, and the
+comparison falls back to sorting devices by total free space. This keeps
+usable space on each device equal so they can all be filled completely.
+
+After f6fca3917b4d, the allocator prefers the devices with larger holes
+over the devices with more free space, so it makes bad allocation choices:
+
+ [after 1 chunk allocation]
+ Device 1: [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
+ Device 2: [23] [_________] (9 GiB)
+ Device 3: [23] [_________] (9 GiB)
+
+ [after 2 chunk allocations]
+ Device 1: [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
+ Device 2: [23] [23] [________] (8 GiB)
+ Device 3: [23] [23] [________] (8 GiB)
+
+ [after 3 chunk allocations]
+ Device 1: [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
+ Device 2: [23] [23] [23] [_______] (7 GiB)
+ Device 3: [23] [23] [23] [_______] (7 GiB)
+
+ [...]
+
+ [after 9 chunk allocations]
+ Device 1: [_] [_] [_] [_] [_] [_] [_] [_] [_] [_] (10 GiB)
+ Device 2: [23] [23] [23] [23] [23] [23] [23] [23] [23] [_] (1 GiB)
+ Device 3: [23] [23] [23] [23] [23] [23] [23] [23] [23] [_] (1 GiB)
+
+ [after 10 chunk allocations]
+ Device 1: [12] [_] [_] [_] [_] [_] [_] [_] [_] [_] (9 GiB)
+ Device 2: [23] [23] [23] [23] [23] [23] [23] [23] [12] (full)
+ Device 3: [23] [23] [23] [23] [23] [23] [23] [23] [_] (1 GiB)
+
+ [after 11 chunk allocations]
+ Device 1: [12] [13] [_] [_] [_] [_] [_] [_] [_] [_] (8 GiB)
+ Device 2: [23] [23] [23] [23] [23] [23] [23] [23] [12] (full)
+ Device 3: [23] [23] [23] [23] [23] [23] [23] [23] [13] (full)
+
+No further allocations are possible, with 8 GiB wasted (4 GiB of data
+space). The sort in gather_device_info now considers free space in
+holes longer than 1 GiB to be distinct, so it will prefer devices 2 and
+3 over device 1 until all but 1 GiB is allocated on devices 2 and 3.
+At that point, with only 1 GiB unallocated on every device, the largest
+hole length on each device is equal at 1 GiB, so the sort finally moves
+to ordering the devices with the most free space, but by this time it
+is too late to make use of the free space on device 1.
+
+Note that it's possible to contrive a case where the pre-f6fca3917b4d
+allocator fails the same way, but these cases generally have extensive
+dev_extent fragmentation as a precondition (e.g. many holes of 768M
+in length on one device, and few holes 1 GiB in length on the others).
+With the regression in f6fca3917b4d, bad chunk allocation can occur even
+under optimal conditions, when all dev_extent holes are exact multiples
+of stripe_len in length, as in the example above.
+
+Also note that post-f6fca3917b4d kernels do treat dev_extent holes
+larger than 10 GiB as equal, so the bad behavior won't show up on a
+freshly formatted filesystem; however, as the filesystem ages and fills
+up, and holes ranging from 1 GiB to 10 GiB in size appear, the problem
+can show up as a failure to balance after adding or removing devices,
+or an unexpected shortfall in available space due to unequal allocation.
+
+To fix the regression and make data chunk allocation work
+again, set ctl->max_stripe_len back to the original SZ_1G, or
+space_info->chunk_size if that's smaller (the latter can happen if the
+user set space_info->chunk_size to less than 1 GiB via sysfs, or it's
+a 32 MiB system chunk with a hardcoded chunk_size and stripe_len).
+
+While researching the background of the earlier commits, I found that an
+identical fix was already proposed at:
+
+ https://lore.kernel.org/linux-btrfs/de83ac46-a4a3-88d3-85ce-255b7abc5249@gmx.com/
+
+The previous review missed one detail: ctl->max_stripe_len is used
+before decide_stripe_size_regular() is called, when it is too late for
+the changes in that function to have any effect. ctl->max_stripe_len is
+not used directly by decide_stripe_size_regular(), but the parameter
+does heavily influence the per-device free space data presented to
+the function.
+
+Fixes: f6fca3917b4d ("btrfs: store chunk size in space-info struct")
+CC: stable@vger.kernel.org # 6.1+
+Link: https://lore.kernel.org/linux-btrfs/20231007051421.19657-1-ce3g8jdj@umail.furryterror.org/
+Reviewed-by: Qu Wenruo <wqu@suse.com>
+Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ fs/btrfs/volumes.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/fs/btrfs/volumes.c
++++ b/fs/btrfs/volumes.c
+@@ -5139,7 +5139,7 @@ static void init_alloc_chunk_ctl_policy_
+ ASSERT(space_info);
+
+ ctl->max_chunk_size = READ_ONCE(space_info->chunk_size);
+- ctl->max_stripe_size = ctl->max_chunk_size;
++ ctl->max_stripe_size = min_t(u64, ctl->max_chunk_size, SZ_1G);
+
+ if (ctl->type & BTRFS_BLOCK_GROUP_SYSTEM)
+ ctl->devs_max = min_t(int, ctl->devs_max, BTRFS_MAX_DEVS_SYS_CHUNK);
--- /dev/null
+From 0288c3e709e5fabd51e84715c5c798a02f43061a Mon Sep 17 00:00:00 2001
+From: Jesse Brandeburg <jesse.brandeburg@intel.com>
+Date: Wed, 11 Oct 2023 16:33:33 -0700
+Subject: ice: reset first in crash dump kernels
+
+From: Jesse Brandeburg <jesse.brandeburg@intel.com>
+
+commit 0288c3e709e5fabd51e84715c5c798a02f43061a upstream.
+
+When the system boots into the crash dump kernel after a panic, the ice
+networking device may still have pending transactions that can cause errors
+or machine checks when the device is re-enabled. This can prevent the crash
+dump kernel from loading the driver or collecting the crash data.
+
+To avoid this issue, perform a function level reset (FLR) on the ice device
+via PCIe config space before enabling it on the crash kernel. This will
+clear any outstanding transactions and stop all queues and interrupts.
+Restore the config space after the FLR, otherwise it was found in testing
+that the driver wouldn't load successfully.
+
+The following sequence causes the original issue:
+- Load the ice driver with modprobe ice
+- Enable SR-IOV with 2 VFs: echo 2 > /sys/class/net/eth0/device/sriov_num_vfs
+- Trigger a crash with echo c > /proc/sysrq-trigger
+- Load the ice driver again (or let it load automatically) with modprobe ice
+- The system crashes again during pcim_enable_device()
+
+Fixes: 837f08fdecbe ("ice: Add basic driver framework for Intel(R) E800 Series")
+Reported-by: Vishal Agrawal <vagrawal@redhat.com>
+Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
+Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
+Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
+Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
+Link: https://lore.kernel.org/r/20231011233334.336092-3-jacob.e.keller@intel.com
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/net/ethernet/intel/ice/ice_main.c | 15 +++++++++++++++
+ 1 file changed, 15 insertions(+)
+
+--- a/drivers/net/ethernet/intel/ice/ice_main.c
++++ b/drivers/net/ethernet/intel/ice/ice_main.c
+@@ -6,6 +6,7 @@
+ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+ #include <generated/utsrelease.h>
++#include <linux/crash_dump.h>
+ #include "ice.h"
+ #include "ice_base.h"
+ #include "ice_lib.h"
+@@ -4681,6 +4682,20 @@ ice_probe(struct pci_dev *pdev, const st
+ return -EINVAL;
+ }
+
++ /* when under a kdump kernel initiate a reset before enabling the
++ * device in order to clear out any pending DMA transactions. These
++ * transactions can cause some systems to machine check when doing
++ * the pcim_enable_device() below.
++ */
++ if (is_kdump_kernel()) {
++ pci_save_state(pdev);
++ pci_clear_master(pdev);
++ err = pcie_flr(pdev);
++ if (err)
++ return err;
++ pci_restore_state(pdev);
++ }
++
+ /* this driver uses devres, see
+ * Documentation/driver-api/driver-model/devres.rst
+ */
--- /dev/null
+From a16eb25b09c02a54c1c1b449d4b6cfa2cf3f013a Mon Sep 17 00:00:00 2001
+From: Jim Mattson <jmattson@google.com>
+Date: Mon, 25 Sep 2023 17:34:47 +0000
+Subject: KVM: x86: Mask LVTPC when handling a PMI
+
+From: Jim Mattson <jmattson@google.com>
+
+commit a16eb25b09c02a54c1c1b449d4b6cfa2cf3f013a upstream.
+
+Per the SDM, "When the local APIC handles a performance-monitoring
+counters interrupt, it automatically sets the mask flag in the LVT
+performance counter register." Add this behavior to KVM's local APIC
+emulation.
+
+Failure to mask the LVTPC entry results in spurious PMIs, e.g. when
+running Linux as a guest, PMI handlers that do a "late_ack" spew a large
+number of "dazed and confused" spurious NMI warnings.
+
+Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
+Cc: stable@vger.kernel.org
+Signed-off-by: Jim Mattson <jmattson@google.com>
+Tested-by: Mingwei Zhang <mizhang@google.com>
+Signed-off-by: Mingwei Zhang <mizhang@google.com>
+Link: https://lore.kernel.org/r/20230925173448.3518223-3-mizhang@google.com
+[sean: massage changelog, correct Fixes]
+Signed-off-by: Sean Christopherson <seanjc@google.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ arch/x86/kvm/lapic.c | 8 ++++++--
+ 1 file changed, 6 insertions(+), 2 deletions(-)
+
+--- a/arch/x86/kvm/lapic.c
++++ b/arch/x86/kvm/lapic.c
+@@ -2535,13 +2535,17 @@ int kvm_apic_local_deliver(struct kvm_la
+ {
+ u32 reg = kvm_lapic_get_reg(apic, lvt_type);
+ int vector, mode, trig_mode;
++ int r;
+
+ if (kvm_apic_hw_enabled(apic) && !(reg & APIC_LVT_MASKED)) {
+ vector = reg & APIC_VECTOR_MASK;
+ mode = reg & APIC_MODE_MASK;
+ trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
+- return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
+- NULL);
++
++ r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
++ if (r && lvt_type == APIC_LVTPC)
++ kvm_lapic_set_reg(apic, APIC_LVTPC, reg | APIC_LVT_MASKED);
++ return r;
+ }
+ return 0;
+ }