--- /dev/null
+From 3c0696076aad60a2f04c019761921954579e1b0e Mon Sep 17 00:00:00 2001
+From: James Houghton <jthoughton@google.com>
+Date: Mon, 4 Dec 2023 17:26:46 +0000
+Subject: arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
+
+From: James Houghton <jthoughton@google.com>
+
+commit 3c0696076aad60a2f04c019761921954579e1b0e upstream.
+
+It is currently possible for a userspace application to enter an
+infinite page fault loop when using HugeTLB pages implemented with
+contiguous PTEs when HAFDBS is not available. This happens because:
+
+1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean
+ (PTE_DIRTY | PTE_RDONLY | PTE_WRITE).
+
+2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling
+ the memory access on a system without HAFDBS, we will get a page
+ fault.
+
+3. HugeTLB will check if it needs to update the dirty bits on the PTE.
+ For contiguous PTEs, it will check to see if the pgprot bits need
+ updating. In this case, HugeTLB wants to write a sequence of
+ sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about
+ to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()),
+ so it thinks no update is necessary.
+
+We can get the kernel to write a sw-dirty, hw-clean PTE with the
+following steps (showing the relevant VMA flags and pgprot bits):
+
+i. Create a valid, writable contiguous PTE.
+ VMA vmflags: VM_SHARED | VM_READ | VM_WRITE
+ VMA pgprot bits: PTE_RDONLY | PTE_WRITE
+ PTE pgprot bits: PTE_DIRTY | PTE_WRITE
+
+ii. mprotect the VMA to PROT_NONE.
+ VMA vmflags: VM_SHARED
+ VMA pgprot bits: PTE_RDONLY
+ PTE pgprot bits: PTE_DIRTY | PTE_RDONLY
+
+iii. mprotect the VMA back to PROT_READ | PROT_WRITE.
+ VMA vmflags: VM_SHARED | VM_READ | VM_WRITE
+ VMA pgprot bits: PTE_RDONLY | PTE_WRITE
+ PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY
+
+Make it impossible to create a writeable sw-dirty, hw-clean PTE with
+pte_modify(). Such a PTE should be impossible to create, and there may
+be places that assume that pte_dirty() implies pte_hw_dirty().
+
+Signed-off-by: James Houghton <jthoughton@google.com>
+Fixes: 031e6e6b4e12 ("arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags")
+Cc: <stable@vger.kernel.org>
+Acked-by: Will Deacon <will@kernel.org>
+Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
+Link: https://lore.kernel.org/r/20231204172646.2541916-3-jthoughton@google.com
+Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ arch/arm64/include/asm/pgtable.h | 6 ++++++
+ 1 file changed, 6 insertions(+)
+
+--- a/arch/arm64/include/asm/pgtable.h
++++ b/arch/arm64/include/asm/pgtable.h
+@@ -826,6 +826,12 @@ static inline pte_t pte_modify(pte_t pte
+ pte = set_pte_bit(pte, __pgprot(PTE_DIRTY));
+
+ pte_val(pte) = (pte_val(pte) & ~mask) | (pgprot_val(newprot) & mask);
++ /*
++ * If we end up clearing hw dirtiness for a sw-dirty PTE, set hardware
++ * dirtiness again.
++ */
++ if (pte_sw_dirty(pte))
++ pte = pte_mkdirty(pte);
+ return pte;
+ }
+
--- /dev/null
+From a86805504b88f636a6458520d85afdf0634e3c6b Mon Sep 17 00:00:00 2001
+From: Boris Burkov <boris@bur.io>
+Date: Fri, 1 Dec 2023 13:00:12 -0800
+Subject: btrfs: don't clear qgroup reserved bit in release_folio
+
+From: Boris Burkov <boris@bur.io>
+
+commit a86805504b88f636a6458520d85afdf0634e3c6b upstream.
+
+The EXTENT_QGROUP_RESERVED bit is used to "lock" regions of the file for
+duplicate reservations. That is two writes to that range in one
+transaction shouldn't create two reservations, as the reservation will
+only be freed once when the write finally goes down. Therefore, it is
+never OK to clear that bit without freeing the associated qgroup
+reserve. At this point, we don't want to be freeing the reserve, so mask
+off the bit.
+
+CC: stable@vger.kernel.org # 5.15+
+Reviewed-by: Qu Wenruo <wqu@suse.com>
+Signed-off-by: Boris Burkov <boris@bur.io>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ fs/btrfs/extent_io.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+--- a/fs/btrfs/extent_io.c
++++ b/fs/btrfs/extent_io.c
+@@ -2303,7 +2303,8 @@ static int try_release_extent_state(stru
+ ret = 0;
+ } else {
+ u32 clear_bits = ~(EXTENT_LOCKED | EXTENT_NODATASUM |
+- EXTENT_DELALLOC_NEW | EXTENT_CTLBITS);
++ EXTENT_DELALLOC_NEW | EXTENT_CTLBITS |
++ EXTENT_QGROUP_RESERVED);
+
+ /*
+ * At this point we can safely clear everything except the
--- /dev/null
+From 9e65bfca24cf1d77e4a5c7a170db5867377b3fe7 Mon Sep 17 00:00:00 2001
+From: Boris Burkov <boris@bur.io>
+Date: Fri, 1 Dec 2023 13:00:10 -0800
+Subject: btrfs: fix qgroup_free_reserved_data int overflow
+
+From: Boris Burkov <boris@bur.io>
+
+commit 9e65bfca24cf1d77e4a5c7a170db5867377b3fe7 upstream.
+
+The reserved data counter and input parameter is a u64, but we
+inadvertently accumulate it in an int. Overflowing that int results in
+freeing the wrong amount of data and breaking reserve accounting.
+
+Unfortunately, this overflow rot spreads from there, as the qgroup
+release/free functions rely on returning an int to take advantage of
+negative values for error codes.
+
+Therefore, the full fix is to return the "released" or "freed" amount by
+a u64 argument and to return 0 or negative error code via the return
+value.
+
+Most of the call sites simply ignore the return value, though some
+of them handle the error and count the returned bytes. Change all of
+them accordingly.
+
+CC: stable@vger.kernel.org # 6.1+
+Reviewed-by: Qu Wenruo <wqu@suse.com>
+Signed-off-by: Boris Burkov <boris@bur.io>
+Reviewed-by: David Sterba <dsterba@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ fs/btrfs/delalloc-space.c | 2 +-
+ fs/btrfs/file.c | 2 +-
+ fs/btrfs/inode.c | 16 ++++++++--------
+ fs/btrfs/ordered-data.c | 7 ++++---
+ fs/btrfs/qgroup.c | 25 +++++++++++++++----------
+ fs/btrfs/qgroup.h | 4 ++--
+ 6 files changed, 31 insertions(+), 25 deletions(-)
+
+--- a/fs/btrfs/delalloc-space.c
++++ b/fs/btrfs/delalloc-space.c
+@@ -199,7 +199,7 @@ void btrfs_free_reserved_data_space(stru
+ start = round_down(start, fs_info->sectorsize);
+
+ btrfs_free_reserved_data_space_noquota(fs_info, len);
+- btrfs_qgroup_free_data(inode, reserved, start, len);
++ btrfs_qgroup_free_data(inode, reserved, start, len, NULL);
+ }
+
+ /*
+--- a/fs/btrfs/file.c
++++ b/fs/btrfs/file.c
+@@ -3187,7 +3187,7 @@ static long btrfs_fallocate(struct file
+ qgroup_reserved -= range->len;
+ } else if (qgroup_reserved > 0) {
+ btrfs_qgroup_free_data(BTRFS_I(inode), data_reserved,
+- range->start, range->len);
++ range->start, range->len, NULL);
+ qgroup_reserved -= range->len;
+ }
+ list_del(&range->list);
+--- a/fs/btrfs/inode.c
++++ b/fs/btrfs/inode.c
+@@ -687,7 +687,7 @@ out:
+ * And at reserve time, it's always aligned to page size, so
+ * just free one page here.
+ */
+- btrfs_qgroup_free_data(inode, NULL, 0, PAGE_SIZE);
++ btrfs_qgroup_free_data(inode, NULL, 0, PAGE_SIZE, NULL);
+ btrfs_free_path(path);
+ btrfs_end_transaction(trans);
+ return ret;
+@@ -5129,7 +5129,7 @@ static void evict_inode_truncate_pages(s
+ */
+ if (state_flags & EXTENT_DELALLOC)
+ btrfs_qgroup_free_data(BTRFS_I(inode), NULL, start,
+- end - start + 1);
++ end - start + 1, NULL);
+
+ clear_extent_bit(io_tree, start, end,
+ EXTENT_CLEAR_ALL_BITS | EXTENT_DO_ACCOUNTING,
+@@ -8051,7 +8051,7 @@ next:
+ * reserved data space.
+ * Since the IO will never happen for this page.
+ */
+- btrfs_qgroup_free_data(inode, NULL, cur, range_end + 1 - cur);
++ btrfs_qgroup_free_data(inode, NULL, cur, range_end + 1 - cur, NULL);
+ if (!inode_evicting) {
+ clear_extent_bit(tree, cur, range_end, EXTENT_LOCKED |
+ EXTENT_DELALLOC | EXTENT_UPTODATE |
+@@ -9481,7 +9481,7 @@ static struct btrfs_trans_handle *insert
+ struct btrfs_path *path;
+ u64 start = ins->objectid;
+ u64 len = ins->offset;
+- int qgroup_released;
++ u64 qgroup_released = 0;
+ int ret;
+
+ memset(&stack_fi, 0, sizeof(stack_fi));
+@@ -9494,9 +9494,9 @@ static struct btrfs_trans_handle *insert
+ btrfs_set_stack_file_extent_compression(&stack_fi, BTRFS_COMPRESS_NONE);
+ /* Encryption and other encoding is reserved and all 0 */
+
+- qgroup_released = btrfs_qgroup_release_data(inode, file_offset, len);
+- if (qgroup_released < 0)
+- return ERR_PTR(qgroup_released);
++ ret = btrfs_qgroup_release_data(inode, file_offset, len, &qgroup_released);
++ if (ret < 0)
++ return ERR_PTR(ret);
+
+ if (trans) {
+ ret = insert_reserved_file_extent(trans, inode,
+@@ -10391,7 +10391,7 @@ out_delalloc_release:
+ btrfs_delalloc_release_metadata(inode, disk_num_bytes, ret < 0);
+ out_qgroup_free_data:
+ if (ret < 0)
+- btrfs_qgroup_free_data(inode, data_reserved, start, num_bytes);
++ btrfs_qgroup_free_data(inode, data_reserved, start, num_bytes, NULL);
+ out_free_data_space:
+ /*
+ * If btrfs_reserve_extent() succeeded, then we already decremented
+--- a/fs/btrfs/ordered-data.c
++++ b/fs/btrfs/ordered-data.c
+@@ -153,11 +153,12 @@ static struct btrfs_ordered_extent *allo
+ {
+ struct btrfs_ordered_extent *entry;
+ int ret;
++ u64 qgroup_rsv = 0;
+
+ if (flags &
+ ((1 << BTRFS_ORDERED_NOCOW) | (1 << BTRFS_ORDERED_PREALLOC))) {
+ /* For nocow write, we can release the qgroup rsv right now */
+- ret = btrfs_qgroup_free_data(inode, NULL, file_offset, num_bytes);
++ ret = btrfs_qgroup_free_data(inode, NULL, file_offset, num_bytes, &qgroup_rsv);
+ if (ret < 0)
+ return ERR_PTR(ret);
+ } else {
+@@ -165,7 +166,7 @@ static struct btrfs_ordered_extent *allo
+ * The ordered extent has reserved qgroup space, release now
+ * and pass the reserved number for qgroup_record to free.
+ */
+- ret = btrfs_qgroup_release_data(inode, file_offset, num_bytes);
++ ret = btrfs_qgroup_release_data(inode, file_offset, num_bytes, &qgroup_rsv);
+ if (ret < 0)
+ return ERR_PTR(ret);
+ }
+@@ -183,7 +184,7 @@ static struct btrfs_ordered_extent *allo
+ entry->inode = igrab(&inode->vfs_inode);
+ entry->compress_type = compress_type;
+ entry->truncated_len = (u64)-1;
+- entry->qgroup_rsv = ret;
++ entry->qgroup_rsv = qgroup_rsv;
+ entry->flags = flags;
+ refcount_set(&entry->refs, 1);
+ init_waitqueue_head(&entry->wait);
+--- a/fs/btrfs/qgroup.c
++++ b/fs/btrfs/qgroup.c
+@@ -3855,13 +3855,14 @@ int btrfs_qgroup_reserve_data(struct btr
+
+ /* Free ranges specified by @reserved, normally in error path */
+ static int qgroup_free_reserved_data(struct btrfs_inode *inode,
+- struct extent_changeset *reserved, u64 start, u64 len)
++ struct extent_changeset *reserved,
++ u64 start, u64 len, u64 *freed_ret)
+ {
+ struct btrfs_root *root = inode->root;
+ struct ulist_node *unode;
+ struct ulist_iterator uiter;
+ struct extent_changeset changeset;
+- int freed = 0;
++ u64 freed = 0;
+ int ret;
+
+ extent_changeset_init(&changeset);
+@@ -3902,7 +3903,9 @@ static int qgroup_free_reserved_data(str
+ }
+ btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid, freed,
+ BTRFS_QGROUP_RSV_DATA);
+- ret = freed;
++ if (freed_ret)
++ *freed_ret = freed;
++ ret = 0;
+ out:
+ extent_changeset_release(&changeset);
+ return ret;
+@@ -3910,7 +3913,7 @@ out:
+
+ static int __btrfs_qgroup_release_data(struct btrfs_inode *inode,
+ struct extent_changeset *reserved, u64 start, u64 len,
+- int free)
++ u64 *released, int free)
+ {
+ struct extent_changeset changeset;
+ int trace_op = QGROUP_RELEASE;
+@@ -3922,7 +3925,7 @@ static int __btrfs_qgroup_release_data(s
+ /* In release case, we shouldn't have @reserved */
+ WARN_ON(!free && reserved);
+ if (free && reserved)
+- return qgroup_free_reserved_data(inode, reserved, start, len);
++ return qgroup_free_reserved_data(inode, reserved, start, len, released);
+ extent_changeset_init(&changeset);
+ ret = clear_record_extent_bits(&inode->io_tree, start, start + len -1,
+ EXTENT_QGROUP_RESERVED, &changeset);
+@@ -3937,7 +3940,8 @@ static int __btrfs_qgroup_release_data(s
+ btrfs_qgroup_free_refroot(inode->root->fs_info,
+ inode->root->root_key.objectid,
+ changeset.bytes_changed, BTRFS_QGROUP_RSV_DATA);
+- ret = changeset.bytes_changed;
++ if (released)
++ *released = changeset.bytes_changed;
+ out:
+ extent_changeset_release(&changeset);
+ return ret;
+@@ -3956,9 +3960,10 @@ out:
+ * NOTE: This function may sleep for memory allocation.
+ */
+ int btrfs_qgroup_free_data(struct btrfs_inode *inode,
+- struct extent_changeset *reserved, u64 start, u64 len)
++ struct extent_changeset *reserved,
++ u64 start, u64 len, u64 *freed)
+ {
+- return __btrfs_qgroup_release_data(inode, reserved, start, len, 1);
++ return __btrfs_qgroup_release_data(inode, reserved, start, len, freed, 1);
+ }
+
+ /*
+@@ -3976,9 +3981,9 @@ int btrfs_qgroup_free_data(struct btrfs_
+ *
+ * NOTE: This function may sleep for memory allocation.
+ */
+-int btrfs_qgroup_release_data(struct btrfs_inode *inode, u64 start, u64 len)
++int btrfs_qgroup_release_data(struct btrfs_inode *inode, u64 start, u64 len, u64 *released)
+ {
+- return __btrfs_qgroup_release_data(inode, NULL, start, len, 0);
++ return __btrfs_qgroup_release_data(inode, NULL, start, len, released, 0);
+ }
+
+ static void add_root_meta_rsv(struct btrfs_root *root, int num_bytes,
+--- a/fs/btrfs/qgroup.h
++++ b/fs/btrfs/qgroup.h
+@@ -363,10 +363,10 @@ int btrfs_verify_qgroup_counts(struct bt
+ /* New io_tree based accurate qgroup reserve API */
+ int btrfs_qgroup_reserve_data(struct btrfs_inode *inode,
+ struct extent_changeset **reserved, u64 start, u64 len);
+-int btrfs_qgroup_release_data(struct btrfs_inode *inode, u64 start, u64 len);
++int btrfs_qgroup_release_data(struct btrfs_inode *inode, u64 start, u64 len, u64 *released);
+ int btrfs_qgroup_free_data(struct btrfs_inode *inode,
+ struct extent_changeset *reserved, u64 start,
+- u64 len);
++ u64 len, u64 *freed);
+ int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
+ enum btrfs_qgroup_rsv_type type, bool enforce);
+ int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes,
--- /dev/null
+From f63e1164b90b385cd832ff0fdfcfa76c3cc15436 Mon Sep 17 00:00:00 2001
+From: Boris Burkov <boris@bur.io>
+Date: Fri, 1 Dec 2023 13:00:09 -0800
+Subject: btrfs: free qgroup reserve when ORDERED_IOERR is set
+
+From: Boris Burkov <boris@bur.io>
+
+commit f63e1164b90b385cd832ff0fdfcfa76c3cc15436 upstream.
+
+An ordered extent completing is a critical moment in qgroup reserve
+handling, as the ownership of the reservation is handed off from the
+ordered extent to the delayed ref. In the happy path we release (unlock)
+but do not free (decrement counter) the reservation, and the delayed ref
+drives the free. However, on an error, we don't create a delayed ref,
+since there is no ref to add. Therefore, free on the error path.
+
+CC: stable@vger.kernel.org # 6.1+
+Reviewed-by: Qu Wenruo <wqu@suse.com>
+Signed-off-by: Boris Burkov <boris@bur.io>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ fs/btrfs/ordered-data.c | 4 +++-
+ 1 file changed, 3 insertions(+), 1 deletion(-)
+
+--- a/fs/btrfs/ordered-data.c
++++ b/fs/btrfs/ordered-data.c
+@@ -603,7 +603,9 @@ void btrfs_remove_ordered_extent(struct
+ release = entry->disk_num_bytes;
+ else
+ release = entry->num_bytes;
+- btrfs_delalloc_release_metadata(btrfs_inode, release, false);
++ btrfs_delalloc_release_metadata(btrfs_inode, release,
++ test_bit(BTRFS_ORDERED_IOERR,
++ &entry->flags));
+ }
+
+ percpu_counter_add_batch(&fs_info->ordered_bytes, -entry->num_bytes,
--- /dev/null
+From 4ee632c82d2dbb9e2dcc816890ef182a151cbd99 Mon Sep 17 00:00:00 2001
+From: Frank Li <Frank.Li@nxp.com>
+Date: Mon, 27 Nov 2023 16:43:25 -0500
+Subject: dmaengine: fsl-edma: fix DMA channel leak in eDMAv4
+
+From: Frank Li <Frank.Li@nxp.com>
+
+commit 4ee632c82d2dbb9e2dcc816890ef182a151cbd99 upstream.
+
+Allocate channel count consistently increases due to a missing source ID
+(srcid) cleanup in the fsl_edma_free_chan_resources() function at imx93
+eDMAv4.
+
+Reset 'srcid' at fsl_edma_free_chan_resources().
+
+Cc: stable@vger.kernel.org
+Fixes: 72f5801a4e2b ("dmaengine: fsl-edma: integrate v3 support")
+Signed-off-by: Frank Li <Frank.Li@nxp.com>
+Link: https://lore.kernel.org/r/20231127214325.2477247-1-Frank.Li@nxp.com
+Signed-off-by: Vinod Koul <vkoul@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/dma/fsl-edma-common.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c
+index 6a3abe5b1790..b53f46245c37 100644
+--- a/drivers/dma/fsl-edma-common.c
++++ b/drivers/dma/fsl-edma-common.c
+@@ -828,6 +828,7 @@ void fsl_edma_free_chan_resources(struct dma_chan *chan)
+ dma_pool_destroy(fsl_chan->tcd_pool);
+ fsl_chan->tcd_pool = NULL;
+ fsl_chan->is_sw = false;
++ fsl_chan->srcid = 0;
+ }
+
+ void fsl_edma_cleanup_vchan(struct dma_device *dmadev)
+--
+2.43.0
+
--- /dev/null
+From 54bed6bafa0f38daf9697af50e3aff5ff1354fe1 Mon Sep 17 00:00:00 2001
+From: Amelie Delaunay <amelie.delaunay@foss.st.com>
+Date: Mon, 6 Nov 2023 14:48:32 +0100
+Subject: dmaengine: stm32-dma: avoid bitfield overflow assertion
+
+From: Amelie Delaunay <amelie.delaunay@foss.st.com>
+
+commit 54bed6bafa0f38daf9697af50e3aff5ff1354fe1 upstream.
+
+stm32_dma_get_burst() returns a negative error for invalid input, which
+gets turned into a large u32 value in stm32_dma_prep_dma_memcpy() that
+in turn triggers an assertion because it does not fit into a two-bit field:
+drivers/dma/stm32-dma.c: In function 'stm32_dma_prep_dma_memcpy':
+include/linux/compiler_types.h:354:38: error: call to '__compiletime_assert_282' declared with attribute error: FIELD_PREP: value too large for the field
+ _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
+ ^
+ include/linux/compiler_types.h:335:4: note: in definition of macro '__compiletime_assert'
+ prefix ## suffix(); \
+ ^~~~~~
+ include/linux/compiler_types.h:354:2: note: in expansion of macro '_compiletime_assert'
+ _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
+ ^~~~~~~~~~~~~~~~~~~
+ include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
+ #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
+ ^~~~~~~~~~~~~~~~~~
+ include/linux/bitfield.h:68:3: note: in expansion of macro 'BUILD_BUG_ON_MSG'
+ BUILD_BUG_ON_MSG(__builtin_constant_p(_val) ? \
+ ^~~~~~~~~~~~~~~~
+ include/linux/bitfield.h:114:3: note: in expansion of macro '__BF_FIELD_CHECK'
+ __BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_PREP: "); \
+ ^~~~~~~~~~~~~~~~
+ drivers/dma/stm32-dma.c:1237:4: note: in expansion of macro 'FIELD_PREP'
+ FIELD_PREP(STM32_DMA_SCR_PBURST_MASK, dma_burst) |
+ ^~~~~~~~~~
+
+As an easy workaround, assume the error can happen, so try to handle this
+by failing stm32_dma_prep_dma_memcpy() before the assertion. It replicates
+what is done in stm32_dma_set_xfer_param() where stm32_dma_get_burst() is
+also used.
+
+Fixes: 1c32d6c37cc2 ("dmaengine: stm32-dma: use bitfield helpers")
+Fixes: a2b6103b7a8a ("dmaengine: stm32-dma: Improve memory burst management")
+Signed-off-by: Arnd Bergmann <arnd@arndb.de>
+Signed-off-by: Amelie Delaunay <amelie.delaunay@foss.st.com>
+Cc: stable@vger.kernel.org
+Reported-by: kernel test robot <lkp@intel.com>
+Closes: https://lore.kernel.org/oe-kbuild-all/202311060135.Q9eMnpCL-lkp@intel.com/
+Link: https://lore.kernel.org/r/20231106134832.1470305-1-amelie.delaunay@foss.st.com
+Signed-off-by: Vinod Koul <vkoul@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/dma/stm32-dma.c | 8 ++++++--
+ 1 file changed, 6 insertions(+), 2 deletions(-)
+
+--- a/drivers/dma/stm32-dma.c
++++ b/drivers/dma/stm32-dma.c
+@@ -1249,8 +1249,8 @@ static struct dma_async_tx_descriptor *s
+ enum dma_slave_buswidth max_width;
+ struct stm32_dma_desc *desc;
+ size_t xfer_count, offset;
+- u32 num_sgs, best_burst, dma_burst, threshold;
+- int i;
++ u32 num_sgs, best_burst, threshold;
++ int dma_burst, i;
+
+ num_sgs = DIV_ROUND_UP(len, STM32_DMA_ALIGNED_MAX_DATA_ITEMS);
+ desc = kzalloc(struct_size(desc, sg_req, num_sgs), GFP_NOWAIT);
+@@ -1268,6 +1268,10 @@ static struct dma_async_tx_descriptor *s
+ best_burst = stm32_dma_get_best_burst(len, STM32_DMA_MAX_BURST,
+ threshold, max_width);
+ dma_burst = stm32_dma_get_burst(chan, best_burst);
++ if (dma_burst < 0) {
++ kfree(desc);
++ return NULL;
++ }
+
+ stm32_dma_clear_reg(&desc->sg_req[i].chan_reg);
+ desc->sg_req[i].chan_reg.dma_scr =
--- /dev/null
+From e7ab758741672acb21c5d841a9f0309d30e48a06 Mon Sep 17 00:00:00 2001
+From: Mario Limonciello <mario.limonciello@amd.com>
+Date: Mon, 19 Jun 2023 15:04:24 -0500
+Subject: drm/amd/display: Disable PSR-SU on Parade 0803 TCON again
+
+From: Mario Limonciello <mario.limonciello@amd.com>
+
+commit e7ab758741672acb21c5d841a9f0309d30e48a06 upstream.
+
+When screen brightness is rapidly changed and PSR-SU is enabled the
+display hangs on panels with this TCON even on the latest DCN 3.1.4
+microcode (0x8002a81 at this time).
+
+This was disabled previously as commit 072030b17830 ("drm/amd: Disable
+PSR-SU on Parade 0803 TCON") but reverted as commit 1e66a17ce546 ("Revert
+"drm/amd: Disable PSR-SU on Parade 0803 TCON"") in favor of testing for
+a new enough microcode (commit cd2e31a9ab93 ("drm/amd/display: Set minimum
+requirement for using PSR-SU on Phoenix")).
+
+As hangs are still happening specifically with this TCON, disable PSR-SU
+again for it until it can be root caused.
+
+Cc: stable@vger.kernel.org
+Cc: aaron.ma@canonical.com
+Cc: binli@gnome.org
+Cc: Marc Rossi <Marc.Rossi@amd.com>
+Cc: Hamza Mahfooz <Hamza.Mahfooz@amd.com>
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2046131
+Acked-by: Alex Deucher <alexander.deucher@amd.com>
+Reviewed-by: Harry Wentland <harry.wentland@amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/gpu/drm/amd/display/modules/power/power_helpers.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+--- a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
++++ b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c
+@@ -839,6 +839,8 @@ bool is_psr_su_specific_panel(struct dc_
+ ((dpcd_caps->sink_dev_id_str[1] == 0x08 && dpcd_caps->sink_dev_id_str[0] == 0x08) ||
+ (dpcd_caps->sink_dev_id_str[1] == 0x08 && dpcd_caps->sink_dev_id_str[0] == 0x07)))
+ isPSRSUSupported = false;
++ else if (dpcd_caps->sink_dev_id_str[1] == 0x08 && dpcd_caps->sink_dev_id_str[0] == 0x03)
++ isPSRSUSupported = false;
+ else if (dpcd_caps->psr_info.force_psrsu_cap == 0x1)
+ isPSRSUSupported = true;
+ }
--- /dev/null
+From b96ab339ee50470d13a1faa6ad94d2218a7cd49f Mon Sep 17 00:00:00 2001
+From: Mario Limonciello <mario.limonciello@amd.com>
+Date: Wed, 6 Dec 2023 12:08:26 -0600
+Subject: drm/amd/display: Restore guard against default backlight value < 1 nit
+
+From: Mario Limonciello <mario.limonciello@amd.com>
+
+commit b96ab339ee50470d13a1faa6ad94d2218a7cd49f upstream.
+
+Mark reports that brightness is not restored after Xorg dpms screen blank.
+
+This behavior was introduced by commit d9e865826c20 ("drm/amd/display:
+Simplify brightness initialization") which dropped the cached backlight
+value in display code, but also removed code for when the default value
+read back was less than 1 nit.
+
+Restore this code so that the backlight brightness is restored to the
+correct default value in this circumstance.
+
+Reported-by: Mark Herbert <mark.herbert42@gmail.com>
+Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3031
+Cc: stable@vger.kernel.org
+Cc: Camille Cho <camille.cho@amd.com>
+Cc: Krunoslav Kovac <krunoslav.kovac@amd.com>
+Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
+Fixes: d9e865826c20 ("drm/amd/display: Simplify brightness initialization")
+Acked-by: Alex Deucher <alexander.deucher@amd.com>
+Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c
++++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c
+@@ -280,8 +280,8 @@ bool set_default_brightness_aux(struct d
+ if (link && link->dpcd_sink_ext_caps.bits.oled == 1) {
+ if (!read_default_bl_aux(link, &default_backlight))
+ default_backlight = 150000;
+- // if > 5000, it might be wrong readback
+- if (default_backlight > 5000000)
++ // if < 1 nits or > 5000, it might be wrong readback
++ if (default_backlight < 1000 || default_backlight > 5000000)
+ default_backlight = 150000;
+
+ return edp_set_backlight_level_nits(link, true,
--- /dev/null
+From ceb9a321e7639700844aa3bf234a4e0884f13b77 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koenig@amd.com>
+Date: Fri, 8 Dec 2023 13:43:09 +0100
+Subject: drm/amdgpu: fix tear down order in amdgpu_vm_pt_free
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+From: Christian König <christian.koenig@amd.com>
+
+commit ceb9a321e7639700844aa3bf234a4e0884f13b77 upstream.
+
+When freeing PD/PT with shadows it can happen that the shadow
+destruction races with detaching the PD/PT from the VM causing a NULL
+pointer dereference in the invalidation code.
+
+Fix this by detaching the the PD/PT from the VM first and then
+freeing the shadow instead.
+
+Signed-off-by: Christian König <christian.koenig@amd.com>
+Fixes: https://gitlab.freedesktop.org/drm/amd/-/issues/2867
+Cc: <stable@vger.kernel.org>
+Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+@@ -642,13 +642,14 @@ static void amdgpu_vm_pt_free(struct amd
+
+ if (!entry->bo)
+ return;
++
++ entry->bo->vm_bo = NULL;
+ shadow = amdgpu_bo_shadowed(entry->bo);
+ if (shadow) {
+ ttm_bo_set_bulk_move(&shadow->tbo, NULL);
+ amdgpu_bo_unref(&shadow);
+ }
+ ttm_bo_set_bulk_move(&entry->bo->tbo, NULL);
+- entry->bo->vm_bo = NULL;
+
+ spin_lock(&entry->vm->status_lock);
+ list_del(&entry->vm_status);
--- /dev/null
+From ab4750332dbe535243def5dcebc24ca00c1f98ac Mon Sep 17 00:00:00 2001
+From: Alex Deucher <alexander.deucher@amd.com>
+Date: Thu, 7 Dec 2023 10:14:41 -0500
+Subject: drm/amdgpu/sdma5.2: add begin/end_use ring callbacks
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+From: Alex Deucher <alexander.deucher@amd.com>
+
+commit ab4750332dbe535243def5dcebc24ca00c1f98ac upstream.
+
+Add begin/end_use ring callbacks to disallow GFXOFF when
+SDMA work is submitted and allow it again afterward.
+
+This should avoid corner cases where GFXOFF is erroneously
+entered when SDMA is still active. For now just allow/disallow
+GFXOFF in the begin and end helpers until we root cause the
+issue. This should not impact power as SDMA usage is pretty
+minimal and GFXOSS should not be active when SDMA is active
+anyway, this just makes it explicit.
+
+v2: move everything into sdma5.2 code. No reason for this
+to be generic at this point.
+v3: Add comments in new code
+
+Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2220
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> (v1)
+Tested-by: Mario Limonciello <mario.limonciello@amd.com> (v1)
+Reviewed-by: Christian König <christian.koenig@amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
+Cc: stable@vger.kernel.org # 5.15+
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 28 ++++++++++++++++++++++++++++
+ 1 file changed, 28 insertions(+)
+
+--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
++++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c
+@@ -1651,6 +1651,32 @@ static void sdma_v5_2_get_clockgating_st
+ *flags |= AMD_CG_SUPPORT_SDMA_LS;
+ }
+
++static void sdma_v5_2_ring_begin_use(struct amdgpu_ring *ring)
++{
++ struct amdgpu_device *adev = ring->adev;
++
++ /* SDMA 5.2.3 (RMB) FW doesn't seem to properly
++ * disallow GFXOFF in some cases leading to
++ * hangs in SDMA. Disallow GFXOFF while SDMA is active.
++ * We can probably just limit this to 5.2.3,
++ * but it shouldn't hurt for other parts since
++ * this GFXOFF will be disallowed anyway when SDMA is
++ * active, this just makes it explicit.
++ */
++ amdgpu_gfx_off_ctrl(adev, false);
++}
++
++static void sdma_v5_2_ring_end_use(struct amdgpu_ring *ring)
++{
++ struct amdgpu_device *adev = ring->adev;
++
++ /* SDMA 5.2.3 (RMB) FW doesn't seem to properly
++ * disallow GFXOFF in some cases leading to
++ * hangs in SDMA. Allow GFXOFF when SDMA is complete.
++ */
++ amdgpu_gfx_off_ctrl(adev, true);
++}
++
+ const struct amd_ip_funcs sdma_v5_2_ip_funcs = {
+ .name = "sdma_v5_2",
+ .early_init = sdma_v5_2_early_init,
+@@ -1698,6 +1724,8 @@ static const struct amdgpu_ring_funcs sd
+ .test_ib = sdma_v5_2_ring_test_ib,
+ .insert_nop = sdma_v5_2_ring_insert_nop,
+ .pad_ib = sdma_v5_2_ring_pad_ib,
++ .begin_use = sdma_v5_2_ring_begin_use,
++ .end_use = sdma_v5_2_ring_end_use,
+ .emit_wreg = sdma_v5_2_ring_emit_wreg,
+ .emit_reg_wait = sdma_v5_2_ring_emit_reg_wait,
+ .emit_reg_write_reg_wait = sdma_v5_2_ring_emit_reg_write_reg_wait,
--- /dev/null
+From 759f14e20891de72e676d9d738eb2c573aa15f52 Mon Sep 17 00:00:00 2001
+From: Jani Nikula <jani.nikula@intel.com>
+Date: Thu, 7 Dec 2023 11:38:21 +0200
+Subject: drm/edid: also call add modes in EDID connector update fallback
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+From: Jani Nikula <jani.nikula@intel.com>
+
+commit 759f14e20891de72e676d9d738eb2c573aa15f52 upstream.
+
+When the separate add modes call was added back in commit c533b5167c7e
+("drm/edid: add separate drm_edid_connector_add_modes()"), it failed to
+address drm_edid_override_connector_update(). Also call add modes there.
+
+Reported-by: bbaa <bbaa@bbaa.fun>
+Closes: https://lore.kernel.org/r/930E9B4C7D91FDFF+29b34d89-8658-4910-966a-c772f320ea03@bbaa.fun
+Fixes: c533b5167c7e ("drm/edid: add separate drm_edid_connector_add_modes()")
+Cc: <stable@vger.kernel.org> # v6.3+
+Signed-off-by: Jani Nikula <jani.nikula@intel.com>
+Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
+Link: https://patchwork.freedesktop.org/patch/msgid/20231207093821.2654267-1-jani.nikula@intel.com
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/gpu/drm/drm_edid.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+--- a/drivers/gpu/drm/drm_edid.c
++++ b/drivers/gpu/drm/drm_edid.c
+@@ -2308,7 +2308,8 @@ int drm_edid_override_connector_update(s
+
+ override = drm_edid_override_get(connector);
+ if (override) {
+- num_modes = drm_edid_connector_update(connector, override);
++ if (drm_edid_connector_update(connector, override) == 0)
++ num_modes = drm_edid_connector_add_modes(connector);
+
+ drm_edid_free(override);
+
--- /dev/null
+From 324b70e997aab0a7deab8cb90711faccda4e98c8 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
+Date: Mon, 4 Dec 2023 22:24:43 +0200
+Subject: drm/i915: Fix ADL+ tiled plane stride when the POT stride is smaller than the original
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+From: Ville Syrjälä <ville.syrjala@linux.intel.com>
+
+commit 324b70e997aab0a7deab8cb90711faccda4e98c8 upstream.
+
+plane_view_scanout_stride() currently assumes that we had to pad the
+mapping stride with dummy pages in order to align it. But that is not
+the case if the original fb stride exceeds the aligned stride used
+to populate the remapped view, which is calculated from the user
+specified framebuffer width rather than the user specified framebuffer
+stride.
+
+Ignore the original fb stride in this case and just stick to the POT
+aligned stride. Getting this wrong will cause the plane to fetch the
+wrong data, and can lead to fault errors if the page tables at the
+bogus location aren't even populated.
+
+TODO: figure out if this is OK for CCS, or if we should instead increase
+the width of the view to cover the entire user specified fb stride
+instead...
+
+Cc: Imre Deak <imre.deak@intel.com>
+Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
+Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
+Link: https://patchwork.freedesktop.org/patch/msgid/20231204202443.31247-1-ville.syrjala@linux.intel.com
+Reviewed-by: Imre Deak <imre.deak@intel.com>
+Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
+(cherry picked from commit 01a39f1c4f1220a4e6a25729fae87ff5794cbc52)
+Cc: stable@vger.kernel.org
+Signed-off-by: Jani Nikula <jani.nikula@intel.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/gpu/drm/i915/display/intel_fb.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+--- a/drivers/gpu/drm/i915/display/intel_fb.c
++++ b/drivers/gpu/drm/i915/display/intel_fb.c
+@@ -1370,7 +1370,8 @@ plane_view_scanout_stride(const struct i
+ struct drm_i915_private *i915 = to_i915(fb->base.dev);
+ unsigned int stride_tiles;
+
+- if (IS_ALDERLAKE_P(i915) || DISPLAY_VER(i915) >= 14)
++ if ((IS_ALDERLAKE_P(i915) || DISPLAY_VER(i915) >= 14) &&
++ src_stride_tiles < dst_stride_tiles)
+ stride_tiles = src_stride_tiles;
+ else
+ stride_tiles = dst_stride_tiles;
--- /dev/null
+From c3070f080f9ba18dea92eaa21730f7ab85b5c8f4 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
+Date: Thu, 7 Dec 2023 21:34:34 +0200
+Subject: drm/i915: Fix intel_atomic_setup_scalers() plane_state handling
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+From: Ville Syrjälä <ville.syrjala@linux.intel.com>
+
+commit c3070f080f9ba18dea92eaa21730f7ab85b5c8f4 upstream.
+
+Since the plane_state variable is declared outside the scaler_users
+loop in intel_atomic_setup_scalers(), and it's never reset back to
+NULL inside the loop we may end up calling intel_atomic_setup_scaler()
+with a non-NULL plane state for the pipe scaling case. That is bad
+because intel_atomic_setup_scaler() determines whether we are doing
+plane scaling or pipe scaling based on plane_state!=NULL. The end
+result is that we may miscalculate the scaler mode for pipe scaling.
+
+The hardware becomes somewhat upset if we end up in this situation
+when scanning out a planar format on a SDR plane. We end up
+programming the pipe scaler into planar mode as well, and the
+result is a screenfull of garbage.
+
+Fix the situation by making sure we pass the correct plane_state==NULL
+when calculating the scaler mode for pipe scaling.
+
+Cc: stable@vger.kernel.org
+Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
+Link: https://patchwork.freedesktop.org/patch/msgid/20231207193441.20206-2-ville.syrjala@linux.intel.com
+Reviewed-by: Jani Nikula <jani.nikula@intel.com>
+(cherry picked from commit e81144106e21271c619f0c722a09e27ccb8c043d)
+Signed-off-by: Jani Nikula <jani.nikula@intel.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/gpu/drm/i915/display/skl_scaler.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+--- a/drivers/gpu/drm/i915/display/skl_scaler.c
++++ b/drivers/gpu/drm/i915/display/skl_scaler.c
+@@ -504,7 +504,6 @@ int intel_atomic_setup_scalers(struct dr
+ {
+ struct drm_plane *plane = NULL;
+ struct intel_plane *intel_plane;
+- struct intel_plane_state *plane_state = NULL;
+ struct intel_crtc_scaler_state *scaler_state =
+ &crtc_state->scaler_state;
+ struct drm_atomic_state *drm_state = crtc_state->uapi.state;
+@@ -536,6 +535,7 @@ int intel_atomic_setup_scalers(struct dr
+
+ /* walkthrough scaler_users bits and start assigning scalers */
+ for (i = 0; i < sizeof(scaler_state->scaler_users) * 8; i++) {
++ struct intel_plane_state *plane_state = NULL;
+ int *scaler_id;
+ const char *name;
+ int idx, ret;
--- /dev/null
+From 0ccd963fe555451b1f84e6d14d2b3ef03dd5c947 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
+Date: Tue, 5 Dec 2023 20:03:08 +0200
+Subject: drm/i915: Fix remapped stride with CCS on ADL+
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+From: Ville Syrjälä <ville.syrjala@linux.intel.com>
+
+commit 0ccd963fe555451b1f84e6d14d2b3ef03dd5c947 upstream.
+
+On ADL+ the hardware automagically calculates the CCS AUX surface
+stride from the main surface stride, so when remapping we can't
+really play a lot of tricks with the main surface stride, or else
+the AUX surface stride would get miscalculated and no longer
+match the actual data layout in memory.
+
+Supposedly we could remap in 256 main surface tile units
+(AUX page(4096)/cachline(64)*4(4x1 main surface tiles per
+AUX cacheline)=256 main surface tiles), but the extra complexity
+is probably not worth the hassle.
+
+So let's just make sure our mapping stride is calculated from
+the full framebuffer stride (instead of the framebuffer width).
+This way the stride we program into PLANE_STRIDE will be the
+original framebuffer stride, and thus there will be no change
+to the AUX stride/layout.
+
+Cc: stable@vger.kernel.org
+Cc: Imre Deak <imre.deak@intel.com>
+Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
+Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
+Link: https://patchwork.freedesktop.org/patch/msgid/20231205180308.7505-1-ville.syrjala@linux.intel.com
+Reviewed-by: Imre Deak <imre.deak@intel.com>
+(cherry picked from commit 2c12eb36f849256f5eb00ffaee9bf99396fd3814)
+Signed-off-by: Jani Nikula <jani.nikula@intel.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/gpu/drm/i915/display/intel_fb.c | 16 ++++++++++++++--
+ 1 file changed, 14 insertions(+), 2 deletions(-)
+
+--- a/drivers/gpu/drm/i915/display/intel_fb.c
++++ b/drivers/gpu/drm/i915/display/intel_fb.c
+@@ -1498,8 +1498,20 @@ static u32 calc_plane_remap_info(const s
+
+ size += remap_info->size;
+ } else {
+- unsigned int dst_stride = plane_view_dst_stride_tiles(fb, color_plane,
+- remap_info->width);
++ unsigned int dst_stride;
++
++ /*
++ * The hardware automagically calculates the CCS AUX surface
++ * stride from the main surface stride so can't really remap a
++ * smaller subset (unless we'd remap in whole AUX page units).
++ */
++ if (intel_fb_needs_pot_stride_remap(fb) &&
++ intel_fb_is_ccs_modifier(fb->base.modifier))
++ dst_stride = remap_info->src_stride;
++ else
++ dst_stride = remap_info->width;
++
++ dst_stride = plane_view_dst_stride_tiles(fb, color_plane, dst_stride);
+
+ assign_chk_ovf(i915, remap_info->dst_stride, dst_stride);
+ color_plane_info->mapping_stride = dst_stride *
--- /dev/null
+From b6961d187fcd138981b8707dac87b9fcdbfe75d1 Mon Sep 17 00:00:00 2001
+From: Stuart Lee <stuart.lee@mediatek.com>
+Date: Fri, 10 Nov 2023 09:29:14 +0800
+Subject: drm/mediatek: Fix access violation in mtk_drm_crtc_dma_dev_get
+
+From: Stuart Lee <stuart.lee@mediatek.com>
+
+commit b6961d187fcd138981b8707dac87b9fcdbfe75d1 upstream.
+
+Add error handling to check NULL input in
+mtk_drm_crtc_dma_dev_get function.
+
+While display path is not configured correctly, none of crtc is
+established. So the caller of mtk_drm_crtc_dma_dev_get may pass
+input parameter *crtc as NULL, Which may cause coredump when
+we try to get the container of NULL pointer.
+
+Fixes: cb1d6bcca542 ("drm/mediatek: Add dma dev get function")
+Signed-off-by: Stuart Lee <stuart.lee@mediatek.com>
+Cc: stable@vger.kernel.org
+Reviewed-by: AngeloGioacchino DEl Regno <angelogioacchino.delregno@collabora.com>
+Tested-by: Macpaul Lin <macpaul.lin@mediatek.com>
+Link: https://patchwork.kernel.org/project/dri-devel/patch/20231110012914.14884-2-stuart.lee@mediatek.com/
+Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 9 ++++++++-
+ 1 file changed, 8 insertions(+), 1 deletion(-)
+
+--- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
++++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c
+@@ -885,7 +885,14 @@ static int mtk_drm_crtc_init_comp_planes
+
+ struct device *mtk_drm_crtc_dma_dev_get(struct drm_crtc *crtc)
+ {
+- struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc);
++ struct mtk_drm_crtc *mtk_crtc = NULL;
++
++ if (!crtc)
++ return NULL;
++
++ mtk_crtc = to_mtk_crtc(crtc);
++ if (!mtk_crtc)
++ return NULL;
+
+ return mtk_crtc->dma_dev;
+ }
--- /dev/null
+From c41bd2514184d75db087fe4c1221237fb7922875 Mon Sep 17 00:00:00 2001
+From: Ignat Korchagin <ignat@cloudflare.com>
+Date: Wed, 29 Nov 2023 22:04:09 +0000
+Subject: kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP
+
+From: Ignat Korchagin <ignat@cloudflare.com>
+
+commit c41bd2514184d75db087fe4c1221237fb7922875 upstream.
+
+In commit f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for
+CRASH_DUMP") we tried to fix a config regression, where CONFIG_CRASH_DUMP
+required CONFIG_KEXEC.
+
+However, it was not enough at least for arm64 platforms. While further
+testing the patch with our arm64 config I noticed that CONFIG_CRASH_DUMP
+is unavailable in menuconfig. This is because CONFIG_CRASH_DUMP still
+depends on the new CONFIG_ARCH_SUPPORTS_KEXEC introduced in commit
+91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") and on
+arm64 CONFIG_ARCH_SUPPORTS_KEXEC requires CONFIG_PM_SLEEP_SMP=y, which in
+turn requires either CONFIG_SUSPEND=y or CONFIG_HIBERNATION=y neither of
+which are set in our config.
+
+Given that we already established that CONFIG_KEXEC (which is a switch for
+kexec system call itself) is not required for CONFIG_CRASH_DUMP drop
+CONFIG_ARCH_SUPPORTS_KEXEC dependency as well. The arm64 kernel builds
+just fine with CONFIG_CRASH_DUMP=y and with both CONFIG_KEXEC=n and
+CONFIG_KEXEC_FILE=n after f8ff23429c62 ("kernel/Kconfig.kexec: drop select
+of KEXEC for CRASH_DUMP") and this patch are applied given that the
+necessary shared bits are included via CONFIG_KEXEC_CORE dependency.
+
+[bhe@redhat.com: don't export some symbols when CONFIG_MMU=n]
+ Link: https://lkml.kernel.org/r/ZW03ODUKGGhP1ZGU@MiWiFi-R3L-srv
+[bhe@redhat.com: riscv, kexec: fix dependency of two items]
+ Link: https://lkml.kernel.org/r/ZW04G/SKnhbE5mnX@MiWiFi-R3L-srv
+Link: https://lkml.kernel.org/r/20231129220409.55006-1-ignat@cloudflare.com
+Fixes: 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec")
+Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
+Signed-off-by: Baoquan He <bhe@redhat.com>
+Acked-by: Baoquan He <bhe@redhat.com>
+Cc: Alexander Gordeev <agordeev@linux.ibm.com>
+Cc: <stable@vger.kernel.org> # 6.6+: f8ff234: kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
+Cc: <stable@vger.kernel.org> # 6.6+
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ arch/riscv/Kconfig | 4 ++--
+ arch/riscv/kernel/crash_core.c | 4 +++-
+ kernel/Kconfig.kexec | 1 -
+ 3 files changed, 5 insertions(+), 4 deletions(-)
+
+--- a/arch/riscv/Kconfig
++++ b/arch/riscv/Kconfig
+@@ -669,7 +669,7 @@ config RISCV_BOOT_SPINWAIT
+ If unsure what to do here, say N.
+
+ config ARCH_SUPPORTS_KEXEC
+- def_bool MMU
++ def_bool y
+
+ config ARCH_SELECTS_KEXEC
+ def_bool y
+@@ -677,7 +677,7 @@ config ARCH_SELECTS_KEXEC
+ select HOTPLUG_CPU if SMP
+
+ config ARCH_SUPPORTS_KEXEC_FILE
+- def_bool 64BIT && MMU
++ def_bool 64BIT
+
+ config ARCH_SELECTS_KEXEC_FILE
+ def_bool y
+--- a/arch/riscv/kernel/crash_core.c
++++ b/arch/riscv/kernel/crash_core.c
+@@ -5,18 +5,20 @@
+
+ void arch_crash_save_vmcoreinfo(void)
+ {
+- VMCOREINFO_NUMBER(VA_BITS);
+ VMCOREINFO_NUMBER(phys_ram_base);
+
+ vmcoreinfo_append_str("NUMBER(PAGE_OFFSET)=0x%lx\n", PAGE_OFFSET);
+ vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START);
+ vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%lx\n", VMALLOC_END);
++#ifdef CONFIG_MMU
++ VMCOREINFO_NUMBER(VA_BITS);
+ vmcoreinfo_append_str("NUMBER(VMEMMAP_START)=0x%lx\n", VMEMMAP_START);
+ vmcoreinfo_append_str("NUMBER(VMEMMAP_END)=0x%lx\n", VMEMMAP_END);
+ #ifdef CONFIG_64BIT
+ vmcoreinfo_append_str("NUMBER(MODULES_VADDR)=0x%lx\n", MODULES_VADDR);
+ vmcoreinfo_append_str("NUMBER(MODULES_END)=0x%lx\n", MODULES_END);
+ #endif
++#endif
+ vmcoreinfo_append_str("NUMBER(KERNEL_LINK_ADDR)=0x%lx\n", KERNEL_LINK_ADDR);
+ vmcoreinfo_append_str("NUMBER(va_kernel_pa_offset)=0x%lx\n",
+ kernel_map.va_kernel_pa_offset);
+--- a/kernel/Kconfig.kexec
++++ b/kernel/Kconfig.kexec
+@@ -94,7 +94,6 @@ config KEXEC_JUMP
+ config CRASH_DUMP
+ bool "kernel crash dumps"
+ depends on ARCH_SUPPORTS_CRASH_DUMP
+- depends on ARCH_SUPPORTS_KEXEC
+ select CRASH_CORE
+ select KEXEC_CORE
+ help
--- /dev/null
+From 081488051d28d32569ebb7c7a23572778b2e7d57 Mon Sep 17 00:00:00 2001
+From: Yu Zhao <yuzhao@google.com>
+Date: Thu, 7 Dec 2023 23:14:04 -0700
+Subject: mm/mglru: fix underprotected page cache
+
+From: Yu Zhao <yuzhao@google.com>
+
+commit 081488051d28d32569ebb7c7a23572778b2e7d57 upstream.
+
+Unmapped folios accessed through file descriptors can be underprotected.
+Those folios are added to the oldest generation based on:
+
+1. The fact that they are less costly to reclaim (no need to walk the
+ rmap and flush the TLB) and have less impact on performance (don't
+ cause major PFs and can be non-blocking if needed again).
+2. The observation that they are likely to be single-use. E.g., for
+ client use cases like Android, its apps parse configuration files
+ and store the data in heap (anon); for server use cases like MySQL,
+ it reads from InnoDB files and holds the cached data for tables in
+ buffer pools (anon).
+
+However, the oldest generation can be very short lived, and if so, it
+doesn't provide the PID controller with enough time to respond to a surge
+of refaults. (Note that the PID controller uses weighted refaults and
+those from evicted generations only take a half of the whole weight.) In
+other words, for a short lived generation, the moving average smooths out
+the spike quickly.
+
+To fix the problem:
+1. For folios that are already on LRU, if they can be beyond the
+ tracking range of tiers, i.e., five accesses through file
+ descriptors, move them to the second oldest generation to give them
+ more time to age. (Note that tiers are used by the PID controller
+ to statistically determine whether folios accessed multiple times
+ through file descriptors are worth protecting.)
+2. When adding unmapped folios to LRU, adjust the placement of them so
+ that they are not too close to the tail. The effect of this is
+ similar to the above.
+
+On Android, launching 55 apps sequentially:
+ Before After Change
+ workingset_refault_anon 25641024 25598972 0%
+ workingset_refault_file 115016834 106178438 -8%
+
+Link: https://lkml.kernel.org/r/20231208061407.2125867-1-yuzhao@google.com
+Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
+Signed-off-by: Yu Zhao <yuzhao@google.com>
+Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
+Tested-by: Kalesh Singh <kaleshsingh@google.com>
+Cc: T.J. Mercier <tjmercier@google.com>
+Cc: Kairui Song <ryncsn@gmail.com>
+Cc: Hillf Danton <hdanton@sina.com>
+Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
+Cc: <stable@vger.kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ include/linux/mm_inline.h | 23 ++++++++++++++---------
+ mm/vmscan.c | 2 +-
+ mm/workingset.c | 6 +++---
+ 3 files changed, 18 insertions(+), 13 deletions(-)
+
+--- a/include/linux/mm_inline.h
++++ b/include/linux/mm_inline.h
+@@ -231,22 +231,27 @@ static inline bool lru_gen_add_folio(str
+ if (folio_test_unevictable(folio) || !lrugen->enabled)
+ return false;
+ /*
+- * There are three common cases for this page:
+- * 1. If it's hot, e.g., freshly faulted in or previously hot and
+- * migrated, add it to the youngest generation.
+- * 2. If it's cold but can't be evicted immediately, i.e., an anon page
+- * not in swapcache or a dirty page pending writeback, add it to the
+- * second oldest generation.
+- * 3. Everything else (clean, cold) is added to the oldest generation.
++ * There are four common cases for this page:
++ * 1. If it's hot, i.e., freshly faulted in, add it to the youngest
++ * generation, and it's protected over the rest below.
++ * 2. If it can't be evicted immediately, i.e., a dirty page pending
++ * writeback, add it to the second youngest generation.
++ * 3. If it should be evicted first, e.g., cold and clean from
++ * folio_rotate_reclaimable(), add it to the oldest generation.
++ * 4. Everything else falls between 2 & 3 above and is added to the
++ * second oldest generation if it's considered inactive, or the
++ * oldest generation otherwise. See lru_gen_is_active().
+ */
+ if (folio_test_active(folio))
+ seq = lrugen->max_seq;
+ else if ((type == LRU_GEN_ANON && !folio_test_swapcache(folio)) ||
+ (folio_test_reclaim(folio) &&
+ (folio_test_dirty(folio) || folio_test_writeback(folio))))
+- seq = lrugen->min_seq[type] + 1;
+- else
++ seq = lrugen->max_seq - 1;
++ else if (reclaiming || lrugen->min_seq[type] + MIN_NR_GENS >= lrugen->max_seq)
+ seq = lrugen->min_seq[type];
++ else
++ seq = lrugen->min_seq[type] + 1;
+
+ gen = lru_gen_from_seq(seq);
+ flags = (gen + 1UL) << LRU_GEN_PGOFF;
+--- a/mm/vmscan.c
++++ b/mm/vmscan.c
+@@ -4933,7 +4933,7 @@ static bool sort_folio(struct lruvec *lr
+ }
+
+ /* protected */
+- if (tier > tier_idx) {
++ if (tier > tier_idx || refs == BIT(LRU_REFS_WIDTH)) {
+ int hist = lru_hist_from_seq(lrugen->min_seq[type]);
+
+ gen = folio_inc_gen(lruvec, folio, false);
+--- a/mm/workingset.c
++++ b/mm/workingset.c
+@@ -313,10 +313,10 @@ static void lru_gen_refault(struct folio
+ * 1. For pages accessed through page tables, hotter pages pushed out
+ * hot pages which refaulted immediately.
+ * 2. For pages accessed multiple times through file descriptors,
+- * numbers of accesses might have been out of the range.
++ * they would have been protected by sort_folio().
+ */
+- if (lru_gen_in_fault() || refs == BIT(LRU_REFS_WIDTH)) {
+- folio_set_workingset(folio);
++ if (lru_gen_in_fault() || refs >= BIT(LRU_REFS_WIDTH) - 1) {
++ set_mask_bits(&folio->flags, 0, LRU_REFS_MASK | BIT(PG_workingset));
+ mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta);
+ }
+ unlock:
--- /dev/null
+From 4376807bf2d5371c3e00080c972be568c3f8a7d1 Mon Sep 17 00:00:00 2001
+From: Yu Zhao <yuzhao@google.com>
+Date: Thu, 7 Dec 2023 23:14:07 -0700
+Subject: mm/mglru: reclaim offlined memcgs harder
+
+From: Yu Zhao <yuzhao@google.com>
+
+commit 4376807bf2d5371c3e00080c972be568c3f8a7d1 upstream.
+
+In the effort to reduce zombie memcgs [1], it was discovered that the
+memcg LRU doesn't apply enough pressure on offlined memcgs. Specifically,
+instead of rotating them to the tail of the current generation
+(MEMCG_LRU_TAIL) for a second attempt, it moves them to the next
+generation (MEMCG_LRU_YOUNG) after the first attempt.
+
+Not applying enough pressure on offlined memcgs can cause them to build
+up, and this can be particularly harmful to memory-constrained systems.
+
+On Pixel 8 Pro, launching apps for 50 cycles:
+ Before After Change
+ Zombie memcgs 45 35 -22%
+
+[1] https://lore.kernel.org/CABdmKX2M6koq4Q0Cmp_-=wbP0Qa190HdEGGaHfxNS05gAkUtPA@mail.gmail.com/
+
+Link: https://lkml.kernel.org/r/20231208061407.2125867-4-yuzhao@google.com
+Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
+Signed-off-by: Yu Zhao <yuzhao@google.com>
+Reported-by: T.J. Mercier <tjmercier@google.com>
+Tested-by: T.J. Mercier <tjmercier@google.com>
+Cc: Charan Teja Kalla <quic_charante@quicinc.com>
+Cc: Hillf Danton <hdanton@sina.com>
+Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
+Cc: Kairui Song <ryncsn@gmail.com>
+Cc: Kalesh Singh <kaleshsingh@google.com>
+Cc: <stable@vger.kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ include/linux/mmzone.h | 8 ++++----
+ mm/vmscan.c | 24 ++++++++++++++++--------
+ 2 files changed, 20 insertions(+), 12 deletions(-)
+
+--- a/include/linux/mmzone.h
++++ b/include/linux/mmzone.h
+@@ -519,10 +519,10 @@ void lru_gen_look_around(struct page_vma
+ * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
+ * 2. The first attempt to reclaim a memcg below low, which triggers
+ * MEMCG_LRU_TAIL;
+- * 3. The first attempt to reclaim a memcg below reclaimable size threshold,
+- * which triggers MEMCG_LRU_TAIL;
+- * 4. The second attempt to reclaim a memcg below reclaimable size threshold,
+- * which triggers MEMCG_LRU_YOUNG;
++ * 3. The first attempt to reclaim a memcg offlined or below reclaimable size
++ * threshold, which triggers MEMCG_LRU_TAIL;
++ * 4. The second attempt to reclaim a memcg offlined or below reclaimable size
++ * threshold, which triggers MEMCG_LRU_YOUNG;
+ * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG;
+ * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG;
+ * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD.
+--- a/mm/vmscan.c
++++ b/mm/vmscan.c
+@@ -5291,7 +5291,12 @@ static bool should_run_aging(struct lruv
+ }
+
+ /* try to scrape all its memory if this memcg was deleted */
+- *nr_to_scan = mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
++ if (!mem_cgroup_online(memcg)) {
++ *nr_to_scan = total;
++ return false;
++ }
++
++ *nr_to_scan = total >> sc->priority;
+
+ /*
+ * The aging tries to be lazy to reduce the overhead, while the eviction
+@@ -5412,14 +5417,9 @@ static int shrink_one(struct lruvec *lru
+ bool success;
+ unsigned long scanned = sc->nr_scanned;
+ unsigned long reclaimed = sc->nr_reclaimed;
+- int seg = lru_gen_memcg_seg(lruvec);
+ struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+ struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+
+- /* see the comment on MEMCG_NR_GENS */
+- if (!lruvec_is_sizable(lruvec, sc))
+- return seg != MEMCG_LRU_TAIL ? MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG;
+-
+ mem_cgroup_calculate_protection(NULL, memcg);
+
+ if (mem_cgroup_below_min(NULL, memcg))
+@@ -5427,7 +5427,7 @@ static int shrink_one(struct lruvec *lru
+
+ if (mem_cgroup_below_low(NULL, memcg)) {
+ /* see the comment on MEMCG_NR_GENS */
+- if (seg != MEMCG_LRU_TAIL)
++ if (lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL)
+ return MEMCG_LRU_TAIL;
+
+ memcg_memory_event(memcg, MEMCG_LOW);
+@@ -5443,7 +5443,15 @@ static int shrink_one(struct lruvec *lru
+
+ flush_reclaim_state(sc);
+
+- return success ? MEMCG_LRU_YOUNG : 0;
++ if (success && mem_cgroup_online(memcg))
++ return MEMCG_LRU_YOUNG;
++
++ if (!success && lruvec_is_sizable(lruvec, sc))
++ return 0;
++
++ /* one retry if offlined or too small */
++ return lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL ?
++ MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG;
+ }
+
+ #ifdef CONFIG_MEMCG
--- /dev/null
+From 8aa420617918d12d1f5d55030a503c9418e73c2c Mon Sep 17 00:00:00 2001
+From: Yu Zhao <yuzhao@google.com>
+Date: Thu, 7 Dec 2023 23:14:06 -0700
+Subject: mm/mglru: respect min_ttl_ms with memcgs
+
+From: Yu Zhao <yuzhao@google.com>
+
+commit 8aa420617918d12d1f5d55030a503c9418e73c2c upstream.
+
+While investigating kswapd "consuming 100% CPU" [1] (also see "mm/mglru:
+try to stop at high watermarks"), it was discovered that the memcg LRU can
+breach the thrashing protection imposed by min_ttl_ms.
+
+Before the memcg LRU:
+ kswapd()
+ shrink_node_memcgs()
+ mem_cgroup_iter()
+ inc_max_seq() // always hit a different memcg
+ lru_gen_age_node()
+ mem_cgroup_iter()
+ check the timestamp of the oldest generation
+
+After the memcg LRU:
+ kswapd()
+ shrink_many()
+ restart:
+ iterate the memcg LRU:
+ inc_max_seq() // occasionally hit the same memcg
+ if raced with lru_gen_rotate_memcg():
+ goto restart
+ lru_gen_age_node()
+ mem_cgroup_iter()
+ check the timestamp of the oldest generation
+
+Specifically, when the restart happens in shrink_many(), it needs to stick
+with the (memcg LRU) generation it began with. In other words, it should
+neither re-read memcg_lru->seq nor age an lruvec of a different
+generation. Otherwise it can hit the same memcg multiple times without
+giving lru_gen_age_node() a chance to check the timestamp of that memcg's
+oldest generation (against min_ttl_ms).
+
+[1] https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com/
+
+Link: https://lkml.kernel.org/r/20231208061407.2125867-3-yuzhao@google.com
+Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
+Signed-off-by: Yu Zhao <yuzhao@google.com>
+Tested-by: T.J. Mercier <tjmercier@google.com>
+Cc: Charan Teja Kalla <quic_charante@quicinc.com>
+Cc: Hillf Danton <hdanton@sina.com>
+Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
+Cc: Kairui Song <ryncsn@gmail.com>
+Cc: Kalesh Singh <kaleshsingh@google.com>
+Cc: <stable@vger.kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ include/linux/mmzone.h | 30 +++++++++++++++++-------------
+ mm/vmscan.c | 30 ++++++++++++++++--------------
+ 2 files changed, 33 insertions(+), 27 deletions(-)
+
+--- a/include/linux/mmzone.h
++++ b/include/linux/mmzone.h
+@@ -505,33 +505,37 @@ void lru_gen_look_around(struct page_vma
+ * the old generation, is incremented when all its bins become empty.
+ *
+ * There are four operations:
+- * 1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in its
++ * 1. MEMCG_LRU_HEAD, which moves a memcg to the head of a random bin in its
+ * current generation (old or young) and updates its "seg" to "head";
+- * 2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in its
++ * 2. MEMCG_LRU_TAIL, which moves a memcg to the tail of a random bin in its
+ * current generation (old or young) and updates its "seg" to "tail";
+- * 3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in the old
++ * 3. MEMCG_LRU_OLD, which moves a memcg to the head of a random bin in the old
+ * generation, updates its "gen" to "old" and resets its "seg" to "default";
+- * 4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin in the
++ * 4. MEMCG_LRU_YOUNG, which moves a memcg to the tail of a random bin in the
+ * young generation, updates its "gen" to "young" and resets its "seg" to
+ * "default".
+ *
+ * The events that trigger the above operations are:
+ * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
+- * 2. The first attempt to reclaim an memcg below low, which triggers
++ * 2. The first attempt to reclaim a memcg below low, which triggers
+ * MEMCG_LRU_TAIL;
+- * 3. The first attempt to reclaim an memcg below reclaimable size threshold,
++ * 3. The first attempt to reclaim a memcg below reclaimable size threshold,
+ * which triggers MEMCG_LRU_TAIL;
+- * 4. The second attempt to reclaim an memcg below reclaimable size threshold,
++ * 4. The second attempt to reclaim a memcg below reclaimable size threshold,
+ * which triggers MEMCG_LRU_YOUNG;
+- * 5. Attempting to reclaim an memcg below min, which triggers MEMCG_LRU_YOUNG;
++ * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG;
+ * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG;
+- * 7. Offlining an memcg, which triggers MEMCG_LRU_OLD.
++ * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD.
+ *
+- * Note that memcg LRU only applies to global reclaim, and the round-robin
+- * incrementing of their max_seq counters ensures the eventual fairness to all
+- * eligible memcgs. For memcg reclaim, it still relies on mem_cgroup_iter().
++ * Notes:
++ * 1. Memcg LRU only applies to global reclaim, and the round-robin incrementing
++ * of their max_seq counters ensures the eventual fairness to all eligible
++ * memcgs. For memcg reclaim, it still relies on mem_cgroup_iter().
++ * 2. There are only two valid generations: old (seq) and young (seq+1).
++ * MEMCG_NR_GENS is set to three so that when reading the generation counter
++ * locklessly, a stale value (seq-1) does not wraparound to young.
+ */
+-#define MEMCG_NR_GENS 2
++#define MEMCG_NR_GENS 3
+ #define MEMCG_NR_BINS 8
+
+ struct lru_gen_memcg {
+--- a/mm/vmscan.c
++++ b/mm/vmscan.c
+@@ -4790,6 +4790,9 @@ static void lru_gen_rotate_memcg(struct
+ else
+ VM_WARN_ON_ONCE(true);
+
++ WRITE_ONCE(lruvec->lrugen.seg, seg);
++ WRITE_ONCE(lruvec->lrugen.gen, new);
++
+ hlist_nulls_del_rcu(&lruvec->lrugen.list);
+
+ if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD)
+@@ -4800,9 +4803,6 @@ static void lru_gen_rotate_memcg(struct
+ pgdat->memcg_lru.nr_memcgs[old]--;
+ pgdat->memcg_lru.nr_memcgs[new]++;
+
+- lruvec->lrugen.gen = new;
+- WRITE_ONCE(lruvec->lrugen.seg, seg);
+-
+ if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq))
+ WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+@@ -4825,11 +4825,11 @@ void lru_gen_online_memcg(struct mem_cgr
+
+ gen = get_memcg_gen(pgdat->memcg_lru.seq);
+
++ lruvec->lrugen.gen = gen;
++
+ hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]);
+ pgdat->memcg_lru.nr_memcgs[gen]++;
+
+- lruvec->lrugen.gen = gen;
+-
+ spin_unlock_irq(&pgdat->memcg_lru.lock);
+ }
+ }
+@@ -5328,7 +5328,7 @@ static long get_nr_to_scan(struct lruvec
+ DEFINE_MAX_SEQ(lruvec);
+
+ if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg))
+- return 0;
++ return -1;
+
+ if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan))
+ return nr_to_scan;
+@@ -5403,7 +5403,7 @@ static bool try_to_shrink_lruvec(struct
+ cond_resched();
+ }
+
+- /* whether try_to_inc_max_seq() was successful */
++ /* whether this lruvec should be rotated */
+ return nr_to_scan < 0;
+ }
+
+@@ -5457,13 +5457,13 @@ static void shrink_many(struct pglist_da
+ struct lruvec *lruvec;
+ struct lru_gen_folio *lrugen;
+ struct mem_cgroup *memcg;
+- const struct hlist_nulls_node *pos;
++ struct hlist_nulls_node *pos;
+
++ gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq));
+ bin = first_bin = get_random_u32_below(MEMCG_NR_BINS);
+ restart:
+ op = 0;
+ memcg = NULL;
+- gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq));
+
+ rcu_read_lock();
+
+@@ -5474,6 +5474,10 @@ restart:
+ }
+
+ mem_cgroup_put(memcg);
++ memcg = NULL;
++
++ if (gen != READ_ONCE(lrugen->gen))
++ continue;
+
+ lruvec = container_of(lrugen, struct lruvec, lrugen);
+ memcg = lruvec_memcg(lruvec);
+@@ -5558,16 +5562,14 @@ static void set_initial_priority(struct
+ if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH)
+ return;
+ /*
+- * Determine the initial priority based on ((total / MEMCG_NR_GENS) >>
+- * priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, where the
+- * estimated reclaimed_to_scanned_ratio = inactive / total.
++ * Determine the initial priority based on
++ * (total >> priority) * reclaimed_to_scanned_ratio = nr_to_reclaim,
++ * where reclaimed_to_scanned_ratio = inactive / total.
+ */
+ reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE);
+ if (get_swappiness(lruvec, sc))
+ reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON);
+
+- reclaimable /= MEMCG_NR_GENS;
+-
+ /* round down reclaimable and round up sc->nr_to_reclaim */
+ priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
+
--- /dev/null
+From 5095a2b23987d3c3c47dd16b3d4080e2733b8bb9 Mon Sep 17 00:00:00 2001
+From: Yu Zhao <yuzhao@google.com>
+Date: Thu, 7 Dec 2023 23:14:05 -0700
+Subject: mm/mglru: try to stop at high watermarks
+
+From: Yu Zhao <yuzhao@google.com>
+
+commit 5095a2b23987d3c3c47dd16b3d4080e2733b8bb9 upstream.
+
+The initial MGLRU patchset didn't include the memcg LRU support, and it
+relied on should_abort_scan(), added by commit f76c83378851 ("mm:
+multi-gen LRU: optimize multiple memcgs"), to "backoff to avoid
+overshooting their aggregate reclaim target by too much".
+
+Later on when the memcg LRU was added, should_abort_scan() was deemed
+unnecessary, and the test results [1] showed no side effects after it was
+removed by commit a579086c99ed ("mm: multi-gen LRU: remove eviction
+fairness safeguard").
+
+However, that test used memory.reclaim, which sets nr_to_reclaim to
+SWAP_CLUSTER_MAX. So it can overshoot only by SWAP_CLUSTER_MAX-1 pages,
+i.e., from nr_reclaimed=nr_to_reclaim-1 to
+nr_reclaimed=nr_to_reclaim+SWAP_CLUSTER_MAX-1. Compared with the batch
+size kswapd sets to nr_to_reclaim, SWAP_CLUSTER_MAX is tiny. Therefore
+that test isn't able to reproduce the worst case scenario, i.e., kswapd
+overshooting GBs on large systems and "consuming 100% CPU" (see the Closes
+tag).
+
+Bring back a simplified version of should_abort_scan() on top of the memcg
+LRU, so that kswapd stops when all eligible zones are above their
+respective high watermarks plus a small delta to lower the chance of
+KSWAPD_HIGH_WMARK_HIT_QUICKLY. Note that this only applies to order-0
+reclaim, meaning compaction-induced reclaim can still run wild (which is a
+different problem).
+
+On Android, launching 55 apps sequentially:
+ Before After Change
+ pgpgin 838377172 802955040 -4%
+ pgpgout 38037080 34336300 -10%
+
+[1] https://lore.kernel.org/20221222041905.2431096-1-yuzhao@google.com/
+
+Link: https://lkml.kernel.org/r/20231208061407.2125867-2-yuzhao@google.com
+Fixes: a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard")
+Signed-off-by: Yu Zhao <yuzhao@google.com>
+Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
+Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
+Closes: https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com/
+Tested-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
+Tested-by: Kalesh Singh <kaleshsingh@google.com>
+Cc: Hillf Danton <hdanton@sina.com>
+Cc: Kairui Song <ryncsn@gmail.com>
+Cc: T.J. Mercier <tjmercier@google.com>
+Cc: <stable@vger.kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ mm/vmscan.c | 36 ++++++++++++++++++++++++++++--------
+ 1 file changed, 28 insertions(+), 8 deletions(-)
+
+--- a/mm/vmscan.c
++++ b/mm/vmscan.c
+@@ -5341,20 +5341,41 @@ static long get_nr_to_scan(struct lruvec
+ return try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false) ? -1 : 0;
+ }
+
+-static unsigned long get_nr_to_reclaim(struct scan_control *sc)
++static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc)
+ {
++ int i;
++ enum zone_watermarks mark;
++
+ /* don't abort memcg reclaim to ensure fairness */
+ if (!root_reclaim(sc))
+- return -1;
++ return false;
++
++ if (sc->nr_reclaimed >= max(sc->nr_to_reclaim, compact_gap(sc->order)))
++ return true;
++
++ /* check the order to exclude compaction-induced reclaim */
++ if (!current_is_kswapd() || sc->order)
++ return false;
++
++ mark = sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING ?
++ WMARK_PROMO : WMARK_HIGH;
++
++ for (i = 0; i <= sc->reclaim_idx; i++) {
++ struct zone *zone = lruvec_pgdat(lruvec)->node_zones + i;
++ unsigned long size = wmark_pages(zone, mark) + MIN_LRU_BATCH;
++
++ if (managed_zone(zone) && !zone_watermark_ok(zone, 0, size, sc->reclaim_idx, 0))
++ return false;
++ }
+
+- return max(sc->nr_to_reclaim, compact_gap(sc->order));
++ /* kswapd should abort if all eligible zones are safe */
++ return true;
+ }
+
+ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
+ {
+ long nr_to_scan;
+ unsigned long scanned = 0;
+- unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
+ int swappiness = get_swappiness(lruvec, sc);
+
+ /* clean file folios are more likely to exist */
+@@ -5376,7 +5397,7 @@ static bool try_to_shrink_lruvec(struct
+ if (scanned >= nr_to_scan)
+ break;
+
+- if (sc->nr_reclaimed >= nr_to_reclaim)
++ if (should_abort_scan(lruvec, sc))
+ break;
+
+ cond_resched();
+@@ -5437,7 +5458,6 @@ static void shrink_many(struct pglist_da
+ struct lru_gen_folio *lrugen;
+ struct mem_cgroup *memcg;
+ const struct hlist_nulls_node *pos;
+- unsigned long nr_to_reclaim = get_nr_to_reclaim(sc);
+
+ bin = first_bin = get_random_u32_below(MEMCG_NR_BINS);
+ restart:
+@@ -5470,7 +5490,7 @@ restart:
+
+ rcu_read_lock();
+
+- if (sc->nr_reclaimed >= nr_to_reclaim)
++ if (should_abort_scan(lruvec, sc))
+ break;
+ }
+
+@@ -5481,7 +5501,7 @@ restart:
+
+ mem_cgroup_put(memcg);
+
+- if (sc->nr_reclaimed >= nr_to_reclaim)
++ if (!is_a_nulls(pos))
+ return;
+
+ /* restart if raced with lru_gen_rotate_memcg() */
--- /dev/null
+From 55ac8bbe358bdd2f3c044c12f249fd22d48fe015 Mon Sep 17 00:00:00 2001
+From: David Stevens <stevensd@chromium.org>
+Date: Tue, 18 Apr 2023 17:40:31 +0900
+Subject: mm/shmem: fix race in shmem_undo_range w/THP
+
+From: David Stevens <stevensd@chromium.org>
+
+commit 55ac8bbe358bdd2f3c044c12f249fd22d48fe015 upstream.
+
+Split folios during the second loop of shmem_undo_range. It's not
+sufficient to only split folios when dealing with partial pages, since
+it's possible for a THP to be faulted in after that point. Calling
+truncate_inode_folio in that situation can result in throwing away data
+outside of the range being targeted.
+
+[akpm@linux-foundation.org: tidy up comment layout]
+Link: https://lkml.kernel.org/r/20230418084031.3439795-1-stevensd@google.com
+Fixes: b9a8a4195c7d ("truncate,shmem: Handle truncates that split large folios")
+Signed-off-by: David Stevens <stevensd@chromium.org>
+Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
+Cc: Suleiman Souhlal <suleiman@google.com>
+Cc: <stable@vger.kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ mm/shmem.c | 19 ++++++++++++++++++-
+ 1 file changed, 18 insertions(+), 1 deletion(-)
+
+--- a/mm/shmem.c
++++ b/mm/shmem.c
+@@ -1098,7 +1098,24 @@ whole_folios:
+ }
+ VM_BUG_ON_FOLIO(folio_test_writeback(folio),
+ folio);
+- truncate_inode_folio(mapping, folio);
++
++ if (!folio_test_large(folio)) {
++ truncate_inode_folio(mapping, folio);
++ } else if (truncate_inode_partial_folio(folio, lstart, lend)) {
++ /*
++ * If we split a page, reset the loop so
++ * that we pick up the new sub pages.
++ * Otherwise the THP was entirely
++ * dropped or the target range was
++ * zeroed, so just continue the loop as
++ * is.
++ */
++ if (!folio_test_large(folio)) {
++ folio_unlock(folio);
++ index = start;
++ break;
++ }
++ }
+ }
+ folio_unlock(folio);
+ }
--- /dev/null
+From 43e8832fed08438e2a27afed9bac21acd0ceffe5 Mon Sep 17 00:00:00 2001
+From: John Hubbard <jhubbard@nvidia.com>
+Date: Fri, 8 Dec 2023 18:01:44 -0800
+Subject: Revert "selftests: error out if kernel header files are not yet built"
+
+From: John Hubbard <jhubbard@nvidia.com>
+
+commit 43e8832fed08438e2a27afed9bac21acd0ceffe5 upstream.
+
+This reverts commit 9fc96c7c19df ("selftests: error out if kernel header
+files are not yet built").
+
+It turns out that requiring the kernel headers to be built as a
+prerequisite to building selftests, does not work in many cases. For
+example, Peter Zijlstra writes:
+
+"My biggest beef with the whole thing is that I simply do not want to use
+'make headers', it doesn't work for me.
+
+I have a ton of output directories and I don't care to build tools into
+the output dirs, in fact some of them flat out refuse to work that way
+(bpf comes to mind)." [1]
+
+Therefore, stop erroring out on the selftests build. Additional patches
+will be required in order to change over to not requiring the kernel
+headers.
+
+[1] https://lore.kernel.org/20231208221007.GO28727@noisy.programming.kicks-ass.net
+
+Link: https://lkml.kernel.org/r/20231209020144.244759-1-jhubbard@nvidia.com
+Fixes: 9fc96c7c19df ("selftests: error out if kernel header files are not yet built")
+Signed-off-by: John Hubbard <jhubbard@nvidia.com>
+Cc: Anders Roxell <anders.roxell@linaro.org>
+Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
+Cc: David Hildenbrand <david@redhat.com>
+Cc: Peter Xu <peterx@redhat.com>
+Cc: Jonathan Corbet <corbet@lwn.net>
+Cc: Nathan Chancellor <nathan@kernel.org>
+Cc: Shuah Khan <shuah@kernel.org>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Marcos Paulo de Souza <mpdesouza@suse.com>
+Cc: <stable@vger.kernel.org>
+Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ tools/testing/selftests/Makefile | 21 --------------------
+ tools/testing/selftests/lib.mk | 40 ++-------------------------------------
+ 2 files changed, 4 insertions(+), 57 deletions(-)
+
+--- a/tools/testing/selftests/Makefile
++++ b/tools/testing/selftests/Makefile
+@@ -152,12 +152,10 @@ ifneq ($(KBUILD_OUTPUT),)
+ abs_objtree := $(realpath $(abs_objtree))
+ BUILD := $(abs_objtree)/kselftest
+ KHDR_INCLUDES := -isystem ${abs_objtree}/usr/include
+- KHDR_DIR := ${abs_objtree}/usr/include
+ else
+ BUILD := $(CURDIR)
+ abs_srctree := $(shell cd $(top_srcdir) && pwd)
+ KHDR_INCLUDES := -isystem ${abs_srctree}/usr/include
+- KHDR_DIR := ${abs_srctree}/usr/include
+ DEFAULT_INSTALL_HDR_PATH := 1
+ endif
+
+@@ -171,7 +169,7 @@ export KHDR_INCLUDES
+ # all isn't the first target in the file.
+ .DEFAULT_GOAL := all
+
+-all: kernel_header_files
++all:
+ @ret=1; \
+ for TARGET in $(TARGETS); do \
+ BUILD_TARGET=$$BUILD/$$TARGET; \
+@@ -182,23 +180,6 @@ all: kernel_header_files
+ ret=$$((ret * $$?)); \
+ done; exit $$ret;
+
+-kernel_header_files:
+- @ls $(KHDR_DIR)/linux/*.h >/dev/null 2>/dev/null; \
+- if [ $$? -ne 0 ]; then \
+- RED='\033[1;31m'; \
+- NOCOLOR='\033[0m'; \
+- echo; \
+- echo -e "$${RED}error$${NOCOLOR}: missing kernel header files."; \
+- echo "Please run this and try again:"; \
+- echo; \
+- echo " cd $(top_srcdir)"; \
+- echo " make headers"; \
+- echo; \
+- exit 1; \
+- fi
+-
+-.PHONY: kernel_header_files
+-
+ run_tests: all
+ @for TARGET in $(TARGETS); do \
+ BUILD_TARGET=$$BUILD/$$TARGET; \
+--- a/tools/testing/selftests/lib.mk
++++ b/tools/testing/selftests/lib.mk
+@@ -44,26 +44,10 @@ endif
+ selfdir = $(realpath $(dir $(filter %/lib.mk,$(MAKEFILE_LIST))))
+ top_srcdir = $(selfdir)/../../..
+
+-ifeq ("$(origin O)", "command line")
+- KBUILD_OUTPUT := $(O)
++ifeq ($(KHDR_INCLUDES),)
++KHDR_INCLUDES := -isystem $(top_srcdir)/usr/include
+ endif
+
+-ifneq ($(KBUILD_OUTPUT),)
+- # Make's built-in functions such as $(abspath ...), $(realpath ...) cannot
+- # expand a shell special character '~'. We use a somewhat tedious way here.
+- abs_objtree := $(shell cd $(top_srcdir) && mkdir -p $(KBUILD_OUTPUT) && cd $(KBUILD_OUTPUT) && pwd)
+- $(if $(abs_objtree),, \
+- $(error failed to create output directory "$(KBUILD_OUTPUT)"))
+- # $(realpath ...) resolves symlinks
+- abs_objtree := $(realpath $(abs_objtree))
+- KHDR_DIR := ${abs_objtree}/usr/include
+-else
+- abs_srctree := $(shell cd $(top_srcdir) && pwd)
+- KHDR_DIR := ${abs_srctree}/usr/include
+-endif
+-
+-KHDR_INCLUDES := -isystem $(KHDR_DIR)
+-
+ # The following are built by lib.mk common compile rules.
+ # TEST_CUSTOM_PROGS should be used by tests that require
+ # custom build rule and prevent common build rule use.
+@@ -74,25 +58,7 @@ TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)
+ TEST_GEN_PROGS_EXTENDED := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS_EXTENDED))
+ TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES))
+
+-all: kernel_header_files $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) \
+- $(TEST_GEN_FILES)
+-
+-kernel_header_files:
+- @ls $(KHDR_DIR)/linux/*.h >/dev/null 2>/dev/null; \
+- if [ $$? -ne 0 ]; then \
+- RED='\033[1;31m'; \
+- NOCOLOR='\033[0m'; \
+- echo; \
+- echo -e "$${RED}error$${NOCOLOR}: missing kernel header files."; \
+- echo "Please run this and try again:"; \
+- echo; \
+- echo " cd $(top_srcdir)"; \
+- echo " make headers"; \
+- echo; \
+- exit 1; \
+- fi
+-
+-.PHONY: kernel_header_files
++all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES)
+
+ define RUN_TESTS
+ BASE_DIR="$(selfdir)"; \
cxl-hdm-fix-dpa-translation-locking.patch
soundwire-stream-fix-null-pointer-dereference-for-multi_link.patch
ext4-prevent-the-normalized-size-from-exceeding-ext_max_blocks.patch
+revert-selftests-error-out-if-kernel-header-files-are-not-yet-built.patch
+arm64-mm-always-make-sw-dirty-ptes-hw-dirty-in-pte_modify.patch
+team-fix-use-after-free-when-an-option-instance-allocation-fails.patch
+drm-amdgpu-sdma5.2-add-begin-end_use-ring-callbacks.patch
+drm-mediatek-fix-access-violation-in-mtk_drm_crtc_dma_dev_get.patch
+dmaengine-stm32-dma-avoid-bitfield-overflow-assertion.patch
+dmaengine-fsl-edma-fix-dma-channel-leak-in-edmav4.patch
+mm-mglru-fix-underprotected-page-cache.patch
+mm-mglru-try-to-stop-at-high-watermarks.patch
+mm-mglru-respect-min_ttl_ms-with-memcgs.patch
+mm-mglru-reclaim-offlined-memcgs-harder.patch
+mm-shmem-fix-race-in-shmem_undo_range-w-thp.patch
+kexec-drop-dependency-on-arch_supports_kexec-from-crash_dump.patch
+btrfs-free-qgroup-reserve-when-ordered_ioerr-is-set.patch
+btrfs-fix-qgroup_free_reserved_data-int-overflow.patch
+btrfs-don-t-clear-qgroup-reserved-bit-in-release_folio.patch
+drm-amdgpu-fix-tear-down-order-in-amdgpu_vm_pt_free.patch
+drm-edid-also-call-add-modes-in-edid-connector-update-fallback.patch
+drm-amd-display-restore-guard-against-default-backlight-value-1-nit.patch
+drm-amd-display-disable-psr-su-on-parade-0803-tcon-again.patch
+drm-i915-fix-adl-tiled-plane-stride-when-the-pot-stride-is-smaller-than-the-original.patch
+drm-i915-fix-intel_atomic_setup_scalers-plane_state-handling.patch
+drm-i915-fix-remapped-stride-with-ccs-on-adl.patch
+smb-client-fix-oob-in-receive_encrypted_standard.patch
+smb-client-fix-potential-oobs-in-smb2_parse_contexts.patch
+smb-client-fix-null-deref-in-asn1_ber_decoder.patch
+smb-client-fix-oob-in-smb2_query_reparse_point.patch
--- /dev/null
+From 90d025c2e953c11974e76637977c473200593a46 Mon Sep 17 00:00:00 2001
+From: Paulo Alcantara <pc@manguebit.com>
+Date: Mon, 11 Dec 2023 10:26:42 -0300
+Subject: smb: client: fix NULL deref in asn1_ber_decoder()
+
+From: Paulo Alcantara <pc@manguebit.com>
+
+commit 90d025c2e953c11974e76637977c473200593a46 upstream.
+
+If server replied SMB2_NEGOTIATE with a zero SecurityBufferOffset,
+smb2_get_data_area() sets @len to non-zero but return NULL, so
+decode_negTokeninit() ends up being called with a NULL @security_blob:
+
+ BUG: kernel NULL pointer dereference, address: 0000000000000000
+ #PF: supervisor read access in kernel mode
+ #PF: error_code(0x0000) - not-present page
+ PGD 0 P4D 0
+ Oops: 0000 [#1] PREEMPT SMP NOPTI
+ CPU: 2 PID: 871 Comm: mount.cifs Not tainted 6.7.0-rc4 #2
+ Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
+ RIP: 0010:asn1_ber_decoder+0x173/0xc80
+ Code: 01 4c 39 2c 24 75 09 45 84 c9 0f 85 2f 03 00 00 48 8b 14 24 4c 29 ea 48 83 fa 01 0f 86 1e 07 00 00 48 8b 74 24 28 4d 8d 5d 01 <42> 0f b6 3c 2e 89 fa 40 88 7c 24 5c f7 d2 83 e2 1f 0f 84 3d 07 00
+ RSP: 0018:ffffc9000063f950 EFLAGS: 00010202
+ RAX: 0000000000000002 RBX: 0000000000000000 RCX: 000000000000004a
+ RDX: 000000000000004a RSI: 0000000000000000 RDI: 0000000000000000
+ RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
+ R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000000
+ R13: 0000000000000000 R14: 000000000000004d R15: 0000000000000000
+ FS: 00007fce52b0fbc0(0000) GS:ffff88806ba00000(0000) knlGS:0000000000000000
+ CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ CR2: 0000000000000000 CR3: 000000001ae64000 CR4: 0000000000750ef0
+ PKRU: 55555554
+ Call Trace:
+ <TASK>
+ ? __die+0x23/0x70
+ ? page_fault_oops+0x181/0x480
+ ? __stack_depot_save+0x1e6/0x480
+ ? exc_page_fault+0x6f/0x1c0
+ ? asm_exc_page_fault+0x26/0x30
+ ? asn1_ber_decoder+0x173/0xc80
+ ? check_object+0x40/0x340
+ decode_negTokenInit+0x1e/0x30 [cifs]
+ SMB2_negotiate+0xc99/0x17c0 [cifs]
+ ? smb2_negotiate+0x46/0x60 [cifs]
+ ? srso_alias_return_thunk+0x5/0xfbef5
+ smb2_negotiate+0x46/0x60 [cifs]
+ cifs_negotiate_protocol+0xae/0x130 [cifs]
+ cifs_get_smb_ses+0x517/0x1040 [cifs]
+ ? srso_alias_return_thunk+0x5/0xfbef5
+ ? srso_alias_return_thunk+0x5/0xfbef5
+ ? queue_delayed_work_on+0x5d/0x90
+ cifs_mount_get_session+0x78/0x200 [cifs]
+ dfs_mount_share+0x13a/0x9f0 [cifs]
+ ? srso_alias_return_thunk+0x5/0xfbef5
+ ? lock_acquire+0xbf/0x2b0
+ ? find_nls+0x16/0x80
+ ? srso_alias_return_thunk+0x5/0xfbef5
+ cifs_mount+0x7e/0x350 [cifs]
+ cifs_smb3_do_mount+0x128/0x780 [cifs]
+ smb3_get_tree+0xd9/0x290 [cifs]
+ vfs_get_tree+0x2c/0x100
+ ? capable+0x37/0x70
+ path_mount+0x2d7/0xb80
+ ? srso_alias_return_thunk+0x5/0xfbef5
+ ? _raw_spin_unlock_irqrestore+0x44/0x60
+ __x64_sys_mount+0x11a/0x150
+ do_syscall_64+0x47/0xf0
+ entry_SYSCALL_64_after_hwframe+0x6f/0x77
+ RIP: 0033:0x7fce52c2ab1e
+
+Fix this by setting @len to zero when @off == 0 so callers won't
+attempt to dereference non-existing data areas.
+
+Reported-by: Robert Morris <rtm@csail.mit.edu>
+Cc: stable@vger.kernel.org
+Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
+Signed-off-by: Steve French <stfrench@microsoft.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ fs/smb/client/smb2misc.c | 26 ++++++++++----------------
+ 1 file changed, 10 insertions(+), 16 deletions(-)
+
+--- a/fs/smb/client/smb2misc.c
++++ b/fs/smb/client/smb2misc.c
+@@ -313,6 +313,9 @@ static const bool has_smb2_data_area[NUM
+ char *
+ smb2_get_data_area_len(int *off, int *len, struct smb2_hdr *shdr)
+ {
++ const int max_off = 4096;
++ const int max_len = 128 * 1024;
++
+ *off = 0;
+ *len = 0;
+
+@@ -384,29 +387,20 @@ smb2_get_data_area_len(int *off, int *le
+ * Invalid length or offset probably means data area is invalid, but
+ * we have little choice but to ignore the data area in this case.
+ */
+- if (*off > 4096) {
+- cifs_dbg(VFS, "offset %d too large, data area ignored\n", *off);
+- *len = 0;
+- *off = 0;
+- } else if (*off < 0) {
+- cifs_dbg(VFS, "negative offset %d to data invalid ignore data area\n",
+- *off);
++ if (unlikely(*off < 0 || *off > max_off ||
++ *len < 0 || *len > max_len)) {
++ cifs_dbg(VFS, "%s: invalid data area (off=%d len=%d)\n",
++ __func__, *off, *len);
+ *off = 0;
+ *len = 0;
+- } else if (*len < 0) {
+- cifs_dbg(VFS, "negative data length %d invalid, data area ignored\n",
+- *len);
+- *len = 0;
+- } else if (*len > 128 * 1024) {
+- cifs_dbg(VFS, "data area larger than 128K: %d\n", *len);
++ } else if (*off == 0) {
+ *len = 0;
+ }
+
+ /* return pointer to beginning of data area, ie offset from SMB start */
+- if ((*off != 0) && (*len != 0))
++ if (*off > 0 && *len > 0)
+ return (char *)shdr + *off;
+- else
+- return NULL;
++ return NULL;
+ }
+
+ /*
--- /dev/null
+From eec04ea119691e65227a97ce53c0da6b9b74b0b7 Mon Sep 17 00:00:00 2001
+From: Paulo Alcantara <pc@manguebit.com>
+Date: Mon, 11 Dec 2023 10:26:40 -0300
+Subject: smb: client: fix OOB in receive_encrypted_standard()
+
+From: Paulo Alcantara <pc@manguebit.com>
+
+commit eec04ea119691e65227a97ce53c0da6b9b74b0b7 upstream.
+
+Fix potential OOB in receive_encrypted_standard() if server returned a
+large shdr->NextCommand that would end up writing off the end of
+@next_buffer.
+
+Fixes: b24df3e30cbf ("cifs: update receive_encrypted_standard to handle compounded responses")
+Cc: stable@vger.kernel.org
+Reported-by: Robert Morris <rtm@csail.mit.edu>
+Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
+Signed-off-by: Steve French <stfrench@microsoft.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ fs/smb/client/smb2ops.c | 14 ++++++++------
+ 1 file changed, 8 insertions(+), 6 deletions(-)
+
+--- a/fs/smb/client/smb2ops.c
++++ b/fs/smb/client/smb2ops.c
+@@ -4941,6 +4941,7 @@ receive_encrypted_standard(struct TCP_Se
+ struct smb2_hdr *shdr;
+ unsigned int pdu_length = server->pdu_size;
+ unsigned int buf_size;
++ unsigned int next_cmd;
+ struct mid_q_entry *mid_entry;
+ int next_is_large;
+ char *next_buffer = NULL;
+@@ -4969,14 +4970,15 @@ receive_encrypted_standard(struct TCP_Se
+ next_is_large = server->large_buf;
+ one_more:
+ shdr = (struct smb2_hdr *)buf;
+- if (shdr->NextCommand) {
++ next_cmd = le32_to_cpu(shdr->NextCommand);
++ if (next_cmd) {
++ if (WARN_ON_ONCE(next_cmd > pdu_length))
++ return -1;
+ if (next_is_large)
+ next_buffer = (char *)cifs_buf_get();
+ else
+ next_buffer = (char *)cifs_small_buf_get();
+- memcpy(next_buffer,
+- buf + le32_to_cpu(shdr->NextCommand),
+- pdu_length - le32_to_cpu(shdr->NextCommand));
++ memcpy(next_buffer, buf + next_cmd, pdu_length - next_cmd);
+ }
+
+ mid_entry = smb2_find_mid(server, buf);
+@@ -5000,8 +5002,8 @@ one_more:
+ else
+ ret = cifs_handle_standard(server, mid_entry);
+
+- if (ret == 0 && shdr->NextCommand) {
+- pdu_length -= le32_to_cpu(shdr->NextCommand);
++ if (ret == 0 && next_cmd) {
++ pdu_length -= next_cmd;
+ server->large_buf = next_is_large;
+ if (next_is_large)
+ server->bigbuf = buf = next_buffer;
--- /dev/null
+From 3a42709fa909e22b0be4bb1e2795aa04ada732a3 Mon Sep 17 00:00:00 2001
+From: Paulo Alcantara <pc@manguebit.com>
+Date: Mon, 11 Dec 2023 10:26:43 -0300
+Subject: smb: client: fix OOB in smb2_query_reparse_point()
+
+From: Paulo Alcantara <pc@manguebit.com>
+
+commit 3a42709fa909e22b0be4bb1e2795aa04ada732a3 upstream.
+
+Validate @ioctl_rsp->OutputOffset and @ioctl_rsp->OutputCount so that
+their sum does not wrap to a number that is smaller than @reparse_buf
+and we end up with a wild pointer as follows:
+
+ BUG: unable to handle page fault for address: ffff88809c5cd45f
+ #PF: supervisor read access in kernel mode
+ #PF: error_code(0x0000) - not-present page
+ PGD 4a01067 P4D 4a01067 PUD 0
+ Oops: 0000 [#1] PREEMPT SMP NOPTI
+ CPU: 2 PID: 1260 Comm: mount.cifs Not tainted 6.7.0-rc4 #2
+ Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
+ rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
+ RIP: 0010:smb2_query_reparse_point+0x3e0/0x4c0 [cifs]
+ Code: ff ff e8 f3 51 fe ff 41 89 c6 58 5a 45 85 f6 0f 85 14 fe ff ff
+ 49 8b 57 48 8b 42 60 44 8b 42 64 42 8d 0c 00 49 39 4f 50 72 40 <8b>
+ 04 02 48 8b 9d f0 fe ff ff 49 8b 57 50 89 03 48 8b 9d e8 fe ff
+ RSP: 0018:ffffc90000347a90 EFLAGS: 00010212
+ RAX: 000000008000001f RBX: ffff88800ae11000 RCX: 00000000000000ec
+ RDX: ffff88801c5cd440 RSI: 0000000000000000 RDI: ffffffff82004aa4
+ RBP: ffffc90000347bb0 R08: 00000000800000cd R09: 0000000000000001
+ R10: 0000000000000000 R11: 0000000000000024 R12: ffff8880114d4100
+ R13: ffff8880114d4198 R14: 0000000000000000 R15: ffff8880114d4000
+ FS: 00007f02c07babc0(0000) GS:ffff88806ba00000(0000)
+ knlGS:0000000000000000
+ CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ CR2: ffff88809c5cd45f CR3: 0000000011750000 CR4: 0000000000750ef0
+ PKRU: 55555554
+ Call Trace:
+ <TASK>
+ ? __die+0x23/0x70
+ ? page_fault_oops+0x181/0x480
+ ? search_module_extables+0x19/0x60
+ ? srso_alias_return_thunk+0x5/0xfbef5
+ ? exc_page_fault+0x1b6/0x1c0
+ ? asm_exc_page_fault+0x26/0x30
+ ? _raw_spin_unlock_irqrestore+0x44/0x60
+ ? smb2_query_reparse_point+0x3e0/0x4c0 [cifs]
+ cifs_get_fattr+0x16e/0xa50 [cifs]
+ ? srso_alias_return_thunk+0x5/0xfbef5
+ ? lock_acquire+0xbf/0x2b0
+ cifs_root_iget+0x163/0x5f0 [cifs]
+ cifs_smb3_do_mount+0x5bd/0x780 [cifs]
+ smb3_get_tree+0xd9/0x290 [cifs]
+ vfs_get_tree+0x2c/0x100
+ ? capable+0x37/0x70
+ path_mount+0x2d7/0xb80
+ ? srso_alias_return_thunk+0x5/0xfbef5
+ ? _raw_spin_unlock_irqrestore+0x44/0x60
+ __x64_sys_mount+0x11a/0x150
+ do_syscall_64+0x47/0xf0
+ entry_SYSCALL_64_after_hwframe+0x6f/0x77
+ RIP: 0033:0x7f02c08d5b1e
+
+Fixes: 2e4564b31b64 ("smb3: add support for stat of WSL reparse points for special file types")
+Cc: stable@vger.kernel.org
+Reported-by: Robert Morris <rtm@csail.mit.edu>
+Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
+Signed-off-by: Steve French <stfrench@microsoft.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ fs/smb/client/smb2ops.c | 26 ++++++++++++++++----------
+ 1 file changed, 16 insertions(+), 10 deletions(-)
+
+--- a/fs/smb/client/smb2ops.c
++++ b/fs/smb/client/smb2ops.c
+@@ -3001,7 +3001,7 @@ static int smb2_query_reparse_point(cons
+ struct kvec *rsp_iov;
+ struct smb2_ioctl_rsp *ioctl_rsp;
+ struct reparse_data_buffer *reparse_buf;
+- u32 plen;
++ u32 off, count, len;
+
+ cifs_dbg(FYI, "%s: path: %s\n", __func__, full_path);
+
+@@ -3082,16 +3082,22 @@ static int smb2_query_reparse_point(cons
+ */
+ if (rc == 0) {
+ /* See MS-FSCC 2.3.23 */
++ off = le32_to_cpu(ioctl_rsp->OutputOffset);
++ count = le32_to_cpu(ioctl_rsp->OutputCount);
++ if (check_add_overflow(off, count, &len) ||
++ len > rsp_iov[1].iov_len) {
++ cifs_tcon_dbg(VFS, "%s: invalid ioctl: off=%d count=%d\n",
++ __func__, off, count);
++ rc = -EIO;
++ goto query_rp_exit;
++ }
+
+- reparse_buf = (struct reparse_data_buffer *)
+- ((char *)ioctl_rsp +
+- le32_to_cpu(ioctl_rsp->OutputOffset));
+- plen = le32_to_cpu(ioctl_rsp->OutputCount);
+-
+- if (plen + le32_to_cpu(ioctl_rsp->OutputOffset) >
+- rsp_iov[1].iov_len) {
+- cifs_tcon_dbg(FYI, "srv returned invalid ioctl len: %d\n",
+- plen);
++ reparse_buf = (void *)((u8 *)ioctl_rsp + off);
++ len = sizeof(*reparse_buf);
++ if (count < len ||
++ count < le16_to_cpu(reparse_buf->ReparseDataLength) + len) {
++ cifs_tcon_dbg(VFS, "%s: invalid ioctl: off=%d count=%d\n",
++ __func__, off, count);
+ rc = -EIO;
+ goto query_rp_exit;
+ }
--- /dev/null
+From af1689a9b7701d9907dfc84d2a4b57c4bc907144 Mon Sep 17 00:00:00 2001
+From: Paulo Alcantara <pc@manguebit.com>
+Date: Mon, 11 Dec 2023 10:26:41 -0300
+Subject: smb: client: fix potential OOBs in smb2_parse_contexts()
+
+From: Paulo Alcantara <pc@manguebit.com>
+
+commit af1689a9b7701d9907dfc84d2a4b57c4bc907144 upstream.
+
+Validate offsets and lengths before dereferencing create contexts in
+smb2_parse_contexts().
+
+This fixes following oops when accessing invalid create contexts from
+server:
+
+ BUG: unable to handle page fault for address: ffff8881178d8cc3
+ #PF: supervisor read access in kernel mode
+ #PF: error_code(0x0000) - not-present page
+ PGD 4a01067 P4D 4a01067 PUD 0
+ Oops: 0000 [#1] PREEMPT SMP NOPTI
+ CPU: 3 PID: 1736 Comm: mount.cifs Not tainted 6.7.0-rc4 #1
+ Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
+ rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
+ RIP: 0010:smb2_parse_contexts+0xa0/0x3a0 [cifs]
+ Code: f8 10 75 13 48 b8 93 ad 25 50 9c b4 11 e7 49 39 06 0f 84 d2 00
+ 00 00 8b 45 00 85 c0 74 61 41 29 c5 48 01 c5 41 83 fd 0f 76 55 <0f> b7
+ 7d 04 0f b7 45 06 4c 8d 74 3d 00 66 83 f8 04 75 bc ba 04 00
+ RSP: 0018:ffffc900007939e0 EFLAGS: 00010216
+ RAX: ffffc90000793c78 RBX: ffff8880180cc000 RCX: ffffc90000793c90
+ RDX: ffffc90000793cc0 RSI: ffff8880178d8cc0 RDI: ffff8880180cc000
+ RBP: ffff8881178d8cbf R08: ffffc90000793c22 R09: 0000000000000000
+ R10: ffff8880180cc000 R11: 0000000000000024 R12: 0000000000000000
+ R13: 0000000000000020 R14: 0000000000000000 R15: ffffc90000793c22
+ FS: 00007f873753cbc0(0000) GS:ffff88806bc00000(0000)
+ knlGS:0000000000000000
+ CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ CR2: ffff8881178d8cc3 CR3: 00000000181ca000 CR4: 0000000000750ef0
+ PKRU: 55555554
+ Call Trace:
+ <TASK>
+ ? __die+0x23/0x70
+ ? page_fault_oops+0x181/0x480
+ ? search_module_extables+0x19/0x60
+ ? srso_alias_return_thunk+0x5/0xfbef5
+ ? exc_page_fault+0x1b6/0x1c0
+ ? asm_exc_page_fault+0x26/0x30
+ ? smb2_parse_contexts+0xa0/0x3a0 [cifs]
+ SMB2_open+0x38d/0x5f0 [cifs]
+ ? smb2_is_path_accessible+0x138/0x260 [cifs]
+ smb2_is_path_accessible+0x138/0x260 [cifs]
+ cifs_is_path_remote+0x8d/0x230 [cifs]
+ cifs_mount+0x7e/0x350 [cifs]
+ cifs_smb3_do_mount+0x128/0x780 [cifs]
+ smb3_get_tree+0xd9/0x290 [cifs]
+ vfs_get_tree+0x2c/0x100
+ ? capable+0x37/0x70
+ path_mount+0x2d7/0xb80
+ ? srso_alias_return_thunk+0x5/0xfbef5
+ ? _raw_spin_unlock_irqrestore+0x44/0x60
+ __x64_sys_mount+0x11a/0x150
+ do_syscall_64+0x47/0xf0
+ entry_SYSCALL_64_after_hwframe+0x6f/0x77
+ RIP: 0033:0x7f8737657b1e
+
+Reported-by: Robert Morris <rtm@csail.mit.edu>
+Cc: stable@vger.kernel.org
+Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
+Signed-off-by: Steve French <stfrench@microsoft.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ fs/smb/client/cached_dir.c | 17 +++++---
+ fs/smb/client/smb2pdu.c | 91 +++++++++++++++++++++++++++------------------
+ fs/smb/client/smb2proto.h | 12 +++--
+ 3 files changed, 74 insertions(+), 46 deletions(-)
+
+--- a/fs/smb/client/cached_dir.c
++++ b/fs/smb/client/cached_dir.c
+@@ -291,16 +291,23 @@ int open_cached_dir(unsigned int xid, st
+ oparms.fid->mid = le64_to_cpu(o_rsp->hdr.MessageId);
+ #endif /* CIFS_DEBUG2 */
+
+- rc = -EINVAL;
++
+ if (o_rsp->OplockLevel != SMB2_OPLOCK_LEVEL_LEASE) {
+ spin_unlock(&cfids->cfid_list_lock);
++ rc = -EINVAL;
+ goto oshr_free;
+ }
+
+- smb2_parse_contexts(server, o_rsp,
+- &oparms.fid->epoch,
+- oparms.fid->lease_key, &oplock,
+- NULL, NULL);
++ rc = smb2_parse_contexts(server, rsp_iov,
++ &oparms.fid->epoch,
++ oparms.fid->lease_key,
++ &oplock, NULL, NULL);
++ if (rc) {
++ spin_unlock(&cfids->cfid_list_lock);
++ goto oshr_free;
++ }
++
++ rc = -EINVAL;
+ if (!(oplock & SMB2_LEASE_READ_CACHING_HE)) {
+ spin_unlock(&cfids->cfid_list_lock);
+ goto oshr_free;
+--- a/fs/smb/client/smb2pdu.c
++++ b/fs/smb/client/smb2pdu.c
+@@ -2141,17 +2141,18 @@ parse_posix_ctxt(struct create_context *
+ posix->nlink, posix->mode, posix->reparse_tag);
+ }
+
+-void
+-smb2_parse_contexts(struct TCP_Server_Info *server,
+- struct smb2_create_rsp *rsp,
+- unsigned int *epoch, char *lease_key, __u8 *oplock,
+- struct smb2_file_all_info *buf,
+- struct create_posix_rsp *posix)
++int smb2_parse_contexts(struct TCP_Server_Info *server,
++ struct kvec *rsp_iov,
++ unsigned int *epoch,
++ char *lease_key, __u8 *oplock,
++ struct smb2_file_all_info *buf,
++ struct create_posix_rsp *posix)
+ {
+- char *data_offset;
++ struct smb2_create_rsp *rsp = rsp_iov->iov_base;
+ struct create_context *cc;
+- unsigned int next;
+- unsigned int remaining;
++ size_t rem, off, len;
++ size_t doff, dlen;
++ size_t noff, nlen;
+ char *name;
+ static const char smb3_create_tag_posix[] = {
+ 0x93, 0xAD, 0x25, 0x50, 0x9C,
+@@ -2160,45 +2161,63 @@ smb2_parse_contexts(struct TCP_Server_In
+ };
+
+ *oplock = 0;
+- data_offset = (char *)rsp + le32_to_cpu(rsp->CreateContextsOffset);
+- remaining = le32_to_cpu(rsp->CreateContextsLength);
+- cc = (struct create_context *)data_offset;
++
++ off = le32_to_cpu(rsp->CreateContextsOffset);
++ rem = le32_to_cpu(rsp->CreateContextsLength);
++ if (check_add_overflow(off, rem, &len) || len > rsp_iov->iov_len)
++ return -EINVAL;
++ cc = (struct create_context *)((u8 *)rsp + off);
+
+ /* Initialize inode number to 0 in case no valid data in qfid context */
+ if (buf)
+ buf->IndexNumber = 0;
+
+- while (remaining >= sizeof(struct create_context)) {
+- name = le16_to_cpu(cc->NameOffset) + (char *)cc;
+- if (le16_to_cpu(cc->NameLength) == 4 &&
+- strncmp(name, SMB2_CREATE_REQUEST_LEASE, 4) == 0)
+- *oplock = server->ops->parse_lease_buf(cc, epoch,
+- lease_key);
+- else if (buf && (le16_to_cpu(cc->NameLength) == 4) &&
+- strncmp(name, SMB2_CREATE_QUERY_ON_DISK_ID, 4) == 0)
+- parse_query_id_ctxt(cc, buf);
+- else if ((le16_to_cpu(cc->NameLength) == 16)) {
+- if (posix &&
+- memcmp(name, smb3_create_tag_posix, 16) == 0)
++ while (rem >= sizeof(*cc)) {
++ doff = le16_to_cpu(cc->DataOffset);
++ dlen = le32_to_cpu(cc->DataLength);
++ if (check_add_overflow(doff, dlen, &len) || len > rem)
++ return -EINVAL;
++
++ noff = le16_to_cpu(cc->NameOffset);
++ nlen = le16_to_cpu(cc->NameLength);
++ if (noff + nlen >= doff)
++ return -EINVAL;
++
++ name = (char *)cc + noff;
++ switch (nlen) {
++ case 4:
++ if (!strncmp(name, SMB2_CREATE_REQUEST_LEASE, 4)) {
++ *oplock = server->ops->parse_lease_buf(cc, epoch,
++ lease_key);
++ } else if (buf &&
++ !strncmp(name, SMB2_CREATE_QUERY_ON_DISK_ID, 4)) {
++ parse_query_id_ctxt(cc, buf);
++ }
++ break;
++ case 16:
++ if (posix && !memcmp(name, smb3_create_tag_posix, 16))
+ parse_posix_ctxt(cc, buf, posix);
++ break;
++ default:
++ cifs_dbg(FYI, "%s: unhandled context (nlen=%zu dlen=%zu)\n",
++ __func__, nlen, dlen);
++ if (IS_ENABLED(CONFIG_CIFS_DEBUG2))
++ cifs_dump_mem("context data: ", cc, dlen);
++ break;
+ }
+- /* else {
+- cifs_dbg(FYI, "Context not matched with len %d\n",
+- le16_to_cpu(cc->NameLength));
+- cifs_dump_mem("Cctxt name: ", name, 4);
+- } */
+
+- next = le32_to_cpu(cc->Next);
+- if (!next)
++ off = le32_to_cpu(cc->Next);
++ if (!off)
+ break;
+- remaining -= next;
+- cc = (struct create_context *)((char *)cc + next);
++ if (check_sub_overflow(rem, off, &rem))
++ return -EINVAL;
++ cc = (struct create_context *)((u8 *)cc + off);
+ }
+
+ if (rsp->OplockLevel != SMB2_OPLOCK_LEVEL_LEASE)
+ *oplock = rsp->OplockLevel;
+
+- return;
++ return 0;
+ }
+
+ static int
+@@ -3029,8 +3048,8 @@ SMB2_open(const unsigned int xid, struct
+ }
+
+
+- smb2_parse_contexts(server, rsp, &oparms->fid->epoch,
+- oparms->fid->lease_key, oplock, buf, posix);
++ rc = smb2_parse_contexts(server, &rsp_iov, &oparms->fid->epoch,
++ oparms->fid->lease_key, oplock, buf, posix);
+ creat_exit:
+ SMB2_open_free(&rqst);
+ free_rsp_buf(resp_buftype, rsp);
+--- a/fs/smb/client/smb2proto.h
++++ b/fs/smb/client/smb2proto.h
+@@ -251,11 +251,13 @@ extern int smb3_validate_negotiate(const
+
+ extern enum securityEnum smb2_select_sectype(struct TCP_Server_Info *,
+ enum securityEnum);
+-extern void smb2_parse_contexts(struct TCP_Server_Info *server,
+- struct smb2_create_rsp *rsp,
+- unsigned int *epoch, char *lease_key,
+- __u8 *oplock, struct smb2_file_all_info *buf,
+- struct create_posix_rsp *posix);
++int smb2_parse_contexts(struct TCP_Server_Info *server,
++ struct kvec *rsp_iov,
++ unsigned int *epoch,
++ char *lease_key, __u8 *oplock,
++ struct smb2_file_all_info *buf,
++ struct create_posix_rsp *posix);
++
+ extern int smb3_encryption_required(const struct cifs_tcon *tcon);
+ extern int smb2_validate_iov(unsigned int offset, unsigned int buffer_length,
+ struct kvec *iov, unsigned int min_buf_size);
--- /dev/null
+From c12296bbecc488623b7d1932080e394d08f3226b Mon Sep 17 00:00:00 2001
+From: Florent Revest <revest@chromium.org>
+Date: Wed, 6 Dec 2023 13:37:18 +0100
+Subject: team: Fix use-after-free when an option instance allocation fails
+
+From: Florent Revest <revest@chromium.org>
+
+commit c12296bbecc488623b7d1932080e394d08f3226b upstream.
+
+In __team_options_register, team_options are allocated and appended to
+the team's option_list.
+If one option instance allocation fails, the "inst_rollback" cleanup
+path frees the previously allocated options but doesn't remove them from
+the team's option_list.
+This leaves dangling pointers that can be dereferenced later by other
+parts of the team driver that iterate over options.
+
+This patch fixes the cleanup path to remove the dangling pointers from
+the list.
+
+As far as I can tell, this uaf doesn't have much security implications
+since it would be fairly hard to exploit (an attacker would need to make
+the allocation of that specific small object fail) but it's still nice
+to fix.
+
+Cc: stable@vger.kernel.org
+Fixes: 80f7c6683fe0 ("team: add support for per-port options")
+Signed-off-by: Florent Revest <revest@chromium.org>
+Reviewed-by: Jiri Pirko <jiri@nvidia.com>
+Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
+Link: https://lore.kernel.org/r/20231206123719.1963153-1-revest@chromium.org
+Signed-off-by: Jakub Kicinski <kuba@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/net/team/team.c | 4 +++-
+ 1 file changed, 3 insertions(+), 1 deletion(-)
+
+--- a/drivers/net/team/team.c
++++ b/drivers/net/team/team.c
+@@ -281,8 +281,10 @@ static int __team_options_register(struc
+ return 0;
+
+ inst_rollback:
+- for (i--; i >= 0; i--)
++ for (i--; i >= 0; i--) {
+ __team_option_inst_del_option(team, dst_opts[i]);
++ list_del(&dst_opts[i]->list);
++ }
+
+ i = option_count;
+ alloc_rollback: