From: Greg Kroah-Hartman Date: Mon, 18 Dec 2023 07:33:14 +0000 (+0100) Subject: 6.6-stable patches X-Git-Tag: v5.15.144~30 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=048e3bf93ee8ad8438e083e5e01e67084b0e7fdb;p=thirdparty%2Fkernel%2Fstable-queue.git 6.6-stable patches added patches: arm64-mm-always-make-sw-dirty-ptes-hw-dirty-in-pte_modify.patch btrfs-don-t-clear-qgroup-reserved-bit-in-release_folio.patch btrfs-fix-qgroup_free_reserved_data-int-overflow.patch btrfs-free-qgroup-reserve-when-ordered_ioerr-is-set.patch dmaengine-fsl-edma-fix-dma-channel-leak-in-edmav4.patch dmaengine-stm32-dma-avoid-bitfield-overflow-assertion.patch drm-amd-display-disable-psr-su-on-parade-0803-tcon-again.patch drm-amd-display-restore-guard-against-default-backlight-value-1-nit.patch drm-amdgpu-fix-tear-down-order-in-amdgpu_vm_pt_free.patch drm-amdgpu-sdma5.2-add-begin-end_use-ring-callbacks.patch drm-edid-also-call-add-modes-in-edid-connector-update-fallback.patch drm-i915-fix-adl-tiled-plane-stride-when-the-pot-stride-is-smaller-than-the-original.patch drm-i915-fix-intel_atomic_setup_scalers-plane_state-handling.patch drm-i915-fix-remapped-stride-with-ccs-on-adl.patch drm-mediatek-fix-access-violation-in-mtk_drm_crtc_dma_dev_get.patch kexec-drop-dependency-on-arch_supports_kexec-from-crash_dump.patch mm-mglru-fix-underprotected-page-cache.patch mm-mglru-reclaim-offlined-memcgs-harder.patch mm-mglru-respect-min_ttl_ms-with-memcgs.patch mm-mglru-try-to-stop-at-high-watermarks.patch mm-shmem-fix-race-in-shmem_undo_range-w-thp.patch revert-selftests-error-out-if-kernel-header-files-are-not-yet-built.patch smb-client-fix-null-deref-in-asn1_ber_decoder.patch smb-client-fix-oob-in-receive_encrypted_standard.patch smb-client-fix-oob-in-smb2_query_reparse_point.patch smb-client-fix-potential-oobs-in-smb2_parse_contexts.patch team-fix-use-after-free-when-an-option-instance-allocation-fails.patch --- diff --git a/queue-6.6/arm64-mm-always-make-sw-dirty-ptes-hw-dirty-in-pte_modify.patch b/queue-6.6/arm64-mm-always-make-sw-dirty-ptes-hw-dirty-in-pte_modify.patch new file mode 100644 index 00000000000..057119ddb60 --- /dev/null +++ b/queue-6.6/arm64-mm-always-make-sw-dirty-ptes-hw-dirty-in-pte_modify.patch @@ -0,0 +1,76 @@ +From 3c0696076aad60a2f04c019761921954579e1b0e Mon Sep 17 00:00:00 2001 +From: James Houghton +Date: Mon, 4 Dec 2023 17:26:46 +0000 +Subject: arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify + +From: James Houghton + +commit 3c0696076aad60a2f04c019761921954579e1b0e upstream. + +It is currently possible for a userspace application to enter an +infinite page fault loop when using HugeTLB pages implemented with +contiguous PTEs when HAFDBS is not available. This happens because: + +1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean + (PTE_DIRTY | PTE_RDONLY | PTE_WRITE). + +2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling + the memory access on a system without HAFDBS, we will get a page + fault. + +3. HugeTLB will check if it needs to update the dirty bits on the PTE. + For contiguous PTEs, it will check to see if the pgprot bits need + updating. In this case, HugeTLB wants to write a sequence of + sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about + to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()), + so it thinks no update is necessary. + +We can get the kernel to write a sw-dirty, hw-clean PTE with the +following steps (showing the relevant VMA flags and pgprot bits): + +i. Create a valid, writable contiguous PTE. + VMA vmflags: VM_SHARED | VM_READ | VM_WRITE + VMA pgprot bits: PTE_RDONLY | PTE_WRITE + PTE pgprot bits: PTE_DIRTY | PTE_WRITE + +ii. mprotect the VMA to PROT_NONE. + VMA vmflags: VM_SHARED + VMA pgprot bits: PTE_RDONLY + PTE pgprot bits: PTE_DIRTY | PTE_RDONLY + +iii. mprotect the VMA back to PROT_READ | PROT_WRITE. + VMA vmflags: VM_SHARED | VM_READ | VM_WRITE + VMA pgprot bits: PTE_RDONLY | PTE_WRITE + PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY + +Make it impossible to create a writeable sw-dirty, hw-clean PTE with +pte_modify(). Such a PTE should be impossible to create, and there may +be places that assume that pte_dirty() implies pte_hw_dirty(). + +Signed-off-by: James Houghton +Fixes: 031e6e6b4e12 ("arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags") +Cc: +Acked-by: Will Deacon +Reviewed-by: Ryan Roberts +Link: https://lore.kernel.org/r/20231204172646.2541916-3-jthoughton@google.com +Signed-off-by: Catalin Marinas +Signed-off-by: Greg Kroah-Hartman +--- + arch/arm64/include/asm/pgtable.h | 6 ++++++ + 1 file changed, 6 insertions(+) + +--- a/arch/arm64/include/asm/pgtable.h ++++ b/arch/arm64/include/asm/pgtable.h +@@ -826,6 +826,12 @@ static inline pte_t pte_modify(pte_t pte + pte = set_pte_bit(pte, __pgprot(PTE_DIRTY)); + + pte_val(pte) = (pte_val(pte) & ~mask) | (pgprot_val(newprot) & mask); ++ /* ++ * If we end up clearing hw dirtiness for a sw-dirty PTE, set hardware ++ * dirtiness again. ++ */ ++ if (pte_sw_dirty(pte)) ++ pte = pte_mkdirty(pte); + return pte; + } + diff --git a/queue-6.6/btrfs-don-t-clear-qgroup-reserved-bit-in-release_folio.patch b/queue-6.6/btrfs-don-t-clear-qgroup-reserved-bit-in-release_folio.patch new file mode 100644 index 00000000000..d5827413a5e --- /dev/null +++ b/queue-6.6/btrfs-don-t-clear-qgroup-reserved-bit-in-release_folio.patch @@ -0,0 +1,38 @@ +From a86805504b88f636a6458520d85afdf0634e3c6b Mon Sep 17 00:00:00 2001 +From: Boris Burkov +Date: Fri, 1 Dec 2023 13:00:12 -0800 +Subject: btrfs: don't clear qgroup reserved bit in release_folio + +From: Boris Burkov + +commit a86805504b88f636a6458520d85afdf0634e3c6b upstream. + +The EXTENT_QGROUP_RESERVED bit is used to "lock" regions of the file for +duplicate reservations. That is two writes to that range in one +transaction shouldn't create two reservations, as the reservation will +only be freed once when the write finally goes down. Therefore, it is +never OK to clear that bit without freeing the associated qgroup +reserve. At this point, we don't want to be freeing the reserve, so mask +off the bit. + +CC: stable@vger.kernel.org # 5.15+ +Reviewed-by: Qu Wenruo +Signed-off-by: Boris Burkov +Signed-off-by: David Sterba +Signed-off-by: Greg Kroah-Hartman +--- + fs/btrfs/extent_io.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +--- a/fs/btrfs/extent_io.c ++++ b/fs/btrfs/extent_io.c +@@ -2303,7 +2303,8 @@ static int try_release_extent_state(stru + ret = 0; + } else { + u32 clear_bits = ~(EXTENT_LOCKED | EXTENT_NODATASUM | +- EXTENT_DELALLOC_NEW | EXTENT_CTLBITS); ++ EXTENT_DELALLOC_NEW | EXTENT_CTLBITS | ++ EXTENT_QGROUP_RESERVED); + + /* + * At this point we can safely clear everything except the diff --git a/queue-6.6/btrfs-fix-qgroup_free_reserved_data-int-overflow.patch b/queue-6.6/btrfs-fix-qgroup_free_reserved_data-int-overflow.patch new file mode 100644 index 00000000000..a4605f3db51 --- /dev/null +++ b/queue-6.6/btrfs-fix-qgroup_free_reserved_data-int-overflow.patch @@ -0,0 +1,254 @@ +From 9e65bfca24cf1d77e4a5c7a170db5867377b3fe7 Mon Sep 17 00:00:00 2001 +From: Boris Burkov +Date: Fri, 1 Dec 2023 13:00:10 -0800 +Subject: btrfs: fix qgroup_free_reserved_data int overflow + +From: Boris Burkov + +commit 9e65bfca24cf1d77e4a5c7a170db5867377b3fe7 upstream. + +The reserved data counter and input parameter is a u64, but we +inadvertently accumulate it in an int. Overflowing that int results in +freeing the wrong amount of data and breaking reserve accounting. + +Unfortunately, this overflow rot spreads from there, as the qgroup +release/free functions rely on returning an int to take advantage of +negative values for error codes. + +Therefore, the full fix is to return the "released" or "freed" amount by +a u64 argument and to return 0 or negative error code via the return +value. + +Most of the call sites simply ignore the return value, though some +of them handle the error and count the returned bytes. Change all of +them accordingly. + +CC: stable@vger.kernel.org # 6.1+ +Reviewed-by: Qu Wenruo +Signed-off-by: Boris Burkov +Reviewed-by: David Sterba +Signed-off-by: David Sterba +Signed-off-by: Greg Kroah-Hartman +--- + fs/btrfs/delalloc-space.c | 2 +- + fs/btrfs/file.c | 2 +- + fs/btrfs/inode.c | 16 ++++++++-------- + fs/btrfs/ordered-data.c | 7 ++++--- + fs/btrfs/qgroup.c | 25 +++++++++++++++---------- + fs/btrfs/qgroup.h | 4 ++-- + 6 files changed, 31 insertions(+), 25 deletions(-) + +--- a/fs/btrfs/delalloc-space.c ++++ b/fs/btrfs/delalloc-space.c +@@ -199,7 +199,7 @@ void btrfs_free_reserved_data_space(stru + start = round_down(start, fs_info->sectorsize); + + btrfs_free_reserved_data_space_noquota(fs_info, len); +- btrfs_qgroup_free_data(inode, reserved, start, len); ++ btrfs_qgroup_free_data(inode, reserved, start, len, NULL); + } + + /* +--- a/fs/btrfs/file.c ++++ b/fs/btrfs/file.c +@@ -3187,7 +3187,7 @@ static long btrfs_fallocate(struct file + qgroup_reserved -= range->len; + } else if (qgroup_reserved > 0) { + btrfs_qgroup_free_data(BTRFS_I(inode), data_reserved, +- range->start, range->len); ++ range->start, range->len, NULL); + qgroup_reserved -= range->len; + } + list_del(&range->list); +--- a/fs/btrfs/inode.c ++++ b/fs/btrfs/inode.c +@@ -687,7 +687,7 @@ out: + * And at reserve time, it's always aligned to page size, so + * just free one page here. + */ +- btrfs_qgroup_free_data(inode, NULL, 0, PAGE_SIZE); ++ btrfs_qgroup_free_data(inode, NULL, 0, PAGE_SIZE, NULL); + btrfs_free_path(path); + btrfs_end_transaction(trans); + return ret; +@@ -5129,7 +5129,7 @@ static void evict_inode_truncate_pages(s + */ + if (state_flags & EXTENT_DELALLOC) + btrfs_qgroup_free_data(BTRFS_I(inode), NULL, start, +- end - start + 1); ++ end - start + 1, NULL); + + clear_extent_bit(io_tree, start, end, + EXTENT_CLEAR_ALL_BITS | EXTENT_DO_ACCOUNTING, +@@ -8051,7 +8051,7 @@ next: + * reserved data space. + * Since the IO will never happen for this page. + */ +- btrfs_qgroup_free_data(inode, NULL, cur, range_end + 1 - cur); ++ btrfs_qgroup_free_data(inode, NULL, cur, range_end + 1 - cur, NULL); + if (!inode_evicting) { + clear_extent_bit(tree, cur, range_end, EXTENT_LOCKED | + EXTENT_DELALLOC | EXTENT_UPTODATE | +@@ -9481,7 +9481,7 @@ static struct btrfs_trans_handle *insert + struct btrfs_path *path; + u64 start = ins->objectid; + u64 len = ins->offset; +- int qgroup_released; ++ u64 qgroup_released = 0; + int ret; + + memset(&stack_fi, 0, sizeof(stack_fi)); +@@ -9494,9 +9494,9 @@ static struct btrfs_trans_handle *insert + btrfs_set_stack_file_extent_compression(&stack_fi, BTRFS_COMPRESS_NONE); + /* Encryption and other encoding is reserved and all 0 */ + +- qgroup_released = btrfs_qgroup_release_data(inode, file_offset, len); +- if (qgroup_released < 0) +- return ERR_PTR(qgroup_released); ++ ret = btrfs_qgroup_release_data(inode, file_offset, len, &qgroup_released); ++ if (ret < 0) ++ return ERR_PTR(ret); + + if (trans) { + ret = insert_reserved_file_extent(trans, inode, +@@ -10391,7 +10391,7 @@ out_delalloc_release: + btrfs_delalloc_release_metadata(inode, disk_num_bytes, ret < 0); + out_qgroup_free_data: + if (ret < 0) +- btrfs_qgroup_free_data(inode, data_reserved, start, num_bytes); ++ btrfs_qgroup_free_data(inode, data_reserved, start, num_bytes, NULL); + out_free_data_space: + /* + * If btrfs_reserve_extent() succeeded, then we already decremented +--- a/fs/btrfs/ordered-data.c ++++ b/fs/btrfs/ordered-data.c +@@ -153,11 +153,12 @@ static struct btrfs_ordered_extent *allo + { + struct btrfs_ordered_extent *entry; + int ret; ++ u64 qgroup_rsv = 0; + + if (flags & + ((1 << BTRFS_ORDERED_NOCOW) | (1 << BTRFS_ORDERED_PREALLOC))) { + /* For nocow write, we can release the qgroup rsv right now */ +- ret = btrfs_qgroup_free_data(inode, NULL, file_offset, num_bytes); ++ ret = btrfs_qgroup_free_data(inode, NULL, file_offset, num_bytes, &qgroup_rsv); + if (ret < 0) + return ERR_PTR(ret); + } else { +@@ -165,7 +166,7 @@ static struct btrfs_ordered_extent *allo + * The ordered extent has reserved qgroup space, release now + * and pass the reserved number for qgroup_record to free. + */ +- ret = btrfs_qgroup_release_data(inode, file_offset, num_bytes); ++ ret = btrfs_qgroup_release_data(inode, file_offset, num_bytes, &qgroup_rsv); + if (ret < 0) + return ERR_PTR(ret); + } +@@ -183,7 +184,7 @@ static struct btrfs_ordered_extent *allo + entry->inode = igrab(&inode->vfs_inode); + entry->compress_type = compress_type; + entry->truncated_len = (u64)-1; +- entry->qgroup_rsv = ret; ++ entry->qgroup_rsv = qgroup_rsv; + entry->flags = flags; + refcount_set(&entry->refs, 1); + init_waitqueue_head(&entry->wait); +--- a/fs/btrfs/qgroup.c ++++ b/fs/btrfs/qgroup.c +@@ -3855,13 +3855,14 @@ int btrfs_qgroup_reserve_data(struct btr + + /* Free ranges specified by @reserved, normally in error path */ + static int qgroup_free_reserved_data(struct btrfs_inode *inode, +- struct extent_changeset *reserved, u64 start, u64 len) ++ struct extent_changeset *reserved, ++ u64 start, u64 len, u64 *freed_ret) + { + struct btrfs_root *root = inode->root; + struct ulist_node *unode; + struct ulist_iterator uiter; + struct extent_changeset changeset; +- int freed = 0; ++ u64 freed = 0; + int ret; + + extent_changeset_init(&changeset); +@@ -3902,7 +3903,9 @@ static int qgroup_free_reserved_data(str + } + btrfs_qgroup_free_refroot(root->fs_info, root->root_key.objectid, freed, + BTRFS_QGROUP_RSV_DATA); +- ret = freed; ++ if (freed_ret) ++ *freed_ret = freed; ++ ret = 0; + out: + extent_changeset_release(&changeset); + return ret; +@@ -3910,7 +3913,7 @@ out: + + static int __btrfs_qgroup_release_data(struct btrfs_inode *inode, + struct extent_changeset *reserved, u64 start, u64 len, +- int free) ++ u64 *released, int free) + { + struct extent_changeset changeset; + int trace_op = QGROUP_RELEASE; +@@ -3922,7 +3925,7 @@ static int __btrfs_qgroup_release_data(s + /* In release case, we shouldn't have @reserved */ + WARN_ON(!free && reserved); + if (free && reserved) +- return qgroup_free_reserved_data(inode, reserved, start, len); ++ return qgroup_free_reserved_data(inode, reserved, start, len, released); + extent_changeset_init(&changeset); + ret = clear_record_extent_bits(&inode->io_tree, start, start + len -1, + EXTENT_QGROUP_RESERVED, &changeset); +@@ -3937,7 +3940,8 @@ static int __btrfs_qgroup_release_data(s + btrfs_qgroup_free_refroot(inode->root->fs_info, + inode->root->root_key.objectid, + changeset.bytes_changed, BTRFS_QGROUP_RSV_DATA); +- ret = changeset.bytes_changed; ++ if (released) ++ *released = changeset.bytes_changed; + out: + extent_changeset_release(&changeset); + return ret; +@@ -3956,9 +3960,10 @@ out: + * NOTE: This function may sleep for memory allocation. + */ + int btrfs_qgroup_free_data(struct btrfs_inode *inode, +- struct extent_changeset *reserved, u64 start, u64 len) ++ struct extent_changeset *reserved, ++ u64 start, u64 len, u64 *freed) + { +- return __btrfs_qgroup_release_data(inode, reserved, start, len, 1); ++ return __btrfs_qgroup_release_data(inode, reserved, start, len, freed, 1); + } + + /* +@@ -3976,9 +3981,9 @@ int btrfs_qgroup_free_data(struct btrfs_ + * + * NOTE: This function may sleep for memory allocation. + */ +-int btrfs_qgroup_release_data(struct btrfs_inode *inode, u64 start, u64 len) ++int btrfs_qgroup_release_data(struct btrfs_inode *inode, u64 start, u64 len, u64 *released) + { +- return __btrfs_qgroup_release_data(inode, NULL, start, len, 0); ++ return __btrfs_qgroup_release_data(inode, NULL, start, len, released, 0); + } + + static void add_root_meta_rsv(struct btrfs_root *root, int num_bytes, +--- a/fs/btrfs/qgroup.h ++++ b/fs/btrfs/qgroup.h +@@ -363,10 +363,10 @@ int btrfs_verify_qgroup_counts(struct bt + /* New io_tree based accurate qgroup reserve API */ + int btrfs_qgroup_reserve_data(struct btrfs_inode *inode, + struct extent_changeset **reserved, u64 start, u64 len); +-int btrfs_qgroup_release_data(struct btrfs_inode *inode, u64 start, u64 len); ++int btrfs_qgroup_release_data(struct btrfs_inode *inode, u64 start, u64 len, u64 *released); + int btrfs_qgroup_free_data(struct btrfs_inode *inode, + struct extent_changeset *reserved, u64 start, +- u64 len); ++ u64 len, u64 *freed); + int btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes, + enum btrfs_qgroup_rsv_type type, bool enforce); + int __btrfs_qgroup_reserve_meta(struct btrfs_root *root, int num_bytes, diff --git a/queue-6.6/btrfs-free-qgroup-reserve-when-ordered_ioerr-is-set.patch b/queue-6.6/btrfs-free-qgroup-reserve-when-ordered_ioerr-is-set.patch new file mode 100644 index 00000000000..191d5888c0a --- /dev/null +++ b/queue-6.6/btrfs-free-qgroup-reserve-when-ordered_ioerr-is-set.patch @@ -0,0 +1,38 @@ +From f63e1164b90b385cd832ff0fdfcfa76c3cc15436 Mon Sep 17 00:00:00 2001 +From: Boris Burkov +Date: Fri, 1 Dec 2023 13:00:09 -0800 +Subject: btrfs: free qgroup reserve when ORDERED_IOERR is set + +From: Boris Burkov + +commit f63e1164b90b385cd832ff0fdfcfa76c3cc15436 upstream. + +An ordered extent completing is a critical moment in qgroup reserve +handling, as the ownership of the reservation is handed off from the +ordered extent to the delayed ref. In the happy path we release (unlock) +but do not free (decrement counter) the reservation, and the delayed ref +drives the free. However, on an error, we don't create a delayed ref, +since there is no ref to add. Therefore, free on the error path. + +CC: stable@vger.kernel.org # 6.1+ +Reviewed-by: Qu Wenruo +Signed-off-by: Boris Burkov +Signed-off-by: David Sterba +Signed-off-by: Greg Kroah-Hartman +--- + fs/btrfs/ordered-data.c | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +--- a/fs/btrfs/ordered-data.c ++++ b/fs/btrfs/ordered-data.c +@@ -603,7 +603,9 @@ void btrfs_remove_ordered_extent(struct + release = entry->disk_num_bytes; + else + release = entry->num_bytes; +- btrfs_delalloc_release_metadata(btrfs_inode, release, false); ++ btrfs_delalloc_release_metadata(btrfs_inode, release, ++ test_bit(BTRFS_ORDERED_IOERR, ++ &entry->flags)); + } + + percpu_counter_add_batch(&fs_info->ordered_bytes, -entry->num_bytes, diff --git a/queue-6.6/dmaengine-fsl-edma-fix-dma-channel-leak-in-edmav4.patch b/queue-6.6/dmaengine-fsl-edma-fix-dma-channel-leak-in-edmav4.patch new file mode 100644 index 00000000000..dad319bbb7f --- /dev/null +++ b/queue-6.6/dmaengine-fsl-edma-fix-dma-channel-leak-in-edmav4.patch @@ -0,0 +1,40 @@ +From 4ee632c82d2dbb9e2dcc816890ef182a151cbd99 Mon Sep 17 00:00:00 2001 +From: Frank Li +Date: Mon, 27 Nov 2023 16:43:25 -0500 +Subject: dmaengine: fsl-edma: fix DMA channel leak in eDMAv4 + +From: Frank Li + +commit 4ee632c82d2dbb9e2dcc816890ef182a151cbd99 upstream. + +Allocate channel count consistently increases due to a missing source ID +(srcid) cleanup in the fsl_edma_free_chan_resources() function at imx93 +eDMAv4. + +Reset 'srcid' at fsl_edma_free_chan_resources(). + +Cc: stable@vger.kernel.org +Fixes: 72f5801a4e2b ("dmaengine: fsl-edma: integrate v3 support") +Signed-off-by: Frank Li +Link: https://lore.kernel.org/r/20231127214325.2477247-1-Frank.Li@nxp.com +Signed-off-by: Vinod Koul +Signed-off-by: Greg Kroah-Hartman +--- + drivers/dma/fsl-edma-common.c | 1 + + 1 file changed, 1 insertion(+) + +diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c +index 6a3abe5b1790..b53f46245c37 100644 +--- a/drivers/dma/fsl-edma-common.c ++++ b/drivers/dma/fsl-edma-common.c +@@ -828,6 +828,7 @@ void fsl_edma_free_chan_resources(struct dma_chan *chan) + dma_pool_destroy(fsl_chan->tcd_pool); + fsl_chan->tcd_pool = NULL; + fsl_chan->is_sw = false; ++ fsl_chan->srcid = 0; + } + + void fsl_edma_cleanup_vchan(struct dma_device *dmadev) +-- +2.43.0 + diff --git a/queue-6.6/dmaengine-stm32-dma-avoid-bitfield-overflow-assertion.patch b/queue-6.6/dmaengine-stm32-dma-avoid-bitfield-overflow-assertion.patch new file mode 100644 index 00000000000..6fa28256ca5 --- /dev/null +++ b/queue-6.6/dmaengine-stm32-dma-avoid-bitfield-overflow-assertion.patch @@ -0,0 +1,78 @@ +From 54bed6bafa0f38daf9697af50e3aff5ff1354fe1 Mon Sep 17 00:00:00 2001 +From: Amelie Delaunay +Date: Mon, 6 Nov 2023 14:48:32 +0100 +Subject: dmaengine: stm32-dma: avoid bitfield overflow assertion + +From: Amelie Delaunay + +commit 54bed6bafa0f38daf9697af50e3aff5ff1354fe1 upstream. + +stm32_dma_get_burst() returns a negative error for invalid input, which +gets turned into a large u32 value in stm32_dma_prep_dma_memcpy() that +in turn triggers an assertion because it does not fit into a two-bit field: +drivers/dma/stm32-dma.c: In function 'stm32_dma_prep_dma_memcpy': +include/linux/compiler_types.h:354:38: error: call to '__compiletime_assert_282' declared with attribute error: FIELD_PREP: value too large for the field + _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) + ^ + include/linux/compiler_types.h:335:4: note: in definition of macro '__compiletime_assert' + prefix ## suffix(); \ + ^~~~~~ + include/linux/compiler_types.h:354:2: note: in expansion of macro '_compiletime_assert' + _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) + ^~~~~~~~~~~~~~~~~~~ + include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert' + #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) + ^~~~~~~~~~~~~~~~~~ + include/linux/bitfield.h:68:3: note: in expansion of macro 'BUILD_BUG_ON_MSG' + BUILD_BUG_ON_MSG(__builtin_constant_p(_val) ? \ + ^~~~~~~~~~~~~~~~ + include/linux/bitfield.h:114:3: note: in expansion of macro '__BF_FIELD_CHECK' + __BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_PREP: "); \ + ^~~~~~~~~~~~~~~~ + drivers/dma/stm32-dma.c:1237:4: note: in expansion of macro 'FIELD_PREP' + FIELD_PREP(STM32_DMA_SCR_PBURST_MASK, dma_burst) | + ^~~~~~~~~~ + +As an easy workaround, assume the error can happen, so try to handle this +by failing stm32_dma_prep_dma_memcpy() before the assertion. It replicates +what is done in stm32_dma_set_xfer_param() where stm32_dma_get_burst() is +also used. + +Fixes: 1c32d6c37cc2 ("dmaengine: stm32-dma: use bitfield helpers") +Fixes: a2b6103b7a8a ("dmaengine: stm32-dma: Improve memory burst management") +Signed-off-by: Arnd Bergmann +Signed-off-by: Amelie Delaunay +Cc: stable@vger.kernel.org +Reported-by: kernel test robot +Closes: https://lore.kernel.org/oe-kbuild-all/202311060135.Q9eMnpCL-lkp@intel.com/ +Link: https://lore.kernel.org/r/20231106134832.1470305-1-amelie.delaunay@foss.st.com +Signed-off-by: Vinod Koul +Signed-off-by: Greg Kroah-Hartman +--- + drivers/dma/stm32-dma.c | 8 ++++++-- + 1 file changed, 6 insertions(+), 2 deletions(-) + +--- a/drivers/dma/stm32-dma.c ++++ b/drivers/dma/stm32-dma.c +@@ -1249,8 +1249,8 @@ static struct dma_async_tx_descriptor *s + enum dma_slave_buswidth max_width; + struct stm32_dma_desc *desc; + size_t xfer_count, offset; +- u32 num_sgs, best_burst, dma_burst, threshold; +- int i; ++ u32 num_sgs, best_burst, threshold; ++ int dma_burst, i; + + num_sgs = DIV_ROUND_UP(len, STM32_DMA_ALIGNED_MAX_DATA_ITEMS); + desc = kzalloc(struct_size(desc, sg_req, num_sgs), GFP_NOWAIT); +@@ -1268,6 +1268,10 @@ static struct dma_async_tx_descriptor *s + best_burst = stm32_dma_get_best_burst(len, STM32_DMA_MAX_BURST, + threshold, max_width); + dma_burst = stm32_dma_get_burst(chan, best_burst); ++ if (dma_burst < 0) { ++ kfree(desc); ++ return NULL; ++ } + + stm32_dma_clear_reg(&desc->sg_req[i].chan_reg); + desc->sg_req[i].chan_reg.dma_scr = diff --git a/queue-6.6/drm-amd-display-disable-psr-su-on-parade-0803-tcon-again.patch b/queue-6.6/drm-amd-display-disable-psr-su-on-parade-0803-tcon-again.patch new file mode 100644 index 00000000000..a7a9716bb61 --- /dev/null +++ b/queue-6.6/drm-amd-display-disable-psr-su-on-parade-0803-tcon-again.patch @@ -0,0 +1,48 @@ +From e7ab758741672acb21c5d841a9f0309d30e48a06 Mon Sep 17 00:00:00 2001 +From: Mario Limonciello +Date: Mon, 19 Jun 2023 15:04:24 -0500 +Subject: drm/amd/display: Disable PSR-SU on Parade 0803 TCON again + +From: Mario Limonciello + +commit e7ab758741672acb21c5d841a9f0309d30e48a06 upstream. + +When screen brightness is rapidly changed and PSR-SU is enabled the +display hangs on panels with this TCON even on the latest DCN 3.1.4 +microcode (0x8002a81 at this time). + +This was disabled previously as commit 072030b17830 ("drm/amd: Disable +PSR-SU on Parade 0803 TCON") but reverted as commit 1e66a17ce546 ("Revert +"drm/amd: Disable PSR-SU on Parade 0803 TCON"") in favor of testing for +a new enough microcode (commit cd2e31a9ab93 ("drm/amd/display: Set minimum +requirement for using PSR-SU on Phoenix")). + +As hangs are still happening specifically with this TCON, disable PSR-SU +again for it until it can be root caused. + +Cc: stable@vger.kernel.org +Cc: aaron.ma@canonical.com +Cc: binli@gnome.org +Cc: Marc Rossi +Cc: Hamza Mahfooz +Signed-off-by: Mario Limonciello +Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2046131 +Acked-by: Alex Deucher +Reviewed-by: Harry Wentland +Signed-off-by: Alex Deucher +Signed-off-by: Greg Kroah-Hartman +--- + drivers/gpu/drm/amd/display/modules/power/power_helpers.c | 2 ++ + 1 file changed, 2 insertions(+) + +--- a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c ++++ b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c +@@ -839,6 +839,8 @@ bool is_psr_su_specific_panel(struct dc_ + ((dpcd_caps->sink_dev_id_str[1] == 0x08 && dpcd_caps->sink_dev_id_str[0] == 0x08) || + (dpcd_caps->sink_dev_id_str[1] == 0x08 && dpcd_caps->sink_dev_id_str[0] == 0x07))) + isPSRSUSupported = false; ++ else if (dpcd_caps->sink_dev_id_str[1] == 0x08 && dpcd_caps->sink_dev_id_str[0] == 0x03) ++ isPSRSUSupported = false; + else if (dpcd_caps->psr_info.force_psrsu_cap == 0x1) + isPSRSUSupported = true; + } diff --git a/queue-6.6/drm-amd-display-restore-guard-against-default-backlight-value-1-nit.patch b/queue-6.6/drm-amd-display-restore-guard-against-default-backlight-value-1-nit.patch new file mode 100644 index 00000000000..d527a9323ee --- /dev/null +++ b/queue-6.6/drm-amd-display-restore-guard-against-default-backlight-value-1-nit.patch @@ -0,0 +1,47 @@ +From b96ab339ee50470d13a1faa6ad94d2218a7cd49f Mon Sep 17 00:00:00 2001 +From: Mario Limonciello +Date: Wed, 6 Dec 2023 12:08:26 -0600 +Subject: drm/amd/display: Restore guard against default backlight value < 1 nit + +From: Mario Limonciello + +commit b96ab339ee50470d13a1faa6ad94d2218a7cd49f upstream. + +Mark reports that brightness is not restored after Xorg dpms screen blank. + +This behavior was introduced by commit d9e865826c20 ("drm/amd/display: +Simplify brightness initialization") which dropped the cached backlight +value in display code, but also removed code for when the default value +read back was less than 1 nit. + +Restore this code so that the backlight brightness is restored to the +correct default value in this circumstance. + +Reported-by: Mark Herbert +Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3031 +Cc: stable@vger.kernel.org +Cc: Camille Cho +Cc: Krunoslav Kovac +Cc: Hamza Mahfooz +Fixes: d9e865826c20 ("drm/amd/display: Simplify brightness initialization") +Acked-by: Alex Deucher +Signed-off-by: Mario Limonciello +Signed-off-by: Alex Deucher +Signed-off-by: Greg Kroah-Hartman +--- + drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c ++++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c +@@ -280,8 +280,8 @@ bool set_default_brightness_aux(struct d + if (link && link->dpcd_sink_ext_caps.bits.oled == 1) { + if (!read_default_bl_aux(link, &default_backlight)) + default_backlight = 150000; +- // if > 5000, it might be wrong readback +- if (default_backlight > 5000000) ++ // if < 1 nits or > 5000, it might be wrong readback ++ if (default_backlight < 1000 || default_backlight > 5000000) + default_backlight = 150000; + + return edp_set_backlight_level_nits(link, true, diff --git a/queue-6.6/drm-amdgpu-fix-tear-down-order-in-amdgpu_vm_pt_free.patch b/queue-6.6/drm-amdgpu-fix-tear-down-order-in-amdgpu_vm_pt_free.patch new file mode 100644 index 00000000000..f23473a37c5 --- /dev/null +++ b/queue-6.6/drm-amdgpu-fix-tear-down-order-in-amdgpu_vm_pt_free.patch @@ -0,0 +1,47 @@ +From ceb9a321e7639700844aa3bf234a4e0884f13b77 Mon Sep 17 00:00:00 2001 +From: =?UTF-8?q?Christian=20K=C3=B6nig?= +Date: Fri, 8 Dec 2023 13:43:09 +0100 +Subject: drm/amdgpu: fix tear down order in amdgpu_vm_pt_free +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Christian König + +commit ceb9a321e7639700844aa3bf234a4e0884f13b77 upstream. + +When freeing PD/PT with shadows it can happen that the shadow +destruction races with detaching the PD/PT from the VM causing a NULL +pointer dereference in the invalidation code. + +Fix this by detaching the the PD/PT from the VM first and then +freeing the shadow instead. + +Signed-off-by: Christian König +Fixes: https://gitlab.freedesktop.org/drm/amd/-/issues/2867 +Cc: +Reviewed-by: Alex Deucher +Signed-off-by: Alex Deucher +Signed-off-by: Greg Kroah-Hartman +--- + drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c ++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c +@@ -642,13 +642,14 @@ static void amdgpu_vm_pt_free(struct amd + + if (!entry->bo) + return; ++ ++ entry->bo->vm_bo = NULL; + shadow = amdgpu_bo_shadowed(entry->bo); + if (shadow) { + ttm_bo_set_bulk_move(&shadow->tbo, NULL); + amdgpu_bo_unref(&shadow); + } + ttm_bo_set_bulk_move(&entry->bo->tbo, NULL); +- entry->bo->vm_bo = NULL; + + spin_lock(&entry->vm->status_lock); + list_del(&entry->vm_status); diff --git a/queue-6.6/drm-amdgpu-sdma5.2-add-begin-end_use-ring-callbacks.patch b/queue-6.6/drm-amdgpu-sdma5.2-add-begin-end_use-ring-callbacks.patch new file mode 100644 index 00000000000..ef8ba313406 --- /dev/null +++ b/queue-6.6/drm-amdgpu-sdma5.2-add-begin-end_use-ring-callbacks.patch @@ -0,0 +1,81 @@ +From ab4750332dbe535243def5dcebc24ca00c1f98ac Mon Sep 17 00:00:00 2001 +From: Alex Deucher +Date: Thu, 7 Dec 2023 10:14:41 -0500 +Subject: drm/amdgpu/sdma5.2: add begin/end_use ring callbacks +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Alex Deucher + +commit ab4750332dbe535243def5dcebc24ca00c1f98ac upstream. + +Add begin/end_use ring callbacks to disallow GFXOFF when +SDMA work is submitted and allow it again afterward. + +This should avoid corner cases where GFXOFF is erroneously +entered when SDMA is still active. For now just allow/disallow +GFXOFF in the begin and end helpers until we root cause the +issue. This should not impact power as SDMA usage is pretty +minimal and GFXOSS should not be active when SDMA is active +anyway, this just makes it explicit. + +v2: move everything into sdma5.2 code. No reason for this +to be generic at this point. +v3: Add comments in new code + +Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2220 +Reviewed-by: Mario Limonciello (v1) +Tested-by: Mario Limonciello (v1) +Reviewed-by: Christian König +Signed-off-by: Alex Deucher +Cc: stable@vger.kernel.org # 5.15+ +Signed-off-by: Greg Kroah-Hartman +--- + drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 28 ++++++++++++++++++++++++++++ + 1 file changed, 28 insertions(+) + +--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c ++++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c +@@ -1651,6 +1651,32 @@ static void sdma_v5_2_get_clockgating_st + *flags |= AMD_CG_SUPPORT_SDMA_LS; + } + ++static void sdma_v5_2_ring_begin_use(struct amdgpu_ring *ring) ++{ ++ struct amdgpu_device *adev = ring->adev; ++ ++ /* SDMA 5.2.3 (RMB) FW doesn't seem to properly ++ * disallow GFXOFF in some cases leading to ++ * hangs in SDMA. Disallow GFXOFF while SDMA is active. ++ * We can probably just limit this to 5.2.3, ++ * but it shouldn't hurt for other parts since ++ * this GFXOFF will be disallowed anyway when SDMA is ++ * active, this just makes it explicit. ++ */ ++ amdgpu_gfx_off_ctrl(adev, false); ++} ++ ++static void sdma_v5_2_ring_end_use(struct amdgpu_ring *ring) ++{ ++ struct amdgpu_device *adev = ring->adev; ++ ++ /* SDMA 5.2.3 (RMB) FW doesn't seem to properly ++ * disallow GFXOFF in some cases leading to ++ * hangs in SDMA. Allow GFXOFF when SDMA is complete. ++ */ ++ amdgpu_gfx_off_ctrl(adev, true); ++} ++ + const struct amd_ip_funcs sdma_v5_2_ip_funcs = { + .name = "sdma_v5_2", + .early_init = sdma_v5_2_early_init, +@@ -1698,6 +1724,8 @@ static const struct amdgpu_ring_funcs sd + .test_ib = sdma_v5_2_ring_test_ib, + .insert_nop = sdma_v5_2_ring_insert_nop, + .pad_ib = sdma_v5_2_ring_pad_ib, ++ .begin_use = sdma_v5_2_ring_begin_use, ++ .end_use = sdma_v5_2_ring_end_use, + .emit_wreg = sdma_v5_2_ring_emit_wreg, + .emit_reg_wait = sdma_v5_2_ring_emit_reg_wait, + .emit_reg_write_reg_wait = sdma_v5_2_ring_emit_reg_write_reg_wait, diff --git a/queue-6.6/drm-edid-also-call-add-modes-in-edid-connector-update-fallback.patch b/queue-6.6/drm-edid-also-call-add-modes-in-edid-connector-update-fallback.patch new file mode 100644 index 00000000000..4b6fa5d50c1 --- /dev/null +++ b/queue-6.6/drm-edid-also-call-add-modes-in-edid-connector-update-fallback.patch @@ -0,0 +1,40 @@ +From 759f14e20891de72e676d9d738eb2c573aa15f52 Mon Sep 17 00:00:00 2001 +From: Jani Nikula +Date: Thu, 7 Dec 2023 11:38:21 +0200 +Subject: drm/edid: also call add modes in EDID connector update fallback +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Jani Nikula + +commit 759f14e20891de72e676d9d738eb2c573aa15f52 upstream. + +When the separate add modes call was added back in commit c533b5167c7e +("drm/edid: add separate drm_edid_connector_add_modes()"), it failed to +address drm_edid_override_connector_update(). Also call add modes there. + +Reported-by: bbaa +Closes: https://lore.kernel.org/r/930E9B4C7D91FDFF+29b34d89-8658-4910-966a-c772f320ea03@bbaa.fun +Fixes: c533b5167c7e ("drm/edid: add separate drm_edid_connector_add_modes()") +Cc: # v6.3+ +Signed-off-by: Jani Nikula +Reviewed-by: Ville Syrjälä +Link: https://patchwork.freedesktop.org/patch/msgid/20231207093821.2654267-1-jani.nikula@intel.com +Signed-off-by: Greg Kroah-Hartman +--- + drivers/gpu/drm/drm_edid.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +--- a/drivers/gpu/drm/drm_edid.c ++++ b/drivers/gpu/drm/drm_edid.c +@@ -2308,7 +2308,8 @@ int drm_edid_override_connector_update(s + + override = drm_edid_override_get(connector); + if (override) { +- num_modes = drm_edid_connector_update(connector, override); ++ if (drm_edid_connector_update(connector, override) == 0) ++ num_modes = drm_edid_connector_add_modes(connector); + + drm_edid_free(override); + diff --git a/queue-6.6/drm-i915-fix-adl-tiled-plane-stride-when-the-pot-stride-is-smaller-than-the-original.patch b/queue-6.6/drm-i915-fix-adl-tiled-plane-stride-when-the-pot-stride-is-smaller-than-the-original.patch new file mode 100644 index 00000000000..ce02a3ac2e6 --- /dev/null +++ b/queue-6.6/drm-i915-fix-adl-tiled-plane-stride-when-the-pot-stride-is-smaller-than-the-original.patch @@ -0,0 +1,54 @@ +From 324b70e997aab0a7deab8cb90711faccda4e98c8 Mon Sep 17 00:00:00 2001 +From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= +Date: Mon, 4 Dec 2023 22:24:43 +0200 +Subject: drm/i915: Fix ADL+ tiled plane stride when the POT stride is smaller than the original +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Ville Syrjälä + +commit 324b70e997aab0a7deab8cb90711faccda4e98c8 upstream. + +plane_view_scanout_stride() currently assumes that we had to pad the +mapping stride with dummy pages in order to align it. But that is not +the case if the original fb stride exceeds the aligned stride used +to populate the remapped view, which is calculated from the user +specified framebuffer width rather than the user specified framebuffer +stride. + +Ignore the original fb stride in this case and just stick to the POT +aligned stride. Getting this wrong will cause the plane to fetch the +wrong data, and can lead to fault errors if the page tables at the +bogus location aren't even populated. + +TODO: figure out if this is OK for CCS, or if we should instead increase +the width of the view to cover the entire user specified fb stride +instead... + +Cc: Imre Deak +Cc: Juha-Pekka Heikkila +Signed-off-by: Ville Syrjälä +Link: https://patchwork.freedesktop.org/patch/msgid/20231204202443.31247-1-ville.syrjala@linux.intel.com +Reviewed-by: Imre Deak +Reviewed-by: Juha-Pekka Heikkila +(cherry picked from commit 01a39f1c4f1220a4e6a25729fae87ff5794cbc52) +Cc: stable@vger.kernel.org +Signed-off-by: Jani Nikula +Signed-off-by: Greg Kroah-Hartman +--- + drivers/gpu/drm/i915/display/intel_fb.c | 3 ++- + 1 file changed, 2 insertions(+), 1 deletion(-) + +--- a/drivers/gpu/drm/i915/display/intel_fb.c ++++ b/drivers/gpu/drm/i915/display/intel_fb.c +@@ -1370,7 +1370,8 @@ plane_view_scanout_stride(const struct i + struct drm_i915_private *i915 = to_i915(fb->base.dev); + unsigned int stride_tiles; + +- if (IS_ALDERLAKE_P(i915) || DISPLAY_VER(i915) >= 14) ++ if ((IS_ALDERLAKE_P(i915) || DISPLAY_VER(i915) >= 14) && ++ src_stride_tiles < dst_stride_tiles) + stride_tiles = src_stride_tiles; + else + stride_tiles = dst_stride_tiles; diff --git a/queue-6.6/drm-i915-fix-intel_atomic_setup_scalers-plane_state-handling.patch b/queue-6.6/drm-i915-fix-intel_atomic_setup_scalers-plane_state-handling.patch new file mode 100644 index 00000000000..97f7ebcdaa8 --- /dev/null +++ b/queue-6.6/drm-i915-fix-intel_atomic_setup_scalers-plane_state-handling.patch @@ -0,0 +1,57 @@ +From c3070f080f9ba18dea92eaa21730f7ab85b5c8f4 Mon Sep 17 00:00:00 2001 +From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= +Date: Thu, 7 Dec 2023 21:34:34 +0200 +Subject: drm/i915: Fix intel_atomic_setup_scalers() plane_state handling +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Ville Syrjälä + +commit c3070f080f9ba18dea92eaa21730f7ab85b5c8f4 upstream. + +Since the plane_state variable is declared outside the scaler_users +loop in intel_atomic_setup_scalers(), and it's never reset back to +NULL inside the loop we may end up calling intel_atomic_setup_scaler() +with a non-NULL plane state for the pipe scaling case. That is bad +because intel_atomic_setup_scaler() determines whether we are doing +plane scaling or pipe scaling based on plane_state!=NULL. The end +result is that we may miscalculate the scaler mode for pipe scaling. + +The hardware becomes somewhat upset if we end up in this situation +when scanning out a planar format on a SDR plane. We end up +programming the pipe scaler into planar mode as well, and the +result is a screenfull of garbage. + +Fix the situation by making sure we pass the correct plane_state==NULL +when calculating the scaler mode for pipe scaling. + +Cc: stable@vger.kernel.org +Signed-off-by: Ville Syrjälä +Link: https://patchwork.freedesktop.org/patch/msgid/20231207193441.20206-2-ville.syrjala@linux.intel.com +Reviewed-by: Jani Nikula +(cherry picked from commit e81144106e21271c619f0c722a09e27ccb8c043d) +Signed-off-by: Jani Nikula +Signed-off-by: Greg Kroah-Hartman +--- + drivers/gpu/drm/i915/display/skl_scaler.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +--- a/drivers/gpu/drm/i915/display/skl_scaler.c ++++ b/drivers/gpu/drm/i915/display/skl_scaler.c +@@ -504,7 +504,6 @@ int intel_atomic_setup_scalers(struct dr + { + struct drm_plane *plane = NULL; + struct intel_plane *intel_plane; +- struct intel_plane_state *plane_state = NULL; + struct intel_crtc_scaler_state *scaler_state = + &crtc_state->scaler_state; + struct drm_atomic_state *drm_state = crtc_state->uapi.state; +@@ -536,6 +535,7 @@ int intel_atomic_setup_scalers(struct dr + + /* walkthrough scaler_users bits and start assigning scalers */ + for (i = 0; i < sizeof(scaler_state->scaler_users) * 8; i++) { ++ struct intel_plane_state *plane_state = NULL; + int *scaler_id; + const char *name; + int idx, ret; diff --git a/queue-6.6/drm-i915-fix-remapped-stride-with-ccs-on-adl.patch b/queue-6.6/drm-i915-fix-remapped-stride-with-ccs-on-adl.patch new file mode 100644 index 00000000000..46638cc4748 --- /dev/null +++ b/queue-6.6/drm-i915-fix-remapped-stride-with-ccs-on-adl.patch @@ -0,0 +1,67 @@ +From 0ccd963fe555451b1f84e6d14d2b3ef03dd5c947 Mon Sep 17 00:00:00 2001 +From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= +Date: Tue, 5 Dec 2023 20:03:08 +0200 +Subject: drm/i915: Fix remapped stride with CCS on ADL+ +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Ville Syrjälä + +commit 0ccd963fe555451b1f84e6d14d2b3ef03dd5c947 upstream. + +On ADL+ the hardware automagically calculates the CCS AUX surface +stride from the main surface stride, so when remapping we can't +really play a lot of tricks with the main surface stride, or else +the AUX surface stride would get miscalculated and no longer +match the actual data layout in memory. + +Supposedly we could remap in 256 main surface tile units +(AUX page(4096)/cachline(64)*4(4x1 main surface tiles per +AUX cacheline)=256 main surface tiles), but the extra complexity +is probably not worth the hassle. + +So let's just make sure our mapping stride is calculated from +the full framebuffer stride (instead of the framebuffer width). +This way the stride we program into PLANE_STRIDE will be the +original framebuffer stride, and thus there will be no change +to the AUX stride/layout. + +Cc: stable@vger.kernel.org +Cc: Imre Deak +Cc: Juha-Pekka Heikkila +Signed-off-by: Ville Syrjälä +Link: https://patchwork.freedesktop.org/patch/msgid/20231205180308.7505-1-ville.syrjala@linux.intel.com +Reviewed-by: Imre Deak +(cherry picked from commit 2c12eb36f849256f5eb00ffaee9bf99396fd3814) +Signed-off-by: Jani Nikula +Signed-off-by: Greg Kroah-Hartman +--- + drivers/gpu/drm/i915/display/intel_fb.c | 16 ++++++++++++++-- + 1 file changed, 14 insertions(+), 2 deletions(-) + +--- a/drivers/gpu/drm/i915/display/intel_fb.c ++++ b/drivers/gpu/drm/i915/display/intel_fb.c +@@ -1498,8 +1498,20 @@ static u32 calc_plane_remap_info(const s + + size += remap_info->size; + } else { +- unsigned int dst_stride = plane_view_dst_stride_tiles(fb, color_plane, +- remap_info->width); ++ unsigned int dst_stride; ++ ++ /* ++ * The hardware automagically calculates the CCS AUX surface ++ * stride from the main surface stride so can't really remap a ++ * smaller subset (unless we'd remap in whole AUX page units). ++ */ ++ if (intel_fb_needs_pot_stride_remap(fb) && ++ intel_fb_is_ccs_modifier(fb->base.modifier)) ++ dst_stride = remap_info->src_stride; ++ else ++ dst_stride = remap_info->width; ++ ++ dst_stride = plane_view_dst_stride_tiles(fb, color_plane, dst_stride); + + assign_chk_ovf(i915, remap_info->dst_stride, dst_stride); + color_plane_info->mapping_stride = dst_stride * diff --git a/queue-6.6/drm-mediatek-fix-access-violation-in-mtk_drm_crtc_dma_dev_get.patch b/queue-6.6/drm-mediatek-fix-access-violation-in-mtk_drm_crtc_dma_dev_get.patch new file mode 100644 index 00000000000..51841c660f7 --- /dev/null +++ b/queue-6.6/drm-mediatek-fix-access-violation-in-mtk_drm_crtc_dma_dev_get.patch @@ -0,0 +1,47 @@ +From b6961d187fcd138981b8707dac87b9fcdbfe75d1 Mon Sep 17 00:00:00 2001 +From: Stuart Lee +Date: Fri, 10 Nov 2023 09:29:14 +0800 +Subject: drm/mediatek: Fix access violation in mtk_drm_crtc_dma_dev_get + +From: Stuart Lee + +commit b6961d187fcd138981b8707dac87b9fcdbfe75d1 upstream. + +Add error handling to check NULL input in +mtk_drm_crtc_dma_dev_get function. + +While display path is not configured correctly, none of crtc is +established. So the caller of mtk_drm_crtc_dma_dev_get may pass +input parameter *crtc as NULL, Which may cause coredump when +we try to get the container of NULL pointer. + +Fixes: cb1d6bcca542 ("drm/mediatek: Add dma dev get function") +Signed-off-by: Stuart Lee +Cc: stable@vger.kernel.org +Reviewed-by: AngeloGioacchino DEl Regno +Tested-by: Macpaul Lin +Link: https://patchwork.kernel.org/project/dri-devel/patch/20231110012914.14884-2-stuart.lee@mediatek.com/ +Signed-off-by: Chun-Kuang Hu +Signed-off-by: Greg Kroah-Hartman +--- + drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 9 ++++++++- + 1 file changed, 8 insertions(+), 1 deletion(-) + +--- a/drivers/gpu/drm/mediatek/mtk_drm_crtc.c ++++ b/drivers/gpu/drm/mediatek/mtk_drm_crtc.c +@@ -885,7 +885,14 @@ static int mtk_drm_crtc_init_comp_planes + + struct device *mtk_drm_crtc_dma_dev_get(struct drm_crtc *crtc) + { +- struct mtk_drm_crtc *mtk_crtc = to_mtk_crtc(crtc); ++ struct mtk_drm_crtc *mtk_crtc = NULL; ++ ++ if (!crtc) ++ return NULL; ++ ++ mtk_crtc = to_mtk_crtc(crtc); ++ if (!mtk_crtc) ++ return NULL; + + return mtk_crtc->dma_dev; + } diff --git a/queue-6.6/kexec-drop-dependency-on-arch_supports_kexec-from-crash_dump.patch b/queue-6.6/kexec-drop-dependency-on-arch_supports_kexec-from-crash_dump.patch new file mode 100644 index 00000000000..520f4ecde8e --- /dev/null +++ b/queue-6.6/kexec-drop-dependency-on-arch_supports_kexec-from-crash_dump.patch @@ -0,0 +1,104 @@ +From c41bd2514184d75db087fe4c1221237fb7922875 Mon Sep 17 00:00:00 2001 +From: Ignat Korchagin +Date: Wed, 29 Nov 2023 22:04:09 +0000 +Subject: kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP + +From: Ignat Korchagin + +commit c41bd2514184d75db087fe4c1221237fb7922875 upstream. + +In commit f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for +CRASH_DUMP") we tried to fix a config regression, where CONFIG_CRASH_DUMP +required CONFIG_KEXEC. + +However, it was not enough at least for arm64 platforms. While further +testing the patch with our arm64 config I noticed that CONFIG_CRASH_DUMP +is unavailable in menuconfig. This is because CONFIG_CRASH_DUMP still +depends on the new CONFIG_ARCH_SUPPORTS_KEXEC introduced in commit +91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") and on +arm64 CONFIG_ARCH_SUPPORTS_KEXEC requires CONFIG_PM_SLEEP_SMP=y, which in +turn requires either CONFIG_SUSPEND=y or CONFIG_HIBERNATION=y neither of +which are set in our config. + +Given that we already established that CONFIG_KEXEC (which is a switch for +kexec system call itself) is not required for CONFIG_CRASH_DUMP drop +CONFIG_ARCH_SUPPORTS_KEXEC dependency as well. The arm64 kernel builds +just fine with CONFIG_CRASH_DUMP=y and with both CONFIG_KEXEC=n and +CONFIG_KEXEC_FILE=n after f8ff23429c62 ("kernel/Kconfig.kexec: drop select +of KEXEC for CRASH_DUMP") and this patch are applied given that the +necessary shared bits are included via CONFIG_KEXEC_CORE dependency. + +[bhe@redhat.com: don't export some symbols when CONFIG_MMU=n] + Link: https://lkml.kernel.org/r/ZW03ODUKGGhP1ZGU@MiWiFi-R3L-srv +[bhe@redhat.com: riscv, kexec: fix dependency of two items] + Link: https://lkml.kernel.org/r/ZW04G/SKnhbE5mnX@MiWiFi-R3L-srv +Link: https://lkml.kernel.org/r/20231129220409.55006-1-ignat@cloudflare.com +Fixes: 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") +Signed-off-by: Ignat Korchagin +Signed-off-by: Baoquan He +Acked-by: Baoquan He +Cc: Alexander Gordeev +Cc: # 6.6+: f8ff234: kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP +Cc: # 6.6+ +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + arch/riscv/Kconfig | 4 ++-- + arch/riscv/kernel/crash_core.c | 4 +++- + kernel/Kconfig.kexec | 1 - + 3 files changed, 5 insertions(+), 4 deletions(-) + +--- a/arch/riscv/Kconfig ++++ b/arch/riscv/Kconfig +@@ -669,7 +669,7 @@ config RISCV_BOOT_SPINWAIT + If unsure what to do here, say N. + + config ARCH_SUPPORTS_KEXEC +- def_bool MMU ++ def_bool y + + config ARCH_SELECTS_KEXEC + def_bool y +@@ -677,7 +677,7 @@ config ARCH_SELECTS_KEXEC + select HOTPLUG_CPU if SMP + + config ARCH_SUPPORTS_KEXEC_FILE +- def_bool 64BIT && MMU ++ def_bool 64BIT + + config ARCH_SELECTS_KEXEC_FILE + def_bool y +--- a/arch/riscv/kernel/crash_core.c ++++ b/arch/riscv/kernel/crash_core.c +@@ -5,18 +5,20 @@ + + void arch_crash_save_vmcoreinfo(void) + { +- VMCOREINFO_NUMBER(VA_BITS); + VMCOREINFO_NUMBER(phys_ram_base); + + vmcoreinfo_append_str("NUMBER(PAGE_OFFSET)=0x%lx\n", PAGE_OFFSET); + vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START); + vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%lx\n", VMALLOC_END); ++#ifdef CONFIG_MMU ++ VMCOREINFO_NUMBER(VA_BITS); + vmcoreinfo_append_str("NUMBER(VMEMMAP_START)=0x%lx\n", VMEMMAP_START); + vmcoreinfo_append_str("NUMBER(VMEMMAP_END)=0x%lx\n", VMEMMAP_END); + #ifdef CONFIG_64BIT + vmcoreinfo_append_str("NUMBER(MODULES_VADDR)=0x%lx\n", MODULES_VADDR); + vmcoreinfo_append_str("NUMBER(MODULES_END)=0x%lx\n", MODULES_END); + #endif ++#endif + vmcoreinfo_append_str("NUMBER(KERNEL_LINK_ADDR)=0x%lx\n", KERNEL_LINK_ADDR); + vmcoreinfo_append_str("NUMBER(va_kernel_pa_offset)=0x%lx\n", + kernel_map.va_kernel_pa_offset); +--- a/kernel/Kconfig.kexec ++++ b/kernel/Kconfig.kexec +@@ -94,7 +94,6 @@ config KEXEC_JUMP + config CRASH_DUMP + bool "kernel crash dumps" + depends on ARCH_SUPPORTS_CRASH_DUMP +- depends on ARCH_SUPPORTS_KEXEC + select CRASH_CORE + select KEXEC_CORE + help diff --git a/queue-6.6/mm-mglru-fix-underprotected-page-cache.patch b/queue-6.6/mm-mglru-fix-underprotected-page-cache.patch new file mode 100644 index 00000000000..42884c75939 --- /dev/null +++ b/queue-6.6/mm-mglru-fix-underprotected-page-cache.patch @@ -0,0 +1,128 @@ +From 081488051d28d32569ebb7c7a23572778b2e7d57 Mon Sep 17 00:00:00 2001 +From: Yu Zhao +Date: Thu, 7 Dec 2023 23:14:04 -0700 +Subject: mm/mglru: fix underprotected page cache + +From: Yu Zhao + +commit 081488051d28d32569ebb7c7a23572778b2e7d57 upstream. + +Unmapped folios accessed through file descriptors can be underprotected. +Those folios are added to the oldest generation based on: + +1. The fact that they are less costly to reclaim (no need to walk the + rmap and flush the TLB) and have less impact on performance (don't + cause major PFs and can be non-blocking if needed again). +2. The observation that they are likely to be single-use. E.g., for + client use cases like Android, its apps parse configuration files + and store the data in heap (anon); for server use cases like MySQL, + it reads from InnoDB files and holds the cached data for tables in + buffer pools (anon). + +However, the oldest generation can be very short lived, and if so, it +doesn't provide the PID controller with enough time to respond to a surge +of refaults. (Note that the PID controller uses weighted refaults and +those from evicted generations only take a half of the whole weight.) In +other words, for a short lived generation, the moving average smooths out +the spike quickly. + +To fix the problem: +1. For folios that are already on LRU, if they can be beyond the + tracking range of tiers, i.e., five accesses through file + descriptors, move them to the second oldest generation to give them + more time to age. (Note that tiers are used by the PID controller + to statistically determine whether folios accessed multiple times + through file descriptors are worth protecting.) +2. When adding unmapped folios to LRU, adjust the placement of them so + that they are not too close to the tail. The effect of this is + similar to the above. + +On Android, launching 55 apps sequentially: + Before After Change + workingset_refault_anon 25641024 25598972 0% + workingset_refault_file 115016834 106178438 -8% + +Link: https://lkml.kernel.org/r/20231208061407.2125867-1-yuzhao@google.com +Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation") +Signed-off-by: Yu Zhao +Reported-by: Charan Teja Kalla +Tested-by: Kalesh Singh +Cc: T.J. Mercier +Cc: Kairui Song +Cc: Hillf Danton +Cc: Jaroslav Pulchart +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/mm_inline.h | 23 ++++++++++++++--------- + mm/vmscan.c | 2 +- + mm/workingset.c | 6 +++--- + 3 files changed, 18 insertions(+), 13 deletions(-) + +--- a/include/linux/mm_inline.h ++++ b/include/linux/mm_inline.h +@@ -231,22 +231,27 @@ static inline bool lru_gen_add_folio(str + if (folio_test_unevictable(folio) || !lrugen->enabled) + return false; + /* +- * There are three common cases for this page: +- * 1. If it's hot, e.g., freshly faulted in or previously hot and +- * migrated, add it to the youngest generation. +- * 2. If it's cold but can't be evicted immediately, i.e., an anon page +- * not in swapcache or a dirty page pending writeback, add it to the +- * second oldest generation. +- * 3. Everything else (clean, cold) is added to the oldest generation. ++ * There are four common cases for this page: ++ * 1. If it's hot, i.e., freshly faulted in, add it to the youngest ++ * generation, and it's protected over the rest below. ++ * 2. If it can't be evicted immediately, i.e., a dirty page pending ++ * writeback, add it to the second youngest generation. ++ * 3. If it should be evicted first, e.g., cold and clean from ++ * folio_rotate_reclaimable(), add it to the oldest generation. ++ * 4. Everything else falls between 2 & 3 above and is added to the ++ * second oldest generation if it's considered inactive, or the ++ * oldest generation otherwise. See lru_gen_is_active(). + */ + if (folio_test_active(folio)) + seq = lrugen->max_seq; + else if ((type == LRU_GEN_ANON && !folio_test_swapcache(folio)) || + (folio_test_reclaim(folio) && + (folio_test_dirty(folio) || folio_test_writeback(folio)))) +- seq = lrugen->min_seq[type] + 1; +- else ++ seq = lrugen->max_seq - 1; ++ else if (reclaiming || lrugen->min_seq[type] + MIN_NR_GENS >= lrugen->max_seq) + seq = lrugen->min_seq[type]; ++ else ++ seq = lrugen->min_seq[type] + 1; + + gen = lru_gen_from_seq(seq); + flags = (gen + 1UL) << LRU_GEN_PGOFF; +--- a/mm/vmscan.c ++++ b/mm/vmscan.c +@@ -4933,7 +4933,7 @@ static bool sort_folio(struct lruvec *lr + } + + /* protected */ +- if (tier > tier_idx) { ++ if (tier > tier_idx || refs == BIT(LRU_REFS_WIDTH)) { + int hist = lru_hist_from_seq(lrugen->min_seq[type]); + + gen = folio_inc_gen(lruvec, folio, false); +--- a/mm/workingset.c ++++ b/mm/workingset.c +@@ -313,10 +313,10 @@ static void lru_gen_refault(struct folio + * 1. For pages accessed through page tables, hotter pages pushed out + * hot pages which refaulted immediately. + * 2. For pages accessed multiple times through file descriptors, +- * numbers of accesses might have been out of the range. ++ * they would have been protected by sort_folio(). + */ +- if (lru_gen_in_fault() || refs == BIT(LRU_REFS_WIDTH)) { +- folio_set_workingset(folio); ++ if (lru_gen_in_fault() || refs >= BIT(LRU_REFS_WIDTH) - 1) { ++ set_mask_bits(&folio->flags, 0, LRU_REFS_MASK | BIT(PG_workingset)); + mod_lruvec_state(lruvec, WORKINGSET_RESTORE_BASE + type, delta); + } + unlock: diff --git a/queue-6.6/mm-mglru-reclaim-offlined-memcgs-harder.patch b/queue-6.6/mm-mglru-reclaim-offlined-memcgs-harder.patch new file mode 100644 index 00000000000..865bf1e5d3b --- /dev/null +++ b/queue-6.6/mm-mglru-reclaim-offlined-memcgs-harder.patch @@ -0,0 +1,116 @@ +From 4376807bf2d5371c3e00080c972be568c3f8a7d1 Mon Sep 17 00:00:00 2001 +From: Yu Zhao +Date: Thu, 7 Dec 2023 23:14:07 -0700 +Subject: mm/mglru: reclaim offlined memcgs harder + +From: Yu Zhao + +commit 4376807bf2d5371c3e00080c972be568c3f8a7d1 upstream. + +In the effort to reduce zombie memcgs [1], it was discovered that the +memcg LRU doesn't apply enough pressure on offlined memcgs. Specifically, +instead of rotating them to the tail of the current generation +(MEMCG_LRU_TAIL) for a second attempt, it moves them to the next +generation (MEMCG_LRU_YOUNG) after the first attempt. + +Not applying enough pressure on offlined memcgs can cause them to build +up, and this can be particularly harmful to memory-constrained systems. + +On Pixel 8 Pro, launching apps for 50 cycles: + Before After Change + Zombie memcgs 45 35 -22% + +[1] https://lore.kernel.org/CABdmKX2M6koq4Q0Cmp_-=wbP0Qa190HdEGGaHfxNS05gAkUtPA@mail.gmail.com/ + +Link: https://lkml.kernel.org/r/20231208061407.2125867-4-yuzhao@google.com +Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") +Signed-off-by: Yu Zhao +Reported-by: T.J. Mercier +Tested-by: T.J. Mercier +Cc: Charan Teja Kalla +Cc: Hillf Danton +Cc: Jaroslav Pulchart +Cc: Kairui Song +Cc: Kalesh Singh +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/mmzone.h | 8 ++++---- + mm/vmscan.c | 24 ++++++++++++++++-------- + 2 files changed, 20 insertions(+), 12 deletions(-) + +--- a/include/linux/mmzone.h ++++ b/include/linux/mmzone.h +@@ -519,10 +519,10 @@ void lru_gen_look_around(struct page_vma + * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD; + * 2. The first attempt to reclaim a memcg below low, which triggers + * MEMCG_LRU_TAIL; +- * 3. The first attempt to reclaim a memcg below reclaimable size threshold, +- * which triggers MEMCG_LRU_TAIL; +- * 4. The second attempt to reclaim a memcg below reclaimable size threshold, +- * which triggers MEMCG_LRU_YOUNG; ++ * 3. The first attempt to reclaim a memcg offlined or below reclaimable size ++ * threshold, which triggers MEMCG_LRU_TAIL; ++ * 4. The second attempt to reclaim a memcg offlined or below reclaimable size ++ * threshold, which triggers MEMCG_LRU_YOUNG; + * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG; + * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG; + * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD. +--- a/mm/vmscan.c ++++ b/mm/vmscan.c +@@ -5291,7 +5291,12 @@ static bool should_run_aging(struct lruv + } + + /* try to scrape all its memory if this memcg was deleted */ +- *nr_to_scan = mem_cgroup_online(memcg) ? (total >> sc->priority) : total; ++ if (!mem_cgroup_online(memcg)) { ++ *nr_to_scan = total; ++ return false; ++ } ++ ++ *nr_to_scan = total >> sc->priority; + + /* + * The aging tries to be lazy to reduce the overhead, while the eviction +@@ -5412,14 +5417,9 @@ static int shrink_one(struct lruvec *lru + bool success; + unsigned long scanned = sc->nr_scanned; + unsigned long reclaimed = sc->nr_reclaimed; +- int seg = lru_gen_memcg_seg(lruvec); + struct mem_cgroup *memcg = lruvec_memcg(lruvec); + struct pglist_data *pgdat = lruvec_pgdat(lruvec); + +- /* see the comment on MEMCG_NR_GENS */ +- if (!lruvec_is_sizable(lruvec, sc)) +- return seg != MEMCG_LRU_TAIL ? MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG; +- + mem_cgroup_calculate_protection(NULL, memcg); + + if (mem_cgroup_below_min(NULL, memcg)) +@@ -5427,7 +5427,7 @@ static int shrink_one(struct lruvec *lru + + if (mem_cgroup_below_low(NULL, memcg)) { + /* see the comment on MEMCG_NR_GENS */ +- if (seg != MEMCG_LRU_TAIL) ++ if (lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL) + return MEMCG_LRU_TAIL; + + memcg_memory_event(memcg, MEMCG_LOW); +@@ -5443,7 +5443,15 @@ static int shrink_one(struct lruvec *lru + + flush_reclaim_state(sc); + +- return success ? MEMCG_LRU_YOUNG : 0; ++ if (success && mem_cgroup_online(memcg)) ++ return MEMCG_LRU_YOUNG; ++ ++ if (!success && lruvec_is_sizable(lruvec, sc)) ++ return 0; ++ ++ /* one retry if offlined or too small */ ++ return lru_gen_memcg_seg(lruvec) != MEMCG_LRU_TAIL ? ++ MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG; + } + + #ifdef CONFIG_MEMCG diff --git a/queue-6.6/mm-mglru-respect-min_ttl_ms-with-memcgs.patch b/queue-6.6/mm-mglru-respect-min_ttl_ms-with-memcgs.patch new file mode 100644 index 00000000000..6efe0448f08 --- /dev/null +++ b/queue-6.6/mm-mglru-respect-min_ttl_ms-with-memcgs.patch @@ -0,0 +1,214 @@ +From 8aa420617918d12d1f5d55030a503c9418e73c2c Mon Sep 17 00:00:00 2001 +From: Yu Zhao +Date: Thu, 7 Dec 2023 23:14:06 -0700 +Subject: mm/mglru: respect min_ttl_ms with memcgs + +From: Yu Zhao + +commit 8aa420617918d12d1f5d55030a503c9418e73c2c upstream. + +While investigating kswapd "consuming 100% CPU" [1] (also see "mm/mglru: +try to stop at high watermarks"), it was discovered that the memcg LRU can +breach the thrashing protection imposed by min_ttl_ms. + +Before the memcg LRU: + kswapd() + shrink_node_memcgs() + mem_cgroup_iter() + inc_max_seq() // always hit a different memcg + lru_gen_age_node() + mem_cgroup_iter() + check the timestamp of the oldest generation + +After the memcg LRU: + kswapd() + shrink_many() + restart: + iterate the memcg LRU: + inc_max_seq() // occasionally hit the same memcg + if raced with lru_gen_rotate_memcg(): + goto restart + lru_gen_age_node() + mem_cgroup_iter() + check the timestamp of the oldest generation + +Specifically, when the restart happens in shrink_many(), it needs to stick +with the (memcg LRU) generation it began with. In other words, it should +neither re-read memcg_lru->seq nor age an lruvec of a different +generation. Otherwise it can hit the same memcg multiple times without +giving lru_gen_age_node() a chance to check the timestamp of that memcg's +oldest generation (against min_ttl_ms). + +[1] https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com/ + +Link: https://lkml.kernel.org/r/20231208061407.2125867-3-yuzhao@google.com +Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") +Signed-off-by: Yu Zhao +Tested-by: T.J. Mercier +Cc: Charan Teja Kalla +Cc: Hillf Danton +Cc: Jaroslav Pulchart +Cc: Kairui Song +Cc: Kalesh Singh +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + include/linux/mmzone.h | 30 +++++++++++++++++------------- + mm/vmscan.c | 30 ++++++++++++++++-------------- + 2 files changed, 33 insertions(+), 27 deletions(-) + +--- a/include/linux/mmzone.h ++++ b/include/linux/mmzone.h +@@ -505,33 +505,37 @@ void lru_gen_look_around(struct page_vma + * the old generation, is incremented when all its bins become empty. + * + * There are four operations: +- * 1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in its ++ * 1. MEMCG_LRU_HEAD, which moves a memcg to the head of a random bin in its + * current generation (old or young) and updates its "seg" to "head"; +- * 2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in its ++ * 2. MEMCG_LRU_TAIL, which moves a memcg to the tail of a random bin in its + * current generation (old or young) and updates its "seg" to "tail"; +- * 3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in the old ++ * 3. MEMCG_LRU_OLD, which moves a memcg to the head of a random bin in the old + * generation, updates its "gen" to "old" and resets its "seg" to "default"; +- * 4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin in the ++ * 4. MEMCG_LRU_YOUNG, which moves a memcg to the tail of a random bin in the + * young generation, updates its "gen" to "young" and resets its "seg" to + * "default". + * + * The events that trigger the above operations are: + * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD; +- * 2. The first attempt to reclaim an memcg below low, which triggers ++ * 2. The first attempt to reclaim a memcg below low, which triggers + * MEMCG_LRU_TAIL; +- * 3. The first attempt to reclaim an memcg below reclaimable size threshold, ++ * 3. The first attempt to reclaim a memcg below reclaimable size threshold, + * which triggers MEMCG_LRU_TAIL; +- * 4. The second attempt to reclaim an memcg below reclaimable size threshold, ++ * 4. The second attempt to reclaim a memcg below reclaimable size threshold, + * which triggers MEMCG_LRU_YOUNG; +- * 5. Attempting to reclaim an memcg below min, which triggers MEMCG_LRU_YOUNG; ++ * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG; + * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG; +- * 7. Offlining an memcg, which triggers MEMCG_LRU_OLD. ++ * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD. + * +- * Note that memcg LRU only applies to global reclaim, and the round-robin +- * incrementing of their max_seq counters ensures the eventual fairness to all +- * eligible memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). ++ * Notes: ++ * 1. Memcg LRU only applies to global reclaim, and the round-robin incrementing ++ * of their max_seq counters ensures the eventual fairness to all eligible ++ * memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). ++ * 2. There are only two valid generations: old (seq) and young (seq+1). ++ * MEMCG_NR_GENS is set to three so that when reading the generation counter ++ * locklessly, a stale value (seq-1) does not wraparound to young. + */ +-#define MEMCG_NR_GENS 2 ++#define MEMCG_NR_GENS 3 + #define MEMCG_NR_BINS 8 + + struct lru_gen_memcg { +--- a/mm/vmscan.c ++++ b/mm/vmscan.c +@@ -4790,6 +4790,9 @@ static void lru_gen_rotate_memcg(struct + else + VM_WARN_ON_ONCE(true); + ++ WRITE_ONCE(lruvec->lrugen.seg, seg); ++ WRITE_ONCE(lruvec->lrugen.gen, new); ++ + hlist_nulls_del_rcu(&lruvec->lrugen.list); + + if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD) +@@ -4800,9 +4803,6 @@ static void lru_gen_rotate_memcg(struct + pgdat->memcg_lru.nr_memcgs[old]--; + pgdat->memcg_lru.nr_memcgs[new]++; + +- lruvec->lrugen.gen = new; +- WRITE_ONCE(lruvec->lrugen.seg, seg); +- + if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq)) + WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); + +@@ -4825,11 +4825,11 @@ void lru_gen_online_memcg(struct mem_cgr + + gen = get_memcg_gen(pgdat->memcg_lru.seq); + ++ lruvec->lrugen.gen = gen; ++ + hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]); + pgdat->memcg_lru.nr_memcgs[gen]++; + +- lruvec->lrugen.gen = gen; +- + spin_unlock_irq(&pgdat->memcg_lru.lock); + } + } +@@ -5328,7 +5328,7 @@ static long get_nr_to_scan(struct lruvec + DEFINE_MAX_SEQ(lruvec); + + if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg)) +- return 0; ++ return -1; + + if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan)) + return nr_to_scan; +@@ -5403,7 +5403,7 @@ static bool try_to_shrink_lruvec(struct + cond_resched(); + } + +- /* whether try_to_inc_max_seq() was successful */ ++ /* whether this lruvec should be rotated */ + return nr_to_scan < 0; + } + +@@ -5457,13 +5457,13 @@ static void shrink_many(struct pglist_da + struct lruvec *lruvec; + struct lru_gen_folio *lrugen; + struct mem_cgroup *memcg; +- const struct hlist_nulls_node *pos; ++ struct hlist_nulls_node *pos; + ++ gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq)); + bin = first_bin = get_random_u32_below(MEMCG_NR_BINS); + restart: + op = 0; + memcg = NULL; +- gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq)); + + rcu_read_lock(); + +@@ -5474,6 +5474,10 @@ restart: + } + + mem_cgroup_put(memcg); ++ memcg = NULL; ++ ++ if (gen != READ_ONCE(lrugen->gen)) ++ continue; + + lruvec = container_of(lrugen, struct lruvec, lrugen); + memcg = lruvec_memcg(lruvec); +@@ -5558,16 +5562,14 @@ static void set_initial_priority(struct + if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH) + return; + /* +- * Determine the initial priority based on ((total / MEMCG_NR_GENS) >> +- * priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, where the +- * estimated reclaimed_to_scanned_ratio = inactive / total. ++ * Determine the initial priority based on ++ * (total >> priority) * reclaimed_to_scanned_ratio = nr_to_reclaim, ++ * where reclaimed_to_scanned_ratio = inactive / total. + */ + reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE); + if (get_swappiness(lruvec, sc)) + reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON); + +- reclaimable /= MEMCG_NR_GENS; +- + /* round down reclaimable and round up sc->nr_to_reclaim */ + priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1); + diff --git a/queue-6.6/mm-mglru-try-to-stop-at-high-watermarks.patch b/queue-6.6/mm-mglru-try-to-stop-at-high-watermarks.patch new file mode 100644 index 00000000000..59e7c4a5ab6 --- /dev/null +++ b/queue-6.6/mm-mglru-try-to-stop-at-high-watermarks.patch @@ -0,0 +1,143 @@ +From 5095a2b23987d3c3c47dd16b3d4080e2733b8bb9 Mon Sep 17 00:00:00 2001 +From: Yu Zhao +Date: Thu, 7 Dec 2023 23:14:05 -0700 +Subject: mm/mglru: try to stop at high watermarks + +From: Yu Zhao + +commit 5095a2b23987d3c3c47dd16b3d4080e2733b8bb9 upstream. + +The initial MGLRU patchset didn't include the memcg LRU support, and it +relied on should_abort_scan(), added by commit f76c83378851 ("mm: +multi-gen LRU: optimize multiple memcgs"), to "backoff to avoid +overshooting their aggregate reclaim target by too much". + +Later on when the memcg LRU was added, should_abort_scan() was deemed +unnecessary, and the test results [1] showed no side effects after it was +removed by commit a579086c99ed ("mm: multi-gen LRU: remove eviction +fairness safeguard"). + +However, that test used memory.reclaim, which sets nr_to_reclaim to +SWAP_CLUSTER_MAX. So it can overshoot only by SWAP_CLUSTER_MAX-1 pages, +i.e., from nr_reclaimed=nr_to_reclaim-1 to +nr_reclaimed=nr_to_reclaim+SWAP_CLUSTER_MAX-1. Compared with the batch +size kswapd sets to nr_to_reclaim, SWAP_CLUSTER_MAX is tiny. Therefore +that test isn't able to reproduce the worst case scenario, i.e., kswapd +overshooting GBs on large systems and "consuming 100% CPU" (see the Closes +tag). + +Bring back a simplified version of should_abort_scan() on top of the memcg +LRU, so that kswapd stops when all eligible zones are above their +respective high watermarks plus a small delta to lower the chance of +KSWAPD_HIGH_WMARK_HIT_QUICKLY. Note that this only applies to order-0 +reclaim, meaning compaction-induced reclaim can still run wild (which is a +different problem). + +On Android, launching 55 apps sequentially: + Before After Change + pgpgin 838377172 802955040 -4% + pgpgout 38037080 34336300 -10% + +[1] https://lore.kernel.org/20221222041905.2431096-1-yuzhao@google.com/ + +Link: https://lkml.kernel.org/r/20231208061407.2125867-2-yuzhao@google.com +Fixes: a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard") +Signed-off-by: Yu Zhao +Reported-by: Charan Teja Kalla +Reported-by: Jaroslav Pulchart +Closes: https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com/ +Tested-by: Jaroslav Pulchart +Tested-by: Kalesh Singh +Cc: Hillf Danton +Cc: Kairui Song +Cc: T.J. Mercier +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + mm/vmscan.c | 36 ++++++++++++++++++++++++++++-------- + 1 file changed, 28 insertions(+), 8 deletions(-) + +--- a/mm/vmscan.c ++++ b/mm/vmscan.c +@@ -5341,20 +5341,41 @@ static long get_nr_to_scan(struct lruvec + return try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false) ? -1 : 0; + } + +-static unsigned long get_nr_to_reclaim(struct scan_control *sc) ++static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc) + { ++ int i; ++ enum zone_watermarks mark; ++ + /* don't abort memcg reclaim to ensure fairness */ + if (!root_reclaim(sc)) +- return -1; ++ return false; ++ ++ if (sc->nr_reclaimed >= max(sc->nr_to_reclaim, compact_gap(sc->order))) ++ return true; ++ ++ /* check the order to exclude compaction-induced reclaim */ ++ if (!current_is_kswapd() || sc->order) ++ return false; ++ ++ mark = sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING ? ++ WMARK_PROMO : WMARK_HIGH; ++ ++ for (i = 0; i <= sc->reclaim_idx; i++) { ++ struct zone *zone = lruvec_pgdat(lruvec)->node_zones + i; ++ unsigned long size = wmark_pages(zone, mark) + MIN_LRU_BATCH; ++ ++ if (managed_zone(zone) && !zone_watermark_ok(zone, 0, size, sc->reclaim_idx, 0)) ++ return false; ++ } + +- return max(sc->nr_to_reclaim, compact_gap(sc->order)); ++ /* kswapd should abort if all eligible zones are safe */ ++ return true; + } + + static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) + { + long nr_to_scan; + unsigned long scanned = 0; +- unsigned long nr_to_reclaim = get_nr_to_reclaim(sc); + int swappiness = get_swappiness(lruvec, sc); + + /* clean file folios are more likely to exist */ +@@ -5376,7 +5397,7 @@ static bool try_to_shrink_lruvec(struct + if (scanned >= nr_to_scan) + break; + +- if (sc->nr_reclaimed >= nr_to_reclaim) ++ if (should_abort_scan(lruvec, sc)) + break; + + cond_resched(); +@@ -5437,7 +5458,6 @@ static void shrink_many(struct pglist_da + struct lru_gen_folio *lrugen; + struct mem_cgroup *memcg; + const struct hlist_nulls_node *pos; +- unsigned long nr_to_reclaim = get_nr_to_reclaim(sc); + + bin = first_bin = get_random_u32_below(MEMCG_NR_BINS); + restart: +@@ -5470,7 +5490,7 @@ restart: + + rcu_read_lock(); + +- if (sc->nr_reclaimed >= nr_to_reclaim) ++ if (should_abort_scan(lruvec, sc)) + break; + } + +@@ -5481,7 +5501,7 @@ restart: + + mem_cgroup_put(memcg); + +- if (sc->nr_reclaimed >= nr_to_reclaim) ++ if (!is_a_nulls(pos)) + return; + + /* restart if raced with lru_gen_rotate_memcg() */ diff --git a/queue-6.6/mm-shmem-fix-race-in-shmem_undo_range-w-thp.patch b/queue-6.6/mm-shmem-fix-race-in-shmem_undo_range-w-thp.patch new file mode 100644 index 00000000000..5b5d9804b62 --- /dev/null +++ b/queue-6.6/mm-shmem-fix-race-in-shmem_undo_range-w-thp.patch @@ -0,0 +1,56 @@ +From 55ac8bbe358bdd2f3c044c12f249fd22d48fe015 Mon Sep 17 00:00:00 2001 +From: David Stevens +Date: Tue, 18 Apr 2023 17:40:31 +0900 +Subject: mm/shmem: fix race in shmem_undo_range w/THP + +From: David Stevens + +commit 55ac8bbe358bdd2f3c044c12f249fd22d48fe015 upstream. + +Split folios during the second loop of shmem_undo_range. It's not +sufficient to only split folios when dealing with partial pages, since +it's possible for a THP to be faulted in after that point. Calling +truncate_inode_folio in that situation can result in throwing away data +outside of the range being targeted. + +[akpm@linux-foundation.org: tidy up comment layout] +Link: https://lkml.kernel.org/r/20230418084031.3439795-1-stevensd@google.com +Fixes: b9a8a4195c7d ("truncate,shmem: Handle truncates that split large folios") +Signed-off-by: David Stevens +Cc: Matthew Wilcox (Oracle) +Cc: Suleiman Souhlal +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + mm/shmem.c | 19 ++++++++++++++++++- + 1 file changed, 18 insertions(+), 1 deletion(-) + +--- a/mm/shmem.c ++++ b/mm/shmem.c +@@ -1098,7 +1098,24 @@ whole_folios: + } + VM_BUG_ON_FOLIO(folio_test_writeback(folio), + folio); +- truncate_inode_folio(mapping, folio); ++ ++ if (!folio_test_large(folio)) { ++ truncate_inode_folio(mapping, folio); ++ } else if (truncate_inode_partial_folio(folio, lstart, lend)) { ++ /* ++ * If we split a page, reset the loop so ++ * that we pick up the new sub pages. ++ * Otherwise the THP was entirely ++ * dropped or the target range was ++ * zeroed, so just continue the loop as ++ * is. ++ */ ++ if (!folio_test_large(folio)) { ++ folio_unlock(folio); ++ index = start; ++ break; ++ } ++ } + } + folio_unlock(folio); + } diff --git a/queue-6.6/revert-selftests-error-out-if-kernel-header-files-are-not-yet-built.patch b/queue-6.6/revert-selftests-error-out-if-kernel-header-files-are-not-yet-built.patch new file mode 100644 index 00000000000..15ec918a22f --- /dev/null +++ b/queue-6.6/revert-selftests-error-out-if-kernel-header-files-are-not-yet-built.patch @@ -0,0 +1,155 @@ +From 43e8832fed08438e2a27afed9bac21acd0ceffe5 Mon Sep 17 00:00:00 2001 +From: John Hubbard +Date: Fri, 8 Dec 2023 18:01:44 -0800 +Subject: Revert "selftests: error out if kernel header files are not yet built" + +From: John Hubbard + +commit 43e8832fed08438e2a27afed9bac21acd0ceffe5 upstream. + +This reverts commit 9fc96c7c19df ("selftests: error out if kernel header +files are not yet built"). + +It turns out that requiring the kernel headers to be built as a +prerequisite to building selftests, does not work in many cases. For +example, Peter Zijlstra writes: + +"My biggest beef with the whole thing is that I simply do not want to use +'make headers', it doesn't work for me. + +I have a ton of output directories and I don't care to build tools into +the output dirs, in fact some of them flat out refuse to work that way +(bpf comes to mind)." [1] + +Therefore, stop erroring out on the selftests build. Additional patches +will be required in order to change over to not requiring the kernel +headers. + +[1] https://lore.kernel.org/20231208221007.GO28727@noisy.programming.kicks-ass.net + +Link: https://lkml.kernel.org/r/20231209020144.244759-1-jhubbard@nvidia.com +Fixes: 9fc96c7c19df ("selftests: error out if kernel header files are not yet built") +Signed-off-by: John Hubbard +Cc: Anders Roxell +Cc: Muhammad Usama Anjum +Cc: David Hildenbrand +Cc: Peter Xu +Cc: Jonathan Corbet +Cc: Nathan Chancellor +Cc: Shuah Khan +Cc: Peter Zijlstra +Cc: Marcos Paulo de Souza +Cc: +Signed-off-by: Andrew Morton +Signed-off-by: Greg Kroah-Hartman +--- + tools/testing/selftests/Makefile | 21 -------------------- + tools/testing/selftests/lib.mk | 40 ++------------------------------------- + 2 files changed, 4 insertions(+), 57 deletions(-) + +--- a/tools/testing/selftests/Makefile ++++ b/tools/testing/selftests/Makefile +@@ -152,12 +152,10 @@ ifneq ($(KBUILD_OUTPUT),) + abs_objtree := $(realpath $(abs_objtree)) + BUILD := $(abs_objtree)/kselftest + KHDR_INCLUDES := -isystem ${abs_objtree}/usr/include +- KHDR_DIR := ${abs_objtree}/usr/include + else + BUILD := $(CURDIR) + abs_srctree := $(shell cd $(top_srcdir) && pwd) + KHDR_INCLUDES := -isystem ${abs_srctree}/usr/include +- KHDR_DIR := ${abs_srctree}/usr/include + DEFAULT_INSTALL_HDR_PATH := 1 + endif + +@@ -171,7 +169,7 @@ export KHDR_INCLUDES + # all isn't the first target in the file. + .DEFAULT_GOAL := all + +-all: kernel_header_files ++all: + @ret=1; \ + for TARGET in $(TARGETS); do \ + BUILD_TARGET=$$BUILD/$$TARGET; \ +@@ -182,23 +180,6 @@ all: kernel_header_files + ret=$$((ret * $$?)); \ + done; exit $$ret; + +-kernel_header_files: +- @ls $(KHDR_DIR)/linux/*.h >/dev/null 2>/dev/null; \ +- if [ $$? -ne 0 ]; then \ +- RED='\033[1;31m'; \ +- NOCOLOR='\033[0m'; \ +- echo; \ +- echo -e "$${RED}error$${NOCOLOR}: missing kernel header files."; \ +- echo "Please run this and try again:"; \ +- echo; \ +- echo " cd $(top_srcdir)"; \ +- echo " make headers"; \ +- echo; \ +- exit 1; \ +- fi +- +-.PHONY: kernel_header_files +- + run_tests: all + @for TARGET in $(TARGETS); do \ + BUILD_TARGET=$$BUILD/$$TARGET; \ +--- a/tools/testing/selftests/lib.mk ++++ b/tools/testing/selftests/lib.mk +@@ -44,26 +44,10 @@ endif + selfdir = $(realpath $(dir $(filter %/lib.mk,$(MAKEFILE_LIST)))) + top_srcdir = $(selfdir)/../../.. + +-ifeq ("$(origin O)", "command line") +- KBUILD_OUTPUT := $(O) ++ifeq ($(KHDR_INCLUDES),) ++KHDR_INCLUDES := -isystem $(top_srcdir)/usr/include + endif + +-ifneq ($(KBUILD_OUTPUT),) +- # Make's built-in functions such as $(abspath ...), $(realpath ...) cannot +- # expand a shell special character '~'. We use a somewhat tedious way here. +- abs_objtree := $(shell cd $(top_srcdir) && mkdir -p $(KBUILD_OUTPUT) && cd $(KBUILD_OUTPUT) && pwd) +- $(if $(abs_objtree),, \ +- $(error failed to create output directory "$(KBUILD_OUTPUT)")) +- # $(realpath ...) resolves symlinks +- abs_objtree := $(realpath $(abs_objtree)) +- KHDR_DIR := ${abs_objtree}/usr/include +-else +- abs_srctree := $(shell cd $(top_srcdir) && pwd) +- KHDR_DIR := ${abs_srctree}/usr/include +-endif +- +-KHDR_INCLUDES := -isystem $(KHDR_DIR) +- + # The following are built by lib.mk common compile rules. + # TEST_CUSTOM_PROGS should be used by tests that require + # custom build rule and prevent common build rule use. +@@ -74,25 +58,7 @@ TEST_GEN_PROGS := $(patsubst %,$(OUTPUT) + TEST_GEN_PROGS_EXTENDED := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS_EXTENDED)) + TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES)) + +-all: kernel_header_files $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) \ +- $(TEST_GEN_FILES) +- +-kernel_header_files: +- @ls $(KHDR_DIR)/linux/*.h >/dev/null 2>/dev/null; \ +- if [ $$? -ne 0 ]; then \ +- RED='\033[1;31m'; \ +- NOCOLOR='\033[0m'; \ +- echo; \ +- echo -e "$${RED}error$${NOCOLOR}: missing kernel header files."; \ +- echo "Please run this and try again:"; \ +- echo; \ +- echo " cd $(top_srcdir)"; \ +- echo " make headers"; \ +- echo; \ +- exit 1; \ +- fi +- +-.PHONY: kernel_header_files ++all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) + + define RUN_TESTS + BASE_DIR="$(selfdir)"; \ diff --git a/queue-6.6/series b/queue-6.6/series index 778b35bc678..082863de08c 100644 --- a/queue-6.6/series +++ b/queue-6.6/series @@ -128,3 +128,30 @@ btrfs-do-not-allow-non-subvolume-root-targets-for-snapshot.patch cxl-hdm-fix-dpa-translation-locking.patch soundwire-stream-fix-null-pointer-dereference-for-multi_link.patch ext4-prevent-the-normalized-size-from-exceeding-ext_max_blocks.patch +revert-selftests-error-out-if-kernel-header-files-are-not-yet-built.patch +arm64-mm-always-make-sw-dirty-ptes-hw-dirty-in-pte_modify.patch +team-fix-use-after-free-when-an-option-instance-allocation-fails.patch +drm-amdgpu-sdma5.2-add-begin-end_use-ring-callbacks.patch +drm-mediatek-fix-access-violation-in-mtk_drm_crtc_dma_dev_get.patch +dmaengine-stm32-dma-avoid-bitfield-overflow-assertion.patch +dmaengine-fsl-edma-fix-dma-channel-leak-in-edmav4.patch +mm-mglru-fix-underprotected-page-cache.patch +mm-mglru-try-to-stop-at-high-watermarks.patch +mm-mglru-respect-min_ttl_ms-with-memcgs.patch +mm-mglru-reclaim-offlined-memcgs-harder.patch +mm-shmem-fix-race-in-shmem_undo_range-w-thp.patch +kexec-drop-dependency-on-arch_supports_kexec-from-crash_dump.patch +btrfs-free-qgroup-reserve-when-ordered_ioerr-is-set.patch +btrfs-fix-qgroup_free_reserved_data-int-overflow.patch +btrfs-don-t-clear-qgroup-reserved-bit-in-release_folio.patch +drm-amdgpu-fix-tear-down-order-in-amdgpu_vm_pt_free.patch +drm-edid-also-call-add-modes-in-edid-connector-update-fallback.patch +drm-amd-display-restore-guard-against-default-backlight-value-1-nit.patch +drm-amd-display-disable-psr-su-on-parade-0803-tcon-again.patch +drm-i915-fix-adl-tiled-plane-stride-when-the-pot-stride-is-smaller-than-the-original.patch +drm-i915-fix-intel_atomic_setup_scalers-plane_state-handling.patch +drm-i915-fix-remapped-stride-with-ccs-on-adl.patch +smb-client-fix-oob-in-receive_encrypted_standard.patch +smb-client-fix-potential-oobs-in-smb2_parse_contexts.patch +smb-client-fix-null-deref-in-asn1_ber_decoder.patch +smb-client-fix-oob-in-smb2_query_reparse_point.patch diff --git a/queue-6.6/smb-client-fix-null-deref-in-asn1_ber_decoder.patch b/queue-6.6/smb-client-fix-null-deref-in-asn1_ber_decoder.patch new file mode 100644 index 00000000000..a180fe5750e --- /dev/null +++ b/queue-6.6/smb-client-fix-null-deref-in-asn1_ber_decoder.patch @@ -0,0 +1,131 @@ +From 90d025c2e953c11974e76637977c473200593a46 Mon Sep 17 00:00:00 2001 +From: Paulo Alcantara +Date: Mon, 11 Dec 2023 10:26:42 -0300 +Subject: smb: client: fix NULL deref in asn1_ber_decoder() + +From: Paulo Alcantara + +commit 90d025c2e953c11974e76637977c473200593a46 upstream. + +If server replied SMB2_NEGOTIATE with a zero SecurityBufferOffset, +smb2_get_data_area() sets @len to non-zero but return NULL, so +decode_negTokeninit() ends up being called with a NULL @security_blob: + + BUG: kernel NULL pointer dereference, address: 0000000000000000 + #PF: supervisor read access in kernel mode + #PF: error_code(0x0000) - not-present page + PGD 0 P4D 0 + Oops: 0000 [#1] PREEMPT SMP NOPTI + CPU: 2 PID: 871 Comm: mount.cifs Not tainted 6.7.0-rc4 #2 + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 + RIP: 0010:asn1_ber_decoder+0x173/0xc80 + Code: 01 4c 39 2c 24 75 09 45 84 c9 0f 85 2f 03 00 00 48 8b 14 24 4c 29 ea 48 83 fa 01 0f 86 1e 07 00 00 48 8b 74 24 28 4d 8d 5d 01 <42> 0f b6 3c 2e 89 fa 40 88 7c 24 5c f7 d2 83 e2 1f 0f 84 3d 07 00 + RSP: 0018:ffffc9000063f950 EFLAGS: 00010202 + RAX: 0000000000000002 RBX: 0000000000000000 RCX: 000000000000004a + RDX: 000000000000004a RSI: 0000000000000000 RDI: 0000000000000000 + RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 + R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000000 + R13: 0000000000000000 R14: 000000000000004d R15: 0000000000000000 + FS: 00007fce52b0fbc0(0000) GS:ffff88806ba00000(0000) knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: 0000000000000000 CR3: 000000001ae64000 CR4: 0000000000750ef0 + PKRU: 55555554 + Call Trace: + + ? __die+0x23/0x70 + ? page_fault_oops+0x181/0x480 + ? __stack_depot_save+0x1e6/0x480 + ? exc_page_fault+0x6f/0x1c0 + ? asm_exc_page_fault+0x26/0x30 + ? asn1_ber_decoder+0x173/0xc80 + ? check_object+0x40/0x340 + decode_negTokenInit+0x1e/0x30 [cifs] + SMB2_negotiate+0xc99/0x17c0 [cifs] + ? smb2_negotiate+0x46/0x60 [cifs] + ? srso_alias_return_thunk+0x5/0xfbef5 + smb2_negotiate+0x46/0x60 [cifs] + cifs_negotiate_protocol+0xae/0x130 [cifs] + cifs_get_smb_ses+0x517/0x1040 [cifs] + ? srso_alias_return_thunk+0x5/0xfbef5 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? queue_delayed_work_on+0x5d/0x90 + cifs_mount_get_session+0x78/0x200 [cifs] + dfs_mount_share+0x13a/0x9f0 [cifs] + ? srso_alias_return_thunk+0x5/0xfbef5 + ? lock_acquire+0xbf/0x2b0 + ? find_nls+0x16/0x80 + ? srso_alias_return_thunk+0x5/0xfbef5 + cifs_mount+0x7e/0x350 [cifs] + cifs_smb3_do_mount+0x128/0x780 [cifs] + smb3_get_tree+0xd9/0x290 [cifs] + vfs_get_tree+0x2c/0x100 + ? capable+0x37/0x70 + path_mount+0x2d7/0xb80 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? _raw_spin_unlock_irqrestore+0x44/0x60 + __x64_sys_mount+0x11a/0x150 + do_syscall_64+0x47/0xf0 + entry_SYSCALL_64_after_hwframe+0x6f/0x77 + RIP: 0033:0x7fce52c2ab1e + +Fix this by setting @len to zero when @off == 0 so callers won't +attempt to dereference non-existing data areas. + +Reported-by: Robert Morris +Cc: stable@vger.kernel.org +Signed-off-by: Paulo Alcantara (SUSE) +Signed-off-by: Steve French +Signed-off-by: Greg Kroah-Hartman +--- + fs/smb/client/smb2misc.c | 26 ++++++++++---------------- + 1 file changed, 10 insertions(+), 16 deletions(-) + +--- a/fs/smb/client/smb2misc.c ++++ b/fs/smb/client/smb2misc.c +@@ -313,6 +313,9 @@ static const bool has_smb2_data_area[NUM + char * + smb2_get_data_area_len(int *off, int *len, struct smb2_hdr *shdr) + { ++ const int max_off = 4096; ++ const int max_len = 128 * 1024; ++ + *off = 0; + *len = 0; + +@@ -384,29 +387,20 @@ smb2_get_data_area_len(int *off, int *le + * Invalid length or offset probably means data area is invalid, but + * we have little choice but to ignore the data area in this case. + */ +- if (*off > 4096) { +- cifs_dbg(VFS, "offset %d too large, data area ignored\n", *off); +- *len = 0; +- *off = 0; +- } else if (*off < 0) { +- cifs_dbg(VFS, "negative offset %d to data invalid ignore data area\n", +- *off); ++ if (unlikely(*off < 0 || *off > max_off || ++ *len < 0 || *len > max_len)) { ++ cifs_dbg(VFS, "%s: invalid data area (off=%d len=%d)\n", ++ __func__, *off, *len); + *off = 0; + *len = 0; +- } else if (*len < 0) { +- cifs_dbg(VFS, "negative data length %d invalid, data area ignored\n", +- *len); +- *len = 0; +- } else if (*len > 128 * 1024) { +- cifs_dbg(VFS, "data area larger than 128K: %d\n", *len); ++ } else if (*off == 0) { + *len = 0; + } + + /* return pointer to beginning of data area, ie offset from SMB start */ +- if ((*off != 0) && (*len != 0)) ++ if (*off > 0 && *len > 0) + return (char *)shdr + *off; +- else +- return NULL; ++ return NULL; + } + + /* diff --git a/queue-6.6/smb-client-fix-oob-in-receive_encrypted_standard.patch b/queue-6.6/smb-client-fix-oob-in-receive_encrypted_standard.patch new file mode 100644 index 00000000000..ba3f1ae7e16 --- /dev/null +++ b/queue-6.6/smb-client-fix-oob-in-receive_encrypted_standard.patch @@ -0,0 +1,64 @@ +From eec04ea119691e65227a97ce53c0da6b9b74b0b7 Mon Sep 17 00:00:00 2001 +From: Paulo Alcantara +Date: Mon, 11 Dec 2023 10:26:40 -0300 +Subject: smb: client: fix OOB in receive_encrypted_standard() + +From: Paulo Alcantara + +commit eec04ea119691e65227a97ce53c0da6b9b74b0b7 upstream. + +Fix potential OOB in receive_encrypted_standard() if server returned a +large shdr->NextCommand that would end up writing off the end of +@next_buffer. + +Fixes: b24df3e30cbf ("cifs: update receive_encrypted_standard to handle compounded responses") +Cc: stable@vger.kernel.org +Reported-by: Robert Morris +Signed-off-by: Paulo Alcantara (SUSE) +Signed-off-by: Steve French +Signed-off-by: Greg Kroah-Hartman +--- + fs/smb/client/smb2ops.c | 14 ++++++++------ + 1 file changed, 8 insertions(+), 6 deletions(-) + +--- a/fs/smb/client/smb2ops.c ++++ b/fs/smb/client/smb2ops.c +@@ -4941,6 +4941,7 @@ receive_encrypted_standard(struct TCP_Se + struct smb2_hdr *shdr; + unsigned int pdu_length = server->pdu_size; + unsigned int buf_size; ++ unsigned int next_cmd; + struct mid_q_entry *mid_entry; + int next_is_large; + char *next_buffer = NULL; +@@ -4969,14 +4970,15 @@ receive_encrypted_standard(struct TCP_Se + next_is_large = server->large_buf; + one_more: + shdr = (struct smb2_hdr *)buf; +- if (shdr->NextCommand) { ++ next_cmd = le32_to_cpu(shdr->NextCommand); ++ if (next_cmd) { ++ if (WARN_ON_ONCE(next_cmd > pdu_length)) ++ return -1; + if (next_is_large) + next_buffer = (char *)cifs_buf_get(); + else + next_buffer = (char *)cifs_small_buf_get(); +- memcpy(next_buffer, +- buf + le32_to_cpu(shdr->NextCommand), +- pdu_length - le32_to_cpu(shdr->NextCommand)); ++ memcpy(next_buffer, buf + next_cmd, pdu_length - next_cmd); + } + + mid_entry = smb2_find_mid(server, buf); +@@ -5000,8 +5002,8 @@ one_more: + else + ret = cifs_handle_standard(server, mid_entry); + +- if (ret == 0 && shdr->NextCommand) { +- pdu_length -= le32_to_cpu(shdr->NextCommand); ++ if (ret == 0 && next_cmd) { ++ pdu_length -= next_cmd; + server->large_buf = next_is_large; + if (next_is_large) + server->bigbuf = buf = next_buffer; diff --git a/queue-6.6/smb-client-fix-oob-in-smb2_query_reparse_point.patch b/queue-6.6/smb-client-fix-oob-in-smb2_query_reparse_point.patch new file mode 100644 index 00000000000..74a2066f1e0 --- /dev/null +++ b/queue-6.6/smb-client-fix-oob-in-smb2_query_reparse_point.patch @@ -0,0 +1,115 @@ +From 3a42709fa909e22b0be4bb1e2795aa04ada732a3 Mon Sep 17 00:00:00 2001 +From: Paulo Alcantara +Date: Mon, 11 Dec 2023 10:26:43 -0300 +Subject: smb: client: fix OOB in smb2_query_reparse_point() + +From: Paulo Alcantara + +commit 3a42709fa909e22b0be4bb1e2795aa04ada732a3 upstream. + +Validate @ioctl_rsp->OutputOffset and @ioctl_rsp->OutputCount so that +their sum does not wrap to a number that is smaller than @reparse_buf +and we end up with a wild pointer as follows: + + BUG: unable to handle page fault for address: ffff88809c5cd45f + #PF: supervisor read access in kernel mode + #PF: error_code(0x0000) - not-present page + PGD 4a01067 P4D 4a01067 PUD 0 + Oops: 0000 [#1] PREEMPT SMP NOPTI + CPU: 2 PID: 1260 Comm: mount.cifs Not tainted 6.7.0-rc4 #2 + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS + rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 + RIP: 0010:smb2_query_reparse_point+0x3e0/0x4c0 [cifs] + Code: ff ff e8 f3 51 fe ff 41 89 c6 58 5a 45 85 f6 0f 85 14 fe ff ff + 49 8b 57 48 8b 42 60 44 8b 42 64 42 8d 0c 00 49 39 4f 50 72 40 <8b> + 04 02 48 8b 9d f0 fe ff ff 49 8b 57 50 89 03 48 8b 9d e8 fe ff + RSP: 0018:ffffc90000347a90 EFLAGS: 00010212 + RAX: 000000008000001f RBX: ffff88800ae11000 RCX: 00000000000000ec + RDX: ffff88801c5cd440 RSI: 0000000000000000 RDI: ffffffff82004aa4 + RBP: ffffc90000347bb0 R08: 00000000800000cd R09: 0000000000000001 + R10: 0000000000000000 R11: 0000000000000024 R12: ffff8880114d4100 + R13: ffff8880114d4198 R14: 0000000000000000 R15: ffff8880114d4000 + FS: 00007f02c07babc0(0000) GS:ffff88806ba00000(0000) + knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: ffff88809c5cd45f CR3: 0000000011750000 CR4: 0000000000750ef0 + PKRU: 55555554 + Call Trace: + + ? __die+0x23/0x70 + ? page_fault_oops+0x181/0x480 + ? search_module_extables+0x19/0x60 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? exc_page_fault+0x1b6/0x1c0 + ? asm_exc_page_fault+0x26/0x30 + ? _raw_spin_unlock_irqrestore+0x44/0x60 + ? smb2_query_reparse_point+0x3e0/0x4c0 [cifs] + cifs_get_fattr+0x16e/0xa50 [cifs] + ? srso_alias_return_thunk+0x5/0xfbef5 + ? lock_acquire+0xbf/0x2b0 + cifs_root_iget+0x163/0x5f0 [cifs] + cifs_smb3_do_mount+0x5bd/0x780 [cifs] + smb3_get_tree+0xd9/0x290 [cifs] + vfs_get_tree+0x2c/0x100 + ? capable+0x37/0x70 + path_mount+0x2d7/0xb80 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? _raw_spin_unlock_irqrestore+0x44/0x60 + __x64_sys_mount+0x11a/0x150 + do_syscall_64+0x47/0xf0 + entry_SYSCALL_64_after_hwframe+0x6f/0x77 + RIP: 0033:0x7f02c08d5b1e + +Fixes: 2e4564b31b64 ("smb3: add support for stat of WSL reparse points for special file types") +Cc: stable@vger.kernel.org +Reported-by: Robert Morris +Signed-off-by: Paulo Alcantara (SUSE) +Signed-off-by: Steve French +Signed-off-by: Greg Kroah-Hartman +--- + fs/smb/client/smb2ops.c | 26 ++++++++++++++++---------- + 1 file changed, 16 insertions(+), 10 deletions(-) + +--- a/fs/smb/client/smb2ops.c ++++ b/fs/smb/client/smb2ops.c +@@ -3001,7 +3001,7 @@ static int smb2_query_reparse_point(cons + struct kvec *rsp_iov; + struct smb2_ioctl_rsp *ioctl_rsp; + struct reparse_data_buffer *reparse_buf; +- u32 plen; ++ u32 off, count, len; + + cifs_dbg(FYI, "%s: path: %s\n", __func__, full_path); + +@@ -3082,16 +3082,22 @@ static int smb2_query_reparse_point(cons + */ + if (rc == 0) { + /* See MS-FSCC 2.3.23 */ ++ off = le32_to_cpu(ioctl_rsp->OutputOffset); ++ count = le32_to_cpu(ioctl_rsp->OutputCount); ++ if (check_add_overflow(off, count, &len) || ++ len > rsp_iov[1].iov_len) { ++ cifs_tcon_dbg(VFS, "%s: invalid ioctl: off=%d count=%d\n", ++ __func__, off, count); ++ rc = -EIO; ++ goto query_rp_exit; ++ } + +- reparse_buf = (struct reparse_data_buffer *) +- ((char *)ioctl_rsp + +- le32_to_cpu(ioctl_rsp->OutputOffset)); +- plen = le32_to_cpu(ioctl_rsp->OutputCount); +- +- if (plen + le32_to_cpu(ioctl_rsp->OutputOffset) > +- rsp_iov[1].iov_len) { +- cifs_tcon_dbg(FYI, "srv returned invalid ioctl len: %d\n", +- plen); ++ reparse_buf = (void *)((u8 *)ioctl_rsp + off); ++ len = sizeof(*reparse_buf); ++ if (count < len || ++ count < le16_to_cpu(reparse_buf->ReparseDataLength) + len) { ++ cifs_tcon_dbg(VFS, "%s: invalid ioctl: off=%d count=%d\n", ++ __func__, off, count); + rc = -EIO; + goto query_rp_exit; + } diff --git a/queue-6.6/smb-client-fix-potential-oobs-in-smb2_parse_contexts.patch b/queue-6.6/smb-client-fix-potential-oobs-in-smb2_parse_contexts.patch new file mode 100644 index 00000000000..41edd75f1b7 --- /dev/null +++ b/queue-6.6/smb-client-fix-potential-oobs-in-smb2_parse_contexts.patch @@ -0,0 +1,257 @@ +From af1689a9b7701d9907dfc84d2a4b57c4bc907144 Mon Sep 17 00:00:00 2001 +From: Paulo Alcantara +Date: Mon, 11 Dec 2023 10:26:41 -0300 +Subject: smb: client: fix potential OOBs in smb2_parse_contexts() + +From: Paulo Alcantara + +commit af1689a9b7701d9907dfc84d2a4b57c4bc907144 upstream. + +Validate offsets and lengths before dereferencing create contexts in +smb2_parse_contexts(). + +This fixes following oops when accessing invalid create contexts from +server: + + BUG: unable to handle page fault for address: ffff8881178d8cc3 + #PF: supervisor read access in kernel mode + #PF: error_code(0x0000) - not-present page + PGD 4a01067 P4D 4a01067 PUD 0 + Oops: 0000 [#1] PREEMPT SMP NOPTI + CPU: 3 PID: 1736 Comm: mount.cifs Not tainted 6.7.0-rc4 #1 + Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS + rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 + RIP: 0010:smb2_parse_contexts+0xa0/0x3a0 [cifs] + Code: f8 10 75 13 48 b8 93 ad 25 50 9c b4 11 e7 49 39 06 0f 84 d2 00 + 00 00 8b 45 00 85 c0 74 61 41 29 c5 48 01 c5 41 83 fd 0f 76 55 <0f> b7 + 7d 04 0f b7 45 06 4c 8d 74 3d 00 66 83 f8 04 75 bc ba 04 00 + RSP: 0018:ffffc900007939e0 EFLAGS: 00010216 + RAX: ffffc90000793c78 RBX: ffff8880180cc000 RCX: ffffc90000793c90 + RDX: ffffc90000793cc0 RSI: ffff8880178d8cc0 RDI: ffff8880180cc000 + RBP: ffff8881178d8cbf R08: ffffc90000793c22 R09: 0000000000000000 + R10: ffff8880180cc000 R11: 0000000000000024 R12: 0000000000000000 + R13: 0000000000000020 R14: 0000000000000000 R15: ffffc90000793c22 + FS: 00007f873753cbc0(0000) GS:ffff88806bc00000(0000) + knlGS:0000000000000000 + CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 + CR2: ffff8881178d8cc3 CR3: 00000000181ca000 CR4: 0000000000750ef0 + PKRU: 55555554 + Call Trace: + + ? __die+0x23/0x70 + ? page_fault_oops+0x181/0x480 + ? search_module_extables+0x19/0x60 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? exc_page_fault+0x1b6/0x1c0 + ? asm_exc_page_fault+0x26/0x30 + ? smb2_parse_contexts+0xa0/0x3a0 [cifs] + SMB2_open+0x38d/0x5f0 [cifs] + ? smb2_is_path_accessible+0x138/0x260 [cifs] + smb2_is_path_accessible+0x138/0x260 [cifs] + cifs_is_path_remote+0x8d/0x230 [cifs] + cifs_mount+0x7e/0x350 [cifs] + cifs_smb3_do_mount+0x128/0x780 [cifs] + smb3_get_tree+0xd9/0x290 [cifs] + vfs_get_tree+0x2c/0x100 + ? capable+0x37/0x70 + path_mount+0x2d7/0xb80 + ? srso_alias_return_thunk+0x5/0xfbef5 + ? _raw_spin_unlock_irqrestore+0x44/0x60 + __x64_sys_mount+0x11a/0x150 + do_syscall_64+0x47/0xf0 + entry_SYSCALL_64_after_hwframe+0x6f/0x77 + RIP: 0033:0x7f8737657b1e + +Reported-by: Robert Morris +Cc: stable@vger.kernel.org +Signed-off-by: Paulo Alcantara (SUSE) +Signed-off-by: Steve French +Signed-off-by: Greg Kroah-Hartman +--- + fs/smb/client/cached_dir.c | 17 +++++--- + fs/smb/client/smb2pdu.c | 91 +++++++++++++++++++++++++++------------------ + fs/smb/client/smb2proto.h | 12 +++-- + 3 files changed, 74 insertions(+), 46 deletions(-) + +--- a/fs/smb/client/cached_dir.c ++++ b/fs/smb/client/cached_dir.c +@@ -291,16 +291,23 @@ int open_cached_dir(unsigned int xid, st + oparms.fid->mid = le64_to_cpu(o_rsp->hdr.MessageId); + #endif /* CIFS_DEBUG2 */ + +- rc = -EINVAL; ++ + if (o_rsp->OplockLevel != SMB2_OPLOCK_LEVEL_LEASE) { + spin_unlock(&cfids->cfid_list_lock); ++ rc = -EINVAL; + goto oshr_free; + } + +- smb2_parse_contexts(server, o_rsp, +- &oparms.fid->epoch, +- oparms.fid->lease_key, &oplock, +- NULL, NULL); ++ rc = smb2_parse_contexts(server, rsp_iov, ++ &oparms.fid->epoch, ++ oparms.fid->lease_key, ++ &oplock, NULL, NULL); ++ if (rc) { ++ spin_unlock(&cfids->cfid_list_lock); ++ goto oshr_free; ++ } ++ ++ rc = -EINVAL; + if (!(oplock & SMB2_LEASE_READ_CACHING_HE)) { + spin_unlock(&cfids->cfid_list_lock); + goto oshr_free; +--- a/fs/smb/client/smb2pdu.c ++++ b/fs/smb/client/smb2pdu.c +@@ -2141,17 +2141,18 @@ parse_posix_ctxt(struct create_context * + posix->nlink, posix->mode, posix->reparse_tag); + } + +-void +-smb2_parse_contexts(struct TCP_Server_Info *server, +- struct smb2_create_rsp *rsp, +- unsigned int *epoch, char *lease_key, __u8 *oplock, +- struct smb2_file_all_info *buf, +- struct create_posix_rsp *posix) ++int smb2_parse_contexts(struct TCP_Server_Info *server, ++ struct kvec *rsp_iov, ++ unsigned int *epoch, ++ char *lease_key, __u8 *oplock, ++ struct smb2_file_all_info *buf, ++ struct create_posix_rsp *posix) + { +- char *data_offset; ++ struct smb2_create_rsp *rsp = rsp_iov->iov_base; + struct create_context *cc; +- unsigned int next; +- unsigned int remaining; ++ size_t rem, off, len; ++ size_t doff, dlen; ++ size_t noff, nlen; + char *name; + static const char smb3_create_tag_posix[] = { + 0x93, 0xAD, 0x25, 0x50, 0x9C, +@@ -2160,45 +2161,63 @@ smb2_parse_contexts(struct TCP_Server_In + }; + + *oplock = 0; +- data_offset = (char *)rsp + le32_to_cpu(rsp->CreateContextsOffset); +- remaining = le32_to_cpu(rsp->CreateContextsLength); +- cc = (struct create_context *)data_offset; ++ ++ off = le32_to_cpu(rsp->CreateContextsOffset); ++ rem = le32_to_cpu(rsp->CreateContextsLength); ++ if (check_add_overflow(off, rem, &len) || len > rsp_iov->iov_len) ++ return -EINVAL; ++ cc = (struct create_context *)((u8 *)rsp + off); + + /* Initialize inode number to 0 in case no valid data in qfid context */ + if (buf) + buf->IndexNumber = 0; + +- while (remaining >= sizeof(struct create_context)) { +- name = le16_to_cpu(cc->NameOffset) + (char *)cc; +- if (le16_to_cpu(cc->NameLength) == 4 && +- strncmp(name, SMB2_CREATE_REQUEST_LEASE, 4) == 0) +- *oplock = server->ops->parse_lease_buf(cc, epoch, +- lease_key); +- else if (buf && (le16_to_cpu(cc->NameLength) == 4) && +- strncmp(name, SMB2_CREATE_QUERY_ON_DISK_ID, 4) == 0) +- parse_query_id_ctxt(cc, buf); +- else if ((le16_to_cpu(cc->NameLength) == 16)) { +- if (posix && +- memcmp(name, smb3_create_tag_posix, 16) == 0) ++ while (rem >= sizeof(*cc)) { ++ doff = le16_to_cpu(cc->DataOffset); ++ dlen = le32_to_cpu(cc->DataLength); ++ if (check_add_overflow(doff, dlen, &len) || len > rem) ++ return -EINVAL; ++ ++ noff = le16_to_cpu(cc->NameOffset); ++ nlen = le16_to_cpu(cc->NameLength); ++ if (noff + nlen >= doff) ++ return -EINVAL; ++ ++ name = (char *)cc + noff; ++ switch (nlen) { ++ case 4: ++ if (!strncmp(name, SMB2_CREATE_REQUEST_LEASE, 4)) { ++ *oplock = server->ops->parse_lease_buf(cc, epoch, ++ lease_key); ++ } else if (buf && ++ !strncmp(name, SMB2_CREATE_QUERY_ON_DISK_ID, 4)) { ++ parse_query_id_ctxt(cc, buf); ++ } ++ break; ++ case 16: ++ if (posix && !memcmp(name, smb3_create_tag_posix, 16)) + parse_posix_ctxt(cc, buf, posix); ++ break; ++ default: ++ cifs_dbg(FYI, "%s: unhandled context (nlen=%zu dlen=%zu)\n", ++ __func__, nlen, dlen); ++ if (IS_ENABLED(CONFIG_CIFS_DEBUG2)) ++ cifs_dump_mem("context data: ", cc, dlen); ++ break; + } +- /* else { +- cifs_dbg(FYI, "Context not matched with len %d\n", +- le16_to_cpu(cc->NameLength)); +- cifs_dump_mem("Cctxt name: ", name, 4); +- } */ + +- next = le32_to_cpu(cc->Next); +- if (!next) ++ off = le32_to_cpu(cc->Next); ++ if (!off) + break; +- remaining -= next; +- cc = (struct create_context *)((char *)cc + next); ++ if (check_sub_overflow(rem, off, &rem)) ++ return -EINVAL; ++ cc = (struct create_context *)((u8 *)cc + off); + } + + if (rsp->OplockLevel != SMB2_OPLOCK_LEVEL_LEASE) + *oplock = rsp->OplockLevel; + +- return; ++ return 0; + } + + static int +@@ -3029,8 +3048,8 @@ SMB2_open(const unsigned int xid, struct + } + + +- smb2_parse_contexts(server, rsp, &oparms->fid->epoch, +- oparms->fid->lease_key, oplock, buf, posix); ++ rc = smb2_parse_contexts(server, &rsp_iov, &oparms->fid->epoch, ++ oparms->fid->lease_key, oplock, buf, posix); + creat_exit: + SMB2_open_free(&rqst); + free_rsp_buf(resp_buftype, rsp); +--- a/fs/smb/client/smb2proto.h ++++ b/fs/smb/client/smb2proto.h +@@ -251,11 +251,13 @@ extern int smb3_validate_negotiate(const + + extern enum securityEnum smb2_select_sectype(struct TCP_Server_Info *, + enum securityEnum); +-extern void smb2_parse_contexts(struct TCP_Server_Info *server, +- struct smb2_create_rsp *rsp, +- unsigned int *epoch, char *lease_key, +- __u8 *oplock, struct smb2_file_all_info *buf, +- struct create_posix_rsp *posix); ++int smb2_parse_contexts(struct TCP_Server_Info *server, ++ struct kvec *rsp_iov, ++ unsigned int *epoch, ++ char *lease_key, __u8 *oplock, ++ struct smb2_file_all_info *buf, ++ struct create_posix_rsp *posix); ++ + extern int smb3_encryption_required(const struct cifs_tcon *tcon); + extern int smb2_validate_iov(unsigned int offset, unsigned int buffer_length, + struct kvec *iov, unsigned int min_buf_size); diff --git a/queue-6.6/team-fix-use-after-free-when-an-option-instance-allocation-fails.patch b/queue-6.6/team-fix-use-after-free-when-an-option-instance-allocation-fails.patch new file mode 100644 index 00000000000..91ee12fe3ec --- /dev/null +++ b/queue-6.6/team-fix-use-after-free-when-an-option-instance-allocation-fails.patch @@ -0,0 +1,51 @@ +From c12296bbecc488623b7d1932080e394d08f3226b Mon Sep 17 00:00:00 2001 +From: Florent Revest +Date: Wed, 6 Dec 2023 13:37:18 +0100 +Subject: team: Fix use-after-free when an option instance allocation fails + +From: Florent Revest + +commit c12296bbecc488623b7d1932080e394d08f3226b upstream. + +In __team_options_register, team_options are allocated and appended to +the team's option_list. +If one option instance allocation fails, the "inst_rollback" cleanup +path frees the previously allocated options but doesn't remove them from +the team's option_list. +This leaves dangling pointers that can be dereferenced later by other +parts of the team driver that iterate over options. + +This patch fixes the cleanup path to remove the dangling pointers from +the list. + +As far as I can tell, this uaf doesn't have much security implications +since it would be fairly hard to exploit (an attacker would need to make +the allocation of that specific small object fail) but it's still nice +to fix. + +Cc: stable@vger.kernel.org +Fixes: 80f7c6683fe0 ("team: add support for per-port options") +Signed-off-by: Florent Revest +Reviewed-by: Jiri Pirko +Reviewed-by: Hangbin Liu +Link: https://lore.kernel.org/r/20231206123719.1963153-1-revest@chromium.org +Signed-off-by: Jakub Kicinski +Signed-off-by: Greg Kroah-Hartman +--- + drivers/net/team/team.c | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +--- a/drivers/net/team/team.c ++++ b/drivers/net/team/team.c +@@ -281,8 +281,10 @@ static int __team_options_register(struc + return 0; + + inst_rollback: +- for (i--; i >= 0; i--) ++ for (i--; i >= 0; i--) { + __team_option_inst_del_option(team, dst_opts[i]); ++ list_del(&dst_opts[i]->list); ++ } + + i = option_count; + alloc_rollback: