]> git.ipfire.org Git - thirdparty/kernel/linux.git/log
thirdparty/kernel/linux.git
3 months agof2fs: Add f2fs_grab_cache_folio()
Matthew Wilcox (Oracle) [Tue, 18 Feb 2025 05:51:44 +0000 (05:51 +0000)] 
f2fs: Add f2fs_grab_cache_folio()

Convert f2fs_grab_cache_page() into f2fs_grab_cache_folio()
and add a wrapper.  Removes several calls to deprecated functions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: Return a folio from last_fsync_dnode()
Matthew Wilcox (Oracle) [Tue, 18 Feb 2025 05:51:43 +0000 (05:51 +0000)] 
f2fs: Return a folio from last_fsync_dnode()

Convert last_page to last_folio in f2fs_fsync_node_pages() and
use folio APIs where they exist.  Saves a few hidden calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: Convert last_fsync_dnode() to use a folio
Matthew Wilcox (Oracle) [Tue, 18 Feb 2025 05:51:42 +0000 (05:51 +0000)] 
f2fs: Convert last_fsync_dnode() to use a folio

Use the folio APIs where they exist.  Saves several hidden calls to
compound_head().  Also removes a reference to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: Convert f2fs_fsync_node_pages() to use a folio
Matthew Wilcox (Oracle) [Tue, 18 Feb 2025 05:51:41 +0000 (05:51 +0000)] 
f2fs: Convert f2fs_fsync_node_pages() to use a folio

Use the folio APIs where they exist.  Saves several hidden calls to
compound_head().  Also removes a reference to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: Pass a folio to flush_dirty_inode()
Matthew Wilcox (Oracle) [Tue, 18 Feb 2025 05:51:40 +0000 (05:51 +0000)] 
f2fs: Pass a folio to flush_dirty_inode()

Its one caller now has a folio; pass it in and do page conversions where
necessary inside flush_dirty_inode().  Saves two hidden calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: Convert f2fs_sync_node_pages() to use a folio
Matthew Wilcox (Oracle) [Tue, 18 Feb 2025 05:51:39 +0000 (05:51 +0000)] 
f2fs: Convert f2fs_sync_node_pages() to use a folio

Use the folio APIs where they exist.  Saves several hidden calls to
compound_head().  Also removes a reference to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: Convert f2fs_flush_inline_data() to use a folio
Matthew Wilcox (Oracle) [Tue, 18 Feb 2025 05:51:38 +0000 (05:51 +0000)] 
f2fs: Convert f2fs_flush_inline_data() to use a folio

Use the folio APIs where they exist.  Saves several hidden calls to
compound_head().  Also removes a reference to page->mapping.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: Add f2fs_folio_put()
Matthew Wilcox (Oracle) [Tue, 18 Feb 2025 05:51:37 +0000 (05:51 +0000)] 
f2fs: Add f2fs_folio_put()

Convert f2fs_put_page() to f2fs_folio_put() and add a wrapper.
Replaces three calls to compound_head() with one.

[Jaegeuk Kim: fix missing null pointer check in f2fs_put_page]
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agomm: Remove wait_for_stable_page()
Matthew Wilcox (Oracle) [Tue, 18 Feb 2025 05:51:36 +0000 (05:51 +0000)] 
mm: Remove wait_for_stable_page()

The last caller has been converted to call folio_wait_stable(), so
we can remove this wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: Add f2fs_folio_wait_writeback()
Matthew Wilcox (Oracle) [Tue, 18 Feb 2025 05:51:35 +0000 (05:51 +0000)] 
f2fs: Add f2fs_folio_wait_writeback()

Convert f2fs_wait_on_page_writeback() to f2fs_folio_wait_writeback()
and add a compatibiility wrapper.  Replaces five calls to
compound_head() with one.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: fix to avoid out-of-bounds access in f2fs_truncate_inode_blocks()
Chao Yu [Mon, 3 Mar 2025 03:47:38 +0000 (11:47 +0800)] 
f2fs: fix to avoid out-of-bounds access in f2fs_truncate_inode_blocks()

syzbot reports an UBSAN issue as below:

------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in fs/f2fs/node.h:381:10
index 18446744073709550692 is out of range for type '__le32[5]' (aka 'unsigned int[5]')
CPU: 0 UID: 0 PID: 5318 Comm: syz.0.0 Not tainted 6.14.0-rc3-syzkaller-00060-g6537cfb395f3 #0
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 ubsan_epilogue lib/ubsan.c:231 [inline]
 __ubsan_handle_out_of_bounds+0x121/0x150 lib/ubsan.c:429
 get_nid fs/f2fs/node.h:381 [inline]
 f2fs_truncate_inode_blocks+0xa5e/0xf60 fs/f2fs/node.c:1181
 f2fs_do_truncate_blocks+0x782/0x1030 fs/f2fs/file.c:808
 f2fs_truncate_blocks+0x10d/0x300 fs/f2fs/file.c:836
 f2fs_truncate+0x417/0x720 fs/f2fs/file.c:886
 f2fs_file_write_iter+0x1bdb/0x2550 fs/f2fs/file.c:5093
 aio_write+0x56b/0x7c0 fs/aio.c:1633
 io_submit_one+0x8a7/0x18a0 fs/aio.c:2052
 __do_sys_io_submit fs/aio.c:2111 [inline]
 __se_sys_io_submit+0x171/0x2e0 fs/aio.c:2081
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f238798cde9

index 18446744073709550692 (decimal, unsigned long long)
= 0xfffffffffffffc64 (hexadecimal, unsigned long long)
= -924 (decimal, long long)

In f2fs_truncate_inode_blocks(), UBSAN detects that get_nid() tries to
access .i_nid[-924], it means both offset[0] and level should zero.

The possible case should be in f2fs_do_truncate_blocks(), we try to
truncate inode size to zero, however, dn.ofs_in_node is zero and
dn.node_page is not an inode page, so it fails to truncate inode page,
and then pass zeroed free_from to f2fs_truncate_inode_blocks(), result
in this issue.

if (dn.ofs_in_node || IS_INODE(dn.node_page)) {
f2fs_truncate_data_blocks_range(&dn, count);
free_from += count;
}

I guess the reason why dn.node_page is not an inode page could be: there
are multiple nat entries share the same node block address, once the node
block address was reused, f2fs_get_node_page() may load a non-inode block.

Let's add a sanity check for such condition to avoid out-of-bounds access
issue.

Reported-by: syzbot+6653f10281a1badc749e@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/66fdcdf3.050a0220.40bef.0025.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: fix to call f2fs_recover_quota_end() correctly
Chao Yu [Mon, 3 Mar 2025 03:25:00 +0000 (11:25 +0800)] 
f2fs: fix to call f2fs_recover_quota_end() correctly

f2fs_recover_quota_begin() and f2fs_recover_quota_end() should be called
in pair, there is some cases we may skip calling f2fs_recover_quota_end(),
fix it.

Fixes: e1bb7d3d9cbf ("f2fs: fix to recover quota data correctly")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: fix potential deadloop in prepare_compress_overwrite()
Chao Yu [Mon, 3 Mar 2025 03:23:29 +0000 (11:23 +0800)] 
f2fs: fix potential deadloop in prepare_compress_overwrite()

Jan Prusakowski reported a kernel hang issue as below:

When running xfstests on linux-next kernel (6.14.0-rc3, 6.12) I
encountered a problem in generic/475 test where fsstress process
gets blocked in __f2fs_write_data_pages() and the test hangs.
The options I used are:

MKFS_OPTIONS  -- -O compression -O extra_attr -O project_quota -O quota /dev/vdc
MOUNT_OPTIONS -- -o acl,user_xattr -o discard,compress_extension=* /dev/vdc /vdc

INFO: task kworker/u8:0:11 blocked for more than 122 seconds.
      Not tainted 6.14.0-rc3-xfstests-lockdep #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:0    state:D stack:0     pid:11    tgid:11    ppid:2      task_flags:0x4208160 flags:0x00004000
Workqueue: writeback wb_workfn (flush-253:0)
Call Trace:
 <TASK>
 __schedule+0x309/0x8e0
 schedule+0x3a/0x100
 schedule_preempt_disabled+0x15/0x30
 __mutex_lock+0x59a/0xdb0
 __f2fs_write_data_pages+0x3ac/0x400
 do_writepages+0xe8/0x290
 __writeback_single_inode+0x5c/0x360
 writeback_sb_inodes+0x22f/0x570
 wb_writeback+0xb0/0x410
 wb_do_writeback+0x47/0x2f0
 wb_workfn+0x5a/0x1c0
 process_one_work+0x223/0x5b0
 worker_thread+0x1d5/0x3c0
 kthread+0xfd/0x230
 ret_from_fork+0x31/0x50
 ret_from_fork_asm+0x1a/0x30
 </TASK>

The root cause is: once generic/475 starts toload error table to dm
device, f2fs_prepare_compress_overwrite() will loop reading compressed
cluster pages due to IO error, meanwhile it has held .writepages lock,
it can block all other writeback tasks.

Let's fix this issue w/ below changes:
- add f2fs_handle_page_eio() in prepare_compress_overwrite() to
detect IO error.
- detect cp_error earler in f2fs_read_multi_pages().

Fixes: 4c8ff7095bef ("f2fs: support data compression")
Reported-by: Jan Prusakowski <jprusakowski@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: add check for deleted inode
Leo Stone [Thu, 27 Feb 2025 15:54:20 +0000 (23:54 +0800)] 
f2fs: add check for deleted inode

The syzbot reproducer mounts a f2fs image, then tries to unlink an
existing file. However, the unlinked file already has a link count of 0
when it is read for the first time in do_read_inode().

Add a check to sanity_check_inode() for i_nlink == 0.

[Chao Yu: rebase the code and fix orphan inode recovery issue]
Reported-by: syzbot+b01a36acd7007e273a83@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b01a36acd7007e273a83
Fixes: 39a53e0ce0df ("f2fs: add superblock and major in-memory structure")
Signed-off-by: Leo Stone <leocstone@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: fix the missing write pointer correction
Jaegeuk Kim [Thu, 27 Feb 2025 19:00:35 +0000 (19:00 +0000)] 
f2fs: fix the missing write pointer correction

If checkpoint was disabled, we missed to fix the write pointers.

Cc: <stable@vger.kernel.org>
Fixes: 1015035609e4 ("f2fs: fix changing cursegs if recovery fails on zoned device")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
3 months agof2fs: fix to set .discard_granularity correctly
Chao Yu [Mon, 24 Feb 2025 06:20:07 +0000 (14:20 +0800)] 
f2fs: fix to set .discard_granularity correctly

commit 4f993264fe29 ("f2fs: introduce discard_unit mount option") introduced
a bug, when we enable discard_unit=section option, it will set
.discard_granularity to BLKS_PER_SEC(), however discard granularity only
supports [1, 512], once section size is not equal to segment size, it will
cause issue_discard_thread() in DPOLICY_BG mode will not select discard entry
w/ any granularity to issue.

Fixes: 4f993264fe29 ("f2fs: introduce discard_unit mount option")
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Yohan Joung <yohan.joung@sk.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 months agof2fs: add a sysfs entry to reclaim POSIX_FADV_NOREUSE pages
Jaegeuk Kim [Fri, 31 Jan 2025 22:27:57 +0000 (22:27 +0000)] 
f2fs: add a sysfs entry to reclaim POSIX_FADV_NOREUSE pages

1. fadvise(fd1, POSIX_FADV_NOREUSE, {0,3});
2. fadvise(fd2, POSIX_FADV_NOREUSE, {1,2});
3. fadvise(fd3, POSIX_FADV_NOREUSE, {3,1});
4. echo 1024 > /sys/fs/f2fs/tuning/reclaim_caches_kb

This gives a way to reclaim file-backed pages by iterating all f2fs mounts until
reclaiming 1MB page cache ranges, registered by #1, #2, and #3.

5. cat /sys/fs/f2fs/tuning/reclaim_caches_kb
-> gives total number of registered file ranges.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 months agof2fs: keep POSIX_FADV_NOREUSE ranges
Jaegeuk Kim [Fri, 31 Jan 2025 22:27:56 +0000 (22:27 +0000)] 
f2fs: keep POSIX_FADV_NOREUSE ranges

This patch records POSIX_FADV_NOREUSE ranges for users to reclaim the caches
instantly off from LRU.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 months agof2fs: fix to avoid panic once fallocation fails for pinfile
Chao Yu [Tue, 11 Feb 2025 06:36:57 +0000 (14:36 +0800)] 
f2fs: fix to avoid panic once fallocation fails for pinfile

syzbot reports a f2fs bug as below:

------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:2746!
CPU: 0 UID: 0 PID: 5323 Comm: syz.0.0 Not tainted 6.13.0-rc2-syzkaller-00018-g7cb1b4663150 #0
RIP: 0010:get_new_segment fs/f2fs/segment.c:2746 [inline]
RIP: 0010:new_curseg+0x1f52/0x1f70 fs/f2fs/segment.c:2876
Call Trace:
 <TASK>
 __allocate_new_segment+0x1ce/0x940 fs/f2fs/segment.c:3210
 f2fs_allocate_new_section fs/f2fs/segment.c:3224 [inline]
 f2fs_allocate_pinning_section+0xfa/0x4e0 fs/f2fs/segment.c:3238
 f2fs_expand_inode_data+0x696/0xca0 fs/f2fs/file.c:1830
 f2fs_fallocate+0x537/0xa10 fs/f2fs/file.c:1940
 vfs_fallocate+0x569/0x6e0 fs/open.c:327
 do_vfs_ioctl+0x258c/0x2e40 fs/ioctl.c:885
 __do_sys_ioctl fs/ioctl.c:904 [inline]
 __se_sys_ioctl+0x80/0x170 fs/ioctl.c:892
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Concurrent pinfile allocation may run out of free section, result in
panic in get_new_segment(), let's expand pin_sem lock coverage to
include f2fs_gc(), so that we can make sure to reclaim enough free
space for following allocation.

In addition, do below changes to enhance error path handling:
- call f2fs_bug_on() only in non-pinfile allocation path in
get_new_segment().
- call reset_curseg_fields() to reset all fields of curseg in
new_curseg()

Fixes: f5a53edcf01e ("f2fs: support aligned pinned file")
Reported-by: syzbot+15669ec8c35ddf6c3d43@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/675cd64e.050a0220.37aaf.00bb.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 months agof2fs: add ioctl to get IO priority hint
Jaegeuk Kim [Tue, 4 Feb 2025 18:06:35 +0000 (10:06 -0800)] 
f2fs: add ioctl to get IO priority hint

This patch adds an ioctl to give a per-file priority hint to attach
REQ_PRIO.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 months agof2fs: add dump_stack() in f2fs_handle_critical_error()
Chao Yu [Wed, 12 Feb 2025 01:54:13 +0000 (09:54 +0800)] 
f2fs: add dump_stack() in f2fs_handle_critical_error()

To show call stack, so that we can see who causes critical error, note
that it won't call dump_stack() for shutdown path.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 months agof2fs: don't retry IO for corrupted data scenario
Chao Yu [Mon, 10 Feb 2025 07:36:32 +0000 (15:36 +0800)] 
f2fs: don't retry IO for corrupted data scenario

F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]

If node block is loaded successfully, but its content is inconsistent, it
doesn't need to retry IO.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 months agof2fs: fix to return SHRINK_EMPTY if no objects to free
Zhiguo Niu [Mon, 10 Feb 2025 01:24:09 +0000 (09:24 +0800)] 
f2fs: fix to return SHRINK_EMPTY if no objects to free

Quoted from include/linux/shrinker.h
"count_objects should return the number of freeable items in the cache. If
 there are no objects to free, it should return SHRINK_EMPTY, while 0 is
 returned in cases of the number of freeable items cannot be determined
 or shrinker should skip this cache for this time (e.g., their number
 is below shrinkable limit)."

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 months agof2fs: quota: fix to avoid warning in dquot_writeback_dquots()
Chao Yu [Sat, 8 Feb 2025 02:33:21 +0000 (10:33 +0800)] 
f2fs: quota: fix to avoid warning in dquot_writeback_dquots()

F2FS-fs (dm-59): checkpoint=enable has some unwritten data.

------------[ cut here ]------------
WARNING: CPU: 6 PID: 8013 at fs/quota/dquot.c:691 dquot_writeback_dquots+0x2fc/0x308
pc : dquot_writeback_dquots+0x2fc/0x308
lr : f2fs_quota_sync+0xcc/0x1c4
Call trace:
dquot_writeback_dquots+0x2fc/0x308
f2fs_quota_sync+0xcc/0x1c4
f2fs_write_checkpoint+0x3d4/0x9b0
f2fs_issue_checkpoint+0x1bc/0x2c0
f2fs_sync_fs+0x54/0x150
f2fs_do_sync_file+0x2f8/0x814
__f2fs_ioctl+0x1960/0x3244
f2fs_ioctl+0x54/0xe0
__arm64_sys_ioctl+0xa8/0xe4
invoke_syscall+0x58/0x114

checkpoint and f2fs_remount may race as below, resulting triggering warning
in dquot_writeback_dquots().

atomic write                                    remount
                                                - do_remount
                                                 - down_write(&sb->s_umount);
                                                  - f2fs_remount
- ioctl
 - f2fs_do_sync_file
  - f2fs_sync_fs
   - f2fs_write_checkpoint
    - block_operations
     - locked = down_read_trylock(&sbi->sb->s_umount)
       : fail to lock due to the write lock was held by remount
                                                 - up_write(&sb->s_umount);
     - f2fs_quota_sync
      - dquot_writeback_dquots
       - WARN_ON_ONCE(!rwsem_is_locked(&sb->s_umount))
       : trigger warning because s_umount lock was unlocked by remount

If checkpoint comes from mount/umount/remount/freeze/quotactl, caller of
checkpoint has already held s_umount lock, calling dquot_writeback_dquots()
in the context should be safe.

So let's record task to sbi->umount_lock_holder, so that checkpoint can
know whether the lock has held in the context or not by checking current
w/ it.

In addition, in order to not misrepresent caller of checkpoint, we should
not allow to trigger async checkpoint for those callers: mount/umount/remount/
freeze/quotactl.

Fixes: af033b2aa8a8 ("f2fs: guarantee journalled quota data by checkpoint")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 months agof2fs: remove unnecessary null checking
Kohei Enju [Sun, 2 Feb 2025 04:32:53 +0000 (13:32 +0900)] 
f2fs: remove unnecessary null checking

When __GFP_DIRECT_RECLAIM (included in both GFP_NOIO and GFP_KERNEL) is
specified, bio_alloc_bioset() never fails to allocate a bio.
Commit 67883ade7a98 ("f2fs: remove FAULT_ALLOC_BIO") replaced
f2fs_bio_alloc() with bio_alloc_bioset(), but null checking after
bio_alloc_bioset() was still left.

Fixes: 67883ade7a98 ("f2fs: remove FAULT_ALLOC_BIO")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 months agof2fs: introduce f2fs_base_attr for global sysfs entries
Jaegeuk Kim [Thu, 30 Jan 2025 05:06:07 +0000 (05:06 +0000)] 
f2fs: introduce f2fs_base_attr for global sysfs entries

In /sys/fs/f2fs/features, there's no f2fs_sb_info, so let's avoid to get
the pointer.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
4 months agoMerge tag 'timers-urgent-2025-02-03' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 3 Feb 2025 17:10:56 +0000 (09:10 -0800)] 
Merge tag 'timers-urgent-2025-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:

 - Properly cast the input to secs_to_jiffies() to unsigned long as
   otherwise the result uses the data type of the input variable, which
   causes result range checks to fail if the input data type is signed
   and smaller than unsigned long.

 - Handle late armed hrtimers gracefully on CPU hotplug

   There are legitimate cases where a hrtimer is (re)armed on an
   outgoing CPU after the timers have been migrated away. This triggers
   warnings and caused people to implement horrible workarounds in RCU.
   But those workarounds are incomplete and do not cover e.g. the
   scheduler hrtimers.

   Stop this by force moving timer which are enqueued on the current CPU
   after timer migration to be queued on a remote online CPU.

   This allows to undo the workarounds in a seperate step.

 - Demote a warning level printk() to info level in the clocksource
   watchdog code as there is no point to emit a warning level message
   for a purely informational message.

 - Mark a helper function __always_inline and move it into the existing
   #ifdef block to avoid 'unused function' warnings from CLANG

* tag 'timers-urgent-2025-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  jiffies: Cast to unsigned long in secs_to_jiffies() conversion
  clocksource: Use pr_info() for "Checking clocksource synchronization" message
  hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING
  hrtimers: Mark is_migration_base() with __always_inline

4 months agoMerge tag 'irq-urgent-2025-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 3 Feb 2025 17:04:21 +0000 (09:04 -0800)] 
Merge tag 'irq-urgent-2025-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:

 - Ensure ordering of memory and device I/O for IPIs on RISCV

   The RISCV interrupt controllers use writel_relaxed() for generating
   an IPI. That's a device I/O write which is not guaranteed to be
   ordered against preceding memory writes. As a consequence a IPI
   receiving CPU might not be able to observe the actual IPI data which
   is required to handle it. Switch to writel() which contains the
   necessary memory barriers to enforce ordering.

 - Fix up the fallout of the MSI conversion in the MVEVBU ICU driver.

   The conversion failed to handle the change of the data storage and
   kept the original code which uses the domain::host_data pointer
   unchanged. After the conversion domain::host_data points to the new
   msi_domain_info structure and not longer to the MVEBU specific MSI
   data, which is now stored in a member of msi_domain_info. This leads
   to malfunction of the transalate() callback.

 - Only handle the PMC in FIQ mode when it is configured that way.

   The original check was incorrect as it did not explicitely check for
   the proper conditions, which led to malfunctions of the PMU
   interrupt.

 - Improve Kconfig dependencies for the LAN966x Outband Interrupt
   controller to avoid pointless pronmpts.

* tag 'irq-urgent-2025-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/apple-aic: Only handle PMC interrupt as FIQ when configured so
  irqchip/irq-mvebu-icu: Fix access to msi_data from irq_domain::host_data
  irqchip/riscv: Ensure ordering of memory writes and IPI writes
  irqchip/lan966x-oic: Make CONFIG_LAN966X_OIC depend on CONFIG_MCHP_LAN966X_PCI
  dt-bindings: interrupt-controller: microchip,lan966x-oic: Clarify endpoint use

4 months agoMerge tag 'xfs-fixes-6.14-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Mon, 3 Feb 2025 16:51:24 +0000 (08:51 -0800)] 
Merge tag 'xfs-fixes-6.14-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs bug fixes from Carlos Maiolino:
 "A few fixes for XFS, but the most notable one is:

   - xfs: remove xfs_buf_cache.bc_lock

  which has been hit by different persons including syzbot"

* tag 'xfs-fixes-6.14-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: remove xfs_buf_cache.bc_lock
  xfs: Add error handling for xfs_reflink_cancel_cow_range
  xfs: Propagate errors from xfs_reflink_cancel_cow_range in xfs_dax_write_iomap_end
  xfs: don't call remap_verify_area with sb write protection held
  xfs: remove an out of data comment in _xfs_buf_alloc
  xfs: fix the entry condition of exact EOF block allocation optimization

4 months agoLinux 6.14-rc1 v6.14-rc1
Linus Torvalds [Sun, 2 Feb 2025 23:39:26 +0000 (15:39 -0800)] 
Linux 6.14-rc1

4 months agoMerge tag 'turbostat-2025.02.02' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 2 Feb 2025 18:49:13 +0000 (10:49 -0800)] 
Merge tag 'turbostat-2025.02.02' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:

 - Fix regression that affinitized forked child in one-shot mode.

 - Harden one-shot mode against hotplug online/offline

 - Enable RAPL SysWatt column by default

 - Add initial PTL, CWF platform support

 - Harden initial PMT code in response to early use

 - Enable first built-in PMT counter: CWF c1e residency

 - Refuse to run on unsupported platforms without --force, to encourage
   updating to a version that supports the system, and to avoid
   no-so-useful measurement results

* tag 'turbostat-2025.02.02' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (25 commits)
  tools/power turbostat: version 2025.02.02
  tools/power turbostat: Add CPU%c1e BIC for CWF
  tools/power turbostat: Harden one-shot mode against cpu offline
  tools/power turbostat: Fix forked child affinity regression
  tools/power turbostat: Add tcore clock PMT type
  tools/power turbostat: version 2025.01.14
  tools/power turbostat: Allow adding PMT counters directly by sysfs path
  tools/power turbostat: Allow mapping multiple PMT files with the same GUID
  tools/power turbostat: Add PMT directory iterator helper
  tools/power turbostat: Extend PMT identification with a sequence number
  tools/power turbostat: Return default value for unmapped PMT domains
  tools/power turbostat: Check for non-zero value when MSR probing
  tools/power turbostat: Enhance turbostat self-performance visibility
  tools/power turbostat: Add fixed RAPL PSYS divisor for SPR
  tools/power turbostat: Fix PMT mmaped file size rounding
  tools/power turbostat: Remove SysWatt from DISABLED_BY_DEFAULT
  tools/power turbostat: Add an NMI column
  tools/power turbostat: add Busy% to "show idle"
  tools/power turbostat: Introduce --force parameter
  tools/power turbostat: Improve --help output
  ...

4 months agoMerge tag 'sh-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubi...
Linus Torvalds [Sun, 2 Feb 2025 18:40:27 +0000 (10:40 -0800)] 
Merge tag 'sh-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux

Pull sh updates from John Paul Adrian Glaubitz:
 "Fixes and improvements for sh:

   - replace seq_printf() with the more efficient
     seq_put_decimal_ull_width() to increase performance when stress
     reading /proc/interrupts (David Wang)

   - migrate sh to the generic rule for built-in DTB to help avoid race
     conditions during parallel builds which can occur because Kbuild
     decends into arch/*/boot/dts twice (Masahiro Yamada)

   - replace select with imply in the board Kconfig for enabling
     hardware with complex dependencies. This addresses warnings which
     were reported by the kernel test robot (Geert Uytterhoeven)"

* tag 'sh-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
  sh: boards: Use imply to enable hardware with complex dependencies
  sh: Migrate to the generic rule for built-in DTB
  sh: irq: Use seq_put_decimal_ull_width() for decimal values

4 months agotools/power turbostat: version 2025.02.02
Len Brown [Sun, 2 Feb 2025 16:43:02 +0000 (10:43 -0600)] 
tools/power turbostat: version 2025.02.02

Summary of Changes since 2024.11.30:

Fix regression in 2023.11.07 that affinitized forked child
in one-shot mode.

Harden one-shot mode against hotplug online/offline

Enable RAPL SysWatt column by default.

Add initial PTL, CWF platform support.

Harden initial PMT code in response to early use.

Enable first built-in PMT counter: CWF c1e residency

Refuse to run on unsupported platforms without --force,
to encourage updating to a version that supports the system,
and to avoid no-so-useful measurement results.

Signed-off-by: Len Brown <len.brown@intel.com>
4 months agoMerge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sat, 1 Feb 2025 23:07:56 +0000 (15:07 -0800)] 
Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc vfs cleanups from Al Viro:
 "Two unrelated patches - one is a removal of long-obsolete include in
  overlayfs (it used to need fs/internal.h, but the extern it wanted has
  been moved back to include/linux/namei.h) and another introduces
  convenience helper constructing struct qstr by a NUL-terminated
  string"

* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  add a string-to-qstr constructor
  fs/overlayfs/namei.c: get rid of include ../internal.h

4 months agoMerge tag 'mips_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Linus Torvalds [Sat, 1 Feb 2025 22:54:33 +0000 (14:54 -0800)] 
Merge tag 'mips_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fix from Thomas Bogendoerfer:
 "Revert commit breaking sysv ipc for o32 ABI"

* tag 'mips_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  Revert "mips: fix shmctl/semctl/msgctl syscall for o32"

4 months agoMerge tag 'v6.14-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 1 Feb 2025 19:30:41 +0000 (11:30 -0800)] 
Merge tag 'v6.14-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull more smb client updates from Steve French:

   - various updates for special file handling: symlink handling,
     support for creating sockets, cleanups, new mount options (e.g. to
     allow disabling using reparse points for them, and to allow
     overriding the way symlinks are saved), and fixes to error paths

   - fix for kerberos mounts (allow IAKerb)

   - SMB1 fix for stat and for setting SACL (auditing)

   - fix an incorrect error code mapping

   - cleanups"

* tag 'v6.14-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: (21 commits)
  cifs: Fix parsing native symlinks directory/file type
  cifs: update internal version number
  cifs: Add support for creating WSL-style symlinks
  smb3: add support for IAKerb
  cifs: Fix struct FILE_ALL_INFO
  cifs: Add support for creating NFS-style symlinks
  cifs: Add support for creating native Windows sockets
  cifs: Add mount option -o reparse=none
  cifs: Add mount option -o symlink= for choosing symlink create type
  cifs: Fix creating and resolving absolute NT-style symlinks
  cifs: Simplify reparse point check in cifs_query_path_info() function
  cifs: Remove symlink member from cifs_open_info_data union
  cifs: Update description about ACL permissions
  cifs: Rename struct reparse_posix_data to reparse_nfs_data_buffer and move to common/smb2pdu.h
  cifs: Remove struct reparse_posix_data from struct cifs_open_info_data
  cifs: Remove unicode parameter from parse_reparse_point() function
  cifs: Fix getting and setting SACLs over SMB1
  cifs: Remove intermediate object of failed create SFU call
  cifs: Validate EAs for WSL reparse points
  cifs: Change translation of STATUS_PRIVILEGE_NOT_HELD to -EPERM
  ...

4 months agoMerge tag 'driver-core-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 1 Feb 2025 18:04:29 +0000 (10:04 -0800)] 
Merge tag 'driver-core-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull debugfs fix from Greg KH:
 "Here is a single debugfs fix from Al to resolve a reported regression
  in the driver-core tree. It has been reported to fix the issue"

* tag 'driver-core-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  debugfs: Fix the missing initializations in __debugfs_file_get()

4 months agoMerge tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 1 Feb 2025 17:49:20 +0000 (09:49 -0800)] 
Merge tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "21 hotfixes. 8 are cc:stable and the remainder address post-6.13
  issues. 13 are for MM and 8 are for non-MM.

  All are singletons, please see the changelogs for details"

* tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
  MAINTAINERS: include linux-mm for xarray maintenance
  revert "xarray: port tests to kunit"
  MAINTAINERS: add lib/test_xarray.c
  mailmap, MAINTAINERS, docs: update Carlos's email address
  mm/hugetlb: fix hugepage allocation for interleaved memory nodes
  mm: gup: fix infinite loop within __get_longterm_locked
  mm, swap: fix reclaim offset calculation error during allocation
  .mailmap: update email address for Christopher Obbard
  kfence: skip __GFP_THISNODE allocations on NUMA systems
  nilfs2: fix possible int overflows in nilfs_fiemap()
  mm: compaction: use the proper flag to determine watermarks
  kernel: be more careful about dup_mmap() failures and uprobe registering
  mm/fake-numa: handle cases with no SRAT info
  mm: kmemleak: fix upper boundary check for physical address objects
  mailmap: add an entry for Hamza Mahfooz
  MAINTAINERS: mailmap: update Yosry Ahmed's email address
  scripts/gdb: fix aarch64 userspace detection in get_current_task
  mm/vmscan: accumulate nr_demoted for accurate demotion statistics
  ocfs2: fix incorrect CPU endianness conversion causing mount failure
  mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()
  ...

4 months agoMerge tag 'media/v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Sat, 1 Feb 2025 17:15:01 +0000 (09:15 -0800)] 
Merge tag 'media/v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fix from Mauro Carvalho Chehab:
 "A revert for a regression in the uvcvideo driver"

* tag 'media/v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  Revert "media: uvcvideo: Require entities to have a non-zero unique ID"

4 months agoMAINTAINERS: include linux-mm for xarray maintenance
Andrew Morton [Fri, 31 Jan 2025 00:16:20 +0000 (16:16 -0800)] 
MAINTAINERS: include linux-mm for xarray maintenance

MM developers have an interest in the xarray code.

Cc: David Gow <davidgow@google.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Tamir Duberstein <tamird@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agorevert "xarray: port tests to kunit"
Andrew Morton [Fri, 31 Jan 2025 00:09:20 +0000 (16:09 -0800)] 
revert "xarray: port tests to kunit"

Revert c7bb5cf9fc4e ("xarray: port tests to kunit").  It broke the build
when compiing the xarray userspace test harness code.

Reported-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Closes: https://lkml.kernel.org/r/07cf896e-adf8-414f-a629-a808fc26014a@oracle.com
Cc: David Gow <davidgow@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Tamir Duberstein <tamird@gmail.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agoMAINTAINERS: add lib/test_xarray.c
Tamir Duberstein [Wed, 29 Jan 2025 21:13:49 +0000 (16:13 -0500)] 
MAINTAINERS: add lib/test_xarray.c

Ensure test-only changes are sent to the relevant maintainer.

Link: https://lkml.kernel.org/r/20250129-xarray-test-maintainer-v1-1-482e31f30f47@gmail.com
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Cc: Mattew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomailmap, MAINTAINERS, docs: update Carlos's email address
Carlos Bilbao [Thu, 30 Jan 2025 01:22:44 +0000 (19:22 -0600)] 
mailmap, MAINTAINERS, docs: update Carlos's email address

Update .mailmap to reflect my new (and final) primary email address,
carlos.bilbao@kernel.org.  Also update contact information in files
Documentation/translations/sp_SP/index.rst and MAINTAINERS.

Link: https://lkml.kernel.org/r/20250130012248.1196208-1-carlos.bilbao@kernel.org
Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org>
Cc: Carlos Bilbao <bilbao@vt.edu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mattew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm/hugetlb: fix hugepage allocation for interleaved memory nodes
Ritesh Harjani (IBM) [Sat, 11 Jan 2025 11:06:55 +0000 (16:36 +0530)] 
mm/hugetlb: fix hugepage allocation for interleaved memory nodes

gather_bootmem_prealloc() assumes the start nid as 0 and size as
num_node_state(N_MEMORY).  That means in case if memory attached numa
nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail
to scan few of these nodes.

Since memory attached numa nodes can be interleaved in any fashion, hence
ensure that the current code checks for all numa node ids
(.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that
it can distributes all nr_node_ids among the these many no. threads.

e.g. qemu cmdline
========================
numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
mem_cmd="-object memory-backend-ram,id=mem1,size=16G"

w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
==========================
~ # cat /proc/meminfo  |grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:               0 kB

with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
===========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       2
HugePages_Free:        2
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:         2097152 kB

Link: https://lkml.kernel.org/r/f8d8dad3a5471d284f54185f65d575a6aaab692b.1736592534.git.ritesh.list@gmail.com
Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reported-by: Pavithra Prakash <pavrampu@linux.ibm.com>
Suggested-by: Muchun Song <muchun.song@linux.dev>
Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Reviewed-by: Luiz Capitulino <luizcap@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Gang Li <gang.li@linux.dev>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm: gup: fix infinite loop within __get_longterm_locked
Zhaoyang Huang [Tue, 21 Jan 2025 02:01:59 +0000 (10:01 +0800)] 
mm: gup: fix infinite loop within __get_longterm_locked

We can run into an infinite loop in __get_longterm_locked() when
collect_longterm_unpinnable_folios() finds only folios that are isolated
from the LRU or were never added to the LRU.  This can happen when all
folios to be pinned are never added to the LRU, for example when
vm_ops->fault allocated pages using cma_alloc() and never added them to
the LRU.

Fix it by simply taking a look at the list in the single caller, to see if
anything was added.

[zhaoyang.huang@unisoc.com: move definition of local]
Link: https://lkml.kernel.org/r/20250122012604.3654667-1-zhaoyang.huang@unisoc.com
Link: https://lkml.kernel.org/r/20250121020159.3636477-1-zhaoyang.huang@unisoc.com
Fixes: 67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()")
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Aijun Sun <aijun.sun@unisoc.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm, swap: fix reclaim offset calculation error during allocation
Kairui Song [Thu, 30 Jan 2025 11:51:31 +0000 (19:51 +0800)] 
mm, swap: fix reclaim offset calculation error during allocation

There is a code error that will cause the swap entry allocator to reclaim
and check the whole cluster with an unexpected tail offset instead of the
part that needs to be reclaimed.  This may cause corruption of the swap
map, so fix it.

Link: https://lkml.kernel.org/r/20250130115131.37777-1-ryncsn@gmail.com
Fixes: 3b644773eefd ("mm, swap: reduce contention on device lock")
Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Chris Li <chrisl@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months ago.mailmap: update email address for Christopher Obbard
Christopher Obbard [Wed, 22 Jan 2025 12:04:27 +0000 (12:04 +0000)] 
.mailmap: update email address for Christopher Obbard

Update my email address.

Link: https://lkml.kernel.org/r/20250122-wip-obbardc-update-email-v2-1-12bde6b79ad0@linaro.org
Signed-off-by: Christopher Obbard <christopher.obbard@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agokfence: skip __GFP_THISNODE allocations on NUMA systems
Marco Elver [Fri, 24 Jan 2025 12:01:38 +0000 (13:01 +0100)] 
kfence: skip __GFP_THISNODE allocations on NUMA systems

On NUMA systems, __GFP_THISNODE indicates that an allocation _must_ be on
a particular node, and failure to allocate on the desired node will result
in a failed allocation.

Skip __GFP_THISNODE allocations if we are running on a NUMA system, since
KFENCE can't guarantee which node its pool pages are allocated on.

Link: https://lkml.kernel.org/r/20250124120145.410066-1-elver@google.com
Fixes: 236e9f153852 ("kfence: skip all GFP_ZONEMASK allocations")
Signed-off-by: Marco Elver <elver@google.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Chistoph Lameter <cl@linux.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agonilfs2: fix possible int overflows in nilfs_fiemap()
Nikita Zhandarovich [Fri, 24 Jan 2025 22:20:53 +0000 (07:20 +0900)] 
nilfs2: fix possible int overflows in nilfs_fiemap()

Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result
by being prepared to go through potentially maxblocks == INT_MAX blocks,
the value in n may experience an overflow caused by left shift of blkbits.

While it is extremely unlikely to occur, play it safe and cast right hand
expression to wider type to mitigate the issue.

Found by Linux Verification Center (linuxtesting.org) with static analysis
tool SVACE.

Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com
Fixes: 622daaff0a89 ("nilfs2: fiemap support")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm: compaction: use the proper flag to determine watermarks
yangge [Sat, 25 Jan 2025 06:53:57 +0000 (14:53 +0800)] 
mm: compaction: use the proper flag to determine watermarks

There are 4 NUMA nodes on my machine, and each NUMA node has 32GB of
memory.  I have configured 16GB of CMA memory on each NUMA node, and
starting a 32GB virtual machine with device passthrough is extremely slow,
taking almost an hour.

Long term GUP cannot allocate memory from CMA area, so a maximum of 16 GB
of no-CMA memory on a NUMA node can be used as virtual machine memory.
There is 16GB of free CMA memory on a NUMA node, which is sufficient to
pass the order-0 watermark check, causing the __compaction_suitable()
function to consistently return true.

For costly allocations, if the __compaction_suitable() function always
returns true, it causes the __alloc_pages_slowpath() function to fail to
exit at the appropriate point.  This prevents timely fallback to
allocating memory on other nodes, ultimately resulting in excessively long
virtual machine startup times.

Call trace:
__alloc_pages_slowpath
    if (compact_result == COMPACT_SKIPPED ||
        compact_result == COMPACT_DEFERRED)
        goto nopage; // should exit __alloc_pages_slowpath() from here

We could use the real unmovable allocation context to have
__zone_watermark_unusable_free() subtract CMA pages, and thus we won't
pass the order-0 check anymore once the non-CMA part is exhausted.  There
is some risk that in some different scenario the compaction could in fact
migrate pages from the exhausted non-CMA part of the zone to the CMA part
and succeed, and we'll skip it instead.  But only __GFP_NORETRY
allocations should be affected in the immediate "goto nopage" when
compaction is skipped, others will attempt with DEF_COMPACT_PRIORITY
anyway and won't fail without trying to compact-migrate the non-CMA
pageblocks into CMA pageblocks first, so it should be fine.

After this fix, it only takes a few tens of seconds to start a 32GB
virtual machine with device passthrough functionality.

Link: https://lore.kernel.org/lkml/1736335854-548-1-git-send-email-yangge1116@126.com/
Link: https://lkml.kernel.org/r/1737788037-8439-1-git-send-email-yangge1116@126.com
Signed-off-by: yangge <yangge1116@126.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Barry Song <21cnbao@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agokernel: be more careful about dup_mmap() failures and uprobe registering
Liam R. Howlett [Mon, 27 Jan 2025 17:02:21 +0000 (12:02 -0500)] 
kernel: be more careful about dup_mmap() failures and uprobe registering

If a memory allocation fails during dup_mmap(), the maple tree can be left
in an unsafe state for other iterators besides the exit path.  All the
locks are dropped before the exit_mmap() call (in mm/mmap.c), but the
incomplete mm_struct can be reached through (at least) the rmap finding
the vmas which have a pointer back to the mm_struct.

Up to this point, there have been no issues with being able to find an
mm_struct that was only partially initialised.  Syzbot was able to make
the incomplete mm_struct fail with recent forking changes, so it has been
proven unsafe to use the mm_struct that hasn't been initialised, as
referenced in the link below.

Although 8ac662f5da19f ("fork: avoid inappropriate uprobe access to
invalid mm") fixed the uprobe access, it does not completely remove the
race.

This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on the
oom side (even though this is extremely unlikely to be selected as an oom
victim in the race window), and sets MMF_UNSTABLE to avoid other potential
users from using a partially initialised mm_struct.

When registering vmas for uprobe, skip the vmas in an mm that is marked
unstable.  Modifying a vma in an unstable mm may cause issues if the mm
isn't fully initialised.

Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/
Link: https://lkml.kernel.org/r/20250127170221.1761366-1-Liam.Howlett@oracle.com
Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm/fake-numa: handle cases with no SRAT info
Bruno Faccini [Mon, 27 Jan 2025 17:16:23 +0000 (09:16 -0800)] 
mm/fake-numa: handle cases with no SRAT info

Handle more gracefully cases where no SRAT information is available, like
in VMs with no Numa support, and allow fake-numa configuration to complete
successfully in these cases

Link: https://lkml.kernel.org/r/20250127171623.1523171-1-bfaccini@nvidia.com
Fixes: 63db8170bf34 (“mm/fake-numa: allow later numa node hotplug”)
Signed-off-by: Bruno Faccini <bfaccini@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hyeonggon Yoo <hyeonggon.yoo@sk.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm: kmemleak: fix upper boundary check for physical address objects
Catalin Marinas [Mon, 27 Jan 2025 18:42:33 +0000 (18:42 +0000)] 
mm: kmemleak: fix upper boundary check for physical address objects

Memblock allocations are registered by kmemleak separately, based on their
physical address.  During the scanning stage, it checks whether an object
is within the min_low_pfn and max_low_pfn boundaries and ignores it
otherwise.

With the recent addition of __percpu pointer leak detection (commit
6c99d4eb7c5e ("kmemleak: enable tracking for percpu pointers")), kmemleak
started reporting leaks in setup_zone_pageset() and
setup_per_cpu_pageset().  These were caused by the node_data[0] object
(initialised in alloc_node_data()) ending on the PFN_PHYS(max_low_pfn)
boundary.  The non-strict upper boundary check introduced by commit
84c326299191 ("mm: kmemleak: check physical address when scan") causes the
pg_data_t object to be ignored (not scanned) and the __percpu pointers it
contains to be reported as leaks.

Make the max_low_pfn upper boundary check strict when deciding whether to
ignore a physical address object and not scan it.

Link: https://lkml.kernel.org/r/20250127184233.2974311-1-catalin.marinas@arm.com
Fixes: 84c326299191 ("mm: kmemleak: check physical address when scan")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: Patrick Wang <patrick.wang.shcn@gmail.com>
Cc: <stable@vger.kernel.org> [6.0.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomailmap: add an entry for Hamza Mahfooz
Hamza Mahfooz [Mon, 20 Jan 2025 20:56:59 +0000 (15:56 -0500)] 
mailmap: add an entry for Hamza Mahfooz

Map my previous work email to my current one.

Link: https://lkml.kernel.org/r/20250120205659.139027-1-hamzamahfooz@linux.microsoft.com
Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hans verkuil <hverkuil@xs4all.nl>
Cc: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agoMAINTAINERS: mailmap: update Yosry Ahmed's email address
Yosry Ahmed [Thu, 23 Jan 2025 23:13:44 +0000 (23:13 +0000)] 
MAINTAINERS: mailmap: update Yosry Ahmed's email address

Moving to a linux.dev email address.

Link: https://lkml.kernel.org/r/20250123231344.817358-1-yosry.ahmed@linux.dev
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agoscripts/gdb: fix aarch64 userspace detection in get_current_task
Jan Kiszka [Fri, 10 Jan 2025 10:36:33 +0000 (11:36 +0100)] 
scripts/gdb: fix aarch64 userspace detection in get_current_task

At least recent gdb releases (seen with 14.2) return SP_EL0 as signed long
which lets the right-shift always return 0.

Link: https://lkml.kernel.org/r/dcd2fabc-9131-4b48-8419-6444e2d67454@siemens.com
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Kieran Bingham <kbingham@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm/vmscan: accumulate nr_demoted for accurate demotion statistics
Li Zhijian [Fri, 10 Jan 2025 12:21:32 +0000 (20:21 +0800)] 
mm/vmscan: accumulate nr_demoted for accurate demotion statistics

In shrink_folio_list(), demote_folio_list() can be called 2 times.
Currently stat->nr_demoted will only store the last nr_demoted( the later
nr_demoted is always zero, the former nr_demoted will get lost), as a
result number of demoted pages is not accurate.

Accumulate the nr_demoted count across multiple calls to
demote_folio_list(), ensuring accurate reporting of demotion statistics.

[lizhijian@fujitsu.com: introduce local nr_demoted to fix nr_reclaimed double counting]
Link: https://lkml.kernel.org/r/20250111015253.425693-1-lizhijian@fujitsu.com
Link: https://lkml.kernel.org/r/20250110122133.423481-1-lizhijian@fujitsu.com
Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Acked-by: Kaiyang Zhao <kaiyang2@cs.cmu.edu>
Tested-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Donet Tom <donettom@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agoocfs2: fix incorrect CPU endianness conversion causing mount failure
Heming Zhao [Tue, 21 Jan 2025 11:22:03 +0000 (19:22 +0800)] 
ocfs2: fix incorrect CPU endianness conversion causing mount failure

Commit 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()")
introduced a regression bug.  The blksz_bits value is already converted to
CPU endian in the previous code; therefore, the code shouldn't use
le32_to_cpu() anymore.

Link: https://lkml.kernel.org/r/20250121112204.12834-1-heming.zhao@suse.com
Fixes: 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()")
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()
Hyeonggon Yoo [Mon, 27 Jan 2025 23:16:31 +0000 (08:16 +0900)] 
mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()

Commit c1b3bb73d55e ("mm/zsmalloc: use zpdesc in
trylock_zspage()/lock_zspage()") introduces is_first_zpdesc() function.
However, the function is only used when CONFIG_DEBUG_VM=y.

When building with LLVM=1 and W=1 option, the following warning is
generated:
  $ make -j12 W=1 LLVM=1 mm/zsmalloc.o
  mm/zsmalloc.c:455:20: error: function 'is_first_zpdesc' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration]
    455 | static inline bool is_first_zpdesc(struct zpdesc *zpdesc)
        |                    ^~~~~~~~~~~~~~~
  1 error generated.

Fix the warning by adding __maybe_unused attribute to the function.
No functional change intended.

Link: https://lkml.kernel.org/r/20250127231631.4363-1-42.hyeyoo@gmail.com
Fixes: c1b3bb73d55e ("mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage()")
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501240958.4ILzuBrH-lkp@intel.com/
Cc: Alex Shi <alexs@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agomm/vmscan: fix hard LOCKUP in function isolate_lru_folios
liuye [Tue, 19 Nov 2024 06:08:42 +0000 (14:08 +0800)] 
mm/vmscan: fix hard LOCKUP in function isolate_lru_folios

This fixes the following hard lockup in isolate_lru_folios() during memory
reclaim.  If the LRU mostly contains ineligible folios this may trigger
watchdog.

watchdog: Watchdog detected hard LOCKUP on cpu 173
RIP: 0010:native_queued_spin_lock_slowpath+0x255/0x2a0
Call Trace:
_raw_spin_lock_irqsave+0x31/0x40
folio_lruvec_lock_irqsave+0x5f/0x90
folio_batch_move_lru+0x91/0x150
lru_add_drain_per_cpu+0x1c/0x40
process_one_work+0x17d/0x350
worker_thread+0x27b/0x3a0
kthread+0xe8/0x120
ret_from_fork+0x34/0x50
ret_from_fork_asm+0x1b/0x30

lruvec->lru_lock owner:

PID: 2865     TASK: ffff888139214d40  CPU: 40   COMMAND: "kswapd0"
 #0 [fffffe0000945e60] crash_nmi_callback at ffffffffa567a555
 #1 [fffffe0000945e68] nmi_handle at ffffffffa563b171
 #2 [fffffe0000945eb0] default_do_nmi at ffffffffa6575920
 #3 [fffffe0000945ed0] exc_nmi at ffffffffa6575af4
 #4 [fffffe0000945ef0] end_repeat_nmi at ffffffffa6601dde
    [exception RIP: isolate_lru_folios+403]
    RIP: ffffffffa597df53  RSP: ffffc90006fb7c28  RFLAGS: 00000002
    RAX: 0000000000000001  RBX: ffffc90006fb7c60  RCX: ffffea04a2196f88
    RDX: ffffc90006fb7c60  RSI: ffffc90006fb7c60  RDI: ffffea04a2197048
    RBP: ffff88812cbd3010   R8: ffffea04a2197008   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000001  R12: ffffea04a2197008
    R13: ffffea04a2197048  R14: ffffc90006fb7de8  R15: 0000000003e3e937
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    <NMI exception stack>
 #5 [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
 #6 [ffffc90006fb7cf8] shrink_active_list at ffffffffa597f788
 #7 [ffffc90006fb7da8] balance_pgdat at ffffffffa5986db0
 #8 [ffffc90006fb7ec0] kswapd at ffffffffa5987354
 #9 [ffffc90006fb7ef8] kthread at ffffffffa5748238
crash>

Scenario:
User processe are requesting a large amount of memory and keep page active.
Then a module continuously requests memory from ZONE_DMA32 area.
Memory reclaim will be triggered due to ZONE_DMA32 watermark alarm reached.
However pages in the LRU(active_anon) list are mostly from
the ZONE_NORMAL area.

Reproduce:
Terminal 1: Construct to continuously increase pages active(anon).
mkdir /tmp/memory
mount -t tmpfs -o size=1024000M tmpfs /tmp/memory
dd if=/dev/zero of=/tmp/memory/block bs=4M
tail /tmp/memory/block

Terminal 2:
vmstat -a 1
active will increase.
procs ---memory--- ---swap-- ---io---- -system-- ---cpu--- ...
 r  b   swpd   free  inact active   si   so    bi    bo
 1  0   0 1445623076 45898836 83646008    0    0     0
 1  0   0 1445623076 43450228 86094616    0    0     0
 1  0   0 1445623076 41003480 88541364    0    0     0
 1  0   0 1445623076 38557088 90987756    0    0     0
 1  0   0 1445623076 36109688 93435156    0    0     0
 1  0   0 1445619552 33663256 95881632    0    0     0
 1  0   0 1445619804 31217140 98327792    0    0     0
 1  0   0 1445619804 28769988 100774944    0    0     0
 1  0   0 1445619804 26322348 103222584    0    0     0
 1  0   0 1445619804 23875592 105669340    0    0     0

cat /proc/meminfo | head
Active(anon) increase.
MemTotal:       1579941036 kB
MemFree:        1445618500 kB
MemAvailable:   1453013224 kB
Buffers:            6516 kB
Cached:         128653956 kB
SwapCached:            0 kB
Active:         118110812 kB
Inactive:       11436620 kB
Active(anon):   115345744 kB
Inactive(anon):   945292 kB

When the Active(anon) is 115345744 kB, insmod module triggers
the ZONE_DMA32 watermark.

perf record -e vmscan:mm_vmscan_lru_isolate -aR
perf script
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=2
nr_skipped=2 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=28835844
nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=1 nr_requested=32 nr_scanned=28835844
nr_skipped=28835844 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=29
nr_skipped=29 nr_taken=0 lru=active_anon
isolate_mode=0 classzone=1 order=0 nr_requested=32 nr_scanned=0
nr_skipped=0 nr_taken=0 lru=active_anon

See nr_scanned=28835844.
28835844 * 4k = 115343376KB approximately equal to 115345744 kB.

If increase Active(anon) to 1000G then insmod module triggers
the ZONE_DMA32 watermark. hard lockup will occur.

In my device nr_scanned = 0000000003e3e937 when hard lockup.
Convert to memory size 0x0000000003e3e937 * 4KB = 261072092 KB.

   [ffffc90006fb7c28] isolate_lru_folios at ffffffffa597df53
    ffffc90006fb7c300000000000000020 0000000000000000
    ffffc90006fb7c40ffffc90006fb7d40 ffff88812cbd3000
    ffffc90006fb7c50ffffc90006fb7d30 0000000106fb7de8
    ffffc90006fb7c60ffffea04a2197008 ffffea0006ed4a48
    ffffc90006fb7c700000000000000000 0000000000000000
    ffffc90006fb7c800000000000000000 0000000000000000
    ffffc90006fb7c900000000000000000 0000000000000000
    ffffc90006fb7ca00000000000000000 0000000003e3e937
    ffffc90006fb7cb00000000000000000 0000000000000000
    ffffc90006fb7cc08d7c0b56b7874b00 ffff88812cbd3000

About the Fixes:
Why did it take eight years to be discovered?

The problem requires the following conditions to occur:
1. The device memory should be large enough.
2. Pages in the LRU(active_anon) list are mostly from the ZONE_NORMAL area.
3. The memory in ZONE_DMA32 needs to reach the watermark.

If the memory is not large enough, or if the usage design of ZONE_DMA32
area memory is reasonable, this problem is difficult to detect.

notes:
The problem is most likely to occur in ZONE_DMA32 and ZONE_NORMAL,
but other suitable scenarios may also trigger the problem.

Link: https://lkml.kernel.org/r/20241119060842.274072-1-liuye@kylinos.cn
Fixes: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Signed-off-by: liuye <liuye@kylinos.cn>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yang Shi <yang@os.amperecomputing.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
4 months agosh: boards: Use imply to enable hardware with complex dependencies
Geert Uytterhoeven [Fri, 24 Jan 2025 08:39:19 +0000 (09:39 +0100)] 
sh: boards: Use imply to enable hardware with complex dependencies

If CONFIG_I2C=n:

    WARNING: unmet direct dependencies detected for SND_SOC_AK4642
      Depends on [n]: SOUND [=y] && SND [=y] && SND_SOC [=y] && I2C [=n]
      Selected by [y]:
      - SH_7724_SOLUTION_ENGINE [=y] && CPU_SUBTYPE_SH7724 [=y] && SND_SIMPLE_CARD [=y]

    WARNING: unmet direct dependencies detected for SND_SOC_DA7210
      Depends on [n]: SOUND [=y] && SND [=y] && SND_SOC [=y] && SND_SOC_I2C_AND_SPI [=n]
      Selected by [y]:
      - SH_ECOVEC [=y] && CPU_SUBTYPE_SH7724 [=y] && SND_SIMPLE_CARD [=y]

Fix this by replacing select by imply, instead of adding a dependency on
I2C.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501240836.OvXqmANX-lkp@intel.com/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
4 months agosh: Migrate to the generic rule for built-in DTB
Masahiro Yamada [Sun, 22 Dec 2024 00:32:07 +0000 (09:32 +0900)] 
sh: Migrate to the generic rule for built-in DTB

Commit 654102df2ac2 ("kbuild: add generic support for built-in
boot DTBs") introduced generic support for built-in DTBs.

Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled.

To keep consistency across architectures, this commit also renames
CONFIG_USE_BUILTIN_DTB to CONFIG_BUILTIN_DTB, and
CONFIG_BUILTIN_DTB_SOURCE to CONFIG_BUILTIN_DTB_NAME.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
4 months agosh: irq: Use seq_put_decimal_ull_width() for decimal values
David Wang [Sat, 30 Nov 2024 13:49:09 +0000 (21:49 +0800)] 
sh: irq: Use seq_put_decimal_ull_width() for decimal values

On a system with n CPUs and m interrupts, there will be n*m decimal
values yielded via seq_printf(.."%10u "..) which has significant costs
parsing format string and is less efficient than seq_put_decimal_ull_width().
Stress reading /proc/interrupts indicates ~30% performance improvement with
this patch.

Signed-off-by: David Wang <00107082@163.com>
Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
4 months agoMerge tag 'for-linus-hexagon-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 1 Feb 2025 04:11:24 +0000 (20:11 -0800)] 
Merge tag 'for-linus-hexagon-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux

Pull hexagon updates from Brian Cain:

 - Move kernel prototypes out of uapi header to internal header

 - Fix to address an unbalanced spinlock

 - Miscellaneous patches to fix static checks

 - Update bcain@quicinc.com->brian.cain@oss.qualcomm.com

* tag 'for-linus-hexagon-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux:
  MAINTAINERS: Update my email address
  hexagon: Fix unbalanced spinlock in die()
  hexagon: Fix warning comparing pointer to 0
  hexagon: Move kernel prototypes out of uapi/asm/setup.h header
  hexagon: time: Remove redundant null check for resource
  hexagon: fix using plain integer as NULL pointer warning in cmpxchg

4 months agoRemove stale generated 'genheaders' file
Linus Torvalds [Sat, 1 Feb 2025 03:49:17 +0000 (19:49 -0800)] 
Remove stale generated 'genheaders' file

This bogus stale file was added in commit 101971298be2 ("riscv: add a
warning when physical memory address overflows").  It's the old location
for what is now 'security/selinux/genheaders'.

It looks like it got incorrectly committed back when that file was in
the old location, and then rebasing kept the bogus file alive.

Reported-by: Eric Biggers <ebiggers@kernel.org>
Link: https://lore.kernel.org/linux-riscv/20250201020003.GA77370@sol.localdomain/
Fixes: 101971298be2 ("riscv: add a warning when physical memory address overflows")
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 months agoMerge tag 'AT_EXECVE_CHECK-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sat, 1 Feb 2025 01:12:31 +0000 (17:12 -0800)] 
Merge tag 'AT_EXECVE_CHECK-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull AT_EXECVE_CHECK selftest fix from Kees Cook:
 "Fixes the AT_EXECVE_CHECK selftests which didn't run on old versions
  of glibc"

* tag 'AT_EXECVE_CHECK-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  selftests: Handle old glibc without execveat(2)

4 months agoMerge tag 'hardening-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 1 Feb 2025 01:10:26 +0000 (17:10 -0800)] 
Merge tag 'hardening-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:
 "This is a fix for the soon to be released GCC 15 which has regressed
  its initialization of unions when performing explicit initialization
  (i.e. a general problem, not specifically a hardening problem; we're
  just carrying the fix).

  Details in the final patch, Acked by Masahiro, with updated selftests
  to validate the fix"

* tag 'hardening-v6.14-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kbuild: Use -fzero-init-padding-bits=all
  stackinit: Add union initialization to selftests
  stackinit: Add old-style zero-init syntax to struct tests

4 months agoMerge tag 'drm-next-2025-02-01' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 31 Jan 2025 23:45:41 +0000 (15:45 -0800)] 
Merge tag 'drm-next-2025-02-01' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
 "This is only AMD fixes:

  amdgpu:
   - GC 12 fix
   - Aldebaran fix
   - DCN 3.5 fix
   - Freesync fix

  amdkfd:
   - Per queue reset fix
   - MES fix"

* tag 'drm-next-2025-02-01' of https://gitlab.freedesktop.org/drm/kernel:
  drm/amd/display: restore invalid MSA timing check for freesync
  drm/amdkfd: only flush the validate MES contex
  drm/amd/display: Correct register address in dcn35
  drm/amd/pm: Mark MM activity as unsupported
  drm/amd/amdgpu: change the config of cgcg on gfx12
  drm/amdkfd: Block per-queue reset when halt_if_hws_hang=1

4 months agoMerge tag 'pci-v6.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Fri, 31 Jan 2025 23:39:50 +0000 (15:39 -0800)] 
Merge tag 'pci-v6.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fix from Bjorn Helgaas:

 - Save the original INTX_DISABLE bit at the first pcim_intx() call and
   restore that at devres cleanup instead of restoring the opposite of
   the most recent enable/disable pcim_intx() argument, which was wrong
   when a driver called pcim_intx() multiple times or with the already
   enabled state (Takashi Iwai)

* tag 'pci-v6.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: Restore original INTX_DISABLE bit by pcim_intx()

4 months agoMerge tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 31 Jan 2025 23:13:25 +0000 (15:13 -0800)] 
Merge tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - The PH1520 pinctrl and dwmac drivers are enabeled in defconfig

 - A redundant AQRL barrier has been removed from the futex cmpxchg
   implementation

 - Support for the T-Head vector extensions, which includes exposing
   these extensions to userspace on systems that implement them

 - Some more page table information is now printed on die() and systems
   that cause PA overflows

* tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: add a warning when physical memory address overflows
  riscv/mm/fault: add show_pte() before die()
  riscv: Add ghostwrite vulnerability
  selftests: riscv: Support xtheadvector in vector tests
  selftests: riscv: Fix vector tests
  riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
  riscv: hwprobe: Add thead vendor extension probing
  riscv: vector: Support xtheadvector save/restore
  riscv: Add xtheadvector instruction definitions
  riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
  RISC-V: define the elements of the VCSR vector CSR
  riscv: vector: Use vlenb from DT for thead
  riscv: Add thead and xtheadvector as a vendor extension
  riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
  dt-bindings: cpus: add a thead vlen register length property
  dt-bindings: riscv: Add xtheadvector ISA extension description
  RISC-V: Mark riscv_v_init() as __init
  riscv: defconfig: drop RT_GROUP_SCHED=y
  riscv/futex: Optimize atomic cmpxchg
  riscv: defconfig: enable pinctrl and dwmac support for TH1520

4 months agoRevert "media: uvcvideo: Require entities to have a non-zero unique ID"
Thadeu Lima de Souza Cascardo [Tue, 14 Jan 2025 20:00:45 +0000 (17:00 -0300)] 
Revert "media: uvcvideo: Require entities to have a non-zero unique ID"

This reverts commit 3dd075fe8ebbc6fcbf998f81a75b8c4b159a6195.

Tomasz has reported that his device, Generalplus Technology Inc. 808 Camera,
with ID 1b3f:2002, stopped being detected:

$ ls -l /dev/video*
zsh: no matches found: /dev/video*
[    7.230599] usb 3-2: Found multiple Units with ID 5

This particular device is non-compliant, having both the Output Terminal
and Processing Unit with ID 5. uvc_scan_fallback, though, is able to build
a chain. However, when media elements are added and uvc_mc_create_links
call uvc_entity_by_id, it will get the incorrect entity,
media_create_pad_link will WARN, and it will fail to register the entities.

In order to reinstate support for such devices in a timely fashion,
reverting the fix for these warnings is appropriate. A proper fix that
considers the existence of such non-compliant devices will be submitted in
a later development cycle.

Reported-by: Tomasz Sikora <sikora.tomus@gmail.com>
Fixes: 3dd075fe8ebb ("media: uvcvideo: Require entities to have a non-zero unique ID")
Cc: stable@vger.kernel.org
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Link: https://lore.kernel.org/r/20250114200045.1401644-1-cascardo@igalia.com
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
4 months agoMerge tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy...
Linus Torvalds [Fri, 31 Jan 2025 20:07:07 +0000 (12:07 -0800)] 
Merge tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Support multiple hook locations for maint scripts of Debian package

 - Remove 'cpio' from the build tool requirement

 - Introduce gendwarfksyms tool, which computes CRCs for export symbols
   based on the DWARF information

 - Support CONFIG_MODVERSIONS for Rust

 - Resolve all conflicts in the genksyms parser

 - Fix several syntax errors in genksyms

* tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (64 commits)
  kbuild: fix Clang LTO with CONFIG_OBJTOOL=n
  kbuild: Strip runtime const RELA sections correctly
  kconfig: fix memory leak in sym_warn_unmet_dep()
  kconfig: fix file name in warnings when loading KCONFIG_DEFCONFIG_LIST
  genksyms: fix syntax error for attribute before init-declarator
  genksyms: fix syntax error for builtin (u)int*x*_t types
  genksyms: fix syntax error for attribute after 'union'
  genksyms: fix syntax error for attribute after 'struct'
  genksyms: fix syntax error for attribute after abstact_declarator
  genksyms: fix syntax error for attribute before nested_declarator
  genksyms: fix syntax error for attribute before abstract_declarator
  genksyms: decouple ATTRIBUTE_PHRASE from type-qualifier
  genksyms: record attributes consistently for init-declarator
  genksyms: restrict direct-declarator to take one parameter-type-list
  genksyms: restrict direct-abstract-declarator to take one parameter-type-list
  genksyms: remove Makefile hack
  genksyms: fix last 3 shift/reduce conflicts
  genksyms: fix 6 shift/reduce conflicts and 5 reduce/reduce conflicts
  genksyms: reduce type_qualifier directly to decl_specifier
  genksyms: rename cvar_qualifier to type_qualifier
  ...

4 months agoMerge tag 'block-6.14-20250131' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 31 Jan 2025 19:49:30 +0000 (11:49 -0800)] 
Merge tag 'block-6.14-20250131' of git://git.kernel.dk/linux

Pull more block updates from Jens Axboe:

 - MD pull request via Song:
      - Fix a md-cluster regression introduced

 - More sysfs race fixes

 - Mark anything inside queue freezing as not being able to do IO for
   memory allocations

 - Fix for a regression introduced in loop in this merge window

 - Fix for a regression in queue mapping setups introduced in this merge
   window

 - Fix for the block dio fops attempting an iov_iter revert upton
   getting -EIOCBQUEUED on the read side. This one is going to stable as
   well

* tag 'block-6.14-20250131' of git://git.kernel.dk/linux:
  block: force noio scope in blk_mq_freeze_queue
  block: fix nr_hw_queue update racing with disk addition/removal
  block: get rid of request queue ->sysfs_dir_lock
  loop: don't clear LO_FLAGS_PARTSCAN on LOOP_SET_STATUS{,64}
  md/md-bitmap: Synchronize bitmap_get_stats() with bitmap lifetime
  blk-mq: create correct map for fallback case
  block: don't revert iter for -EIOCBQUEUED

4 months agoMerge tag 'io_uring-6.14-20250131' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 31 Jan 2025 19:29:23 +0000 (11:29 -0800)] 
Merge tag 'io_uring-6.14-20250131' of git://git.kernel.dk/linux

Pull more io_uring updates from Jens Axboe:

 - Series cleaning up the alloc cache changes from this merge window,
   and then another series on top making it better yet.

   This also solves an issue with KASAN_EXTRA_INFO, by making io_uring
   resilient to KASAN using parts of the freed struct for storage

 - Cleanups and simplications to buffer cloning and io resource node
   management

 - Fix an issue introduced in this merge window where READ/WRITE_ONCE
   was used on an atomic_t, which made some archs complain

 - Fix for an errant connect retry when the socket has been shut down

 - Fix for multishot and provided buffers

* tag 'io_uring-6.14-20250131' of git://git.kernel.dk/linux:
  io_uring/net: don't retry connect operation on EPOLLERR
  io_uring/rw: simplify io_rw_recycle()
  io_uring: remove !KASAN guards from cache free
  io_uring/net: extract io_send_select_buffer()
  io_uring/net: clean io_msg_copy_hdr()
  io_uring/net: make io_net_vec_assign() return void
  io_uring: add alloc_cache.c
  io_uring: dont ifdef io_alloc_cache_kasan()
  io_uring: include all deps for alloc_cache.h
  io_uring: fix multishots with selected buffers
  io_uring/register: use atomic_read/write for sq_flags migration
  io_uring/alloc_cache: get rid of _nocache() helper
  io_uring: get rid of alloc cache init_once handling
  io_uring/uring_cmd: cleanup struct io_uring_cmd_data layout
  io_uring/uring_cmd: use cached cmd_op in io_uring_cmd_sock()
  io_uring/msg_ring: don't leave potentially dangling ->tctx pointer
  io_uring/rsrc: Move lockdep assert from io_free_rsrc_node() to caller
  io_uring/rsrc: remove unused parameter ctx for io_rsrc_node_alloc()
  io_uring: clean up io_uring_register_get_file()
  io_uring/rsrc: Simplify buffer cloning by locking both rings

4 months agokbuild: fix Clang LTO with CONFIG_OBJTOOL=n
Masahiro Yamada [Fri, 31 Jan 2025 14:04:01 +0000 (23:04 +0900)] 
kbuild: fix Clang LTO with CONFIG_OBJTOOL=n

Since commit bede169618c6 ("kbuild: enable objtool for *.mod.o and
additional kernel objects"), Clang LTO builds do not perform any
optimizations when CONFIG_OBJTOOL is disabled (e.g., for ARCH=arm64).
This is because every LLVM bitcode file is immediately converted to
ELF format before the object files are linked together.

This commit fixes the breakage.

Fixes: bede169618c6 ("kbuild: enable objtool for *.mod.o and additional kernel objects")
Reported-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
4 months agokbuild: Strip runtime const RELA sections correctly
Ard Biesheuvel [Mon, 13 Jan 2025 15:53:07 +0000 (16:53 +0100)] 
kbuild: Strip runtime const RELA sections correctly

Due to the fact that runtime const ELF sections are named without a
leading period or double underscore, the RSTRIP logic that removes the
static RELA sections from vmlinux fails to identify them. This results
in a situation like below, where some sections that were supposed to get
removed are left behind.

  [Nr] Name                              Type            Address          Off     Size   ES Flg Lk Inf Al

  [58] runtime_shift_d_hash_shift        PROGBITS        ffffffff83500f50 2900f50 000014 00   A  0   0  1
  [59] .relaruntime_shift_d_hash_shift   RELA            0000000000000000 55b6f00 000078 18   I 70  58  8
  [60] runtime_ptr_dentry_hashtable      PROGBITS        ffffffff83500f68 2900f68 000014 00   A  0   0  1
  [61] .relaruntime_ptr_dentry_hashtable RELA            0000000000000000 55b6f78 000078 18   I 70  60  8
  [62] runtime_ptr_USER_PTR_MAX          PROGBITS        ffffffff83500f80 2900f80 000238 00   A  0   0  1
  [63] .relaruntime_ptr_USER_PTR_MAX     RELA            0000000000000000 55b6ff0 000d50 18   I 70  62  8

So tweak the match expression to strip all sections starting with .rel.
While at it, consolidate the logic used by RISC-V, s390 and x86 into a
single shared Makefile library command.

Link: https://lore.kernel.org/all/CAHk-=wjk3ynjomNvFN8jf9A1k=qSc=JFF591W00uXj-qqNUxPQ@mail.gmail.com/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
4 months agoMerge tag 'ata-6.14-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/libat...
Linus Torvalds [Fri, 31 Jan 2025 19:07:56 +0000 (11:07 -0800)] 
Merge tag 'ata-6.14-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull more ata updates from Niklas Cassel:

 - Add ATA_QUIRK_NOLPM for Samsung SSD 870 QVO drives (Daniel)

 - Ensure that PIO transfers using libata-sff cannot write outside the
   allocated buffer (me)

* tag 'ata-6.14-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-sff: Ensure that we cannot write outside the allocated buffer
  ata: libata-core: Add ATA_QUIRK_NOLPM for Samsung SSD 870 QVO drives

4 months agocifs: Fix parsing native symlinks directory/file type
Pali Rohár [Mon, 23 Sep 2024 20:29:30 +0000 (22:29 +0200)] 
cifs: Fix parsing native symlinks directory/file type

As SMB protocol distinguish between symlink to directory and symlink to
file, add some mechanism to disallow resolving incompatible types.

When SMB symlink is of the directory type, ensure that its target path ends
with slash. This forces Linux to not allow resolving such symlink to file.

And when SMB symlink is of the file type and its target path ends with
slash then returns an error as such symlink is unresolvable. Such symlink
always points to invalid location as file cannot end with slash.

As POSIX server does not distinguish between symlinks to file and symlink
directory, do not apply this change for symlinks from POSIX SMB server. For
POSIX SMB servers, this change does nothing.

This mimics Windows behavior of native SMB symlinks.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agocifs: update internal version number
Steve French [Mon, 27 Jan 2025 23:45:57 +0000 (17:45 -0600)] 
cifs: update internal version number

To 2.53

Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agocifs: Add support for creating WSL-style symlinks
Pali Rohár [Sat, 28 Sep 2024 11:24:26 +0000 (13:24 +0200)] 
cifs: Add support for creating WSL-style symlinks

This change implements support for creating new symlink in WSL-style by
Linux cifs client when -o reparse=wsl mount option is specified. WSL-style
symlink uses reparse point with tag IO_REPARSE_TAG_LX_SYMLINK and symlink
target location is stored in reparse buffer in UTF-8 encoding prefixed by
32-bit flags. Flags bits are unknown, but it was observed that WSL always
sets flags to value 0x02000000. Do same in Linux cifs client.

New symlinks would be created in WSL-style only in case the mount option
-o reparse=wsl is specified, which is not by default. So default CIFS
mounts are not affected by this change.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agosmb3: add support for IAKerb
Steve French [Tue, 28 Jan 2025 07:04:23 +0000 (01:04 -0600)] 
smb3: add support for IAKerb

There are now more servers which advertise support for IAKerb (passthrough
Kerberos authentication via proxy).  IAKerb is a public extension industry
standard Kerberos protocol that allows a client without line-of-sight
to a Domain Controller to authenticate. There can be cases where we
would fail to mount if the server only advertises the OID for IAKerb
in SPNEGO/GSSAPI.  Add code to allow us to still upcall to userspace
in these cases to obtain the Kerberos ticket.

Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agocifs: Fix struct FILE_ALL_INFO
Pali Rohár [Sun, 29 Dec 2024 14:31:05 +0000 (15:31 +0100)] 
cifs: Fix struct FILE_ALL_INFO

struct FILE_ALL_INFO for level 263 (0x107) used by QPathInfo does not have
any IndexNumber, AccessFlags, IndexNumber1, CurrentByteOffset, Mode or
AlignmentRequirement members. So remove all of them.

Also adjust code in move_cifs_info_to_smb2() function which converts struct
FILE_ALL_INFO to struct smb2_file_all_info.

Fixed content of struct FILE_ALL_INFO was verified that is correct against:
* [MS-CIFS] section 2.2.8.3.10 SMB_QUERY_FILE_ALL_INFO
* Samba server implementation of trans2 query file/path for level 263
* Packet structure tests against Windows SMB servers

This change fixes CIFSSMBQFileInfo() and CIFSSMBQPathInfo() functions which
directly copy received FILE_ALL_INFO network buffers into kernel structures
of FILE_ALL_INFO type.

struct FILE_ALL_INFO is the response structure returned by the SMB server.
So the incorrect definition of this structure can lead to returning bogus
information in stat() call.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agocifs: Add support for creating NFS-style symlinks
Pali Rohár [Fri, 13 Sep 2024 09:42:59 +0000 (11:42 +0200)] 
cifs: Add support for creating NFS-style symlinks

CIFS client is currently able to parse NFS-style symlinks, but is not able
to create them. This functionality is useful when the mounted SMB share is
used also by Windows NFS server (on Windows Server 2012 or new). It allows
interop of symlinks between SMB share mounted by Linux CIFS client and same
export from Windows NFS server mounted by some NFS client.

New symlinks would be created in NFS-style only in case the mount option
-o reparse=nfs is specified, which is not by default. So default CIFS
mounts are not affected by this change.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agocifs: Add support for creating native Windows sockets
Pali Rohár [Mon, 16 Sep 2024 18:21:52 +0000 (20:21 +0200)] 
cifs: Add support for creating native Windows sockets

Native Windows sockets created by WinSock on Windows 10 April 2018 Update
(version 1803) or Windows Server 2019 (version 1809) or later versions is
reparse point with IO_REPARSE_TAG_AF_UNIX tag, with empty reparse point
data buffer and without any EAs.

Create AF_UNIX sockets in this native format if -o nonativesocket was not
specified.

This change makes AF_UNIX sockets created by Linux CIFS client compatible
with AF_UNIX sockets created by Windows applications on NTFS volumes.

Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
4 months agoMerge tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Linus Torvalds [Fri, 31 Jan 2025 18:39:07 +0000 (10:39 -0800)] 
Merge tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 mm updates from Ingo Molnar:

 - The biggest changes are the TLB flushing scalability optimizations,
   to update the mm_cpumask lazily and related changes.

   This feature has both a track record and a continued risk of
   performance regressions, so it was already delayed by a cycle - but
   it's all 100% perfect now™ (Rik van Riel)

 - Also miscellaneous fixes and cleanups. (Gautam Somani, Kirill
   Shutemov, Sebastian Andrzej Siewior)

* tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Remove unnecessary include of <linux/extable.h>
  x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state()
  x86/mm/selftests: Fix typo in lam.c
  x86/mm/tlb: Only trim the mm_cpumask once a second
  x86/mm/tlb: Also remove local CPU from mm_cpumask if stale
  x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU
  x86/mm/tlb: Update mm_cpumask lazily

4 months agoMerge tag 'ceph-for-6.14-rc1' of https://github.com/ceph/ceph-client
Linus Torvalds [Fri, 31 Jan 2025 18:30:34 +0000 (10:30 -0800)] 
Merge tag 'ceph-for-6.14-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "A fix for a memory leak from Antoine (marked for stable) and two
  cleanups from Liang and Slava"

* tag 'ceph-for-6.14-rc1' of https://github.com/ceph/ceph-client:
  ceph: exchange hardcoded value on NAME_MAX
  ceph: streamline request head structures in MDS client
  ceph: fix memory leak in ceph_mds_auth_match()

4 months agoMerge tag 'for-linus-6.14-ofs4' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 31 Jan 2025 17:40:31 +0000 (09:40 -0800)] 
Merge tag 'for-linus-6.14-ofs4' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs fix from Mike Marshall:
 "Fix a oob in orangefs_debug_write

  I got a syzbot report: "slab-out-of-bounds Read in orangefs_debug_write"

  Several people suggested fixes, I tested Al Viro's suggestion and made
  this patch"

* tag 'for-linus-6.14-ofs4' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: fix a oob in orangefs_debug_write

4 months agoMerge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Fri, 31 Jan 2025 17:33:54 +0000 (09:33 -0800)] 
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull hostfs fix from Al Viro:
 "Fix hostfs __dentry_name() string handling.

  The use of strcpy() with overlapping source and destination is a UB;
  original loop hadn't been. More to the point, the whole thing is much
  easier done with memcpy() + memmove()"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  hostfs: fix string handling in __dentry_name()

4 months agoMerge tag 'sound-fix-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 31 Jan 2025 17:17:02 +0000 (09:17 -0800)] 
Merge tag 'sound-fix-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Here is a collection of fixes that have been gathered since the
  previous pull request.

  All about device-specific fixes and quirks, and most of them are
  pretty small and trivial"

* tag 'sound-fix-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (25 commits)
  ALSA: hda/realtek: Workaround for resume on Dell Venue 11 Pro 7130
  ALSA: hda: Fix headset detection failure due to unstable sort
  ALSA: pcm: use new array-copying-wrapper
  ASoC: codec: es8316: "DAC Soft Ramp Rate" is just a 2 bit control
  ASoC: amd: acp: Fix possible deadlock
  firmware: cs_dsp: FW_CS_DSP_KUNIT_TEST should not select REGMAP
  ALSA: usb-audio: Add delay quirk for iBasso DC07 Pro
  ALSA: hda/realtek: Fix quirk matching for Legion Pro 7
  ASoC: renesas: SND_SIU_MIGOR should depend on DMADEVICES
  ASoC: Intel: bytcr_rt5640: Add DMI quirk for Vexia Edu Atla 10 tablet 5V
  ASoC: da7213: Initialize the mutex
  ASoC: use to_platform_device() instead of container_of()
  ASoC: acp: Support microphone from Lenovo Go S
  ASoC: SOF: imx8m: Add entry for new 8M Plus revision
  ASoC: SOF: imx8: Add entries for new 8QM and 8QXP revisions
  ASoC: SOF: imx: Add mach entry to select cs42888 topology
  dt-bindings: arm: imx: Add board revisions for i.MX8MP, i.MX8QM and i.MX8QXP
  ASoC: fsl_asrc_m2m: select CONFIG_DMA_SHARED_BUFFER
  ASoC: audio-graph-card2: use correct endpoint when getting link parameters
  ASoC: SOF: imx8m: add SAI2,5,6,7
  ...

4 months agoblock: force noio scope in blk_mq_freeze_queue
Christoph Hellwig [Fri, 31 Jan 2025 12:03:47 +0000 (13:03 +0100)] 
block: force noio scope in blk_mq_freeze_queue

When block drivers or the core block code perform allocations with a
frozen queue, this could try to recurse into the block device to
reclaim memory and deadlock.  Thus all allocations done by a process
that froze a queue need to be done without __GFP_IO and __GFP_FS.
Instead of tying to track all of them down, force a noio scope as
part of freezing the queue.

Note that nvme is a bit of a mess here due to the non-owner freezes,
and they will be addressed separately.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250131120352.1315351-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 months agojiffies: Cast to unsigned long in secs_to_jiffies() conversion
Easwar Hariharan [Thu, 30 Jan 2025 19:26:58 +0000 (19:26 +0000)] 
jiffies: Cast to unsigned long in secs_to_jiffies() conversion

While converting users of msecs_to_jiffies(), lkp reported that some range
checks would always be true because of the mismatch between the implied int
value of secs_to_jiffies() vs the unsigned long return value of the
msecs_to_jiffies() calls it was replacing.

Fix this by casting the secs_to_jiffies() input value to unsigned long.

Fixes: b35108a51cf7ba ("jiffies: Define secs_to_jiffies()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250130192701.99626-1-eahariha@linux.microsoft.com
Closes: https://lore.kernel.org/oe-kbuild-all/202501301334.NB6NszQR-lkp@intel.com/
4 months agoRevert "mips: fix shmctl/semctl/msgctl syscall for o32"
Thomas Bogendoerfer [Thu, 30 Jan 2025 10:48:56 +0000 (11:48 +0100)] 
Revert "mips: fix shmctl/semctl/msgctl syscall for o32"

This reverts commit bc7584e009c39375294794f7ca751a6b2622c425.

The split IPC system calls for o32 have been introduced with modern
version only. Changing this breaks ABI.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
4 months agoMAINTAINERS: Update my email address
Brian Cain [Mon, 27 Jan 2025 20:51:03 +0000 (12:51 -0800)] 
MAINTAINERS: Update my email address

Qualcomm is migrating away from quicinc.com email addresses towards ones
with *.qualcomm.com.

Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com>
Signed-off-by: Brian Cain <bcain@quicinc.com>
4 months agohexagon: Fix unbalanced spinlock in die()
Lin Yujun [Mon, 22 May 2023 02:56:08 +0000 (02:56 +0000)] 
hexagon: Fix unbalanced spinlock in die()

die executes holding the spinlock of &die.lock and unlock
it after printing the oops message.
However in the code if the notify_die() returns NOTIFY_STOP
, die() exit with returning 1 but never unlocked the spinlock.

Fix this by adding spin_unlock_irq(&die.lock) before returning.

Fixes: cf9750bae262 ("Hexagon: Provide basic debugging and system trap support.")
Signed-off-by: Lin Yujun <linyujun809@huawei.com>
Link: https://lore.kernel.org/r/20230522025608.2515558-1-linyujun809@huawei.com
Signed-off-by: Brian Cain <bcain@quicinc.com>
Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com>
4 months agohexagon: Fix warning comparing pointer to 0
Yang Li [Wed, 8 Feb 2023 01:11:05 +0000 (09:11 +0800)] 
hexagon: Fix warning comparing pointer to 0

./arch/hexagon/kernel/traps.c:138:6-7: WARNING comparing pointer to 0

Avoid pointer type value compared with 0 to make code clear.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3978
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230208011105.80219-1-yang.lee@linux.alibaba.com
Signed-off-by: Brian Cain <bcain@quicinc.com>
Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com>
4 months agohexagon: Move kernel prototypes out of uapi/asm/setup.h header
Thomas Huth [Thu, 2 May 2024 17:38:18 +0000 (19:38 +0200)] 
hexagon: Move kernel prototypes out of uapi/asm/setup.h header

The kernel function prototypes are of no use for userspace and
shouldn't get exposed in an uapi header, so let's move them into
an internal header instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/r/20240502173818.58152-1-thuth@redhat.com
Signed-off-by: Brian Cain <bcain@quicinc.com>
Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com>
4 months agohexagon: time: Remove redundant null check for resource
Hardevsinh Palaniya [Mon, 11 Nov 2024 14:24:10 +0000 (19:54 +0530)] 
hexagon: time: Remove redundant null check for resource

Null check for 'resource' before assignment is unnecessary because the
variable 'resource' is initialized to NULL at the beginning of the function.

Signed-off-by: Hardevsinh Palaniya <hardevsinh.palaniya@siliconsignals.io>
Link: https://lore.kernel.org/r/20241111142458.67854-1-hardevsinh.palaniya@siliconsignals.io
Signed-off-by: Brian Cain <bcain@quicinc.com>
Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com>
4 months agohexagon: fix using plain integer as NULL pointer warning in cmpxchg
Willem de Bruijn [Tue, 3 Dec 2024 22:17:34 +0000 (17:17 -0500)] 
hexagon: fix using plain integer as NULL pointer warning in cmpxchg

Sparse reports

    net/ipv4/inet_diag.c:1511:17: sparse: sparse: Using plain integer as NULL pointer

Due to this code calling cmpxchg on a non-integer type
struct inet_diag_handler *

    return !cmpxchg((const struct inet_diag_handler**)&inet_diag_table[type],
                    NULL, h) ? 0 : -EEXIST;

While hexagon's cmpxchg assigns an integer value to a variable of this
type.

    __typeof__(*(ptr)) __oldval = 0;

Update this assignment to cast 0 to the correct type.

The original issue is easily reproduced at head with the below block,
and is absent after this change.

    make LLVM=1 ARCH=hexagon defconfig
    make C=1 LLVM=1 ARCH=hexagon net/ipv4/inet_diag.o

Fixes: 99a70aa051d2 ("Hexagon: Add processor and system headers")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202411091538.PGSTqUBi-lkp@intel.com/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Tested-by: Christian Gmeiner <cgmeiner@igalia.com>
Link: https://lore.kernel.org/r/20241203221736.282020-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Brian Cain <bcain@quicinc.com>
Signed-off-by: Brian Cain <brian.cain@oss.qualcomm.com>
4 months agoMerge tag 'uml-for-linus-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 31 Jan 2025 02:29:40 +0000 (18:29 -0800)] 
Merge tag 'uml-for-linus-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull UML updates from Richard Weinberger:

 - hostfs: Convert to writepages

 - many cleanups: removal of dead macros, missing __init

* tag 'uml-for-linus-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
  um: Remove unused asm/archparam.h header
  um: Include missing headers in asm/pgtable.h
  hostfs: Convert to writepages
  um: rtc: use RTC time when calculating the alarm
  um: Remove unused user_context function
  um: Remove unused THREAD_NAME_LEN macro
  um: Remove unused PGD_BOUND macro
  um: Mark setup_env_path as __init
  um: Mark install_fatal_handler as __init
  um: Mark set_stklim as __init
  um: Mark get_top_address as __init
  um: Mark parse_cache_line as __init
  um: Mark parse_host_cpu_flags as __init
  um: Count iomem_size only once in physmem calculation
  um: Remove obsolete fixmap support
  um: Remove unused MODULES_LEN macro

4 months agoMerge tag 'ubifs-for-linus-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 31 Jan 2025 02:27:02 +0000 (18:27 -0800)] 
Merge tag 'ubifs-for-linus-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBI and UBIFS updates from Richard Weinberger:
 "UBI:
   - New interface to dump detailed erase counters
   - Fixes around wear-leveling

  UBIFS:
   - Minor cleanups
   - Fix for TNC dumping code"

* tag 'ubifs-for-linus-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubi: ubi_get_ec_info: Fix compiling error 'cast specifies array type'
  ubi: Implement ioctl for detailed erase counters
  ubi: Expose interface for detailed erase counters
  ubifs: skip dumping tnc tree when zroot is null
  ubi: Revert "ubi: wl: Close down wear-leveling before nand is suspended"
  ubifs: ubifs_dump_leb: remove return from end of void function
  ubifs: dump_lpt_leb: remove return at end of void function
  ubi: Add a check for ubi_num