Previously, some sanity check have been done in dump_mapping() before
the print facility parsing '%pd' though, it's still possible to run into
an invalid dentry.d_name.name.
Since dump_mapping() only needs to dump the filename only, retrieve it
by itself in a safer way to prevent an unnecessary crash.
Note that either retrieving the filename with '%pd' or
strncpy_from_kernel_nofault(), the filename could be unreliable.
Baokun Li [Mon, 26 Aug 2024 11:34:04 +0000 (19:34 +0800)]
netfs: Delete subtree of 'fs/netfs' when netfs module exits
In netfs_init() or fscache_proc_init(), we create dentry under 'fs/netfs',
but in netfs_exit(), we only delete the proc entry of 'fs/netfs' without
deleting its subtree. This triggers the following WARNING:
Therefore use remove_proc_subtree() instead of remove_proc_entry() to
fix the above problem.
Fixes: 7eb5b3e3a0a5 ("netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a symlink") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240826113404.3214786-1-libaokun@huaweicloud.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
I've recently looked for some free space in struct inode again because
of some exec kerfuffle we had and while my idea didn't turn into
anything I noticed that we often waste bytes when using wait bit
operations. So I set out to switch that to another mechanism that would
allow us to free up bytes. So this is an attempt to turn i_state from an
unsigned long into an u32 using the individual bytes of i_state as
addresses for the wait var event mechanism (Thanks to Linus for that idea.).
This survives LTP, xfstests on various filesystems, and will-it-scale.
* patches from https://lore.kernel.org/r/20240823-work-i_state-v3-1-5cd5fd207a57@kernel.org:
inode: make i_state a u32
inode: port __I_LRU_ISOLATING to var event
inode: port __I_NEW to var event
inode: port __I_SYNC to var event
fs: reorder i_state bits
fs: add i_state helpers
Now that we use the wait var event mechanism make i_state a u32 and free
up 4 bytes. This means we currently have two 4 byte holes in struct
inode which we can pack.
Julian Sun [Fri, 23 Aug 2024 13:07:30 +0000 (21:07 +0800)]
vfs: fix race between evice_inodes() and find_inode()&iput()
Hi, all
Recently I noticed a bug[1] in btrfs, after digged it into
and I believe it'a race in vfs.
Let's assume there's a inode (ie ino 261) with i_count 1 is
called by iput(), and there's a concurrent thread calling
generic_shutdown_super().
cpu0: cpu1:
iput() // i_count is 1
->spin_lock(inode)
->dec i_count to 0
->iput_final() generic_shutdown_super()
->__inode_add_lru() ->evict_inodes()
// cause some reason[2] ->if (atomic_read(inode->i_count)) continue;
// return before // inode 261 passed the above check
// list_lru_add_obj() // and then schedule out
->spin_unlock()
// note here: the inode 261
// was still at sb list and hash list,
// and I_FREEING|I_WILL_FREE was not been set
btrfs_iget()
// after some function calls
->find_inode()
// found the above inode 261
->spin_lock(inode)
// check I_FREEING|I_WILL_FREE
// and passed
->__iget()
->spin_unlock(inode) // schedule back
->spin_lock(inode)
// check (I_NEW|I_FREEING|I_WILL_FREE) flags,
// passed and set I_FREEING
iput() ->spin_unlock(inode)
->spin_lock(inode) ->evict()
// dec i_count to 0
->iput_final()
->spin_unlock()
->evict()
Now, we have two threads simultaneously evicting
the same inode, which may trigger the BUG(inode->i_state & I_CLEAR)
statement both within clear_inode() and iput().
To fix the bug, recheck the inode->i_count after holding i_lock.
Because in the most scenarios, the first check is valid, and
the overhead of spin_lock() can be reduced.
If there is any misunderstanding, please let me know, thanks.
[1]: https://lore.kernel.org/linux-btrfs/000000000000eabe1d0619c48986@google.com/
[2]: The reason might be 1. SB_ACTIVE was removed or 2. mapping_shrinkable()
return false when I reproduced the bug.
Reported-by: syzbot+67ba3c42bcbb4665d3ad@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=67ba3c42bcbb4665d3ad CC: stable@vger.kernel.org Fixes: 63997e98a3be ("split invalidate_inodes()") Signed-off-by: Julian Sun <sunjunchao2870@gmail.com> Link: https://lore.kernel.org/r/20240823130730.658881-1-sunjunchao2870@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
The i_state member is an unsigned long so that it can be used with the
wait bit infrastructure which expects unsigned long. This wastes 4 bytes
which we're unlikely to ever use. Switch to using the var event wait
mechanism using the address of the bit. Thanks to Linus for the address
idea.
After its conversion to the new mount API, debugfs displays "none" in
/proc/mounts instead of the actual source. Fix this by recognising its
"source" mount option.
Signed-off-by: Marc Aurèle La France <tsi@tuyoix.net> Link: https://lore.kernel.org/r/e439fae2-01da-234b-75b9-2a7951671e27@tuyoix.net Fixes: a20971c18752 ("vfs: Convert debugfs to use the new mount API") Cc: stable@vger.kernel.org # 6.10.x: 49abee5991e1: debugfs: Convert to new uid/gid option parsing helpers Signed-off-by: Christian Brauner <brauner@kernel.org>
Afaict, we can just rely on inode->i_dio_count for waiting instead of
this awkward indirection through __I_DIO_WAKEUP. This survives LTP dio
and xfstests dio tests.
Mateusz Guzik [Thu, 15 Aug 2024 08:33:10 +0000 (10:33 +0200)]
vfs: elide smp_mb in iversion handling in the common case
According to bpftrace on these routines most calls result in cmpxchg,
which already provides the same guarantee.
In inode_maybe_inc_iversion elision is possible because even if the
wrong value was read due to now missing smp_mb fence, the issue is going
to correct itself after cmpxchg. If it appears cmpxchg wont be issued,
the fence + reload are there bringing back previous behavior.
Ian Kent [Wed, 14 Aug 2024 09:02:31 +0000 (17:02 +0800)]
autofs: add per dentry expire timeout
Add ability to set per-dentry mount expire timeout to autofs.
There are two fairly well known automounter map formats, the autofs
format and the amd format (more or less System V and Berkley).
Some time ago Linux autofs added an amd map format parser that
implemented a fair amount of the amd functionality. This was done
within the autofs infrastructure and some functionality wasn't
implemented because it either didn't make sense or required extra
kernel changes. The idea was to restrict changes to be within the
existing autofs functionality as much as possible and leave changes
with a wider scope to be considered later.
One of these changes is implementing the amd options:
1) "unmount", expire this mount according to a timeout (same as the
current autofs default).
2) "nounmount", don't expire this mount (same as setting the autofs
timeout to 0 except only for this specific mount) .
3) "utimeout=<seconds>", expire this mount using the specified
timeout (again same as setting the autofs timeout but only for
this mount).
To implement these options per-dentry expire timeouts need to be
implemented for autofs indirect mounts. This is because all map keys
(mounts) for autofs indirect mounts use an expire timeout stored in
the autofs mount super block info. structure and all indirect mounts
use the same expire timeout.
Now I have a request to add the "nounmount" option so I need to add
the per-dentry expire handling to the kernel implementation to do this.
The implementation uses the trailing path component to identify the
mount (and is also used as the autofs map key) which is passed in the
autofs_dev_ioctl structure path field. The expire timeout is passed
in autofs_dev_ioctl timeout field (well, of the timeout union).
If the passed in timeout is equal to -1 the per-dentry timeout and
flag are cleared providing for the "unmount" option. If the timeout
is greater than or equal to 0 the timeout is set to the value and the
flag is also set. If the dentry timeout is 0 the dentry will not expire
by timeout which enables the implementation of the "nounmount" option
for the specific mount. When the dentry timeout is greater than zero it
allows for the implementation of the "utimeout=<seconds>" option.
A soft lockup in ilookup was reported when stress-testing a 512-way
system [1] (see [2] for full context) and it was verified that not
taking the lock shifts issues back to mm.
This is another flag that is statically set and doesn't need to use up
an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit.
(1) mem_open() used from proc_mem_operations
(2) adi_open() used from adi_fops
(3) drm_open_helper():
(3.1) accel_open() used from DRM_ACCEL_FOPS
(3.2) drm_open() used from
(3.2.1) amdgpu_driver_kms_fops
(3.2.2) psb_gem_fops
(3.2.3) i915_driver_fops
(3.2.4) nouveau_driver_fops
(3.2.5) panthor_drm_driver_fops
(3.2.6) radeon_driver_kms_fops
(3.2.7) tegra_drm_fops
(3.2.8) vmwgfx_driver_fops
(3.2.9) xe_driver_fops
(3.2.10) DRM_GEM_FOPS
(3.2.11) DEFINE_DRM_GEM_DMA_FOPS
(4) struct memdev sets fmode flags based on type of device opened. For
devices using struct mem_fops unsigned offset is used.
Mark all these file operations as FOP_UNSIGNED_OFFSET and add asserts
into the open helper to ensure that the flag is always set.
Mateusz Guzik [Sat, 10 Aug 2024 06:47:53 +0000 (08:47 +0200)]
vfs: only read fops once in fops_get/put
In do_dentry_open() the usage is:
f->f_op = fops_get(inode->i_fop);
In generated asm the compiler emits 2 reads from inode->i_fop instead of
just one.
This popped up due to false-sharing where loads from that offset end up
bouncing a cacheline during parallel open. While this is going to be fixed,
the spurious load does not need to be there.
This makes do_dentry_open() go down from 1177 to 1154 bytes.
fops_put() is patched to maintain some consistency.
Thorsten Blum [Thu, 8 Aug 2024 15:00:24 +0000 (17:00 +0200)]
fs/select: Annotate struct poll_list with __counted_by()
Add the __counted_by compiler attribute to the flexible array member
entries to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
CONFIG_FORTIFY_SOURCE.
During O_CREAT we unconditionally audit the parent inode. This makes it
difficult to support a fastpath for O_CREAT when the file already exists
because we have to drop out of RCU lookup needlessly.
We worked around this by checking whether audit was actually active but
that's also suboptimal. Instead, move the audit of the parent inode down
into lookup_open() at a point where it's mostly certain that the file
needs to be created.
This also reduced the inconsistency that currently exists: while audit
on the parent is done independent of whether or no the file already
existed an audit on the file is only performed if it has been created.
By moving the audit down a bit we emit the audit a little later but it
will allow us to simplify the fastpath for O_CREAT significantly.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Jeff Layton [Wed, 7 Aug 2024 12:10:27 +0000 (08:10 -0400)]
fs: try an opportunistic lookup for O_CREAT opens too
Today, when opening a file we'll typically do a fast lookup, but if
O_CREAT is set, the kernel always takes the exclusive inode lock. I
assume this was done with the expectation that O_CREAT means that we
always expect to do the create, but that's often not the case. Many
programs set O_CREAT even in scenarios where the file already exists.
This patch rearranges the pathwalk-for-open code to also attempt a
fast_lookup in certain O_CREAT cases. If a positive dentry is found, the
inode_lock can be avoided altogether, and if auditing isn't enabled, it
can stay in rcuwalk mode for the last step_into.
One notable exception that is hopefully temporary: if we're doing an
rcuwalk and auditing is enabled, skip the lookup_fast. Legitimizing the
dentry in that case is more expensive than taking the i_rwsem for now.
Martin Karsten [Tue, 6 Aug 2024 12:33:01 +0000 (12:33 +0000)]
eventpoll: Annotate data-race of busy_poll_usecs
A struct eventpoll's busy_poll_usecs field can be modified via a user
ioctl at any time. All reads of this field should be annotated with
READ_ONCE.
Fixes: 85455c795c07 ("eventpoll: support busy poll per epoll instance") Cc: stable@vger.kernel.org Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca> Link: https://lore.kernel.org/r/20240806123301.167557-1-jdamato@fastly.com Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
Joe Damato [Wed, 7 Aug 2024 10:52:31 +0000 (10:52 +0000)]
eventpoll: Don't re-zero eventpoll fields
Remove redundant and unnecessary code.
ep_alloc uses kzalloc to create struct eventpoll, so there is no need to
set fields to defaults of 0. This was accidentally introduced in commit 85455c795c07 ("eventpoll: support busy poll per epoll instance") and
expanded on in follow-up commits.
Xiaxi Shen [Wed, 7 Aug 2024 07:05:36 +0000 (00:05 -0700)]
Fix spelling and gramatical errors
Fixed 3 typos in design.rst
Signed-off-by: Xiaxi Shen <shenxiaxi26@gmail.com> Link: https://lore.kernel.org/r/20240807070536.14536-1-shenxiaxi26@gmail.com Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
Mateusz Guzik [Tue, 6 Aug 2024 17:28:46 +0000 (19:28 +0200)]
vfs: dodge smp_mb in break_lease and break_deleg in the common case
These inlines show up in the fast path (e.g., in do_dentry_open()) and
induce said full barrier regarding i_flctx access when in most cases the
pointer is NULL.
The pointer can be safely checked before issuing the barrier, dodging it
in most cases as a result.
It is plausible the consume fence would be sufficient, but I don't want
to go audit all callers regarding what they before calling here.
Joel Savitz [Sat, 3 Aug 2024 02:54:55 +0000 (22:54 -0400)]
file: remove outdated comment after close_fd()
Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: linux-fsdevel@vger.kernel.org
The comment on EXPORT_SYMBOL(close_fd) was added in commit 2ca2a09d6215
("fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()"),
before commit 8760c909f54a ("file: Rename __close_fd to close_fd and remove
the files parameter") gave the function its current name, however commit 1572bfdf21d4 ("file: Replace ksys_close with close_fd") removes the
referenced caller entirely, obsoleting this comment.
Mateusz Guzik [Mon, 5 Aug 2024 13:17:21 +0000 (15:17 +0200)]
exec: don't WARN for racy path_noexec check
Both i_mode and noexec checks wrapped in WARN_ON stem from an artifact
of the previous implementation. They used to legitimately check for the
condition, but that got moved up in two commits: 633fb6ac3980 ("exec: move S_ISREG() check earlier") 0fd338b2d2cd ("exec: move path_noexec() check earlier")
Instead of being removed said checks are WARN_ON'ed instead, which
has some debug value.
However, the spurious path_noexec check is racy, resulting in
unwarranted warnings should someone race with setting the noexec flag.
One can note there is more to perm-checking whether execve is allowed
and none of the conditions are guaranteed to still hold after they were
tested for.
Additionally this does not validate whether the code path did any perm
checking to begin with -- it will pass if the inode happens to be
regular.
Keep the redundant path_noexec() check even though it's mindless
nonsense checking for guarantee that isn't given so drop the WARN.
Reword the commentary and do small tidy ups while here.
Put __percpu annotation at the right place to fix these warnings.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20240730121915.4514-1-ubizjak@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
Fixed: fs: file_table_c: Missing blank line warnings and struct declaration improved
Fixed-
WARNING: Missing a blank line after declarations
WARNING: Missing a blank line after declarations
Declaration format: improved struct file declaration format
fs: don't flush in-flight wb switches for superblocks without cgroup writeback
When deactivating any type of superblock, it had to wait for the in-flight
wb switches to be completed. wb switches are executed in inode_switch_wbs_work_fn()
which needs to acquire the wb_switch_rwsem and races against sync_inodes_sb().
If there are too much dirty data in the superblock, the waiting time may increase
significantly.
For superblocks without cgroup writeback such as tmpfs, they have nothing to
do with the wb swithes, so the flushing can be avoided.
Merge patch series "Add an fcntl() to check file creation"
Christian Brauner <brauner@kernel.org> says:
Systemd has a helper called openat_report_new() that returns whether a
file was created anew or it already existed before for cases where
O_CREAT has to be used without O_EXCL (cf. [1]). That apparently isn't
something that's specific to systemd but it's where I noticed it.
The current logic is that it first attempts to open the file without
O_CREAT | O_EXCL and if it gets ENOENT the helper tries again with both
flags. If that succeeds all is well. If it now reports EEXIST it
retries.
That works fairly well but some corner cases make this more involved. If
this operates on a dangling symlink the first openat() without O_CREAT |
O_EXCL will return ENOENT but the second openat() with O_CREAT | O_EXCL
will fail with EEXIST. The reason is that openat() without O_CREAT |
O_EXCL follows the symlink while O_CREAT | O_EXCL doesn't for security
reasons. So it's not something we can really change unless we add an
explicit opt-in via O_FOLLOW which seems really ugly.
The caller could try and use fanotify() to register to listen for
creation events in the directory before calling openat(). The caller
could then compare the returned tid to its own tid to ensure that even
in threaded environments it actually created the file. That might work
but is a lot of work for something that should be fairly simple and I'm
uncertain about it's reliability.
The caller could use a bpf lsm hook to hook into security_file_open() to
figure out whether they created the file. That also seems a bit wild.
So let's add F_CREATED_QUERY which allows the caller to check whether
they actually did create the file. That has caveats of course but I
don't
think they are problematic:
* In multi-threaded environments a thread can only be sure that it did
create the file if it calls openat() with O_CREAT. In other words,
it's obviously not enough to just go through it's fdtable and check
these fds because another thread could've created the file.
* If there's any codepaths where an openat() with O_CREAT would yield
the same struct file as that of another thread it would obviously
cause wrong results. I'm not aware of any such codepaths from openat()
itself. Imho, that would be a bug.
* Related to the previous point, calling the new fcntl() on files created
and opened via special-purpose system calls or ioctl()s would cause
wrong results only if the affected subsystem a) raises FMODE_CREATED
and b) may return the same struct file for two different calls. I'm
not seeing anything outside of regular VFS code that raises
FMODE_CREATED.
There is code for b) in e.g., the drm layer where the same struct file
is resurfaced but again FMODE_CREATED isn't used and it would be very
misleading if it did.
Systemd has a helper called openat_report_new() that returns whether a
file was created anew or it already existed before for cases where
O_CREAT has to be used without O_EXCL (cf. [1]). That apparently isn't
something that's specific to systemd but it's where I noticed it.
The current logic is that it first attempts to open the file without
O_CREAT | O_EXCL and if it gets ENOENT the helper tries again with both
flags. If that succeeds all is well. If it now reports EEXIST it
retries.
That works fairly well but some corner cases make this more involved. If
this operates on a dangling symlink the first openat() without O_CREAT |
O_EXCL will return ENOENT but the second openat() with O_CREAT | O_EXCL
will fail with EEXIST. The reason is that openat() without O_CREAT |
O_EXCL follows the symlink while O_CREAT | O_EXCL doesn't for security
reasons. So it's not something we can really change unless we add an
explicit opt-in via O_FOLLOW which seems really ugly.
The caller could try and use fanotify() to register to listen for
creation events in the directory before calling openat(). The caller
could then compare the returned tid to its own tid to ensure that even
in threaded environments it actually created the file. That might work
but is a lot of work for something that should be fairly simple and I'm
uncertain about it's reliability.
The caller could use a bpf lsm hook to hook into security_file_open() to
figure out whether they created the file. That also seems a bit wild.
So let's add F_CREATED_QUERY which allows the caller to check whether
they actually did create the file. That has caveats of course but I
don't think they are problematic:
* In multi-threaded environments a thread can only be sure that it did
create the file if it calls openat() with O_CREAT. In other words,
it's obviously not enough to just go through it's fdtable and check
these fds because another thread could've created the file.
* If there's any codepaths where an openat() with O_CREAT would yield
the same struct file as that of another thread it would obviously
cause wrong results. I'm not aware of any such codepaths from openat()
itself. Imho, that would be a bug.
* Related to the previous point, calling the new fcntl() on files created
and opened via special-purpose system calls or ioctl()s would cause
wrong results only if the affected subsystem a) raises FMODE_CREATED
and b) may return the same struct file for two different calls. I'm
not seeing anything outside of regular VFS code that raises
FMODE_CREATED.
There is code for b) in e.g., the drm layer where the same struct file
is resurfaced but again FMODE_CREATED isn't used and it would be very
misleading if it did.
Linus Torvalds [Sun, 18 Aug 2024 17:19:49 +0000 (10:19 -0700)]
Merge tag 'driver-core-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are two driver fixes for regressions from 6.11-rc1 due to the
driver core change making a structure in a driver core callback const.
These were missed by all testing EXCEPT for what Bart happened to be
running, so I appreciate the fixes provided here for some
odd/not-often-used driver subsystems that nothing else happened to
catch.
Both of these fixes have been in linux-next all week with no reported
issues"
* tag 'driver-core-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
mips: sgi-ip22: Fix the build
ARM: riscpc: ecard: Fix the build
Linus Torvalds [Sun, 18 Aug 2024 17:16:34 +0000 (10:16 -0700)]
Merge tag 'char-misc-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc fixes from Greg KH:
"Here are some small char/misc fixes for 6.11-rc4 to resolve reported
problems. Included in here are:
- fastrpc revert of a change that broke userspace
- xillybus fixes for reported issues
Half of these have been in linux-next this week with no reported
problems, I don't know if the last bit of xillybus driver changes made
it in, but they are 'obviously correct' so will be safe :)"
* tag 'char-misc-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
char: xillybus: Check USB endpoints when probing device
char: xillybus: Refine workqueue handling
Revert "misc: fastrpc: Restrict untrusted app to attach to privileged PD"
char: xillybus: Don't destroy workqueue from work item running on it
Linus Torvalds [Sun, 18 Aug 2024 17:10:48 +0000 (10:10 -0700)]
Merge tag 'tty-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial fixes from Greg KH:
"Here are some small tty and serial driver fixes for 6.11-rc4 to
resolve some reported problems. Included in here are:
- conmakehash.c userspace build issues
- fsl_lpuart driver fix
- 8250_omap revert for reported regression
- atmel_serial rts flag fix
All of these have been in linux-next this week with no reported
issues"
* tag 'tty-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "serial: 8250_omap: Set the console genpd always on if no console suspend"
tty: atmel_serial: use the correct RTS flag.
tty: vt: conmakehash: remove non-portable code printing comment header
tty: serial: fsl_lpuart: mark last busy before uart_add_one_port
Linus Torvalds [Sun, 18 Aug 2024 16:59:06 +0000 (09:59 -0700)]
Merge tag 'usb-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB / Thunderbolt driver fixes from Greg KH:
"Here are some small USB and Thunderbolt driver fixes for 6.11-rc4 to
resolve some reported issues. Included in here are:
- thunderbolt driver fixes for reported problems
- typec driver fixes
- xhci fixes
- new device id for ljca usb driver
All of these have been in linux-next this week with no reported
issues"
* tag 'usb-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: Fix Panther point NULL pointer deref at full-speed re-enumeration
usb: misc: ljca: Add Lunar Lake ljca GPIO HID to ljca_gpio_hids[]
Revert "usb: typec: tcpm: clear pd_event queue in PORT_RESET"
usb: typec: ucsi: Fix the return value of ucsi_run_command()
usb: xhci: fix duplicate stall handling in handle_tx_event()
usb: xhci: Check for xhci->interrupters being allocated in xhci_mem_clearup()
thunderbolt: Mark XDomain as unplugged when router is removed
thunderbolt: Fix memory leaks in {port|retimer}_sb_regs_write()
Linus Torvalds [Sun, 18 Aug 2024 15:50:36 +0000 (08:50 -0700)]
Merge tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull more btrfs fixes from David Sterba:
"A more fixes. We got reports that shrinker added in 6.10 still causes
latency spikes and the fixes don't handle all corner cases. Due to
summer holidays we're taking a shortcut to disable it for release
builds and will fix it in the near future.
- only enable extent map shrinker for DEBUG builds, temporary quick
fix to avoid latency spikes for regular builds
- update target inode's ctime on unlink, mandated by POSIX
- properly take lock to read/update block group's zoned variables
- add counted_by() annotations"
* tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: only enable extent map shrinker for DEBUG builds
btrfs: zoned: properly take lock to read/update block group's zoned variables
btrfs: tree-checker: add dev extent item checks
btrfs: update target inode's ctime on unlink
btrfs: send: annotate struct name_cache_entry with __counted_by()
Jann Horn [Tue, 6 Aug 2024 19:51:42 +0000 (21:51 +0200)]
fuse: Initialize beyond-EOF page contents before setting uptodate
fuse_notify_store(), unlike fuse_do_readpage(), does not enable page
zeroing (because it can be used to change partial page contents).
So fuse_notify_store() must be more careful to fully initialize page
contents (including parts of the page that are beyond end-of-file)
before marking the page uptodate.
The current code can leave beyond-EOF page contents uninitialized, which
makes these uninitialized page contents visible to userspace via mmap().
This is an information leak, but only affects systems which do not
enable init-on-alloc (via CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y or the
corresponding kernel command line parameter).
Linus Torvalds [Sun, 18 Aug 2024 02:50:16 +0000 (19:50 -0700)]
Merge tag 'mm-hotfixes-stable-2024-08-17-19-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"16 hotfixes. All except one are for MM. 10 of these are cc:stable and
the others pertain to post-6.10 issues.
As usual with these merges, singletons and doubletons all over the
place, no identifiable-by-me theme. Please see the lovingly curated
changelogs to get the skinny"
* tag 'mm-hotfixes-stable-2024-08-17-19-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/migrate: fix deadlock in migrate_pages_batch() on large folios
alloc_tag: mark pages reserved during CMA activation as not tagged
alloc_tag: introduce clear_page_tag_ref() helper function
crash: fix riscv64 crash memory reserve dead loop
selftests: memfd_secret: don't build memfd_secret test on unsupported arches
mm: fix endless reclaim on machines with unaccepted memory
selftests/mm: compaction_test: fix off by one in check_compaction()
mm/numa: no task_numa_fault() call if PMD is changed
mm/numa: no task_numa_fault() call if PTE is changed
mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
mm: don't account memmap per-node
mm: add system wide stats items category
mm: don't account memmap on failure
mm/hugetlb: fix hugetlb vs. core-mm PT locking
mseal: fix is_madv_discard()
Linus Torvalds [Sun, 18 Aug 2024 02:23:02 +0000 (19:23 -0700)]
Merge tag 'powerpc-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix crashes on 85xx with some configs since the recent hugepd rework.
- Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL on some
platforms.
- Don't enable offline cores when changing SMT modes, to match existing
userspace behaviour.
Thanks to Christophe Leroy, Dr. David Alan Gilbert, Guenter Roeck, Nysal
Jan K.A, Shrikanth Hegde, Thomas Gleixner, and Tyrel Datwyler.
* tag 'powerpc-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/topology: Check if a core is online
cpu/SMT: Enable SMT only if a core is online
powerpc/mm: Fix boot warning with hugepages and CONFIG_DEBUG_VIRTUAL
powerpc/mm: Fix size of allocated PGDIR
soc: fsl: qbman: remove unused struct 'cgr_comp'
Linus Torvalds [Sat, 17 Aug 2024 23:31:12 +0000 (16:31 -0700)]
Merge tag 'v6.11-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- fix for clang warning - additional null check
- fix for cached write with posix locks
- flexible structure fix
* tag 'v6.11-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: smb2pdu.h: Use static_assert() to check struct sizes
smb3: fix lock breakage for cached writes
smb/client: avoid possible NULL dereference in cifs_free_subrequest()
Linus Torvalds [Sat, 17 Aug 2024 23:23:05 +0000 (16:23 -0700)]
Merge tag 'i2c-for-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C core fix replacing IS_ENABLED() with IS_REACHABLE()
For host drivers, there are two fixes:
- Tegra I2C Controller: Addresses a potential double-locking issue
during probe. ACPI devices are not IRQ-safe when invoking runtime
suspend and resume functions, so the irq_safe flag should not be
set.
- Qualcomm GENI I2C Controller: Fixes an oversight in the exit path
of the runtime_resume() function, which was missed in the previous
release"
* tag 'i2c-for-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: tegra: Do not mark ACPI devices as irq safe
i2c: Use IS_REACHABLE() for substituting empty ACPI functions
i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume
Linus Torvalds [Sat, 17 Aug 2024 17:04:01 +0000 (10:04 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes to the mpi3mr driver. One to avoid oversize
allocations in tracing and the other to fix an uninitialized spinlock
in the user to driver feature request code (used to trigger dumps and
the like)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mpi3mr: Avoid MAX_PAGE_ORDER WARNING for buffer allocations
scsi: mpi3mr: Add missing spin_lock_init() for mrioc->trigger_lock
Linus Torvalds [Sat, 17 Aug 2024 16:51:28 +0000 (09:51 -0700)]
Merge tag 'xfs-6.11-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Check for presence of only 'attr' feature before scrubbing an inode's
attribute fork.
- Restore the behaviour of setting AIL thread to TASK_INTERRUPTIBLE for
long (i.e. 50ms) sleep durations to prevent high load averages.
- Do not allow users to change the realtime flag of a file unless the
datadev and rtdev both support fsdax access modes.
* tag 'xfs-6.11-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set
xfs: revert AIL TASK_KILLABLE threshold
xfs: attr forks require attr, not attr2
Linus Torvalds [Sat, 17 Aug 2024 16:46:10 +0000 (09:46 -0700)]
Merge tag 'bcachefs-2024-08-16' of git://evilpiepirate.org/bcachefs
Pull bcachefs fixes from Kent OverstreetL
- New on disk format version, bcachefs_metadata_version_disk_accounting_inum
This adds one more disk accounting counter, which counts disk usage
and number of extents per inode number. This lets us track
fragmentation, for implementing defragmentation later, and it also
counts disk usage per inode in all snapshots, which will be a useful
thing to expose to users.
- One performance issue we've observed is threads spinning when they
should be waiting for dirty keys in the key cache to be flushed by
journal reclaim, so we now have hysteresis for the waiting thread, as
well as improving the tracepoint and a new time_stat, for tracking
time blocked waiting on key cache flushing.
... and various assorted smaller fixes.
* tag 'bcachefs-2024-08-16' of git://evilpiepirate.org/bcachefs:
bcachefs: Fix locking in __bch2_trans_mark_dev_sb()
bcachefs: fix incorrect i_state usage
bcachefs: avoid overflowing LRU_TIME_BITS for cached data lru
bcachefs: Fix forgetting to pass trans to fsck_err()
bcachefs: Increase size of cuckoo hash table on too many rehashes
bcachefs: bcachefs_metadata_version_disk_accounting_inum
bcachefs: Kill __bch2_accounting_mem_mod()
bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()
bcachefs: Fix warning in __bch2_fsck_err() for trans not passed in
bcachefs: Add a time_stat for blocked on key cache flush
bcachefs: Improve trans_blocked_journal_reclaim tracepoint
bcachefs: Add hysteresis to waiting on btree key cache flush
lib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc()
bcachefs: Convert for_each_btree_node() to lockrestart_do()
bcachefs: Add missing downgrade table entry
bcachefs: disk accounting: ignore unknown types
bcachefs: bch2_accounting_invalid() fixup
bcachefs: Fix bch2_trigger_alloc when upgrading from old versions
bcachefs: delete faulty fastpath in bch2_btree_path_traverse_cached()
Linus Torvalds [Sat, 17 Aug 2024 00:02:32 +0000 (17:02 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Fix the arm64 __get_mem_asm() to use the _ASM_EXTABLE_##type##ACCESS()
macro instead of the *_ERR() one in order to avoid writing -EFAULT to
the value register in case of a fault
- Initialise all elements of the acpi_early_node_map[] to NUMA_NO_NODE.
Prior to this fix, only the first element was initialised
- Move the KASAN random tag seed initialisation after the per-CPU areas
have been initialised (prng_state is __percpu)
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix KASAN random tag seed initialization
arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
arm64: uaccess: correct thinko in __get_mem_asm()
Linus Torvalds [Fri, 16 Aug 2024 23:59:05 +0000 (16:59 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"One fix for the new T-Head TH1520 clk driver that marks a bus clk
critical so that it isn't turned off during late init which breaks
emmc-sdio"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: thead: fix dependency on clk_ignore_unused
Linus Torvalds [Fri, 16 Aug 2024 21:03:31 +0000 (14:03 -0700)]
Merge tag 'block-6.11-20240824' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Fix corruption issues with s390/dasd (Eric, Stefan)
- Fix a misuse of non irq locking grab of a lock (Li)
- MD pull request with a single data corruption fix for raid1 (Yu)
* tag 'block-6.11-20240824' of git://git.kernel.dk/linux:
block: Fix lockdep warning in blk_mq_mark_tag_wait
md/raid1: Fix data corruption for degraded array with slow disk
s390/dasd: fix error recovery leading to data corruption on ESE devices
s390/dasd: Remove DMA alignment
Linus Torvalds [Fri, 16 Aug 2024 21:00:05 +0000 (14:00 -0700)]
Merge tag 'io_uring-6.11-20240824' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Fix a comment in the uapi header using the wrong member name (Caleb)
- Fix KCSAN warning for a debug check in sqpoll (me)
- Two more NAPI tweaks (Olivier)
* tag 'io_uring-6.11-20240824' of git://git.kernel.dk/linux:
io_uring: fix user_data field name in comment
io_uring/sqpoll: annotate debug task == current with data_race()
io_uring/napi: remove duplicate io_napi_entry timeout assignation
io_uring/napi: check napi_enabled in io_napi_add() before proceeding
Qu Wenruo [Fri, 16 Aug 2024 01:10:38 +0000 (10:40 +0930)]
btrfs: only enable extent map shrinker for DEBUG builds
Although there are several patches improving the extent map shrinker,
there are still reports of too frequent shrinker behavior, taking too
much CPU for the kswapd process.
So let's only enable extent shrinker for now, until we got more
comprehensive understanding and a better solution.
Linus Torvalds [Fri, 16 Aug 2024 18:49:07 +0000 (11:49 -0700)]
Merge tag 'thermal-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Fix a Bang-bang thermal governor issue causing it to fail to reset the
state of cooling devices if they are 'on' to start with, but the
thermal zone temperature is always below the corresponding trip point
(Rafael Wysocki)"
* tag 'thermal-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: gov_bang_bang: Use governor_data to reduce overhead
thermal: gov_bang_bang: Add .manage() callback
thermal: gov_bang_bang: Split bang_bang_control()
thermal: gov_bang_bang: Call __thermal_cdev_update() directly
Linus Torvalds [Fri, 16 Aug 2024 18:43:54 +0000 (11:43 -0700)]
Merge tag 'acpi-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix an issue related to the ACPI EC device handling that causes the
_REG control method to be evaluated for EC operation regions that are
not expected to be used.
This confuses the platform firmware and provokes various types of
misbehavior on some systems (Rafael Wysocki)"
* tag 'acpi-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: EC: Evaluate _REG outside the EC scope more carefully
ACPICA: Add a depth argument to acpi_execute_reg_methods()
Revert "ACPI: EC: Evaluate orphan _REG under EC device"
Linus Torvalds [Fri, 16 Aug 2024 18:36:40 +0000 (11:36 -0700)]
Merge tag 'libnvdimm-fixes-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Ira Weiny:
"Commit f467fee48da4 ("block: move the dax flag to queue_limits") broke
the DAX tests by skipping over the legacy pmem mapping pages case.
Set the DAX flag in this case as well"
* tag 'libnvdimm-fixes-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm/pmem: Set dax flag for all 'PFN_MAP' cases
Linus Torvalds [Fri, 16 Aug 2024 18:24:06 +0000 (11:24 -0700)]
Merge tag 'rust-fixes-6.11' of https://github.com/Rust-for-Linux/linux
Pull rust fixes from Miguel Ojeda:
- Fix '-Os' Rust 1.80.0+ builds adding more intrinsics (also tweaked in
upstream Rust for the upcoming 1.82.0).
- Fix support for the latest version of rust-analyzer due to a change
on rust-analyzer config file semantics (considered a fix since most
developers use the latest version of the tool, which is the only one
actually supported by upstream). I am discussing stability of the
config file with upstream -- they may be able to start versioning it.
- Fix GCC 14 builds due to '-fmin-function-alignment' not skipped for
libclang (bindgen).
- A couple Kconfig fixes around '{RUSTC,BINDGEN}_VERSION_TEXT' to
suppress error messages in a foreign architecture chroot and to use a
proper default format.
- Clean 'rust-analyzer' target warning due to missing recursive make
invocation mark.
- Clean Clippy warning due to missing indentation in docs.
- Clean LLVM 19 build warning due to removed 3dnow feature upstream.
* tag 'rust-fixes-6.11' of https://github.com/Rust-for-Linux/linux:
rust: x86: remove `-3dnow{,a}` from target features
kbuild: rust-analyzer: mark `rust_is_available.sh` invocation as recursive
rust: add intrinsics to fix `-Os` builds
kbuild: rust: skip -fmin-function-alignment in bindgen flags
rust: Support latest version of `rust-analyzer`
rust: macros: indent list item in `module!`'s docs
rust: fix the default format for CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT
rust: suppress error messages from CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT
Linus Torvalds [Fri, 16 Aug 2024 18:18:09 +0000 (11:18 -0700)]
Merge tag 'riscv-for-linus-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- reintroduce the text patching global icache flush
- fix syscall entry code to correctly initialize a0, which manifested
as a strace bug
- XIP kernels now map the entire kernel, which fixes boot under at
least DEBUG_VIRTUAL=y
- initialize all nodes in the acpi_early_node_map initializer
- fix OOB access in the Andes vendor extension probing code
- A new key for scalar misaligned access performance in hwprobe, which
correctly treat the values as an enum (as opposed to a bitmap)
* tag 'riscv-for-linus-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Fix out-of-bounds when accessing Andes per hart vendor extension array
RISC-V: hwprobe: Add SCALAR to misaligned perf defines
RISC-V: hwprobe: Add MISALIGNED_PERF key
RISC-V: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
riscv: change XIP's kernel_map.size to be size of the entire kernel
riscv: entry: always initialize regs->a0 to -ENOSYS
riscv: Re-introduce global icache flush in patch_text_XXX()
Linus Torvalds [Fri, 16 Aug 2024 18:12:29 +0000 (11:12 -0700)]
Merge tag 'trace-v6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"A couple of fixes for tracing:
- Prevent a NULL pointer dereference in the error path of RTLA tool
- Fix an infinite loop bug when reading from the ring buffer when
closed. If there's a thread trying to read the ring buffer and it
gets closed by another thread, the one reading will go into an
infinite loop when the buffer is empty instead of exiting back to
user space"
* tag 'trace-v6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rtla/osnoise: Prevent NULL dereference in error handling
tracing: Return from tracing_buffers_read() if the file has been closed
Linus Torvalds [Fri, 16 Aug 2024 15:56:45 +0000 (08:56 -0700)]
Merge tag 'iommu-fixes-v6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
- Bring back a lost return statement in io-page-fault code
- Remove an unused function declaration
* tag 'iommu-fixes-v6.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu: Remove unused declaration iommu_sva_unbind_gpasid()
iommu: Restore lost return in iommu_report_device_fault()
Linus Torvalds [Fri, 16 Aug 2024 15:39:41 +0000 (08:39 -0700)]
Merge tag 'sound-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"All small fixes, mostly for usual suspects, HD-audio and USB-audio
device-specific fixes / quirks. The Cirrus codec support took the
update of SPI header as well. Other than that, there is a regression
fix in the sanity check of ALSA timer code"
* tag 'sound-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/tas2781: Use correct endian conversion
ALSA: usb-audio: Support Yamaha P-125 quirk entry
ALSA: hda: cs35l41: Remove redundant call to hda_cs_dsp_control_remove()
ALSA: hda: cs35l56: Remove redundant call to hda_cs_dsp_control_remove()
ALSA: hda/tas2781: fix wrong calibrated data order
ALSA: usb-audio: Add delay quirk for VIVO USB-C-XE710 HEADSET
ALSA: hda/realtek: Add support for new HP G12 laptops
ALSA: hda/realtek: Fix noise from speakers on Lenovo IdeaPad 3 15IAU7
ALSA: timer: Relax start tick time check for slave timer elements
spi: Add empty versions of ACPI functions
Linus Torvalds [Fri, 16 Aug 2024 15:35:50 +0000 (08:35 -0700)]
Merge tag 'drm-fixes-2024-08-16' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly drm fixes, mostly amdgpu and xe. The larger amdgpu fix is for a
new IP block introduced in rc1, so should be fine. The xe fixes
contain some missed fixes from the end of the previous round along
with some fixes which required precursor changes, but otherwise
everything seems fine,
xe:
- Validate user fence during creation
- Fix use after free when client stats are captured
- SRIOV fixes
- Runtime PM fixes"
* tag 'drm-fixes-2024-08-16' of https://gitlab.freedesktop.org/drm/kernel: (37 commits)
drm/xe: Hold a PM ref when GT TLB invalidations are inflight
drm/xe: Drop xe_gt_tlb_invalidation_wait
drm/xe: Add xe_gt_tlb_invalidation_fence_init helper
drm/xe/pf: Fix VF config validation on multi-GT platforms
drm/xe: Build PM into GuC CT layer
drm/xe/vf: Fix register value lookup
drm/xe: Fix use after free when client stats are captured
drm/xe: Take a ref to xe file when user creates a VM
drm/xe: Add ref counting for xe_file
drm/xe: Move part of xe_file cleanup to a helper
drm/xe: Validate user fence during creation
drm/rockchip: inno-hdmi: Fix infoframe upload
drm/amd/amdgpu: add HDP_SD support on gc 12.0.0/1
drm/amdgpu: Update kmd_fw_shared for VCN5
drm/amd/amdgpu: command submission parser for JPEG
drm/amdgpu/mes12: fix suspend issue
drm/amdgpu/mes12: sw/hw fini for unified mes
drm/amdgpu/mes12: configure two pipes hardware resources
drm/amdgpu/mes12: adjust mes12 sw/hw init for multiple pipes
drm/amdgpu/mes12: add mes pipe switch support
...
Wolfram Sang [Fri, 16 Aug 2024 14:23:51 +0000 (16:23 +0200)]
Merge tag 'i2c-host-fixes-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current
Two fixes in this update:
Tegra I2C Controller: Addresses a potential double-locking issue
during probe. ACPI devices are not IRQ-safe when invoking runtime
suspend and resume functions, so the irq_safe flag should not be
set.
Qualcomm GENI I2C Controller: Fixes an oversight in the exit path
of the runtime_resume() function, which was missed in the
previous release.
thermal: gov_bang_bang: Use governor_data to reduce overhead
After running once, the for_each_trip_desc() loop in
bang_bang_manage() is pure needless overhead because it is not going to
make any changes unless a new cooling device has been bound to one of
the trips in the thermal zone or the system is resuming from sleep.
For this reason, make bang_bang_manage() set governor_data for the
thermal zone and check it upfront to decide whether or not it needs to
do anything.
However, governor_data needs to be reset in some cases to let
bang_bang_manage() know that it should walk the trips again, so add an
.update_tz() callback to the governor and make the core additionally
invoke it during system resume.
To avoid affecting the other users of that callback unnecessarily, add
a special notification reason for system resume, THERMAL_TZ_RESUME, and
also pass it to __thermal_zone_device_update() called during system
resume for consistency.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Kästle <peter@piie.net> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Cc: 6.10+ <stable@vger.kernel.org> # 6.10+ Link: https://patch.msgid.link/2285575.iZASKD2KPV@rjwysocki.net
After recent changes, the Bang-bang governor may not adjust the
initial configuration of cooling devices to the actual situation.
Namely, if a cooling device bound to a certain trip point starts in
the "on" state and the thermal zone temperature is below the threshold
of that trip point, the trip point may never be crossed on the way up
in which case the state of the cooling device will never be adjusted
because the thermal core will never invoke the governor's
.trip_crossed() callback. [Note that there is no issue if the zone
temperature is at the trip threshold or above it to start with because
.trip_crossed() will be invoked then to indicate the start of thermal
mitigation for the given trip.]
To address this, add a .manage() callback to the Bang-bang governor
and use it to ensure that all of the thermal instances managed by the
governor have been initialized properly and the states of all of the
cooling devices involved have been adjusted to the current zone
temperature as appropriate.
Move the setting of the thermal instance target state from
bang_bang_control() into a separate function that will be also called
in a different place going forward.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Kästle <peter@piie.net> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Cc: 6.10+ <stable@vger.kernel.org> # 6.10+ Link: https://patch.msgid.link/3313587.aeNJFYEL58@rjwysocki.net
Instead of clearing the "updated" flag for each cooling device
affected by the trip point crossing in bang_bang_control() and
walking all thermal instances to run thermal_cdev_update() for all
of the affected cooling devices, call __thermal_cdev_update()
directly for each of them.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Kästle <peter@piie.net> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Cc: 6.10+ <stable@vger.kernel.org> # 6.10+ Link: https://patch.msgid.link/13583081.uLZWGnKmhe@rjwysocki.net
Eli Billauer [Fri, 16 Aug 2024 07:02:00 +0000 (10:02 +0300)]
char: xillybus: Check USB endpoints when probing device
Ensure, as the driver probes the device, that all endpoints that the
driver may attempt to access exist and are of the correct type.
All XillyUSB devices must have a Bulk IN and Bulk OUT endpoint at
address 1. This is verified in xillyusb_setup_base_eps().
On top of that, a XillyUSB device may have additional Bulk OUT
endpoints. The information about these endpoints' addresses is deduced
from a data structure (the IDT) that the driver fetches from the device
while probing it. These endpoints are checked in setup_channels().
A XillyUSB device never has more than one IN endpoint, as all data
towards the host is multiplexed in this single Bulk IN endpoint. This is
why setup_channels() only checks OUT endpoints.
Reported-by: syzbot+eac39cba052f2e750dbe@syzkaller.appspotmail.com Cc: stable <stable@kernel.org> Closes: https://lore.kernel.org/all/0000000000001d44a6061f7a54ee@google.com/T/ Fixes: a53d1202aef1 ("char: xillybus: Add driver for XillyUSB (Xillybus variant for USB)"). Signed-off-by: Eli Billauer <eli.billauer@gmail.com> Link: https://lore.kernel.org/r/20240816070200.50695-2-eli.billauer@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mm/migrate: fix deadlock in migrate_pages_batch() on large folios
Currently, migrate_pages_batch() can lock multiple locked folios with an
arbitrary order. Although folio_trylock() is used to avoid deadlock as
commit 2ef7dbb26990 ("migrate_pages: try migrate in batch asynchronously
firstly") mentioned, it seems try_split_folio() is still missing.
It was found by compaction stress test when I explicitly enable EROFS
compressed files to use large folios, which case I cannot reproduce with
the same workload if large folio support is off (current mainline).
Typically, filesystem reads (with locked file-backed folios) could use
another bdev/meta inode to load some other I/Os (e.g. inode extent
metadata or caching compressed data), so the locking order will be:
file-backed folios (A)
bdev/meta folios (B)
The following calltrace shows the deadlock:
Thread 1 takes (B) lock and tries to take folio (A) lock
Thread 2 takes (A) lock and tries to take folio (B) lock
[Thread 1]
INFO: task stress:1824 blocked for more than 30 seconds.
Tainted: G OE 6.10.0-rc7+ #6
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:stress state:D stack:0 pid:1824 tgid:1824 ppid:1822 flags:0x0000000c
Call trace:
__switch_to+0xec/0x138
__schedule+0x43c/0xcb0
schedule+0x54/0x198
io_schedule+0x44/0x70
folio_wait_bit_common+0x184/0x3f8
<-- folio mapping ffff00036d69cb18 index 996 (**)
__folio_lock+0x24/0x38
migrate_pages_batch+0x77c/0xea0 // try_split_folio (mm/migrate.c:1486:2)
// migrate_pages_batch (mm/migrate.c:1734:16)
<--- LIST_HEAD(unmap_folios) has
..
folio mapping 0xffff0000d184f1d8 index 1711; (*)
folio mapping 0xffff0000d184f1d8 index 1712;
..
migrate_pages+0xb28/0xe90
compact_zone+0xa08/0x10f0
compact_node+0x9c/0x180
sysctl_compaction_handler+0x8c/0x118
proc_sys_call_handler+0x1a8/0x280
proc_sys_write+0x1c/0x30
vfs_write+0x240/0x380
ksys_write+0x78/0x118
__arm64_sys_write+0x24/0x38
invoke_syscall+0x78/0x108
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x3c/0x148
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
[Thread 2]
INFO: task stress:1825 blocked for more than 30 seconds.
Tainted: G OE 6.10.0-rc7+ #6
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:stress state:D stack:0 pid:1825 tgid:1825 ppid:1822 flags:0x0000000c
Call trace:
__switch_to+0xec/0x138
__schedule+0x43c/0xcb0
schedule+0x54/0x198
io_schedule+0x44/0x70
folio_wait_bit_common+0x184/0x3f8
<-- folio = 0xfffffdffc6b503c0 (mapping == 0xffff0000d184f1d8 index == 1711) (*)
__folio_lock+0x24/0x38
z_erofs_runqueue+0x384/0x9c0 [erofs]
z_erofs_readahead+0x21c/0x350 [erofs] <-- folio mapping 0xffff00036d69cb18 range from [992, 1024] (**)
read_pages+0x74/0x328
page_cache_ra_order+0x26c/0x348
ondemand_readahead+0x1c0/0x3a0
page_cache_sync_ra+0x9c/0xc0
filemap_get_pages+0xc4/0x708
filemap_read+0x104/0x3a8
generic_file_read_iter+0x4c/0x150
vfs_read+0x27c/0x330
ksys_pread64+0x84/0xd0
__arm64_sys_pread64+0x28/0x40
invoke_syscall+0x78/0x108
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x3c/0x148
el0t_64_sync_handler+0x100/0x130
el0t_64_sync+0x190/0x198
Link: https://lkml.kernel.org/r/20240729021306.398286-1-hsiangkao@linux.alibaba.com Fixes: 5dfab109d519 ("migrate_pages: batch _unmap and _move") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>