There are two check of 'mreplace' in raid10_sync_request(). In the first
check, 'need_replace' will be set and 'mreplace' will be used later if
no-Faulty 'mreplace' exists, In the second check, 'mreplace' will be
set to NULL if it is Faulty, but 'need_replace' will not be changed
accordingly. null-ptr-deref occurs if Faulty is set between two check.
Fix it by merging two checks into one. And replace 'need_replace' with
'mreplace' because their values are always the same.
Fixes: ee37d7314a32 ("md/raid10: Fix raid10 replace hang when new added disk faulty") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230527072218.2365857-2-linan666@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
There is no input check when echo md/safe_mode_delay in safe_delay_store().
And msec might also overflow when HZ < 1000 in safe_delay_show(), Fix it by
checking overflow in safe_delay_store() and use unsigned long conversion in
safe_delay_show().
If we write a large number to md/bitmap_set_bits, md_bitmap_checkpage()
will return -EINVAL because 'page >= bitmap->pages', but the return value
was not checked immediately in md_bitmap_get_counter() in order to set
*blocks value and slab-out-of-bounds occurs.
Move check of 'page >= bitmap->pages' to md_bitmap_get_counter() and
return directly if true.
Fixes: ef4256733506 ("md/bitmap: optimise scanning of empty bitmaps.") Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230515134808.3936750-2-linan666@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The below-mentioned patch was intended to simplify refcounting on the
svc_serv used by locked. The goal was to only ever have a single
reference from the single thread. To that end we dropped a call to
lockd_start_svc() (except when creating thread) which would take a
reference, and dropped the svc_put(serv) that would drop that reference.
Unfortunately we didn't also remove the svc_get() from
lockd_create_svc() in the case where the svc_serv already existed.
So after the patch:
- on the first call the svc_serv was allocated and the one reference
was given to the thread, so there are no extra references
- on subsequent calls svc_get() was called so there is now an extra
reference.
This is clearly not consistent.
The inconsistency is also clear in the current code in lockd_get()
takes *two* references, one on nlmsvc_serv and one by incrementing
nlmsvc_users. This clearly does not match lockd_put().
So: drop that svc_get() from lockd_get() (which used to be in
lockd_create_svc().
In __blk_mq_tag_busy/idle(), updating 'active_queues' and calculating
'wake_batch' is not atomic:
t1: t2:
_blk_mq_tag_busy blk_mq_tag_busy
inc active_queues
// assume 1->2
inc active_queues
// 2 -> 3
blk_mq_update_wake_batch
// calculate based on 3
blk_mq_update_wake_batch
/* calculate based on 2, while active_queues is actually 3. */
Fix this problem by protecting them wih 'tags->lock', this is not a hot
path, so performance should not be concerned. And now that all writers
are inside the lock, switch 'actives_queues' from atomic to unsigned
int.
Fixes: 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230610023043.2559121-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
When blkg_alloc() is called to allocate a blkcg_gq structure
with the associated blkg_iostat_set's, there are 2 fields within
blkg_iostat_set that requires proper initialization - blkg & sync.
The former field was introduced by commit 3b8cc6298724 ("blk-cgroup:
Optimize blkcg_rstat_flush()") while the later one was introduced by
commit f73316482977 ("blk-cgroup: reimplement basic IO stats using
cgroup rstat").
Unfortunately those fields in the blkg_iostat_set's are not properly
re-initialized when they are cleared in v1's blkcg_reset_stats(). This
can lead to a kernel panic due to NULL pointer access of the blkg
pointer. The missing initialization of sync is less problematic and
can be a problem in a debug kernel due to missing lockdep initialization.
Fix these problems by re-initializing them after memory clearing.
Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()") Fixes: f73316482977 ("blk-cgroup: reimplement basic IO stats using cgroup rstat") Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230606180724.2455066-1-longman@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
When calculating an end address based on an unsigned int number of pages,
any value greater than or equal to 0x100000 that is shift PAGE_SHIFT bits
results in a 0 value, resulting in an invalid end address. Change the
number of pages variable in various routines from an unsigned int to an
unsigned long to calculate the end address correctly.
Fixes: 5e5ccff60a29 ("x86/sev: Add helper for validating pages in early enc attribute changes") Fixes: dc3f3d2474b8 ("x86/mm: Validate memory when changing the C-bit") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/6a6e4eea0e1414402bac747744984fa4e9c01bb6.1686063086.git.thomas.lendacky@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
adjust_inuse_and_calc_cost() use spin_lock_irq() and IRQ will be enabled
when unlock. DEADLOCK might happen if we have held other locks and disabled
IRQ before invoking it.
Fix it by using spin_lock_irqsave() instead, which can keep IRQ state
consistent with before when unlock.
================================
WARNING: inconsistent lock state 5.10.0-02758-g8e5f91fd772f #26 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
kworker/2:3/388 [HC0[0]:SC0[0]:HE0:SE1] takes: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390
{IN-HARDIRQ-W} state was registered at:
__lock_acquire+0x3d7/0x1070
lock_acquire+0x197/0x4a0
__raw_spin_lock_irqsave
_raw_spin_lock_irqsave+0x3b/0x60
bfq_idle_slice_timer_body
bfq_idle_slice_timer+0x53/0x1d0
__run_hrtimer+0x477/0xa70
__hrtimer_run_queues+0x1c6/0x2d0
hrtimer_interrupt+0x302/0x9e0
local_apic_timer_interrupt
__sysvec_apic_timer_interrupt+0xfd/0x420
run_sysvec_on_irqstack_cond
sysvec_apic_timer_interrupt+0x46/0xa0
asm_sysvec_apic_timer_interrupt+0x12/0x20
irq event stamp: 837522
hardirqs last enabled at (837521): [<ffffffff84b9419d>] __raw_spin_unlock_irqrestore
hardirqs last enabled at (837521): [<ffffffff84b9419d>] _raw_spin_unlock_irqrestore+0x3d/0x40
hardirqs last disabled at (837522): [<ffffffff84b93fa3>] __raw_spin_lock_irq
hardirqs last disabled at (837522): [<ffffffff84b93fa3>] _raw_spin_lock_irq+0x43/0x50
softirqs last enabled at (835852): [<ffffffff84e00558>] __do_softirq+0x558/0x8ec
softirqs last disabled at (835845): [<ffffffff84c010ff>] asm_call_irq_on_stack+0xf/0x20
other info that might help us debug this:
Possible unsafe locking scenario:
When writing a task id to the "tasks" file in an rdtgroup,
rdtgroup_tasks_write() treats the pid as a number in the current pid
namespace. But when reading the "tasks" file, rdtgroup_tasks_show() shows
the list of global pids from the init namespace, which is confusing and
incorrect.
To be more robust, let the "tasks" file only show pids in the current pid
namespace.
After heavily stressing EROFS with several images which include a
hand-crafted image of repeated patterns for more than 46 days, I found
two chains could be linked with each other almost simultaneously and
form a loop so that the entire loop won't be submitted. As a
consequence, the corresponding file pages will remain locked forever.
It can be _only_ observed on data-deduplicated compressed images.
For example, consider two chains with five pclusters in total:
Chain 1: 2->3->4->5 -- The tail pcluster is 5;
Chain 2: 5->1->2 -- The tail pcluster is 2.
Chain 2 could link to Chain 1 with pcluster 5; and Chain 1 could link
to Chain 2 at the same time with pcluster 2.
Since hooked chains are all linked locklessly now, I have no idea how
to simply avoid the race. Instead, let's avoid hooked chains completely
until I could work out a proper way to fix this and end users finally
tell us that it's needed to add it back.
Actually, this optimization can be found with multi-threaded workloads
(especially even more often on deduplicated compressed images), yet I'm
not sure about the overall system impacts of not having this compared
with implementation complexity.
Fix filemap_splice_read() to use file->f_mapping->host, not file->f_inode,
as the source of the file size because in the case of a block device,
file->f_inode points to the block-special file (which is typically 0
length) and not the backing store.
Fixes: 07073eb01c5f ("splice: Add a func to do a splice from a buffered file without ITER_PIPE") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org>
cc: Steve French <stfrench@microsoft.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-2-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
Change the type of the second argument of bdev_op_is_zoned_write() from
blk_opf_t into enum req_op because this function expects an operation
without flags as second argument.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Fixes: 8cafdb5ab94c ("block: adapt blk_mq_plug() to not plug for writes that require a zone lock") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230517174230.897144-4-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
A couple of functions from fs/pipe.c are used both internally
and for the watch queue code, but the declaration is only
visible when the latter is enabled:
fs/pipe.c:1254:5: error: no previous prototype for 'pipe_resize_ring'
fs/pipe.c:758:15: error: no previous prototype for 'account_pipe_buffers'
fs/pipe.c:764:6: error: no previous prototype for 'too_many_pipe_buffers_soft'
fs/pipe.c:771:6: error: no previous prototype for 'too_many_pipe_buffers_hard'
fs/pipe.c:777:6: error: no previous prototype for 'pipe_is_unprivileged_user'
Make the visible unconditionally to avoid these warnings.
Commit c145e0b47c77 ("mm: streamline COW logic in do_swap_page()") moved
the call to swap_free() before the call to set_pte_at(), which meant that
the MTE tags could end up being freed before set_pte_at() had a chance to
restore them. Fix it by adding a call to the arch_swap_restore() hook
before the call to swap_free().
Link: https://lkml.kernel.org/r/20230523004312.1807357-2-pcc@google.com Link: https://linux-review.googlesource.com/id/I6470efa669e8bd2f841049b8c61020c510678965 Fixes: c145e0b47c77 ("mm: streamline COW logic in do_swap_page()") Signed-off-by: Peter Collingbourne <pcc@google.com> Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com> Closes: https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com/ Acked-by: David Hildenbrand <david@redhat.com> Acked-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> [6.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why & How]
In the commit 32953485c558 ("drm/amd/display: Do not update DRR while
BW optimizations pending"), a modification was added to avoid adjusting
DRR if optimized bandwidth is set. This change was only intended for
DCN, but one part of the patch changed the code path for DCE devices and
caused regressions to the kms_vrr test. To address this problem, this
commit adds a modification in which dc_stream_adjust_vmin_vmax will be
fully executed in DCE devices.
Fixes: 32953485c558 ("drm/amd/display: Do not update DRR while BW optimizations pending") Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The version is fetched once in check_version(), which then does some
validation and then overwrites the version in userspace with the API
version supported by the kernel. copy_params() then fetches the version
from userspace *again*, and this time no validation is done. The result
is that the kernel's version number is completely controllable by
userspace, provided that userspace can win a race condition.
Fix this flaw by not copying the version back to the kernel the second
time. This is not exploitable as the version is not further used in the
kernel. However, it could become a problem if future patches start
relying on the version field.
Cc: stable@vger.kernel.org Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gtags considers any file outside of its current working directory
"outside the source tree" and refuses to index it. For O= kernel builds,
or when "make" is invoked from a directory other then the kernel source
tree, gtags ignores the entire kernel source and generates an empty
index.
Force-set gtags current working directory to the kernel source tree.
Due to commit 9da0763bdd82 ("kbuild: Use relative path when building in
a subdir of the source tree"), if the kernel build is done in a
sub-directory of the kernel source tree, the kernel Makefile will set
the kernel's $srctree to ".." for shorter compile-time and run-time
warnings. Consequently, the list of files to be indexed will be in the
"../*" form, rendering all such paths invalid once gtags switches to the
kernel source tree as its current working directory.
If gtags indexing is requested and the build directory is not the kernel
source tree, index all files in absolute-path form.
Note, indexing in absolute-path form will not affect the generated
index, as paths in gtags indices are always relative to the gtags "root
directory" anyway (as evidenced by "gtags --dump").
Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The proc_create_single_data() conversion does not work because
single_open(file, nubus_proc_rsrc_show, PDE_DATA(inode)) is not
equivalent to the original code.
commit eb0764b822b9 ("cxl/port: Enable the HDM decoder capability for switch ports")
...was added on the observation of CXL memory not being accessible after
setting up a region on a "cold-plugged" device. A "cold-plugged" CXL
device is one that was not present at boot, so platform-firmware/BIOS
has no chance to set it up.
While it is true that the debug found the enable bit clear in the
host-bridge's instance of the global control register (CXL 3.0
8.2.4.19.2 CXL HDM Decoder Global Control Register), that bit is
described as:
"This bit is only applicable to CXL.mem devices and shall
return 0 on CXL Host Bridges and Upstream Switch Ports."
So it is meant to be zero, and further testing confirmed that this "fix"
had no effect on the failure. Revert it, and be more vigilant about
proposed fixes in the future. Since the original copied stable@, flag
this revert for stable@ as well.
Cc: <stable@vger.kernel.org> Fixes: eb0764b822b9 ("cxl/port: Enable the HDM decoder capability for switch ports") Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/168685882012.3475336.16733084892658264991.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NFS doesn't properly support reporting the btime in getattr (yet), but 61a968b4f05e mistakenly added it to the request_mask. This causes statx
for STATX_BTIME to report a zeroed out btime instead of properly
clearing the flag.
Cc: stable@vger.kernel.org # v6.3+ Fixes: 61a968b4f05e ("nfs: report the inode version in getattr if requested") Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2214134 Reported-by: Boyang Xue <bxue@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While our user stacks can grow either down (all common architectures) or
up (parisc and the ia64 register stack), the initial stack setup when we
copy the argument and environment strings to the new stack at execve()
time is always done by extending the stack downwards.
But it turns out that in commit 8d7071af8907 ("mm: always expand the
stack with the mmap write lock held"), as part of making the stack
growing code more robust, 'expand_downwards()' was now made to actually
check the vma flags:
if (!(vma->vm_flags & VM_GROWSDOWN))
return -EFAULT;
and that meant that this execve-time stack expansion started failing on
parisc, because on that architecture, the stack flags do not contain the
VM_GROWSDOWN bit.
At the same time the new check in expand_downwards() is clearly correct,
and simplified the callers, so let's not remove it.
The solution is instead to just codify the fact that yes, during
execve(), the stack grows down. This not only matches reality, it ends
up being particularly simple: we already have special execve-time flags
for the stack (VM_STACK_INCOMPLETE_SETUP) and use those flags to avoid
page migration during this setup time (see vma_is_temporary_stack() and
invalid_migration_vma()).
So just add VM_GROWSDOWN to that set of temporary flags, and now our
stack flags automatically match reality, and the parisc stack expansion
works again.
Note that the VM_STACK_INCOMPLETE_SETUP bits will be cleared when the
stack is finalized, so we only add the extra VM_GROWSDOWN bit on
CONFIG_STACK_GROWSUP architectures (ie parisc) rather than adding it in
general.
ACPI r6.5, sec 6.5.4, describes how AML is unable to access an
OperationRegion unless _REG has been called to connect a handler:
The OS runs _REG control methods to inform AML code of a change in the
availability of an operation region. When an operation region handler is
unavailable, AML cannot access data fields in that region. (Operation
region writes will be ignored and reads will return indeterminate data.)
The PCI core does not call _REG at any time, leading to the undefined
behavior mentioned in the spec.
The spec explains that _REG should be executed to indicate whether a
given region can be accessed:
Once _REG has been executed for a particular operation region, indicating
that the operation region handler is ready, a control method can access
fields in the operation region. Conversely, control methods must not
access fields in operation regions when _REG method execution has not
indicated that the operation region handler is ready.
An example included in the spec demonstrates calling _REG when devices are
turned off: "when the host controller or bridge controller is turned off
or disabled, PCI Config Space Operation Regions for child devices are
no longer available. As such, ETH0’s _REG method will be run when it
is turned off and will again be run when PCI1 is turned off."
It is reported that ASMedia PCIe GPIO controllers fail functional tests
after the system has returning from suspend (S3 or s2idle). This is because
the BIOS checks whether the OSPM has called the _REG method to determine
whether it can interact with the OperationRegion assigned to the device as
part of the other AML called for the device.
To fix this issue, call acpi_evaluate_reg() when devices are transitioning
to D3cold or D0.
Previously acpi_pci_set_power_state() assumed the requested power state was
valid (PCI_D0 ... PCI_D3cold). If a caller supplied something else, we
could index outside the state_conv[] array and pass junk to
acpi_device_set_power().
Validate the pci_power_t parameter and return -EINVAL if it's invalid.
[why]
While bandwidth optimizations are pending, it's possible a pstate change
will occur. During this time, VSYNC handler should not also try to update
DRR parameters causing pstate hang
[how]
Do not adjust DRR if optimize bandwidth is set.
Reviewed-by: Aric Cyr <aric.cyr@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
MMU version of lock_mm_and_find_vma releases the mm lock before
returning when VMA is not found. Do the same in noMMU version.
This fixes hang on an attempt to handle protection fault.
It turns out that xtensa has a really odd configuration situation: you
can do a no-MMU config, but still have the page fault code enabled.
Which doesn't sound all that sensible, but it turns out that xtensa can
have protection faults even without the MMU, and we have this:
config PFAULT
bool "Handle protection faults" if EXPERT && !MMU
default y
help
Handle protection faults. MMU configurations must enable it.
noMMU configurations may disable it if used memory map never
generates protection faults or faults are always fatal.
If unsure, say Y.
which completely violated my expectations of the page fault handling.
End result: Guenter reports that the xtensa no-MMU builds all fail with
arch/xtensa/mm/fault.c: In function ‘do_page_fault’:
arch/xtensa/mm/fault.c:133:8: error: implicit declaration of function ‘lock_mm_and_find_vma’
because I never exposed the new lock_mm_and_find_vma() function for the
no-MMU case.
As already mentioned in my merge message for the 'expand-stack' branch,
we have something like 24 different versions of the page fault path for
all our different architectures, all just _slightly_ different due to
various historical reasons (usually related to exactly when they
branched off the original i386 version, and the details of the other
architectures they had in their history).
And a few of them had some silly mistake in the conversion.
Most of the architectures call the faulting address 'address' in the
fault path. But not all. Some just call it 'addr'. And if you end up
doing a bit too much copy-and-paste, you end up with the wrong version
in the places that do it differently.
In commit 8d7071af8907 ("mm: always expand the stack with the mmap write
lock held") I tried to deal with the remaining odd page fault handling
cases. The oddest one is ia64, which has stacks that grow both up and
down. And because ia64 was _so_ odd, I asked people to verify the end
result.
But a close second oddity is parisc, which is the only one that has a
main stack growing up (our "CONFIG_STACK_GROWSUP" config option). But
it looked obvious enough that I didn't worry about it.
I should have worried a bit more. Not because it was particularly
complex, but because I just used the wrong variable name.
The previous vma isn't called "prev", it's called "prev_vma". Blush.
Fixes: 8d7071af8907 ("mm: always expand the stack with the mmap write lock held") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The sparc32 conversion to lock_mm_and_find_vma() in commit a050ba1e7422
("mm/fault: convert remaining simple cases to lock_mm_and_find_vma()")
missed the fact that we didn't actually have a 'regs' pointer available
in the 'force_user_fault()' case.
It's there in the regular page fault path ("do_sparc_fault()"), but not
the window underflow/overflow paths.
Which is all fine - we can just pass in a NULL pointer. The register
state is only used to avoid deadlock with kernel faults, which is not
the case for any of these register window faults.
That change was causing a regression in the generic-adc-thermal-probed
bootrr test as reported in the kernelci-results list [1].
A proper rework will take longer, so revert it for now.
Fixes: f05c7b7d9ea9 ("thermal/drivers/mediatek: Use devm_of_iomap to avoid resource leak in mtk_thermal_probe") Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com> Suggested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20230525121811.3360268-1-ricardo.canuelo@collabora.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 498ba2069035 ("HID: logitech-hidpp: Don't restart communication if
not necessary") put restarting communication behind that flag, and this
was apparently necessary on the T651, but the flag was not set for it.
Fixes: 498ba2069035 ("HID: logitech-hidpp: Don't restart communication if not necessary") Cc: stable@vger.kernel.org Signed-off-by: Mike Hommey <mh@glandium.org> Link: https://lore.kernel.org/r/20230617230957.6mx73th4blv7owqk@glandium.org Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Code which interacts with timestamps needs to use the ktime_t type
returned by functions like ktime_get. The int type does not offer
enough space to store these values, and attempting to use it is a
recipe for problems. In this particular case, overflows would occur
when calculating/storing timestamps leading to incorrect values being
reported to userspace. In some cases these bad timestamps cause input
handling in userspace to appear hung.
The hidraw_open() function increments the hidraw device reference
counter. The counter has no dedicated synchronization mechanism,
resulting in a potential data race when concurrently opening a device.
The race is a regression introduced by commit 8590222e4b02 ("HID:
hidraw: Replace hidraw device table mutex with a rwsem"). While
minors_rwsem is intended to protect the hidraw_table itself, by instead
acquiring the lock for writing, the reference counter is also protected.
This is symmetrical to hidraw_release().
There is a potential OOB read at fast_imageblit, for
"colortab[(*src >> 4)]" can become a negative value due to
"const char *s = image->data, *src".
This change makes sure the index for colortab always positive
or zero.
Similar commit:
https://patchwork.kernel.org/patch/11746067
It feels very unlikely that anybody would want to do a GUP in an
unmapped area under the stack pointer, but real users sometimes do some
really strange things. So add a (temporary) warning for the case where
a GUP fails and expanding the stack might have made it work.
It's trivial to do the expansion in the caller as part of getting the mm
lock in the first place - see __access_remote_vm() for ptrace, for
example - it's just that it's unnecessarily painful to do it deep in the
guts of the GUP lookup when we might have to drop and re-take the lock.
I doubt anybody actually does anything quite this strange, but let's be
proactive: adding these warnings is simple, and will make debugging it
much easier if they trigger.
This finishes the job of always holding the mmap write lock when
extending the user stack vma, and removes the 'write_locked' argument
from the vm helper functions again.
For some cases, we just avoid expanding the stack at all: drivers and
page pinning really shouldn't be extending any stacks. Let's see if any
strange users really wanted that.
It's worth noting that architectures that weren't converted to the new
lock_mm_and_find_vma() helper function are left using the legacy
"expand_stack()" function, but it has been changed to drop the mmap_lock
and take it for writing while expanding the vma. This makes it fairly
straightforward to convert the remaining architectures.
As a result of dropping and re-taking the lock, the calling conventions
for this function have also changed, since the old vma may no longer be
valid. So it will now return the new vma if successful, and NULL - and
the lock dropped - if the area could not be extended.
Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # ia64 Tested-by: Frank Scheiner <frank.scheiner@web.de> # ia64 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is a small step towards a model where GUP itself would not expand
the stack, and any user that needs GUP to not look up existing mappings,
but actually expand on them, would have to do so manually before-hand,
and with the mm lock held for writing.
It turns out that execve() already did almost exactly that, except it
didn't take the mm lock at all (it's single-threaded so no locking
technically needed, but it could cause lockdep errors). And it only did
it for the CONFIG_STACK_GROWSUP case, since in that case GUP has
obviously never expanded the stack downwards.
So just make that CONFIG_STACK_GROWSUP case do the right thing with
locking, and enable it generally. This will eventually help GUP, and in
the meantime avoids a special case and the lockdep issue.
Make calls to extend_vma() and find_extend_vma() fail if the write lock
is required.
To avoid making this a flag-day event, this still allows the old
read-locking case for the trivial situations, and passes in a flag to
say "is it write-locked". That way write-lockers can say "yes, I'm
being careful", and legacy users will continue to work in all the common
cases until they have been fully converted to the new world order.
Co-Developed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is one of the simple cases, except there's no pt_regs pointer.
Which is fine, as lock_mm_and_find_vma() is set up to work fine with a
NULL pt_regs.
Powerpc already enabled LOCK_MM_AND_FIND_VMA for the main CPU faulting,
so we can just use the helper without any extra work.
This does the simple pattern conversion of alpha, arc, csky, hexagon,
loongarch, nios2, sh, sparc32, and xtensa to the lock_mm_and_find_vma()
helper. They all have the regular fault handling pattern without odd
special cases.
The remaining architectures all have something that keeps us from a
straightforward conversion: ia64 and parisc have stacks that can grow
both up as well as down (and ia64 has special address region checks).
And m68k, microblaze, openrisc, sparc64, and um end up having extra
rules about only expanding the stack down a limited amount below the
user space stack pointer. That is something that x86 used to do too
(long long ago), and it probably could just be skipped, but it still
makes the conversion less than trivial.
Note that this conversion was done manually and with the exception of
alpha without any build testing, because I have a fairly limited cross-
building environment. The cases are all simple, and I went through the
changes several times, but...
arm has an additional check for address < FIRST_USER_ADDRESS before
expanding the stack. Since FIRST_USER_ADDRESS is defined everywhere
(generally as 0), move that check to the generic expand_downwards().
This converts arm64 to use the new page fault helper. It was very
straightforward, but still needed a fix for the "obvious" conversion I
initially did. Thanks to Suren for the fix and testing.
This is done as a separate patch from introducing the new
lock_mm_and_find_vma() helper, because while it's an obvious change,
it's not what x86 used to do in this area.
We already abort the page fault on fatal signals anyway, so why should
we wait for the mmap lock only to then abort later? With the new helper
function that returns without the lock held on failure anyway, this is
particularly easy and straightforward.
This basically extracts the existing x86 "find and expand faulting vma"
code, but extends it to also take the mmap lock for writing in case we
actually do need to expand the vma.
We've historically short-circuited that case, and have some rather ugly
special logic to serialize the stack segment expansion (since we only
hold the mmap lock for reading) that doesn't match the normal VM
locking.
That slight violation of locking worked well, right up until it didn't:
the maple tree code really does want proper locking even for simple
extension of an existing vma.
So extract the code for "look up the vma of the fault" from x86, fix it
up to do the necessary write locking, and make it available as a helper
function for other architectures that can use the common helper.
Note: I say "common helper", but it really only handles the normal
stack-grows-down case. Which is all architectures except for PA-RISC
and IA64. So some rare architectures can't use the helper, but if they
care they'll just need to open-code this logic.
It's also worth pointing out that this code really would like to have an
optimistic "mmap_upgrade_trylock()" to make it quicker to go from a
read-lock (for the common case) to taking the write lock (for having to
extend the vma) in the normal single-threaded situation where there is
no other locking activity.
But that _is_ all the very uncommon special case, so while it would be
nice to have such an operation, it probably doesn't matter in reality.
I did put in the skeleton code for such a possible future expansion,
even if it only acts as pseudo-documentation for what we're doing.
Check the write offset end bounds before using it as the offset into the
pivot array. This avoids a possible out-of-bounds access on the pivot
array if the write extends to the last slot in the node, in which case the
node maximum should be used as the end pivot.
akpm: this doesn't affect any current callers, but new users of mapletree
may encounter this problem if backported into earlier kernels, so let's
fix it in -stable kernels in case of this.
Link: https://lkml.kernel.org/r/20230506024752.2550-1-zhangpeng.00@bytedance.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With commit d674a8f123b4 ("can: isotp: isotp_sendmsg(): fix return
error on FC timeout on TX path") the missing correct return value in
the case of a protocol error was introduced.
But the way the error value has been read and sent to the user space
does not follow the common scheme to clear the error after reading
which is provided by the sock_error() function. This leads to an error
report at the following write() attempt although everything should be
working.
Fixes: d674a8f123b4 ("can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path") Reported-by: Carsten Schmidt <carsten.schmidt-achim@t-online.de> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/all/20230607072708.38809-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When kexec() is executed on a system with offline CPUs, which are parked in
mwait_play_dead() it can end up in a triple fault during the bootup of the
kexec kernel or cause hard to diagnose data corruption.
The reason is that kexec() eventually overwrites the previous kernel's text,
page tables, data and stack. If it writes to the cache line which is
monitored by a previously offlined CPU, MWAIT resumes execution and ends
up executing the wrong text, dereferencing overwritten page tables or
corrupting the kexec kernels data.
Cure this by bringing the offlined CPUs out of MWAIT into HLT.
Write to the monitored cache line of each offline CPU, which makes MWAIT
resume execution. The written control word tells the offlined CPUs to issue
HLT, which does not have the MWAIT problem.
That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as
those make it come out of HLT.
A follow up change will put them into INIT, which protects at least against
NMI and SMI.
Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case") Reported-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Monitoring idletask::thread_info::flags in mwait_play_dead() has been an
obvious choice as all what is needed is a cache line which is not written
by other CPUs.
But there is a use case where a "dead" CPU needs to be brought out of
MWAIT: kexec().
This is required as kexec() can overwrite text, pagetables, stacks and the
monitored cacheline of the original kernel. The latter causes MWAIT to
resume execution which obviously causes havoc on the kexec kernel which
results usually in triple faults.
Use a dedicated per CPU storage to prepare for that.
stop_this_cpu() tests CPUID leaf 0x8000001f::EAX unconditionally. Intel
CPUs return the content of the highest supported leaf when a non-existing
leaf is read, while AMD CPUs return all zeros for unsupported leafs.
So the result of the test on Intel CPUs is lottery.
While harmless it's incorrect and causes the conditional wbinvd() to be
issued where not required.
Check whether the leaf is supported before reading it.
Tony reported intermittent lockups on poweroff. His analysis identified the
wbinvd() in stop_this_cpu() as the culprit. This was added to ensure that
on SME enabled machines a kexec() does not leave any stale data in the
caches when switching from encrypted to non-encrypted mode or vice versa.
That wbinvd() is conditional on the SME feature bit which is read directly
from CPUID. But that readout does not check whether the CPUID leaf is
available or not. If it's not available the CPU will return the value of
the highest supported leaf instead. Depending on the content the "SME" bit
might be set or not.
That's incorrect but harmless. Making the CPUID readout conditional makes
the observed hangs go away, but it does not fix the underlying problem:
CPU0 CPU1
stop_other_cpus()
send_IPIs(REBOOT); stop_this_cpu()
while (num_online_cpus() > 1); set_online(false);
proceed... -> hang
wbinvd()
WBINVD is an expensive operation and if multiple CPUs issue it at the same
time the resulting delays are even larger.
But CPU0 already observed num_online_cpus() going down to 1 and proceeds
which causes the system to hang.
This issue exists independent of WBINVD, but the delays caused by WBINVD
make it more prominent.
Make this more robust by adding a cpumask which is initialized to the
online CPU mask before sending the IPIs and CPUs clear their bit in
stop_this_cpu() after the WBINVD completed. Check for that cpumask to
become empty in stop_other_cpus() instead of watching num_online_cpus().
The cpumask cannot plug all holes either, but it's better than a raw
counter and allows to restrict the NMI fallback IPI to be sent only the
CPUs which have not reported within the timeout window.
If mas_store_gfp() in the gather loop failed, the 'error' variable that
ultimately gets returned was not being set. In many cases, its original
value of -ENOMEM was still in place, and that was fine. But if VMAs had
been split at the start or end of the range, then 'error' could be zero.
Change to the 'error = foo(); if (error) goto â\80¦' idiom to fix the bug.
Also clean up a later case which avoided the same bug by *explicitly*
setting error = -ENOMEM right before calling the function that might
return -ENOMEM.
In a final cosmetic change, move the 'Point of no return' comment to
*after* the goto. That's been in the wrong place since the preallocation
was removed, and this new error path was added.
Fixes: 606c812eb1d5 ("mm/mmap: Fix error path in do_vmi_align_munmap()") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Cc: stable@vger.kernel.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The error unrolling was leaving the VMAs detached in many cases and
leaving the locked_vm statistic altered, and skipping the unrolling
entirely in the case of the vma tree write failing.
Fix the error path by re-attaching the detached VMAs and adding the
necessary goto for the failed vma tree write, and fix the locked_vm
statistic by only updating after the vma tree write succeeds.
Fixes: 763ecb035029 ("mm: remove the vma linked list") Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ dwmw2: Strictly, the original patch wasn't *re-attaching* the
detached VMAs. They *were* still attached but just had
the 'detached' flag set, which is an optimisation. Which
doesn't exist in 6.3, so drop that. Also drop the call
to vma_start_write() which came in with the per-VMA
locking in 6.4. ] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
last component point filename struct. Currently putname is called after
vfs_path_parent_lookup(). And then last component is used for
lookup_one_qstr_excl(). name in last component is freed by previous
calling putname(). And It cause file lookup failure when testing
generic/464 test of xfstest.
Fixes: 74d7970febf7 ("ksmbd: fix racy issue from using ->d_parent and ->d_name") Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a case that file_present is true and path is uninitialized.
This patch change file_present is set to false by default and set to
true when patch is initialized.
Fixes: 74d7970febf7 ("ksmbd: fix racy issue from using ->d_parent and ->d_name") Reported-by: Coverity Scan <scan-admin@coverity.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When reworking the vgic locking, the vgic distributor registration
got simplified, which was a very good cleanup. But just a tad too
radical, as we now register the *native* vgic only, ignoring the
GICv2-on-GICv3 that allows pre-historic VMs (or so I thought)
to run.
As it turns out, QEMU still defaults to GICv2 in some cases, and
this breaks Nathan's setup!
Fix it by propagating the *requested* vgic type rather than the
host's version.
Fixes b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Claim clkhi and clklo as integer type to avoid possible calculation
errors caused by data overflow.
Fixes: a55fa9d0e42e ("i2c: imx-lpi2c: add low power i2c bus driver") Signed-off-by: Clark Wang <xiaoning.wang@nxp.com> Signed-off-by: Carlos Song <carlos.song@nxp.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When booting with "intremap=off" and "x2apic_phys" on the kernel command
line, the physical x2APIC driver ends up being used even when x2APIC
mode is disabled ("intremap=off" disables x2APIC mode). This happens
because the first compound condition check in x2apic_phys_probe() is
false due to x2apic_mode == 0 and so the following one returns true
after default_acpi_madt_oem_check() having already selected the physical
x2APIC driver.
Commits ffb1b4a41016 ("x86/unwind/orc: Add 'signal' field to ORC
metadata") and fb799447ae29 ("x86,objtool: Split UNWIND_HINT_EMPTY in
two") changed the ORC format. Although ORC is internal to the kernel,
it's the only way for external tools to get reliable kernel stack traces
on x86-64. In particular, the drgn debugger [1] uses ORC for stack
unwinding, and these format changes broke it [2]. As the drgn
maintainer, I don't care how often or how much the kernel changes the
ORC format as long as I have a way to detect the change.
It suffices to store a version identifier in the vmlinux and kernel
module ELF files (to use when parsing ORC sections from ELF), and in
kernel memory (to use when parsing ORC from a core dump+symbol table).
Rather than hard-coding a version number that needs to be manually
bumped, Peterz suggested hashing the definitions from orc_types.h. If
there is a format change that isn't caught by this, the hashing script
can be updated.
This patch adds an .orc_header allocated ELF section containing the
20-byte hash to vmlinux and kernel modules, along with the corresponding
__start_orc_header and __stop_orc_header symbols in vmlinux.
Fix possible virtqueue used buffers leak and corresponding stuck
in case of temporary -EIO from sendmsg() which is produced by
tun driver while backend device is not up.
In case of no-retriable error and zcopy do not revert upend_idx
to pass packet data (that is update used_idx in corresponding
vhost_zerocopy_signal_used()) as if packet data has been
transferred successfully.
v2: set vq->heads[ubuf->desc].len equal to VHOST_DMA_DONE_LEN
in case of fake successful transmit.
Signed-off-by: Andrey Smetanin <asmetanin@yandex-team.ru>
Message-Id: <20230424204411.24888-1-asmetanin@yandex-team.ru> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrey Smetanin <asmetanin@yandex-team.ru> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
As is done in the net, iscsi, and vsock vhost support, let the vdpa vqs
know about the features that have been negotiated. This allows vhost
to more safely make decisions based on the features, such as when using
PACKED vs split queues.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230424225031.18947-2-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add cpu_relax() for arm64 instead of directly assert(), and add assert.h
header file. Also, add smp_wmb and smp_mb for arm64.
Compilation error as follows, avoid __always_inline undefined.
$ make
cc -Wall -pthread -O2 -ggdb -flto -fwhole-program -c -o ring.o ring.c
In file included from ring.c:10:
main.h: In function ‘busy_wait’:
main.h:99:21: warning: implicit declaration of function ‘assert’
[-Wimplicit-function-declaration]
99 | #define cpu_relax() assert(0)
| ^~~~~~
main.h:107:17: note: in expansion of macro ‘cpu_relax’
107 | cpu_relax();
| ^~~~~~~~~
main.h:12:1: note: ‘assert’ is defined in header ‘<assert.h>’; did you
forget to ‘#include <assert.h>’?
11 | #include <stdbool.h>
+++ |+#include <assert.h>
12 |
main.h: At top level:
main.h:143:23: error: expected ‘;’ before ‘void’
143 | static __always_inline
| ^
| ;
144 | void __read_once_size(const volatile void *p, void *res, int
size)
| ~~~~
main.h:158:23: error: expected ‘;’ before ‘void’
158 | static __always_inline void __write_once_size(volatile void *p,
void *res, int size)
| ^~~~~
| ;
make: *** [<builtin>: ring.o] Error 1
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Message-Id: <tencent_F53E159DD7925174445D830DA19FACF44B07@qq.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Userspace can race to free the gobj(robj converted from), robj should not
be accessed again after drm_gem_object_put, otherwith it will result in
use-after-free.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Min Li <lm0963hack@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If it is async, runqueue_node is freed in g2d_runqueue_worker on another
worker thread. So in extreme cases, if g2d_runqueue_worker runs first, and
then executes the following if statement, there will be use-after-free.
Signed-off-by: Min Li <lm0963hack@gmail.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Inki Dae <inki.dae@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix a wrong error return by dropping an error return.
When vidi driver is remvoed, if ctx->raw_edid isn't same as fake_edid_info
then only what we have to is to free ctx->raw_edid so that driver removing
can work correctly - it's not an error case.
Several device tree files get the polarity of the pendown-gpios
wrong: this signal is active low. Fix up all incorrect flags, so
that operating systems can rely on the flag being correctly set.
With IC_INTR_RX_FULL slave interrupt handler reads data in a loop until
RX FIFO is empty. When testing with the slave-eeprom, each transaction
has 2 bytes for address/index and 1 byte for value, the address byte
can be written as data byte due to dropping STOP condition.
In the test below, the master continuously writes to the slave, first 2
bytes are index, 3rd byte is value and follow by a STOP condition.
Upon receiving STOP condition slave eeprom would reset `idx_write_cnt` so
next 2 bytes can be treated as buffer index for upcoming transaction.
Supposedly the slave eeprom buffer would be written as
When CPU load is high the slave irq handler may not read fast enough,
the interrupt status can be seen as 0x204 with both DW_IC_INTR_STOP_DET
(0x200) and DW_IC_INTR_RX_FULL (0x4) bits. The slave device may see
the transactions below.
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1794 : INTR_STAT=0x204
0x1 STATUS SLAVE_ACTIVITY=0x0 : RAW_INTR_STAT=0x1790 : INTR_STAT=0x200
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4
0x1 STATUS SLAVE_ACTIVITY=0x1 : RAW_INTR_STAT=0x1594 : INTR_STAT=0x4
After `D1` is received, read loop continues to read `00` which is the
first bype of next index. Since STOP condition is ignored by the loop,
eeprom buffer index increased to `D2` and `00` is written as value.
The fix is to use `FIRST_DATA_BYTE` (bit 11) in `IC_DATA_CMD` to split
the transactions. The first index byte in this case would have bit 11
set. Check this indication to inject I2C_SLAVE_WRITE_REQUESTED event
which will reset `idx_write_cnt` in slave eeprom.
Signed-off-by: David Zheng <david.zheng@intel.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
net/sched/sch_api.c:2305:1: sparse: warning: symbol 'tc_skip_wrapper' was not declared. Should it be static?
No functional change intended.
Signed-off-by: Min-Hua Chen <minhuadotchen@gmail.com> Acked-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
There's an issue on SAI synchronous mode that TX/RX side can't get BCLK
from RX/TX it sync with if BYP bit is asserted. It's a workaround to
fix it that enable SION of IOMUX pad control and assert BCI.
For example if TX sync with RX which means both TX and RX are using clk
form RX and BYP=1. TX can get BCLK only if the following two conditions
are valid:
1. SION of RX BCLK IOMUX pad is set to 1
2. BCI of TX is set to 1
Avoid linker error for randomly generated config file that
has CONFIG_BRANCH_PROFILE_NONE enabled and make it similar
to riscv, x86 and also to commit 4bf3ec384edf ("s390: disable
branch profiling for vdso").
When a direct I/O write is performed, iomap_dio_rw() invalidates the
part of the page cache which the write is going to before carrying out
the write. In the odd case, the direct I/O write will be reading from
the same page it is writing to. gfs2 carries out writes with page
faults disabled, so it should have been obvious that this page
invalidation can cause iomap_dio_rw() to never make any progress.
Currently, gfs2 will end up in an endless retry loop in
gfs2_file_direct_write() instead, though.
Break this endless loop by limiting the number of retries and falling
back to buffered I/O after that.
Also simplify should_fault_in_pages() sightly and add a comment to make
the above case easier to understand.
Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When int3472 is loaded before GPIO driver, acpi_get_and_request_gpiod()
failed but the returned gpio descriptor is not NULL, it will cause panic
in later gpiod_put(), so set the gpio_desc to NULL in register error
handling to avoid such crash.
Signed-off-by: Hao Yao <hao.yao@intel.com> Signed-off-by: Bingbu Cao <bingbu.cao@intel.com> Link: https://lore.kernel.org/r/20230524035135.90315-1-bingbu.cao@intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
regmap-sdw does not support multi register writes, so there is
no point in setting this flag. This also leads to incorrect
programming of WSA codecs with regmap_multi_reg_write() call.
This invalid configuration should have been rejected by regmap-sdw.