[ 0.118053] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283
Caused by get_c0_compare_int on secondary CPU.
We also skipped saving IRQ number to struct clock_event_device *cd as
it's never used by clockevent core, as per comments it's only meant
for "non CPU local devices".
Reported-by: Serge Semin <fancer.lancer@gmail.com> Closes: https://lore.kernel.org/linux-mips/6szkkqxpsw26zajwysdrwplpjvhl5abpnmxgu2xuj3dkzjnvsf@4daqrz4mf44k/ Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Tested-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
If we need to increase the tree depth, allocate a new node, and then
race with another thread that increased the tree depth before us, we'll
still have a preallocated node that might be used later.
If we then use that node for a new non-root node, it'll still have a
pointer to the old root instead of being zeroed - fix this by zeroing it
in the cmpxchg failure path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Sasha Levin <sashal@kernel.org>
When of_irq_parse_raw() is invoked with a device address smaller than
the interrupt parent node (from #address-cells property), KASAN detects
the following out-of-bounds read when populating the initial match table
(dyndbg="func of_irq_parse_* +p"):
OF: of_irq_parse_one: dev=/soc@0/picasso/watchdog, index=0
OF: parent=/soc@0/pci@878000000000/gpio0@17,0, intsize=2
OF: intspec=4
OF: of_irq_parse_raw: ipar=/soc@0/pci@878000000000/gpio0@17,0, size=2
OF: -> addrsize=3
==================================================================
BUG: KASAN: slab-out-of-bounds in of_irq_parse_raw+0x2b8/0x8d0
Read of size 4 at addr ffffff81beca5608 by task bash/764
The buggy address belongs to the object at ffffff81beca5600
which belongs to the cache kmalloc-128 of size 128
The buggy address is located 8 bytes inside of
128-byte region [ffffff81beca5600, ffffff81beca5680)
Memory state around the buggy address: ffffff81beca5500: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffffff81beca5580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffffff81beca5600: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffffff81beca5680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffffff81beca5700: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
==================================================================
OF: -> got it !
Prevent the out-of-bounds read by copying the device address into a
buffer of sufficient size.
Syzkiller reports a "KMSAN: uninit-value in pick_link" bug.
This is caused by an uninitialised page, which is ultimately caused
by a corrupted symbolic link size read from disk.
The reason why the corrupted symlink size causes an uninitialised
page is due to the following sequence of events:
1. squashfs_read_inode() is called to read the symbolic
link from disk. This assigns the corrupted value 3875536935 to inode->i_size.
2. Later squashfs_symlink_read_folio() is called, which assigns
this corrupted value to the length variable, which being a
signed int, overflows producing a negative number.
3. The following loop that fills in the page contents checks that
the copied bytes is less than length, which being negative means
the loop is skipped, producing an uninitialised page.
This patch adds a sanity check which checks that the symbolic
link size is not larger than expected.
ipheth_sndbulk_callback() can submit carrier_work
as a part of its error handling. That means that
the driver must make sure that the work is cancelled
after it has made sure that no more URB can terminate
with an error condition.
Hence the order of actions in ipheth_close() needs
to be inverted.
Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: Foster Snowhill <forst@pen.gy> Tested-by: Georgi Valkov <gvalkov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When exercising uinput interface syzkaller may try setting up device
with a really large number of slots, which causes memory allocation
failure in input_mt_init_slots(). While this allocation failure is
handled properly and request is rejected, it results in syzkaller
reports. Additionally, such request may put undue burden on the
system which will try to free a lot of memory for a bogus request.
Fix it by limiting allowed number of slots to 100. This can easily
be extended if we see devices that can track more than 100 contacts.
HID driver callbacks aren't called anymore once hid_destroy_device() has
been called. Hence, hid driver_data should be freed only after the
hid_destroy_device() function returned as driver_data is used in several
callbacks.
I observed a crash with kernel 6.10.0 on my T14s Gen 3, after enabling
KASAN to debug memory allocation, I got this output:
[ 13.050438] ==================================================================
[ 13.054060] BUG: KASAN: slab-use-after-free in amd_sfh_get_report+0x3ec/0x530 [amd_sfh]
[ 13.054809] psmouse serio1: trackpoint: Synaptics TrackPoint firmware: 0x02, buttons: 3/3
[ 13.056432] Read of size 8 at addr ffff88813152f408 by task (udev-worker)/479
[ 13.342482] The buggy address belongs to the object at ffff88813152f400
which belongs to the cache kmalloc-64 of size 64
[ 13.347357] The buggy address is located 8 bytes inside of
freed 64-byte region [ffff88813152f400, ffff88813152f440)
KASAN reports a use-after-free of hid->driver_data in function
amd_sfh_get_report(). The backtrace indicates that the function is called
by amdtp_hid_request() which is one of the callbacks of hid device.
The current make sure that driver_data is freed only once
hid_destroy_device() returned.
Note that I observed the crash both on v6.9.9 and v6.10.0. The
code seems to be as it was from the early days of the driver.
The .data.rel.ro and .got section were added between the rodata and
ro_after_init data section, which adds an RW mapping in between all RO
mapping of the kernel image:
Some arch + compiler combinations report a potentially unused variable
location in btrfs_lookup_dentry(). This is a false alert as the variable
is passed by value and always valid or there's an error. The compilers
cannot probably reason about that although btrfs_inode_by_name() is in
the same file.
> + /kisskb/src/fs/btrfs/inode.c: error: 'location.objectid' may be used
+uninitialized in this function [-Werror=maybe-uninitialized]: => 5603:9
> + /kisskb/src/fs/btrfs/inode.c: error: 'location.type' may be used
+uninitialized in this function [-Werror=maybe-uninitialized]: => 5674:5
Initialize it to zero, this should fix the warnings and won't change the
behaviour as btrfs_inode_by_name() accepts only a root or inode item
types, otherwise returns an error.
If the value of max_speed_hz is 0, it may cause a division by zero
error in hisi_calc_effective_speed().
The value of max_speed_hz is provided by firmware.
Firmware is generally considered as a trusted domain. However, as
division by zero errors can cause system failure, for defense measure,
the value of max_speed is validated here. So 0 is regarded as invalid
and an error code is returned.
Even if a vgem device is configured in, we will skip the import_vgem_fd()
test almost every time.
TAP version 13
1..11
# Testing heap: system
# =======================================
# Testing allocation and importing:
ok 1 # SKIP Could not open vgem -1
The problem is that we use the DRM_IOCTL_VERSION ioctl to query the driver
version information but leave the name field a non-null-terminated string.
Terminate it properly to actually test against the vgem device.
While at it, let's check the length of the driver name is exactly 4 bytes
and return early otherwise (in case there is a name like "vgemfoo" that
gets converted to "vgem\0" unexpectedly).
Definitely condition dma_get_cache_alignment * defined value > 256
during driver initialization is not reason to BUG_ON(). Turn that to
graceful error out with -EINVAL.
According to I3C Spec 1.1.1, 11-Jun-2021, section: 5.1.2.2.3:
If the Controller chooses to start an I3C Message with an I3C Dynamic
Address, then special provisions shall be made because that same I3C Target
may be initiating an IBI or a Controller Role Request. So, one of three
things may happen: (skip 1, 2)
3. The Addresses match and the RnW bits also match, and so neither
Controller nor Target will ACK since both are expecting the other side to
provide ACK. As a result, each side might think it had "won" arbitration,
but neither side would continue, as each would subsequently see that the
other did not provide ACK.
...
For either value of RnW: Due to the NACK, the Controller shall defer the
Private Write or Private Read, and should typically transmit the Target
^^^^^^^^^^^^^^^^^^^
Address again after a Repeated START (i.e., the next one or any one prior
^^^^^^^^^^^^^
to a STOP in the Frame). Since the Address Header following a Repeated
START is not arbitrated, the Controller will always win (see Section
5.1.2.2.4).
Resend target address again if address is not 7E and controller get NACK.
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When using cachefiles, lockdep may emit something similar to the circular
locking dependency notice below. The problem appears to stem from the
following:
(1) Cachefiles manipulates xattrs on the files in its cache when called
from ->writepages().
(2) The setxattr() and removexattr() system call handlers get the name
(and value) from userspace after taking the sb_writers lock, putting
accesses of the vma->vm_lock and mm->mmap_lock inside of that.
(3) The afs filesystem uses a per-inode lock to prevent multiple
revalidation RPCs and in writeback vs truncate to prevent parallel
operations from deadlocking against the server on one side and local
page locks on the other.
Fix this by moving the getting of the name and value in {get,remove}xattr()
outside of the sb_writers lock. This also has the minor benefits that we
don't need to reget these in the event of a retry and we never try to take
the sb_writers lock in the event we can't pull the name and value into the
kernel.
Alternative approaches that might fix this include moving the dispatch of a
write to the cache off to a workqueue or trying to do without the
validation lock in afs. Note that this might also affect other filesystems
that use netfslib and/or cachefiles.
======================================================
WARNING: possible circular locking dependency detected
6.10.0-build2+ #956 Not tainted
------------------------------------------------------
fsstress/6050 is trying to acquire lock: ffff888138fd82f0 (mapping.invalidate_lock#3){++++}-{3:3}, at: filemap_fault+0x26e/0x8b0
but task is already holding lock: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
With gcc-14.1, there is a false-postive -Wuninitialized warning in
regcache_maple_drop:
drivers/base/regmap/regcache-maple.c: In function 'regcache_maple_drop':
drivers/base/regmap/regcache-maple.c:113:23: error: 'lower_index' is used uninitialized [-Werror=uninitialized]
113 | unsigned long lower_index, lower_last;
| ^~~~~~~~~~~
drivers/base/regmap/regcache-maple.c:113:36: error: 'lower_last' is used uninitialized [-Werror=uninitialized]
113 | unsigned long lower_index, lower_last;
| ^~~~~~~~~~
I've created a reduced test case to see if this needs to be reported
as a gcc, but it appears that the gcc-14.x branch already has a change
that turns this into a more sensible -Wmaybe-uninitialized warning, so
I ended up not reporting it so far.
The reduced test case also produces a warning for gcc-13 and gcc-12
but I don't see that with the version in the kernel.
fw_arg1 is in memory space rather than I/O space, so we should use
early_memremap_ro() instead of early_ioremap() to map the cmdline.
Moreover, we should unmap it after using.
Suggested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Signed-off-by: Sasha Levin <sashal@kernel.org>
The dpaa-eth driver is written for PowerPC and Arm SoCs which have 1-24
CPUs. It depends on CONFIG_NR_CPUS having a reasonably small value in
Kconfig. Otherwise, there are 2 functions which allocate on-stack arrays
of NR_CPUS elements, and these can quickly explode in size, leading to
warnings such as:
The problem is twofold:
- Reducing the array size to the boot-time num_possible_cpus() (rather
than the compile-time NR_CPUS) creates a variable-length array,
which should be avoided in the Linux kernel.
- Using NR_CPUS as an array size makes the driver blow up in stack
consumption with generic, as opposed to hand-crafted, .config files.
A simple solution is to use dynamic allocation for num_possible_cpus()
elements (aka a small number determined at runtime).
This adds a check before freeing the rx->skb in flush and close
functions to handle the kernel crash seen while removing driver after FW
download fails or before FW download completes.
However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
ACK.
For example, the write() syscall in the following packetdrill script fails
with -EAGAIN, and wrong SNMP stats get incremented.
0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
+0 < S 0:0(0) win 1000 <mss 1000>
+0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
+0 < S. 0:0(0) ack 1 win 1000
# packetdrill cross-synack.pkt
cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
# nstat
...
TcpExtTCPChallengeACK 1 0.0
TcpExtTCPSYNChallenge 1 0.0
The problem is that bpf_skops_established() is triggered by the Challenge
ACK instead of SYN+ACK. This causes the bpf prog to miss the chance to
check if the peer supports a TCP option that is expected to be exchanged
in SYN and SYN+ACK.
Let's accept a bare SYN+ACK for active-open TCP_SYN_RECV sockets to avoid
such a situation.
Note that tcp_ack_snd_check() in tcp_rcv_state_process() is skipped not to
send an unnecessary ACK, but this could be a bit risky for net.git, so this
targets for net-next.
Where pci_reset_bus() users are triggering unlocked secondary bus resets.
Ironically pci_bus_reset(), several calls down from pci_reset_bus(), uses
pci_bus_lock() before issuing the reset which locks everything *but* the
bridge itself.
For the same motivation as adding:
bridge = pci_upstream_bridge(dev);
if (bridge)
pci_dev_lock(bridge);
to pci_reset_function() for the "bus" and "cxl_bus" reset cases, add
pci_dev_lock() for @bus->self to pci_bus_lock().
Link: https://lore.kernel.org/r/171711747501.1628941.15217746952476635316.stgit@dwillia2-xfh.jf.intel.com Reported-by: Imre Deak <imre.deak@intel.com> Closes: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
[bhelgaas: squash in recursive locking deadlock fix from Keith Busch:
https://lore.kernel.org/r/20240711193650.701834-1-kbusch@meta.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Kalle Valo <kvalo@kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The exception vector of the booting hart is not set before enabling
the mmu and then still points to the value of the previous firmware,
typically _start. That makes it hard to debug setup_vm() when bad
things happen. So fix that by setting the exception vector earlier.
When a root decoder is configured the interleave target list is read
from the BIOS populated CFMWS structure. Per the CXL spec 3.1 Table
9-22 the target list is in interleave order. The CXL driver populates
its decoder target list in the same order and stores it in 'struct
cxl_switch_decoder' field "@target: active ordered target list in
current decoder configuration"
Given the promise of an ordered list, the driver can stop duplicating
the work of BIOS and simply check target positions against the ordered
list during region configuration.
The simplified check against the ordered list is presented here.
A follow-on patch will remove the unused code.
For Modulo arithmetic this is not a fix, only a simplification.
For XOR arithmetic this is a fix for HB IW of 3,6,12.
Fixes: f9db85bfec0d ("cxl/acpi: Support CXL XOR Interleave Math (CXIMS)") Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/35d08d3aba08fee0f9b86ab1cef0c25116ca8a55.1719980933.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Instead of a BUG_ON() just return an error, log an error message and
abort the transaction in case we find an extent buffer belonging to the
relocation tree that doesn't have the full backref flag set. This is
unexpected and should never happen (save for bugs or a potential bad
memory).
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In reada we BUG_ON(refs == 0), which could be unkind since we aren't
holding a lock on the extent leaf and thus could get a transient
incorrect answer. In walk_down_proc we also BUG_ON(refs == 0), which
could happen if we have extent tree corruption. Change that to return
-EUCLEAN. In do_walk_down() we catch this case and handle it correctly,
however we return -EIO, which -EUCLEAN is a more appropriate error code.
Finally in walk_up_proc we have the same BUG_ON(refs == 0), so convert
that to proper error handling. Also adjust the error message so we can
actually do something with the information.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We have a couple of areas where we check to make sure the tree block is
locked before looking up or messing with references. This is old code
so it has this as BUG_ON(). Convert this to ASSERT() for developers.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
For CONFIG_DEBUG_OBJECTS_WORK=y kernels sscs.work defined by
INIT_WORK_ONSTACK() is initialized by debug_object_init_on_stack() for
the debug check in __init_work() to work correctly.
But this lacks the counterpart to remove the tracked object from debug
objects again, which will cause a debug object warning once the stack is
freed.
Add the missing destroy_work_on_stack() invocation to cure that.
A gang submit won't work if the VMID is reserved and we can't flush out
VM changes from multiple engines at the same time.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
mwifiex_get_priv_by_id() returns the priv pointer corresponding to
the bss_num and bss_type, but without checking if the priv is actually
currently in use.
Unused priv pointers do not have a wiphy attached to them which can
lead to NULL pointer dereferences further down the callstack. Fix
this by returning only used priv pointers which have priv->bss_mode
set to something else than NL80211_IFTYPE_UNSPECIFIED.
Said NULL pointer dereference happened when an Accesspoint was started
with wpa_supplicant -i mlan0 with this config:
The test thread will start N benchmark kthreads and then schedule out
until the test time finished and notify the benchmark kthreads to stop.
The benchmark kthreads will keep running until notified to stop.
There's a problem with current implementation when the benchmark
kthreads number is equal to the CPUs on a non-preemptible kernel:
since the scheduler will balance the kthreads across the CPUs and
when the test time's out the test thread won't get a chance to be
scheduled on any CPU then cannot notify the benchmark kthreads to stop.
This can be easily reproduced on a VM (simulated with 16 CPUs) with
PREEMPT_VOLUNTARY:
estuary:/mnt$ ./dma_map_benchmark -t 16 -s 1
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 10-...!: (5221 ticks this GP) idle=ed24/1/0x4000000000000000 softirq=142/142 fqs=0
rcu: (t=5254 jiffies g=-559 q=45 ncpus=16)
rcu: rcu_sched kthread starved for 5255 jiffies! g-559 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=12
rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_sched state:R running task stack:0 pid:16 tgid:16 ppid:2 flags:0x00000008
Call trace
__switch_to+0xec/0x138
__schedule+0x2f8/0x1080
schedule+0x30/0x130
schedule_timeout+0xa0/0x188
rcu_gp_fqs_loop+0x128/0x528
rcu_gp_kthread+0x1c8/0x208
kthread+0xec/0xf8
ret_from_fork+0x10/0x20
Sending NMI from CPU 10 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 332 Comm: dma-map-benchma Not tainted 6.10.0-rc1-vanilla-LSE #8
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : arm_smmu_cmdq_issue_cmdlist+0x218/0x730
lr : arm_smmu_cmdq_issue_cmdlist+0x488/0x730
sp : ffff80008748b630
x29: ffff80008748b630 x28: 0000000000000000 x27: ffff80008748b780
x26: 0000000000000000 x25: 000000000000bc70 x24: 000000000001bc70
x23: ffff0000c12af080 x22: 0000000000010000 x21: 000000000000ffff
x20: ffff80008748b700 x19: ffff0000c12af0c0 x18: 0000000000010000
x17: 0000000000000001 x16: 0000000000000040 x15: ffffffffffffffff
x14: 0001ffffffffffff x13: 000000000000ffff x12: 00000000000002f1
x11: 000000000001ffff x10: 0000000000000031 x9 : ffff800080b6b0b8
x8 : ffff0000c2a48000 x7 : 000000000001bc71 x6 : 0001800000000000
x5 : 00000000000002f1 x4 : 01ffffffffffffff x3 : 000000000009aaf1
x2 : 0000000000000018 x1 : 000000000000000f x0 : ffff0000c12af18c
Call trace:
arm_smmu_cmdq_issue_cmdlist+0x218/0x730
__arm_smmu_tlb_inv_range+0xe0/0x1a8
arm_smmu_iotlb_sync+0xc0/0x128
__iommu_dma_unmap+0x248/0x320
iommu_dma_unmap_page+0x5c/0xe8
dma_unmap_page_attrs+0x38/0x1d0
map_benchmark_thread+0x118/0x2c0
kthread+0xec/0xf8
ret_from_fork+0x10/0x20
Solve this by adding scheduling point in the kthread loop,
so if there're other threads in the system they may have
a chance to run, especially the thread to notify the test
end. However this may degrade the test concurrency so it's
recommended to run this on an idle system.
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Acked-by: Barry Song <baohua@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
We encountered a problem that the file system could not be mounted in
the power-off scenario. The analysis of the file system mirror shows that
only part of the data is written to the last commit block.
The valid data of the commit block is concentrated in the first sector.
However, the data of the entire block is involved in the checksum calculation.
For different hardware, the minimum atomic unit may be different.
If the checksum of a committed block is incorrect, clear the data except the
'commit_header' and then calculate the checksum. If the checkusm is correct,
it is considered that the block is partially committed, Then continue to replay
journal.
In the fast commit code there are a few places where tid_t variables are
being compared without taking into account the fact that these sequence
numbers may wrap. Fix this issue by using the helper functions tid_gt()
and tid_geq().
Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://patch.msgid.link/20240529092030.9557-3-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
[Why]
If VF request full GPU access and the request failed,
the VF driver can get stuck accessing registers for an extended period during
the unload of KMS.
[How]
Set no_hw_access flag when VF request for full GPU access fails
This prevents further hardware access attempts, avoiding the prolonged
stuck state.
Signed-off-by: Yifan Zha <Yifan.Zha@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While calling bpf_object__find_map_by_name with obj = NULL is
obviously incorrect, this should not lead to a segmentation
fault but rather be handled gracefully.
As __bpf_map__iter already handles this situation correctly, we
can delegate the check for the regular case there and only add
a check in case the prev or next parameter is NULL.
DIV_ROUND_CLOSEST() after kstrtol() results in an underflow if a large
negative number such as -9223372036854775808 is provided by the user.
Fix it by reordering clamp_val() and DIV_ROUND_CLOSEST() operations.
DIV_ROUND_CLOSEST() after kstrtol() results in an underflow if a large
negative number such as -9223372036854775808 is provided by the user.
Fix it by reordering clamp_val() and DIV_ROUND_CLOSEST() operations.
DIV_ROUND_CLOSEST() after kstrtol() results in an underflow if a large
negative number such as -9223372036854775808 is provided by the user.
Fix it by reordering clamp_val() and DIV_ROUND_CLOSEST() operations.
DIV_ROUND_CLOSEST() after kstrtol() results in an underflow if a large
negative number such as -9223372036854775808 is provided by the user.
Fix it by reordering clamp_val() and DIV_ROUND_CLOSEST() operations.
The logic that detects pending VF2PF interrupts unintentionally clears
the section of the error mask register(s) not related to VF2PF.
This might cause interrupts unrelated to VF2PF, reported through
errsou3 and errsou5, to be reported again after the execution
of the function disable_pending_vf2pf_interrupts() in dh895xcc
and GEN2 devices.
Fix by updating only section of errmsk3 and errmsk5 related to VF2PF.
pm8001_phy_control() populates the enable_completion pointer with a stack
address, sends a PHY_LINK_RESET / PHY_HARD_RESET, waits 300 ms, and
returns. The problem arises when a phy control response comes late. After
300 ms the pm8001_phy_control() function returns and the passed
enable_completion stack address is no longer valid. Late phy control
response invokes complete() on a dangling enable_completion pointer which
leads to a kernel crash.
Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Signed-off-by: Terrence Adams <tadamsjr@google.com> Link: https://lore.kernel.org/r/20240627155924.2361370-2-tadamsjr@google.com Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If host tries to remove ufshcd driver from a UFS device it would cause a
kernel panic if ufshcd_async_scan fails during ufshcd_probe_hba before
adding a SCSI host with scsi_add_host and MCQ is enabled since SCSI host
has been defered after MCQ configuration introduced by commit 0cab4023ec7b
("scsi: ufs: core: Defer adding host to SCSI if MCQ is supported").
To guarantee that SCSI host is removed only if it has been added, set the
scsi_host_added flag to true after adding a SCSI host and check whether it
is set or not before removing it.
Signed-off-by: Kyoungrul Kim <k831.kim@samsung.com> Signed-off-by: Minwoo Im <minwoo.im@samsung.com> Link: https://lore.kernel.org/r/20240627085104epcms2p5897a3870ea5c6416aa44f94df6c543d7@epcms2p5 Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Right now it's possible to hit NULL pointer dereference in
rtw_rx_fill_rx_status on hw object and/or its fields because
initialization routine can start getting USB replies before
rtw_dev is fully setup.
So while we do the async stuff rtw_usb_probe continues and calls
rtw_register_hw, which does all kinds of initialization (e.g.
via ieee80211_register_hw) that rtw_rx_fill_rx_status relies on.
Fix this by moving the first usb_submit_urb after everything
is set up.
For me, this bug manifested as:
[ 8.893177] rtw_8821cu 1-1:1.2: band wrong, packet dropped
[ 8.910904] rtw_8821cu 1-1:1.2: hw->conf.chandef.chan NULL in rtw_rx_fill_rx_status
because I'm using Larry's backport of rtw88 driver with the NULL
checks in rtw_rx_fill_rx_status.
Add kmsan for virtqueue_dma_map_single_attrs to fix:
BUG: KMSAN: uninit-value in receive_buf+0x45ca/0x6990
receive_buf+0x45ca/0x6990
virtnet_poll+0x17e0/0x3130
net_rx_action+0x832/0x26e0
handle_softirqs+0x330/0x10f0
[...]
Uninit was created at:
__alloc_pages_noprof+0x62a/0xe60
alloc_pages_noprof+0x392/0x830
skb_page_frag_refill+0x21a/0x5c0
virtnet_rq_alloc+0x50/0x1500
try_fill_recv+0x372/0x54c0
virtnet_open+0x210/0xbe0
__dev_open+0x56e/0x920
__dev_change_flags+0x39c/0x2000
dev_change_flags+0xaa/0x200
do_setlink+0x197a/0x7420
rtnl_setlink+0x77c/0x860
[...]
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Tested-by: Alexander Potapenko <glider@google.com>
Message-Id: <20240606111345.93600-1-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # s390x Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The hotplug driver for powerpc (pci/hotplug/pnv_php.c) causes a kernel
crash when we try to hot-unplug/disable the PCIe switch/bridge from
the PHB.
The crash occurs because although the MSI data structure has been
released during disable/hot-unplug path and it has been assigned
with NULL, still during unregistration the code was again trying to
explicitly disable the MSI which causes the NULL pointer dereference and
kernel crash.
The patch fixes the check during unregistration path to prevent invoking
pci_disable_msi/msix() since its data structure is already freed.
The pointer isn't initialized by callers, but I have
encountered cases where it's still printed; initialize
it in all possible cases in setup_one_line().
The writing of css->cgroup associated with the cgroup root in
rebind_subsystems() is currently protected only by cgroup_mutex.
However, the reading of css->cgroup in both proc_cpuset_show() and
proc_cgroup_show() is protected just by css_set_lock. That makes the
readers susceptible to racing problems like data tearing or caching.
It is also a problem that can be reported by KCSAN.
This can be fixed by using READ_ONCE() and WRITE_ONCE() to access
css->cgroup. Alternatively, the writing of css->cgroup can be moved
under css_set_lock as well which is done by this patch.
Queued invalidation wait descriptor status is volatile in that IOMMU
hardware writes the data upon completion.
Use READ_ONCE() to prevent compiler optimizations which ensures memory
reads every time. As a side effect, READ_ONCE() also enforces strict
types and may add an extra instruction. But it should not have negative
performance impact since we use cpu_relax anyway and the extra time(by
adding an instruction) may allow IOMMU HW request cacheline ownership
easier.
e.g. gcc 12.3
BEFORE:
81 38 ad de 00 00 cmpl $0x2,(%rax)
AFTER (with READ_ONCE())
772f: 8b 00 mov (%rax),%eax
7731: 3d ad de 00 00 cmp $0x2,%eax
//status data is 32 bit
dm_parse_device_entry() simply copies the minor number into dmi.dev, but
the dev_t format splits the minor number between the lowest 8 bytes and
highest 12 bytes. If the minor number is larger than 255, part of it
will end up getting treated as the major number
Fix this by checking that the minor number is valid and then encoding it
as a dev_t.
Take the phy mutex in xlate to protect against concurrent
modification/access to gtr_phy. This does not typically cause any
issues, since in most systems the phys are only xlated once and
thereafter accessed with the phy API (which takes the locks). However,
we are about to allow userspace to access phys for debugging, so it's
important to avoid any data races.
Add a check to cs_dsp_coeff_write_ctrl() to abort if the control
is not writeable.
The cs_dsp code originated as an ASoC driver (wm_adsp) where all
controls were exported as ALSA controls. It relied on ALSA to
enforce the read-only permission. Now that the code has been
separated from ALSA/ASoC it must perform its own permission check.
This isn't currently causing any problems so there shouldn't be any
need to backport this. If the client of cs_dsp exposes the control as
an ALSA control, it should set permissions on that ALSA control to
protect it. The few uses of cs_dsp_coeff_write_ctrl() inside drivers
are for writable controls.
Nothing prevents simultaneous ioctl calls to privcmd_irqfd_assign() and
privcmd_irqfd_deassign(). If that happens, it is possible that a kirqfd
created and added to the irqfds_list by privcmd_irqfd_assign() may get
removed by another thread executing privcmd_irqfd_deassign(), while the
former is still using it after dropping the locks.
This can lead to a situation where an already freed kirqfd instance may
be accessed and cause kernel oops.
Use SRCU locking to prevent the same, as is done for the KVM
implementation for irqfds.
When userspace wants to take over a fdb entry by setting it as
EXTERN_LEARNED, we set both flags BR_FDB_ADDED_BY_EXT_LEARN and
BR_FDB_ADDED_BY_USER in br_fdb_external_learn_add().
If the bridge updates the entry later because its port changed, we clear
the BR_FDB_ADDED_BY_EXT_LEARN flag, but leave the BR_FDB_ADDED_BY_USER
flag set.
If userspace then wants to take over the entry again,
br_fdb_external_learn_add() sees that BR_FDB_ADDED_BY_USER and skips
setting the BR_FDB_ADDED_BY_EXT_LEARN flags, thus silently ignores the
update.
Fix this by always allowing to set BR_FDB_ADDED_BY_EXT_LEARN regardless
if this was a user fdb entry or not.
Fixes: 710ae7287737 ("net: bridge: Mark FDB entries that were added by user as such") Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20240903081958.29951-1-jonas.gorski@bisdn.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
generic_ocp_write() asks the parameter "size" must be 4 bytes align.
Therefore, write the bp would fail, if the mac->bp_num is odd. Align the
size to 4 for fixing it. The way may write an extra bp, but the
rtl8152_is_fw_mac_ok() makes sure the value must be 0 for the bp whose
index is more than mac->bp_num. That is, there is no influence for the
firmware.
Besides, I check the return value of generic_ocp_write() to make sure
everything is correct.
We observed a null-ptr-deref in fou_gro_receive() while shutting down
a host. [0]
The NULL pointer is sk->sk_user_data, and the offset 8 is of protocol
in struct fou.
When fou_release() is called due to netns dismantle or explicit tunnel
teardown, udp_tunnel_sock_release() sets NULL to sk->sk_user_data.
Then, the tunnel socket is destroyed after a single RCU grace period.
So, in-flight udp4_gro_receive() could find the socket and execute the
FOU GRO handler, where sk->sk_user_data could be NULL.
Let's use rcu_dereference_sk_user_data() in fou_from_sock() and add NULL
checks in FOU GRO handlers.
There's a potential race when `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` is
false during the execution of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN`, but
becomes true when `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is called.
This inconsistency can lead to `BPF_CGROUP_RUN_PROG_GETSOCKOPT` receiving
an "-EFAULT" from `__cgroup_bpf_run_filter_getsockopt(max_optlen=0)`.
Scenario shown as below:
To resolve this, remove the `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` macro and
directly uses `copy_from_sockptr` to ensure that `max_optlen` is always
set before `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is invoked.
Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks") Co-developed-by: Yanghui Li <yanghui.li@mediatek.com> Signed-off-by: Yanghui Li <yanghui.li@mediatek.com> Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com> Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com> Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://patch.msgid.link/20240830082518.23243-1-Tze-nan.Wu@mediatek.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Split __sys_getsockopt() into two functions by removing the core
logic into a sub-function (do_sock_getsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.
do_sock_getsockopt() will be called by io_uring getsockopt() command
operation in the following patch.
The same was done for the setsockopt pair.
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev> Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20231016134750.1381153-5-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 33f339a1ba54 ("bpf, net: Fix a potential race in do_sock_getsockopt()") Signed-off-by: Sasha Levin <sashal@kernel.org>
Split __sys_setsockopt() into two functions by removing the core
logic into a sub-function (do_sock_setsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.
do_sock_setsockopt() will be called by io_uring setsockopt() command
operation in the following patch.
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20231016134750.1381153-4-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Stable-dep-of: 33f339a1ba54 ("bpf, net: Fix a potential race in do_sock_getsockopt()") Signed-off-by: Sasha Levin <sashal@kernel.org>
The whole network stack uses sockptr, and while it doesn't move to
something more modern, let's use sockptr in setsockptr BPF hooks, so, it
could be used by other callers.
The main motivation for this change is to use it in the io_uring
{g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
kernel value for optlen.
The whole network stack uses sockptr, and while it doesn't move to
something more modern, let's use sockptr in getsockptr BPF hooks, so, it
could be used by other callers.
The main motivation for this change is to use it in the io_uring
{g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
kernel value for optlen.
The main threat to data consistency in ice_xdp() is a possible asynchronous
PF reset. It can be triggered by a user or by TX timeout handler.
XDP setup and PF reset code access the same resources in the following
sections:
* ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
* ice_vsi_rebuild() for the PF VSI - not protected
* ice_vsi_open() - already rtnl-locked
With an unfortunate timing, such accesses can result in a crash such as the
one below:
The previous way of handling this through returning -EBUSY is not viable,
particularly when destroying AF_XDP socket, because the kernel proceeds
with removal anyway.
There is plenty of code between those calls and there is no need to create
a large critical section that covers all of them, same as there is no need
to protect ice_vsi_rebuild() with rtnl_lock().
Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().
Leaving unprotected sections in between would result in two states that
have to be considered:
1. when the VSI is closed, but not yet rebuild
2. when VSI is already rebuild, but not yet open
The latter case is actually already handled through !netif_running() case,
we just need to adjust flag checking a little. The former one is not as
trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
hardware interaction happens, this can make adding/deleting rings exit
with an error. Luckily, VSI rebuild is pending and can apply new
configuration for us in a managed fashion.
Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
indicate that ice_xdp() can just hot-swap the program.
Also, as ice_vsi_rebuild() flow is touched in this patch, make it more
consistent by deconfiguring VSI when coalesce allocation fails.
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Call rtnl_unlock() on this error path, before returning.
Fixes: bc23aa949aeb ("igc: Add pcie error handler support") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
MGMT_OP_DISCONNECT can be called while mgmt_device_connected has not
been called yet, which will cause the connection procedure to be
aborted, so mgmt_device_disconnected shall still respond with command
complete to MGMT_OP_DISCONNECT and just not emit
MGMT_EV_DEVICE_DISCONNECTED since MGMT_EV_DEVICE_CONNECTED was never
sent.
To fix this MGMT_OP_DISCONNECT is changed to work similarly to other
command which do use hci_cmd_sync_queue and then use hci_conn_abort to
disconnect and returns the result, in order for hci_conn_abort to be
used from hci_cmd_sync context it now uses hci_cmd_sync_run_once.
Link: https://github.com/bluez/bluez/issues/932 Fixes: 12d4a3b2ccb3 ("Bluetooth: Move check for MGMT_CONNECTED flag into mgmt.c") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This introduces hci_cmd_sync_run/hci_cmd_sync_run_once which acts like
hci_cmd_sync_queue/hci_cmd_sync_queue_once but runs immediately when
already on hdev->cmd_sync_work context.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 227a0cdf4a02 ("Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT") Signed-off-by: Sasha Levin <sashal@kernel.org>
If connection is still queued/pending in the cmd_sync queue it means no
command has been generated and it should be safe to just dequeue the
callback when it is being aborted.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 227a0cdf4a02 ("Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT") Signed-off-by: Sasha Levin <sashal@kernel.org>
This fixes the UAF on __hci_acl_create_connection_sync caused by
connection abortion, it uses the same logic as to LE_LINK which uses
hci_cmd_sync_cancel to prevent the callback to run if the connection is
abort prematurely.
Reported-by: syzbot+3f0a39be7a2035700868@syzkaller.appspotmail.com Fixes: 45340097ce6e ("Bluetooth: hci_conn: Only do ACL connections sequentially") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 227a0cdf4a02 ("Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT") Signed-off-by: Sasha Levin <sashal@kernel.org>
With the last commit we moved to using the hci_sync queue for "Create
Connection" requests, removing the need for retrying the paging after
finished/failed "Create Connection" requests and after the end of
inquiries.
hci_conn_check_pending() was used to trigger this retry, we can remove it
now.
Note that we can also remove the special handling for COMMAND_DISALLOWED
errors in the completion handler of "Create Connection", because "Create
Connection" requests are now always serialized.
This is somewhat reverting commit 4c67bc74f016 ("[Bluetooth] Support
concurrent connect requests").
With this, the BT_CONNECT2 state of ACL hci_conn objects should now be
back to meaning only one thing: That we received a "Connection Request"
from another device (see hci_conn_request_evt), but the response to that
is going to be deferred.
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 227a0cdf4a02 ("Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT") Signed-off-by: Sasha Levin <sashal@kernel.org>
Pretty much all bluetooth chipsets only support paging a single device at
a time, and if they don't reject a secondary "Create Connection" request
while another is still ongoing, they'll most likely serialize those
requests in the firware.
With commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
requests") we started adding some serialization of our own in case the
adapter returns "Command Disallowed" HCI error.
This commit was using the BT_CONNECT2 state for the serialization, this
state is also used for a few more things (most notably to indicate we're
waiting for an inquiry to cancel) and therefore a bit unreliable. Also
not all BT firwares would respond with "Command Disallowed" on too many
connection requests, some will also respond with "Hardware Failure"
(BCM4378), and others will error out later and send a "Connect Complete"
event with error "Rejected Limited Resources" (Marvell 88W8897).
We can clean things up a bit and also make the serialization more reliable
by using our hci_sync machinery to always do "Create Connection" requests
in a sequential manner.
This is very similar to what we're already doing for establishing LE
connections, and it works well there.
Note that this causes a test failure in mgmt-tester (test "Pair Device
- Power off 1") because the hci_abort_conn_sync() changes the error we
return on timeout of the "Create Connection". We'll fix this on the
mgmt-tester side by adjusting the expected error for the test.
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 227a0cdf4a02 ("Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT") Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Stable-dep-of: 227a0cdf4a02 ("Bluetooth: MGMT: Fix not generating command complete for MGMT_OP_DISCONNECT") Signed-off-by: Sasha Levin <sashal@kernel.org>
On systems in the field, we are seeing this sometimes in the kernel logs:
Bluetooth: qca_controller_memdump() hci0: hci_devcd_init Return:-95
This means that _something_ decided that it wanted to get a memdump
but then hci_devcd_init() returned -EOPNOTSUPP (AKA -95).
The cleanup code in qca_controller_memdump() when we get back an error
from hci_devcd_init() undoes most things but forgets to clear
QCA_IBS_DISABLED. One side effect of this is that, during the next
suspend, qca_suspend() will always get a timeout.
Let's fix it so that we clear the bit.
Fixes: 06d3fdfcdf5c ("Bluetooth: hci_qca: Add qcom devcoredump support") Reviewed-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Kvaser's PCIe cards uses the KCAN FPGA IP block which has dual 4K
buffers for incoming messages shared by all (currently up to eight)
channels. While the driver processes messages in one buffer, new
incoming messages are stored in the other and so on.
The design of KCAN is such that a buffer must be fully read and then
released. Releasing a buffer will make the FPGA switch buffers. If the
other buffer contains at least one incoming message the FPGA will also
instantly issue a new interrupt, if not the interrupt will be issued
after receiving the first new message.
With IRQx interrupts, it takes a little time for the interrupt to
happen, enough for any previous ISR call to do it's business and
return, but MSI interrupts are way faster so this time is reduced to
almost nothing.
So with MSI, releasing the buffer HAS to be the very last action of
the ISR before returning, otherwise the new interrupt might be
"masked" by the kernel because the previous ISR call hasn't returned.
And the interrupts are edge-triggered so we cannot loose one, or the
ping-pong reading process will stop.
This is why this patch modifies the driver to use a single write to
the SRB_CMD register before returning.
Signed-off-by: Martin Jocic <martin.jocic@kvaser.com> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://patch.msgid.link/20240830153113.2081440-1-martin.jocic@kvaser.com Fixes: 26ad340e582d ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices") Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
A new interrupt is triggered by resetting the DMA RX buffers.
Since MSI interrupts are faster than legacy interrupts, the reset
of the DMA buffers must be moved to the very end of the ISR,
otherwise a new MSI interrupt will be masked by the current one.
This check is already done at the creation of the net devices in
kvaser_pciefd_setup_can_ctrls called from kvaser_pciefd_probe.
If it fails, the driver won't load, so there should be no need to
repeat the check inside the ISR. The number of channels is read
from the FPGA and should be trusted.
When adding devm_regulator_bulk_get_const() I missed adding a stub for
when CONFIG_REGULATOR is not enabled. Under certain conditions (like
randconfig testing) this can cause the compiler to reports errors
like:
error: implicit declaration of function 'devm_regulator_bulk_get_const';
did you mean 'devm_regulator_bulk_get_enable'?
Add the stub.
Fixes: 1de452a0edda ("regulator: core: Allow drivers to define their init data as const") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408301813.TesFuSbh-lkp@intel.com/ Cc: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patch.msgid.link/20240830073511.1.Ib733229a8a19fad8179213c05e1af01b51e42328@changeid Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In case of error in build_tokens_sysfs(), all the memory that has been
allocated is freed at end of this function. But then free_group() is
called which performs memory deallocation again.
Also, instead of free_group() call, there should be exit_dell_smbios_smm()
and exit_dell_smbios_wmi() calls, since there is initialization, but there
is no release of resources in case of an error.
Fix these issues by replacing free_group() call with
exit_dell_smbios_wmi() and exit_dell_smbios_smm().
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 33b9ca1e53b4 ("platform/x86: dell-smbios: Add a sysfs interface for SMBIOS tokens") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Link: https://lore.kernel.org/r/20240830065428.9544-1-amishin@t-argos.ru Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Ethtool callbacks can be executed while reset is in progress and try to
access deleted resources, e.g. getting coalesce settings can result in a
NULL pointer dereference seen below.
Reproduction steps:
Once the driver is fully initialized, trigger reset:
# echo 1 > /sys/class/net/<interface>/device/reset
when reset is in progress try to get coalesce settings using ethtool:
# ethtool -c <interface>
Calling netif_device_detach() before reset makes the net core not call
the driver when ethtool command is issued, the attempt to execute an
ethtool command during reset will result in the following message:
netlink error: No such device
instead of NULL pointer dereference. Once reset is done and
ice_rebuild() is executing, the netif_device_attach() is called to allow
for ethtool operations to occur again in a safe manner.
Fixes: fcea6f3da546 ("ice: Add stats and ethtool support") Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Reviewed-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
82580 NICs have a hardware bug that makes it
necessary to write into the TSICR (TimeSync Interrupt Cause) register
to clear it:
https://lore.kernel.org/all/CDCB8BE0.1EC2C%25matthew.vick@intel.com/
Add a conditional so only for 82580 we write into the TSICR register,
so we don't risk losing events for other models.
Without this change, when running ptp4l with an Intel 82580 card,
I get the following output:
> timed out while polling for tx timestamp increasing tx_timestamp_timeout or
> increasing kworker priority may correct this issue, but a driver bug likely
> causes it
This goes away with this change.
This (partially) reverts commit ee14cc9ea19b ("igb: Fix missing time sync events").
Fixes: ee14cc9ea19b ("igb: Fix missing time sync events") Closes: https://lore.kernel.org/intel-wired-lan/CAN0jFd1kO0MMtOh8N2Ztxn6f7vvDKp2h507sMryobkBKe=xk=w@mail.gmail.com/ Tested-by: Daiwei Li <daiweili@google.com> Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Daiwei Li <daiweili@google.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Under certain conditions, the range to be cleared by FALLOC_FL_ZERO_RANGE
may only be buffered locally and not yet have been flushed to the server.
For example:
will write two 4KiB blocks of data, which get buffered in the pagecache,
and then fallocate() is used to clear the first 4KiB block on the server -
but we don't flush the data first, which means the EOF position on the
server is wrong, and so the FSCTL_SET_ZERO_DATA RPC fails (and xfs_io
ignores the error), but then when we try to read it, we see the old data.
Fix this by preflushing any part of the target region that above the
server's idea of the EOF position to force the server to update its EOF
position.
Note, however, that we don't want to simply expand the file by moving the
EOF before doing the FSCTL_SET_ZERO_DATA[*] because someone else might see
the zeroed region or if the RPC fails we then have to try to clean it up or
risk getting corruption.
[*] And we have to move the EOF first otherwise FSCTL_SET_ZERO_DATA won't
do what we want.
This fixes the generic/008 xfstest.
[!] Note: A better way to do this might be to split the operation into two
parts: we only do FSCTL_SET_ZERO_DATA for the part of the range below the
server's EOF and then, if that worked, invalidate the buffered pages for the
part above the range.
Fixes: 6b69040247e1 ("cifs/smb3: Fix data inconsistent when zero file range") Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <stfrench@microsoft.com>
cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
cc: Pavel Shilovsky <pshilov@microsoft.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Symbols in the bss segment are not currently exported. This is a problem
for Rust modules that link against statics, that are resident in the kernel
image. Thus export symbols in the bss segment.
Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support") Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Tested-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Gary Guo <gary@garyguo.net> Link: https://lore.kernel.org/r/20240815074519.2684107-2-nmi@metaspace.dk
[ Reworded slightly. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
`awk` is already required by the kernel build, and the `xargs` feature
used in current Rust detection is not present in all `xargs` (notably,
toybox based xargs, used in the Android kernel build).
Signed-off-by: Matthew Maurer <mmaurer@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Tested-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Link: https://lore.kernel.org/r/20230928205045.2375899-1-mmaurer@google.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Stable-dep-of: b8673d56935c ("rust: kbuild: fix export of bss symbols") Signed-off-by: Sasha Levin <sashal@kernel.org>
If the ring (rx, tx) and/or coalescing parameters (rx-frames-irq,
tx-frames-irq) have been configured while the interface was in CAN-CC
mode, but the interface is brought up in CAN-FD mode, the ring
parameters might be too big.
[WHAT & HOW]
A denominator cannot be 0, and is checked before used.
This fixes 1 DIVIDE_BY_ZERO issue reported by Coverity.
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Jerry Zuo <jerry.zuo@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>