Super-IO accesses may fail on a system with no or unmapped LPC bus.
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple drivers
is synchronized.
Fixes: 8d5d45fb1468 ("I2C: Move hwmon drivers (2/3)") Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by: John Garry <john.garry@huawei.com> Cc: John Garry <john.garry@huawei.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Super-IO accesses may fail on a system with no or unmapped LPC bus.
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple drivers
is synchronized.
Fixes: 8d5d45fb1468 ("I2C: Move hwmon drivers (2/3)") Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by: John Garry <john.garry@huawei.com> Cc: John Garry <john.garry@huawei.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Super-IO accesses may fail on a system with no or unmapped LPC bus.
Also, other drivers may attempt to access the LPC bus at the same time,
resulting in undefined behavior.
Use request_muxed_region() to ensure that IO access on the requested
address space is supported, and to ensure that access by multiple drivers
is synchronized.
Fixes: 2219cd81a6cd ("hwmon/vt1211: Add probing of alternate config index port") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The patch 23c7b54ca1cd: "PM / devfreq: Fix devfreq_add_device() when
drivers are built as modules." leads to the following static checker
warning:
drivers/devfreq/devfreq.c:1043 governor_store()
warn: 'governor' can also be NULL
The reason is that the try_then_request_governor() function returns both
error pointers and NULL. It should just return error pointers, so fix
this by returning a ERR_PTR to the error intead of returning NULL.
Fixes: 23c7b54ca1cd ("PM / devfreq: Fix devfreq_add_device() when drivers are built as modules.") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently if alloc_skb fails to allocate the skb a null skb is passed to
t4_set_arp_err_handler and this ends up dereferencing the null skb. Avoid
the NULL pointer dereference by checking for a NULL skb and returning
early.
Addresses-Coverity: ("Dereference null return") Fixes: b38a0ad8ec11 ("RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
clock_getres() in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().
In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
where 'hrtimer_resolution' depends on whether or not high resolution
timers are enabled, which is a runtime decision.
The vDSO incorrectly returns the constant CLOCK_REALTIME_RES. Fix this
by exposing 'hrtimer_resolution' in the vDSO datapage and returning that
instead.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
[will: Use WRITE_ONCE(), move adr off COARSE path, renumber labels, use 'w' reg] Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In a system where, through IORT firmware mappings, the SMMU device is
mapped to a NUMA node that is not online, the kernel bootstrap results
in the following crash:
Unable to handle kernel paging request at virtual address 0000000000001388
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
[0000000000001388] user address but active_mm is swapper
Internal error: Oops: 96000004 [#1] SMP
Modules linked in:
CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
pstate: 80c00009 (Nzcv daif +PAN +UAO)
pc : __alloc_pages_nodemask+0x13c/0x1068
lr : __alloc_pages_nodemask+0xdc/0x1068
...
Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
Call trace:
__alloc_pages_nodemask+0x13c/0x1068
new_slab+0xec/0x570
___slab_alloc+0x3e0/0x4f8
__slab_alloc+0x60/0x80
__kmalloc_node_track_caller+0x10c/0x478
devm_kmalloc+0x44/0xb0
pinctrl_bind_pins+0x4c/0x188
really_probe+0x78/0x2b8
driver_probe_device+0x64/0x110
device_driver_attach+0x74/0x98
__driver_attach+0x9c/0xe8
bus_for_each_dev+0x84/0xd8
driver_attach+0x30/0x40
bus_add_driver+0x170/0x218
driver_register+0x64/0x118
__platform_driver_register+0x54/0x60
arm_smmu_driver_init+0x24/0x2c
do_one_initcall+0xbc/0x328
kernel_init_freeable+0x304/0x3ac
kernel_init+0x18/0x110
ret_from_fork+0x10/0x1c
Code: f90013b5b9410fa11a9f0694b50014c2 (b9400804)
---[ end trace dfeaed4c373a32da ]--
Change the dev_set_proximity() hook prototype so that it returns a
value and make it return failure if the PXM->NUMA-node mapping
corresponds to an offline node, fixing the crash.
Modifying the VLAN stripping options when a port VLAN is configured
will break traffic for the VSI, and conceptually doesn't make sense,
so don't allow this.
Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch fixes the problem with the driver being able to add only 7
multicast MAC address filters instead of 16. The problem is fixed by
changing the maximum number of MAC address filters to 16+1+1 (two extra
are needed because the driver uses 1 for unicast MAC address and 1 for
broadcast).
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
gcc points out that when CONFIG_GPIOLIB is disabled,
gpiod_get_array_value_cansleep() returns 0 but fails to set its output:
drivers/phy/motorola/phy-mapphone-mdm6600.c: In function 'phy_mdm6600_status':
drivers/phy/motorola/phy-mapphone-mdm6600.c:220:24: error: 'values[0]' is used uninitialized in this function [-Werror=uninitialized]
This could be fixed more generally in gpiolib by returning a failure
code, but for this specific case, the easier workaround is to add a
gpiolib dependency.
Fixes: 5d1ebbda0318 ("phy: mapphone-mdm6600: Add USB PHY driver for MDM6600 on Droid 4") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
On platforms where the MUSB and HCI controllers share PHY0, PHY passby
is required when using the HCI controller with the PHY, but it must be
disabled when the MUSB controller is used instead.
Without this, PHY0 passby is always enabled, which results in broken
peripheral mode on such platforms (e.g. H3/H5).
Fixes: ba4bdc9e1dc0 ("PHY: sunxi: Add driver for sunxi usb phy") Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add a required reset to the SDM845 UFS phy to express the PHY reset
bit inside the UFS controller register space. Before this change, this
reset was not expressed in the DT, and the driver utilized two different
callbacks (phy_init and phy_poweron) to implement a two-phase
initialization procedure that involved deasserting this reset between
init and poweron. This abused the two callbacks and diluted their
purpose.
That scheme does not work as regulators cannot be turned off in
phy_poweroff because they were turned on in init, rather than poweron.
The net result is that regulators are left on in suspend that shouldn't
be.
This new scheme gives the UFS reset to the PHY, so that it can fully
initialize itself in a single callback. We can then turn regulators on
during poweron and off during poweroff.
Signed-off-by: Evan Green <evgreen@chromium.org> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
WARNING: CPU: 0 PID: 19001 at kernel/dma/debug.c:1301 debug_dma_map_sg+0x284/0x3dc
etnaviv etnaviv: DMA-API: mapping sg segment longer than device claims to support [len=3145728] [max=65536]
Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_conntrack ip_set nfnetlink ebtable_broute ebtable_nat ip6table_raw ip6table_nat nf_nat_ipv6 ip6table_mangle iptable_raw iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 libcrc32c iptable_mangle ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter caam_jr error snd_soc_imx_spdif imx_thermal snd_soc_imx_audmux nvmem_imx_ocotp snd_soc_sgtl5000
caam imx_sdma virt_dma coda rc_cec v4l2_mem2mem snd_soc_fsl_ssi snd_soc_fsl_spdif imx_vdoa imx_pcm_dma videobuf2_dma_contig etnaviv dw_hdmi_cec gpu_sched dw_hdmi_ahb_audio imx6q_cpufreq nfsd sch_fq_codel ip_tables x_tables
CPU: 0 PID: 19001 Comm: Xorg Not tainted 4.20.0+ #307
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[<c0019658>] (unwind_backtrace) from [<c001489c>] (show_stack+0x10/0x14)
[<c001489c>] (show_stack) from [<c07fb420>] (dump_stack+0x9c/0xd4)
[<c07fb420>] (dump_stack) from [<c00312dc>] (__warn+0xf8/0x124)
[<c00312dc>] (__warn) from [<c00313d0>] (warn_slowpath_fmt+0x38/0x48)
[<c00313d0>] (warn_slowpath_fmt) from [<c00b14e8>] (debug_dma_map_sg+0x284/0x3dc)
[<c00b14e8>] (debug_dma_map_sg) from [<c046eb40>] (drm_gem_map_dma_buf+0xc4/0x13c)
[<c046eb40>] (drm_gem_map_dma_buf) from [<c04c3314>] (dma_buf_map_attachment+0x38/0x5c)
[<c04c3314>] (dma_buf_map_attachment) from [<c046e728>] (drm_gem_prime_import_dev+0x74/0x104)
[<c046e728>] (drm_gem_prime_import_dev) from [<c046e5bc>] (drm_gem_prime_fd_to_handle+0x84/0x17c)
[<c046e5bc>] (drm_gem_prime_fd_to_handle) from [<c046edd0>] (drm_prime_fd_to_handle_ioctl+0x38/0x4c)
[<c046edd0>] (drm_prime_fd_to_handle_ioctl) from [<c0460efc>] (drm_ioctl_kernel+0x90/0xc8)
[<c0460efc>] (drm_ioctl_kernel) from [<c0461114>] (drm_ioctl+0x1e0/0x3b0)
[<c0461114>] (drm_ioctl) from [<c01cae20>] (do_vfs_ioctl+0x90/0xa48)
[<c01cae20>] (do_vfs_ioctl) from [<c01cb80c>] (ksys_ioctl+0x34/0x60)
[<c01cb80c>] (ksys_ioctl) from [<c0009000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xd81a9fa8 to 0xd81a9ff0)
9fa0: b6c69c88bec613f800000009c00c642ebec613f8b86c4600
9fc0: b6c69c88bec613f8c00c642e00000036012762e00127634800000300012d91f8
9fe0: b6989f18bec613dcb697185cb667be5c
irq event stamp: 47905
hardirqs last enabled at (47913): [<c0098824>] console_unlock+0x46c/0x680
hardirqs last disabled at (47922): [<c0098470>] console_unlock+0xb8/0x680
softirqs last enabled at (47754): [<c000a484>] __do_softirq+0x344/0x540
softirqs last disabled at (47701): [<c0038700>] irq_exit+0x124/0x144
---[ end trace af477747acbcc642 ]---
The reason is the contiguous buffer exceeds the default maximum segment
size of 64K as specified by dma_get_max_seg_size() in
linux/dma-mapping.h. Fix this by providing our own segment size, which
is set to 2GiB to cover the window found in MMUv1 GPUs.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
37fe6a42b343 ("x86: Check stack overflow in detail")
added a broad check for the full exception stack area, i.e. it considers
the full exception stack area as valid.
That's wrong in two aspects:
1) It does not check the individual areas one by one
2) #DF, NMI and #MCE are not enabling interrupts which means that a
regular device interrupt cannot happen in their context. In fact if a
device interrupt hits one of those IST stacks that's a bug because some
code path enabled interrupts while handling the exception.
Limit the check to the #DB stack and consider all other IST stacks as
'overflow' or invalid.
The SCSI core does not like to have devices or hosts unregistered
while error recovery is in progress. Trying to do so can lead to
self-deadlock: Part of the removal code tries to obtain a lock already
held by the error handler.
This can cause problems for the usb-storage and uas drivers, because
their error handler routines perform a USB reset, and if the reset
fails then the USB core automatically goes on to unbind all drivers
from the device's interfaces -- all while still in the context of the
SCSI error handler.
As it turns out, practically all the scenarios leading to a USB reset
failure end up causing a device disconnect (the main error pathway in
usb_reset_and_verify_device(), at the end of the routine, calls
hub_port_logical_disconnect() before returning). As a result, the
hub_wq thread will soon become aware of the problem and will unbind
all the device's drivers in its own context, not in the
error-handler's context.
This means that usb_reset_device() does not need to call
usb_unbind_and_rebind_marked_interfaces() in cases where
usb_reset_and_verify_device() has returned an error, because hub_wq
will take care of everything anyway.
This particular problem was observed in somewhat artificial
circumstances, by using usbfs to tell a hub to power-down a port
connected to a USB-3 mass storage device using the UAS protocol. With
the port turned off, the currently executing command timed out and the
error handler started running. The USB reset naturally failed,
because the hub port was off, and the error handler deadlocked as
described above. Not carrying out the call to
usb_unbind_and_rebind_marked_interfaces() fixes this issue.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Kento Kobayashi <Kento.A.Kobayashi@sony.com> Tested-by: Kento Kobayashi <Kento.A.Kobayashi@sony.com> CC: Bart Van Assche <bvanassche@acm.org> CC: Martin K. Petersen <martin.petersen@oracle.com> CC: Jacky Cao <Jacky.Cao@sony.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Subsequent code relies on the values that qeth_update_from_chp_desc()
reads from the CHP descriptor. Rather than dealing with weird errors
later on, just handle it properly here.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
switch_lock was introduced because it allowed serialization of device
authorization requests from userspace without need to take the big
domain lock (tb->lock). This was fine because device authorization with
ICM is just one command that is sent to the firmware. Now that we start
to handle all tunneling in the driver switch_lock is not enough because
we need to walk over the topology to establish paths.
For this reason drop switch_lock from the driver completely in favour of
big domain lock.
There is one complication, though. If userspace is waiting for the lock
in tb_switch_set_authorized(), it keeps the device_del() from removing
the sysfs attribute because it waits for active users to release the
attribute first which leads into following splat:
We deal this by following what network stack did for some of their
attributes and use mutex_trylock() with restart_syscall(). This makes
userspace release the attribute allowing sysfs attribute removal to
progress before the write is restarted and eventually fail when the
attribute is removed.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The QEMU PowerPC/PSeries machine model was not expecting a self-IPI,
and it may be a bit surprising thing to do, so have irq_work_queue_on
do local queueing when target is the current CPU.
Suggested-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@kaod.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190409093403.20994-1-npiggin@gmail.com
[ Simplified the preprocessor comments.
Fixed unbalanced curly brackets pointed out by Thomas. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Luca Weiss <luca@z3ntu.xyz> Signed-off-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The frame_busy mask is used in frame_done event handling, which is not
invoked for async commits. So an async commit will leave the
frame_busy mask populated after it completes and future commits will start
with the busy mask incorrect.
This showed up on disable after cursor move. I was hitting the "this should
not happen" comment in the frame event worker since frame_busy was set,
we queued the event, but there were no frames pending (since async
also doesn't set that).
The call to of_get_child_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:57:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 47, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:66:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 47, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:118:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 47, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:57:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 51, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:66:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 51, but without a corresponding object release within this function.
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:118:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 51, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <sean@poorly.run> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Jordan Crouse <jcrouse@codeaurora.org> Cc: Mamta Shukla <mamtashukla555@gmail.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Sharat Masetty <smasetty@codeaurora.org> Cc: linux-arm-msm@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: freedreno@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org (open list) Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
release resources allocated in mode_set if any of
the hw check fails. Most of these checks are not
necessary and they will be removed in the follow up
patches with state based resource allocations.
The kzalloc here was being used without checking the return - if the
kzalloc fails return VCHIQ_ERROR. The call-site of
vchiq_platform_init_state() vchiq_init_state() was not responding
to an allocation failure so checks for != VCHIQ_SUCCESS
and pass VCHIQ_ERROR up to vchiq_platform_init() which then
will fail with -EINVAL.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Reported-by: kbuild test robot <lkp@intel.com> Acked-By: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The number of descendant cgroups and the number of dying
descendant cgroups are currently synchronized using the cgroup_mutex.
The number of descendant cgroups will be required by the cgroup v2
freezer, which will use it to determine if a cgroup is frozen
(depending on total number of descendants and number of frozen
descendants). It's not always acceptable to grab the cgroup_mutex,
especially from quite hot paths (e.g. exit()).
To avoid this, let's additionally synchronize these counters using
the css_set_lock.
So, it's safe to read these counters with either cgroup_mutex or
css_set_lock locked, and for changing both locks should be acquired.
The per-CPU variable batched_entropy_uXX is protected by get_cpu_var().
This is just a preempt_disable() which ensures that the variable is only
from the local CPU. It does not protect against users on the same CPU
from another context. It is possible that a preemptible context reads
slot 0 and then an interrupt occurs and the same value is read again.
The above scenario is confirmed by lockdep if we add a spinlock:
| ================================
| WARNING: inconsistent lock state
| 5.1.0-rc3+ #42 Not tainted
| --------------------------------
| inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
| ksoftirqd/9/56 [HC0[0]:SC1[1]:HE0:SE0] takes:
| (____ptrval____) (batched_entropy_u32.lock){+.?.}, at: get_random_u32+0x3e/0xe0
| {SOFTIRQ-ON-W} state was registered at:
| _raw_spin_lock+0x2a/0x40
| get_random_u32+0x3e/0xe0
| new_slab+0x15c/0x7b0
| ___slab_alloc+0x492/0x620
| __slab_alloc.isra.73+0x53/0xa0
| kmem_cache_alloc_node+0xaf/0x2a0
| copy_process.part.41+0x1e1/0x2370
| _do_fork+0xdb/0x6d0
| kernel_thread+0x20/0x30
| kthreadd+0x1ba/0x220
| ret_from_fork+0x3a/0x50
…
| other info that might help us debug this:
| Possible unsafe locking scenario:
|
| CPU0
| ----
| lock(batched_entropy_u32.lock);
| <Interrupt>
| lock(batched_entropy_u32.lock);
|
| *** DEADLOCK ***
|
| stack backtrace:
| Call Trace:
…
| kmem_cache_alloc_trace+0x20e/0x270
| ipmi_alloc_recv_msg+0x16/0x40
…
| __do_softirq+0xec/0x48d
| run_ksoftirqd+0x37/0x60
| smpboot_thread_fn+0x191/0x290
| kthread+0xfe/0x130
| ret_from_fork+0x3a/0x50
Add a spinlock_t to the batched_entropy data structure and acquire the
lock while accessing it. Acquire the lock with disabled interrupts
because this function may be used from interrupt context.
Remove the batched_entropy_reset_lock lock. Now that we have a lock for
the data scructure, we can access it from a remote CPU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the system boots with random.trust_cpu=1 it doesn't initialize the
per-NUMA CRNGs because it skips the rest of the CRNG startup code. This
means that the code from 1e7f583af67b ("random: make /dev/urandom scalable
for silly userspace programs") is not used when random.trust_cpu=1.
The call to invalidate_batched_entropy() was also missing. This is
important for architectures like PPC and S390 which only have the
arch_get_random_seed_* functions.
Fixes: 39a8883a2b98 ("random: add a config option to trust the CPU's hwrng") Signed-off-by: Jon DeVree <nuxi@vault24.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
With STRICT_KERNEL_RWX enabled anything marked __init is placed at a 16M
boundary. This is necessary so that it can be repurposed later with
different permissions. However, in kernels with text larger than 16M,
this pushes early_setup past 32M, incapable of being reached by the
branch instruction.
Fix this by setting the CTR and branching there instead.
Fixes: 1e0fc9d1eb2b ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs") Signed-off-by: Russell Currey <ruscur@russell.cc>
[mpe: Fix it to work on BE by using DOTSYM()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
When booted with "topology_updates=no", or when "off" is written to
/proc/powerpc/topology_updates, NUMA reassignments are inhibited for
PRRN and VPHN events. However, migration and suspend unconditionally
re-enable reassignments via start_topology_update(). This is
incoherent.
Check the topology_updates_enabled flag in
start/stop_topology_update() so that callers of those APIs need not be
aware of whether reassignments are enabled. This allows the
administrative decision on reassignments to remain in force across
migrations and suspensions.
commit 2da78092dda "block: Fix dev_t minor allocation lifetime"
specifically moved blk_free_devt(dev->devt) call to part_release()
to avoid reallocating device number before the device is fully
shutdown.
However, it can cause use-after-free on gendisk in get_gendisk().
We use md device as example to show the race scenes:
Process1 Worker Process2
md_free
blkdev_open
del_gendisk
add delete_partition_work_fn() to wq
__blkdev_get
get_gendisk
put_disk
disk_release
kfree(disk)
find part from ext_devt_idr
get_disk_and_module(disk)
cause use after free
delete_partition_work_fn
put_device(part)
part_release
remove part from ext_devt_idr
Before <devt, hd_struct pointer> is removed from ext_devt_idr by
delete_partition_work_fn(), we can find the devt and then access
gendisk by hd_struct pointer. But, if we access the gendisk after
it have been freed, it can cause in use-after-freeon gendisk in
get_gendisk().
We fix this by adding a new helper blk_invalidate_devt() in
delete_partition() and del_gendisk(). It replaces hd_struct
pointer in idr with value 'NULL', and deletes the entry from
idr in part_release() as we do now.
Thanks to Jan Kara for providing the solution and more clear comments
for the code.
Fixes: 2da78092dda1 ("block: Fix dev_t minor allocation lifetime") Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
The ctrl_check_input() function is called from pvr2_ctrl_range_check().
It's supposed to validate user supplied input and return true or false
depending on whether the input is valid or not. The problem is that
negative shifts or shifts greater than 31 are undefined in C. In
practice with GCC they result in shift wrapping so this function returns
true for some inputs which are not valid and this could result in a
buffer overflow:
drivers/media/usb/pvrusb2/pvrusb2-ctrl.c:205 pvr2_ctrl_get_valname()
warn: uncapped user index 'names[val]'
The cptr->hdw->input_allowed_mask mask is configured in pvr2_hdw_create()
and the highest valid bit is BIT(4).
Fixes: 7fb20fa38caa ("V4L/DVB (7299): pvrusb2: Improve logic which handles input choice availability") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix au0828_analog_stream_enable() to check if device is in the right
state first. When unbind happens while bind is in progress, usbdev
pointer could be invalid in au0828_analog_stream_enable() and a call
to usb_ifnum_to_if() will result in the null pointer dereference.
This problem is found with the new media_dev_allocator.sh test.
In audit_rule_change(), audit_data_to_entry() is firstly invoked to
translate the payload data to the kernel's rule representation. In
audit_data_to_entry(), depending on the audit field type, an audit tree may
be created in audit_make_tree(), which eventually invokes kmalloc() to
allocate the tree. Since this tree is a temporary tree, it will be then
freed in the following execution, e.g., audit_add_rule() if the message
type is AUDIT_ADD_RULE or audit_del_rule() if the message type is
AUDIT_DEL_RULE. However, if the message type is neither AUDIT_ADD_RULE nor
AUDIT_DEL_RULE, i.e., the default case of the switch statement, this
temporary tree is not freed.
To fix this issue, only allocate the tree when the type is AUDIT_ADD_RULE
or AUDIT_DEL_RULE.
Signed-off-by: Wenwen Wang <wang6495@umn.edu> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This driver returns an error if unsupported media bus pixel code is
requested by VIDIOC_SUBDEV_S_FMT.
But according to Documentation/media/uapi/v4l/vidioc-subdev-g-fmt.rst,
Drivers must not return an error solely because the requested format
doesn't match the device capabilities. They must instead modify the
format to match what the hardware can provide.
So select default format code and return success in that case.
In preparation for adding asynchronous subdevice support to the driver,
don't acquire v4l2_clk from the driver .probe() callback as that may
fail if the clock is provided by a bridge driver which may be not yet
initialized. Move the v4l2_clk_get() to ov6650_video_probe() helper
which is going to be converted to v4l2_subdev_internal_ops.registered()
callback, executed only when the bridge driver is ready.
Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The error return value is not written by some firmware codecs, such as
MPEG-2 decode on CodaHx4. Clear the error return value before starting
the picture run to avoid misinterpreting unrelated values returned by
sequence initialization as error return value.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Even if this case shouldn't happen when controller is properly programmed,
it's still better to avoid dumping a kernel Oops for this.
As the sequence may happen only for debugging purposes, log the error and
just finish the tasklet call.
Uncore PMU drivers face an awkward cyclic dependency wherein:
- They have to pick a valid online CPU to associate with before
registering the PMU device, since it will get exposed to userspace
immediately.
- The PMU registration has to be be at least partly complete before
hotplug events can be handled, since trying to migrate an
uninitialised context would be bad.
- The hotplug handler has to be ready as soon as a CPU is chosen, lest
it go offline without the user-visible cpumask value getting updated.
The arm-cci driver has tried to solve this by using get_cpu() to pick
the current CPU and prevent it from disappearing while both
registrations are performed, but that results in taking mutexes with
preemption disabled, which makes certain configurations very unhappy:
It is not feasible to resolve all the possible races outside of the perf
core itself, so address the immediate bug by following the example of
nearly every other PMU driver and not even trying to do so. Registering
the hotplug notifier first should minimise the window in which things
can go wrong, so that's about as much as we can reasonably do here. This
also revealed an additional race in assigning the global pointer too
late relative to the hotplug notifier, which gets fixed in the process.
Reported-by: Li, Meng <Meng.Li@windriver.com> Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This is mostly a revert of commit 55bb6a633c33 ("clk: rockchip: mark
noc and some special clk as critical on rk3288") except that we're
keeping "pmu_hclk_otg0" as critical still.
NOTE: turning these clocks off doesn't seem to do a whole lot in terms
of power savings (checking the power on the logic rail). It appears
to save maybe 1-2mW. ...but still it seems like we should turn the
clocks off if they aren't needed.
About "pmu_hclk_otg0" (the one clock from the original commit we're
still keeping critical) from an email thread:
> pmu ahb clock
>
> Function: Clock to pmu module when hibernation and/or ADP is
> enabled. Must be greater than or equal to 30 MHz.
>
> If the SOC design does not support hibernation/ADP function, only have
> hclk_otg, this clk can be switched according to the usage of otg.
> If the SOC design support hibernation/ADP, has two clocks, hclk_otg and
> pmu_hclk_otg0.
> Hclk_otg belongs to the closed part of otg logic, which can be switched
> according to the use of otg.
>
> pmu_hclk_otg0 belongs to the always on part.
>
> As for whether pmu_hclk_otg0 can be turned off when otg is not in use,
> we have not tested. IC suggest make pmu_hclk_otg0 always on.
For the rest of the clocks:
atclk: No documentation about this clock other than that it goes to
the CPU. CPU functions fine without it on. Maybe needed for JTAG?
jtag: Presumably this clock is only needed if you're debugging with
JTAG. It doesn't seem like it makes sense to waste power for every
rk3288 user. In any case to do JTAG you'd need private patches to
adjust the pinctrl the mux the JTAG out anyway.
pclk_dbg, pclk_core_niu: On veyron Chromebooks we turn these two
clocks on only during kernel panics in order to access some coresight
registers. Since nothing in the upstream kernel does this we should
be able to leave them off safely. Maybe also needed for JTAG?
hsicphy12m_xin12m: There is no indication of why this clock would need
to be turned on for boards that don't use HSIC.
pclk_ddrupctl[0-1], pclk_publ0[0-1]: On veyron Chromebooks we turn
these 4 clocks on only when doing DDR transitions and they are off
otherwise. I see no reason why they'd need to be on in the upstream
kernel which doesn't support DDRFreq.
Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Elaine Zhang <zhangqing@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
The call to of_find_compatible_node returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/pinctrl/samsung/pinctrl-exynos-arm.c:76:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 66, but without a corresponding object release within this function.
./drivers/pinctrl/samsung/pinctrl-exynos-arm.c:82:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 66, but without a corresponding object release within this function.
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Tomasz Figa <tomasz.figa@gmail.com> Cc: Sylwester Nawrocki <s.nawrocki@samsung.com> Cc: Kukjin Kim <kgene@kernel.org> Cc: linux-samsung-soc@vger.kernel.org Cc: linux-gpio@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The call to of_get_child_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/pinctrl/pinctrl-st.c:1188:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1175, but without a corresponding object release within this function.
./drivers/pinctrl/pinctrl-st.c:1188:3-9: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1175, but without a corresponding object release within this function.
./drivers/pinctrl/pinctrl-st.c:1199:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1175, but without a corresponding object release within this function.
./drivers/pinctrl/pinctrl-st.c:1199:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1175, but without a corresponding object release within this function.
The call to of_get_child_by_name returns a node pointer with refcount
incremented thus it must be explicitly decremented after the last
usage.
Detected by coccinelle with the following warnings:
./drivers/pinctrl/pinctrl-pistachio.c:1422:1-7: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 1360, but without a corresponding object release within this function.
According to the logitech_hidpp_2.0_specification_draft_2012-06-04.pdf doc:
https://lekensteyn.nl/files/logitech/logitech_hidpp_2.0_specification_draft_2012-06-04.pdf
We should use a register-access-protocol request using the short input /
output report ids. This is necessary because 27MHz HID++ receivers have
a max-packetsize on their HIP++ endpoint of 8, so they cannot support
long reports. Using a feature-access-protocol request (which is always
long or very-long) with these will cause a timeout error, followed by
the hidpp driver treating the device as not being HID++ capable.
This commit fixes this by switching to using a rap request to get the
protocol version.
Besides being tested with a (046d:c517) 27MHz receiver with various
27MHz keyboards and mice, this has also been tested to not cause
regressions on a non-unifying dual-HID++ nano receiver (046d:c534) with
k270 and m185 HID++-2.0 devices connected and on a unifying/dj receiver
(046d:c52b) with a HID++-2.0 Logitech Rechargeable Touchpad T650.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The BCM43341B has the default MAC address 43:34:1B:00:1F:AC if none
is given. This address was found when enabling Bluetooth on multiple
Intel Edison modules. It also contains the sequence 43341B, the name
the chip identifies itself as. Using the same BD_ADDR is problematic
when having multiple Intel Edison modules in each others range.
The default address also has the LAA (locally administered address)
bit set which prevents a BNEP device from being created, needed for
BT tethering.
Add this to the list of black listed default MAC addresses and let
the user configure a valid one using f.i.
`btmgmt -i hci0 public-addr xx:xx:xx:xx:xx:xx`
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Ferry Toth <ftoth@exalondelft.nl> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Randy reported objtool triggered on his (GCC-7.4) build:
lib/strncpy_from_user.o: warning: objtool: strncpy_from_user()+0x315: call to __ubsan_handle_add_overflow() with UACCESS enabled
lib/strnlen_user.o: warning: objtool: strnlen_user()+0x337: call to __ubsan_handle_sub_overflow() with UACCESS enabled
This is due to UBSAN generating signed-overflow-UB warnings where it
should not. Prior to GCC-8 UBSAN ignored -fwrapv (which the kernel
uses through -fno-strict-overflow).
Make the functions use 'unsigned long' throughout.
Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@kernel.org Link: http://lkml.kernel.org/r/20190424072208.754094071@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In-NMI warnings have been added to vmalloc_fault() via:
ebc8827f75 ("x86: Barf when vmalloc and kmemcheck faults happen in NMI")
back in the time when our NMI entry code could not cope with nested NMIs.
These days, it's perfectly fine to take a fault in NMI context and we
don't have to care about the fact that IRET from the fault handler might
cause NMI nesting.
This warning has already been removed from 32-bit implementation of
vmalloc_fault() in:
6863ea0cda8 ("x86/mm: Remove in_nmi() warning from vmalloc_fault()")
but the 64-bit version was omitted.
Remove the bogus warning also from 64-bit implementation of vmalloc_fault().
The __put_user() macro evaluates it's @ptr argument inside the
__uaccess_begin() / __uaccess_end() region. While this would normally
not be expected to be an issue, an UBSAN bug (it ignored -fwrapv,
fixed in GCC 8+) would transform the @ptr evaluation for:
drivers/gpu/drm/i915/i915_gem_execbuffer.c: if (unlikely(__put_user(offset, &urelocs[r-stack].presumed_offset))) {
into a signed-overflow-UB check and trigger the objtool AC validation.
Finish this commit:
2a418cf3f5f1 ("x86/uaccess: Don't leak the AC flag into __put_user() value evaluation")
and explicitly evaluate all 3 arguments early.
Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: luto@kernel.org Fixes: 2a418cf3f5f1 ("x86/uaccess: Don't leak the AC flag into __put_user() value evaluation") Link: http://lkml.kernel.org/r/20190424072208.695962771@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The test robot reported a wrong assignment of a per-CPU variable which
it detected by using sparse and sent a report. The assignment itself is
correct. The annotation for sparse was wrong and hence the report.
The first pointer is a "normal" pointer and points to the per-CPU memory
area. That means that the __percpu annotation has to be moved.
Move the __percpu annotation to pointer which points to the per-CPU
area. This change affects only the sparse tool (and is ignored by the
compiler).
Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: f97f8f06a49fe ("smpboot: Provide infrastructure for percpu hotplug threads") Link: http://lkml.kernel.org/r/20190424085253.12178-1-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When building x86 with Clang LTO and CFI, CFI jump regions are
automatically added to the end of the .text section late in linking. As a
result, the _etext position was being labelled before the appended jump
regions, causing confusion about where the boundaries of the executable
region actually are in the running kernel, and broke at least the fault
injection code. This moves the _etext mark to outside (and immediately
after) the .text area, as it already the case on other architectures
(e.g. arm64, arm).
When releasing the vfio-ccw mdev, we currently do not release
any existing channel program and its pinned pages. This can
lead to the following warning:
[1038876.561565] WARNING: CPU: 2 PID: 144727 at drivers/vfio/vfio_iommu_type1.c:1494 vfio_sanity_check_pfn_list+0x40/0x70 [vfio_iommu_type1]
Similarly we do not free the channel program when we are removing
the vfio-ccw device. Let's fix this by resetting the device and freeing
the channel program and pinned pages in the release path. For the remove
path we can just quiesce the device, since in the remove path the mediated
device is going away for good and so we don't need to do a full reset.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Message-Id: <ae9f20dc8873f2027f7b3c5d2aaa0bdfe06850b8.1554756534.git.alifm@linux.ibm.com> Acked-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently we call flush_workqueue while holding the subchannel
spinlock. But flush_workqueue function can go to sleep, so
do not call the function while holding the spinlock.
When two netdev have same link local addresses (such as vlan and non
vlan), two rdma cm listen id should be able to bind to following different
addresses.
Currently run_cache_set() has no return value, if there is failure in
bch_journal_replay(), the caller of run_cache_set() has no idea about
such failure and just continue to execute following code after
run_cache_set(). The internal failure is triggered inside
bch_journal_replay() and being handled in async way. This behavior is
inefficient, while failure handling inside bch_journal_replay(), cache
register code is still running to start the cache set. Registering and
unregistering code running as same time may introduce some rare race
condition, and make the code to be more hard to be understood.
This patch adds return value to run_cache_set(), and returns -EIO if
bch_journal_rreplay() fails. Then caller of run_cache_set() may detect
such failure and stop registering code flow immedidately inside
register_cache_set().
If journal replay fails, run_cache_set() can report error immediately
to register_cache_set(). This patch makes the failure handling for
bch_journal_replay() be in synchronized way, easier to understand and
debug, and avoid poetential race condition for register-and-unregister
in same time.
The reason is in journal_reclaim(), when discard is enabled, we send
discard command and reclaim those journal buckets whose seq is old
than the last_seq_now, but before we write a journal with last_seq_now,
the machine is restarted, so the journal with the last_seq_now is not
written to the journal bucket, and the last_seq_wrote in the newest
journal is old than last_seq_now which we expect to be, so when we doing
replay, journals from last_seq_wrote to last_seq_now are missing.
It's hard to write a journal immediately after journal_reclaim(),
and it harmless if those missed journal are caused by discarding
since those journals are already wrote to btree node. So, if miss
seqs are started from the beginning journal, we treat it as normal,
and only print a message to show the miss journal, and point out
it maybe caused by discarding.
Patch v2 add a judgement condition to ignore the missed journal
only when discard enabled as Coly suggested.
(Coly Li: rebase the patch with other changes in bch_journal_replay())
Signed-off-by: Tang Junhui <tang.junhui.linux@gmail.com> Tested-by: Dennis Schridde <devurandom@gmx.net> Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
When failure happens inside bch_journal_replay(), calling
cache_set_err_on() and handling the failure in async way is not a good
idea. Because after bch_journal_replay() returns, registering code will
continue to execute following steps, and unregistering code triggered
by cache_set_err_on() is running in same time. First it is unnecessary
to handle failure and unregister cache set in an async way, second there
might be potential race condition to run register and unregister code
for same cache set.
So in this patch, if failure happens in bch_journal_replay(), we don't
call cache_set_err_on(), and just print out the same error message to
kernel message buffer, then return -EIO immediately caller. Then caller
can detect such failure and handle it in synchrnozied way.
Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
When nbytes < 4, end is wronlgy set to a negative value which, due to
uint, is then interpreted to a large value leading to a deadlock in the
following code.
If we timeout the admin startup sequence we might not yet have
an I/O tagset allocated which causes the teardown sequence to crash.
Make nvme_tcp_teardown_io_queues safe by not iterating inflight tags
if the tagset wasn't allocated.
If we timeout the admin startup sequence we might not yet have
an I/O tagset allocated which causes the teardown sequence to crash.
Make nvme_tcp_teardown_io_queues safe by not iterating inflight tags
if the tagset wasn't allocated.
kmalloc can fail in rsi_register_rates_channels but memcpy still attempts
to write to channels. The patch replaces these calls with kmemdup and
passes the error upstream.
Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The "rate_index" is only used as an index into the phist_data->rx_rate[]
array in the mwifiex_hist_data_set() function. That array has
MWIFIEX_MAX_AC_RX_RATES (74) elements and it's used to generate some
debugfs information. The "rate_index" variable comes from the network
skb->data[] and it is a u8 so it's in the 0-255 range. We need to cap
it to prevent an array overflow.
Fixes: cbf6e05527a7 ("mwifiex: add rx histogram statistics support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
is_slave_mode defaults to false because sai structure
that contains it is kzalloc'ed.
Anyhow, if we decide to set the following configuration
SAI slave -> SAI master, is_slave_mode will remain set on true
although SAI being master it should be set to false.
Fix this by updating is_slave_mode for each call of
fsl_sai_set_dai_fmt.
Signed-off-by: Daniel Baluta <daniel.baluta@nxp.com> Acked-by: Nicolin Chen <nicoleotsuka@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, building bpf samples will cause the following error.
./tools/lib/bpf/bpf.h:132:27: error: 'UINT32_MAX' undeclared here (not in a function) ..
#define BPF_LOG_BUF_SIZE (UINT32_MAX >> 8) /* verifier maximum in kernels <= 5.1 */
^
./samples/bpf/bpf_load.h:31:25: note: in expansion of macro 'BPF_LOG_BUF_SIZE'
extern char bpf_log_buf[BPF_LOG_BUF_SIZE];
^~~~~~~~~~~~~~~~
Due to commit 4519efa6f8ea ("libbpf: fix BPF_LOG_BUF_SIZE off-by-one error")
hard-coded size of BPF_LOG_BUF_SIZE has been replaced with UINT32_MAX which is
defined in <stdint.h> header.
Even with this change, bpf selftests are running fine since these are built
with clang and it includes header(-idirafter) from clang/6.0.0/include.
(it has <stdint.h>)
But bpf samples are compiled with GCC, and it only searches and includes
headers declared at the target file. As '#include <stdint.h>' hasn't been
declared in tools/lib/bpf/bpf.h, it causes build failure of bpf samples.
This commit add declaration of '#include <stdint.h>' to tools/lib/bpf/bpf.h
to fix this problem.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
FullMAC STAs have no way to update bss channel after CSA channel switch
completion. As a result, user-space tools may provide inconsistent
channel info. For instance, consider the following two commands:
$ sudo iw dev wlan0 link
$ sudo iw dev wlan0 info
The latter command gets channel info from the hardware, so most probably
its output will be correct. However the former command gets channel info
from scan cache, so its output will contain outdated channel info.
In fact, current bss channel info will not be updated until the
next [re-]connect.
Note that mac80211 STAs have a workaround for this, but it requires
access to internal cfg80211 data, see ieee80211_chswitch_work:
This patch kill instructs the DMAC to immediately terminate
execution of a thread. and then clear the interrupt status,
at last, stop generating interrupts for DMA_SEV. to guarantee
the next dma start is clean. otherwise, one interrupt maybe leave
to next start and make some mistake.
we can reporduce the problem as follows:
DMASEV: modify the event-interrupt resource, and if the INTEN sets
function as interrupt, the DMAC will set irq<event_num> HIGH to
generate interrupt. write INTCLR to clear interrupt.
Since irq handler and mailbox task will both update arq's count,
so arq's count should use atomic_t instead of u32, otherwise
its value may go wrong finally.
Fixes: 07a0556a3a73 ("net: hns3: Changes to support ARQ(Asynchronous Receive Queue)") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
clang produces a harmless warning for each use for the qeth_adp_supported
macro:
drivers/s390/net/qeth_l2_main.c:559:31: warning: implicit conversion from enumeration type 'enum qeth_ipa_setadp_cmd' to
different enumeration type 'enum qeth_ipa_funcs' [-Wenum-conversion]
if (qeth_adp_supported(card, IPA_SETADP_SET_PROMISC_MODE))
~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/s390/net/qeth_core.h:179:41: note: expanded from macro 'qeth_adp_supported'
qeth_is_ipa_supported(&c->options.adp, f)
~~~~~~~~~~~~~~~~~~~~~ ^
Add a version of this macro that uses the correct types, and
remove the unused qeth_adp_enabled() macro that has the same
problem.
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
PHY's behave differently when being reset. Some reset registers to
defaults, some don't. Some trigger an autoneg restart, some don't.
So let's also set the autoneg restart bit when resetting. Then PHY
behavior should be more consistent. Clearing BMCR_ISOLATE serves the
same purpose and is borrowed from genphy_restart_aneg.
BMCR holds the speed / duplex settings in fixed mode. Therefore
we may have an issue if a soft reset resets BMCR to its default.
So better call genphy_setup_forced() afterwards in fixed mode.
We've seen no related complaint in the last >10 yrs, so let's
treat it as an improvement.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
hns3_desc_unused() returns how many BD have been cleaned, but new
buffer has not been attached to them. The register of
HNS3_RING_RX_RING_FBDNUM_REG returns how many BD need allocating new
buffer to or need to cleaned. So the remaining BD need to be clean
is HNS3_RING_RX_RING_FBDNUM_REG - hns3_desc_unused().
Also, new buffer can not attach to the pending BD when the last BD is
not handled, because memcpy has not been done on the first pending BD.
This patch fixes by subtracting the pending BD num from unused_count
after 'HNS3_RING_RX_RING_FBDNUM_REG - unused_count' is used to calculate
the BD bum need to be clean.
Fixes: e55970950556 ("net: hns3: Add handling of GRO Pkts not fully RX'ed in NAPI poll") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
>From the DS2408 datasheet [1]:
"Resume Command function checks the status of the RC flag and, if it is set,
directly transfers control to the control functions, similar to a Skip ROM
command. The only way to set the RC flag is through successfully executing
the Match ROM, Search ROM, Conditional Search ROM, or Overdrive-Match ROM
command"
The function currently works perfectly fine in a multidrop bus, but when we
have only a single slave connected, then only a Skip ROM is used and Match
ROM is not called at all. This is leading to problems e.g. with single one
DS2408 connected, as the Resume Command is not working properly and the
device is responding with failing results after the Resume Command.
This commit is fixing this by using a Skip ROM instead in those cases.
The bandwidth / performance advantage is exactly the same.
Now CPSW ALE will set/clean Host port bit in Unregistered Multicast Flood
Mask (UNREG_MCAST_FLOOD_MASK) for every VLAN without checking if this port
belongs to VLAN or not when ALLMULTI mode flag is set for nedev. This is
working in non dual_mac mode, but in dual_mac - it causes
enabling/disabling ALLMULTI flag for both ports.
Hence fix it by adding additional parameter to cpsw_ale_set_allmulti() to
specify ALE port number for which ALLMULTI has to be enabled and check if
port belongs to VLAN before modifying UNREG_MCAST_FLOOD_MASK.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The NOHZ idle balancer runs on the lowest idle CPU. This can
interfere with isolated CPUs, so confine it to HK_FLAG_MISC
housekeeping CPUs.
HK_FLAG_SCHED is not used for this because it is not set anywhere
at the moment. This could be folded into HK_FLAG_SCHED once that
option is fixed.
The problem was observed with increased jitter on an application
running on CPU0, caused by NOHZ idle load balancing being run on
CPU1 (an SMT sibling).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190412042613.28930-1-npiggin@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
snd_hda_codec_device_new() is used by both legacy HDA and ASoC
driver. However, we will call snd_hdac_device_unregister() in
snd_hdac_ext_bus_device_remove() for ASoC device. This patch uses
the type flag in hdac_device struct to determine is it a ASoC device
or legacy HDA device and call snd_hdac_device_unregister() in
snd_hda_codec_dev_free() only if it is a legacy HDA device.
To register data for the next kernel (command line, oldmem_base, etc.) the
current kernel needs to find the ELF segment that contains head.S. This is
currently done by checking ifor 'phdr->p_paddr == 0'. This works fine for
the current kernel build but in theory the first few pages could be
skipped. Make the detection more robust by checking if the entry point lies
within the segment.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Sometimes during connection recovery when there is a failure to resolve
ARP, and offload connection was not issued, driver tries to flush pending
offload connection work which was not queued up.
The device's remove() attempts to shut down the delayed_work scheduled
on the kernel-global workqueue by calling flush_scheduled_work().
Unfortunately, flush_scheduled_work() does not prevent the delayed_work
from re-scheduling itself. The delayed_work might run after the device
has been removed, and touch the already de-allocated info structure.
This is a potential use-after-free.
Fix by calling cancel_delayed_work_sync() during remove(): this ensures
that the delayed work is properly cancelled, is no longer running, and
is not able to re-schedule itself.
This issue was detected with the help of Coccinelle.
Signed-off-by: Sven Van Asbroeck <TheSven73@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If for some reason the device gives us an RX interrupt before we're
ready for it, perhaps during device power-on with misconfigured IRQ
causes mapping or so, we can crash trying to access the queues.
Prevent that by checking that we actually have RXQs and that they
were properly allocated.
When we failed to find a root key in btrfs_update_root(), we just panic.
That's definitely not cool, fix it by outputting an unique error
message, aborting current transaction and return -EUCLEAN. This should
not normally happen as the root has been used by the callers in some
way.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The way relocation moves data extents is by creating a reloc inode and
preallocating extents in this inode and then copying the data into these
preallocated extents. Once we've done this for all of our extents,
we'll write out these dirty pages, which marks the extent written, and
goes into btrfs_reloc_cow_block(). From here we get our current
reloc_control, which _should_ match the reloc_control for the current
block group we're relocating.
However if we get an ENOSPC in this path at some point we'll bail out,
never initiating writeback on this inode. Not a huge deal, unless we
happen to be doing relocation on a different block group, and this block
group is now rc->stage == UPDATE_DATA_PTRS. This trips the BUG_ON() in
btrfs_reloc_cow_block(), because we expect to be done modifying the data
inode. We are in fact done modifying the metadata for the data inode
we're currently using, but not the one from the failed block group, and
thus we BUG_ON().
(This happens when writeback finishes for extents from the previous
group, when we are at btrfs_finish_ordered_io() which updates the data
reloc tree (inode item, drops/adds extent items, etc).)
Fix this by writing out the reloc data inode always, and then breaking
out of the loop after that point to keep from tripping this BUG_ON()
later.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Filipe Manana <fdmanana@suse.com>
[ add note from Filipe ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When doing fallocate, we first add the range to the reserve_list and
then reserve the quota. If quota reservation fails, we'll release all
reserved parts of reserve_list.
However, cur_offset is not updated to indicate that this range is
already been inserted into the list. Therefore, the same range is freed
twice. Once at list_for_each_entry loop, and once at the end of the
function. This will result in WARN_ON on bytes_may_use when we free the
remaining space.
At the end, under the 'out' label we have a call to:
When modules and BPF filters are loaded, there is a time window in
which some memory is both writable and executable. An attacker that has
already found another vulnerability (e.g., a dangling pointer) might be
able to exploit this behavior to overwrite kernel code. Prevent having
writable executable PTEs in this stage.
In addition, avoiding having W+X mappings can also slightly simplify the
patching of modules code on initialization (e.g., by alternatives and
static-key), as would be done in the next patch. This was actually the
main motivation for this patch.
To avoid having W+X mappings, set them initially as RW (NX) and after
they are set as RO set them as X as well. Setting them as executable is
done as a separate step to avoid one core in which the old PTE is cached
(hence writable), and another which sees the updated PTE (executable),
which would break the W^X protection.
Suggested-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Rik van Riel <riel@surriel.com> Link: https://lkml.kernel.org/r/20190426001143.4983-12-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Since fc_remote_port_delete() must be called with interrupts enabled, do
not disable interrupts when calling that function. Remove the lockin calls
from around the put_sess() call. This is safe because the function that is
called when the final reference is dropped, qlt_unreg_sess(), grabs the
proper locks. This patch avoids that lockdep reports the following:
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
kworker/2:1/62 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: 0000000009e679b3 (&(&k->k_lock)->rlock){+.+.}, at: klist_next+0x43/0x1d0
and this task is already holding: 00000000a033b71c (&(&ha->tgt.sess_lock)->rlock){-...}, at: qla24xx_delete_sess_fn+0x55/0xf0 [qla2xxx_scst]
which would create a new lock dependency:
(&(&ha->tgt.sess_lock)->rlock){-...} -> (&(&k->k_lock)->rlock){+.+.}
but this new dependency connects a HARDIRQ-irq-safe lock:
(&(&ha->tgt.sess_lock)->rlock){-...}
This patch avoids that lockdep reports the following warning:
=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
5.1.0-rc1-dbg+ #11 Tainted: G W
-----------------------------------------------------
rmdir/1478 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: 00000000e7ac4607 (&(&k->k_lock)->rlock){+.+.}, at: klist_next+0x43/0x1d0
and this task is already holding: 00000000cf0baf5e (&(&ha->tgt.sess_lock)->rlock){-...}, at: tcm_qla2xxx_close_session+0x57/0xb0 [tcm_qla2xxx]
which would create a new lock dependency:
(&(&ha->tgt.sess_lock)->rlock){-...} -> (&(&k->k_lock)->rlock){+.+.}
but this new dependency connects a HARDIRQ-irq-safe lock:
(&(&ha->tgt.sess_lock)->rlock){-...}
Implementations of the .write_pending() callback functions must guarantee
that an appropriate LIO core callback function will be called immediately or
at a later time. Make sure that this guarantee is met for aborted SCSI
commands.
[mkp: typo]
Cc: Himanshu Madhani <hmadhani@marvell.com> Cc: Giridhar Malavali <gmalavali@marvell.com> Fixes: 694833ee00c4 ("scsi: tcm_qla2xxx: Do not allow aborted cmd to advance.") # v4.13. Fixes: a07100e00ac4 ("qla2xxx: Fix TMR ABORT interaction issue between qla2xxx and TCM") # v4.5. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>