Since 25aaa75df1e6 SDMA driver uses clock rates of "ipg" and "ahb"
clock to determine if it needs to configure the IP block as operating
at 1:1 or 1:2 clock ratio (ACR bit in SDMAARM_CONFIG). Specifying both
clocks as IMX7D_CLK_SDMA results in driver incorrectly thinking that
ratio is 1:1 which results in broken SDMA funtionality. Fix the code
to specify IMX7D_CLK_IPG as "ipg" clock for SDMA, to avoid detecting
incorrect clock ratio.
Since 25aaa75df1e6 SDMA driver uses clock rates of "ipg" and "ahb"
clock to determine if it needs to configure the IP block as operating
at 1:1 or 1:2 clock ratio (ACR bit in SDMAARM_CONFIG). Specifying both
clocks as IMX6SLL_CLK_SDMA result in driver incorrectly thinking that
ratio is 1:1 which results in broken SDMA funtionality. Fix the code
to specify IMX6SLL_CLK_IPG as "ipg" clock for SDMA, to avoid detecting
incorrect clock ratio.
Since 25aaa75df1e6 SDMA driver uses clock rates of "ipg" and "ahb"
clock to determine if it needs to configure the IP block as operating
at 1:1 or 1:2 clock ratio (ACR bit in SDMAARM_CONFIG). Specifying both
clocks as IMX6SL_CLK_SDMA results in driver incorrectly thinking that
ratio is 1:1 which results in broken SDMA funtionality. Fix the code
to specify IMX6SL_CLK_AHB as "ahb" clock for SDMA, to avoid detecting
incorrect clock ratio.
Since 25aaa75df1e6 SDMA driver uses clock rates of "ipg" and "ahb"
clock to determine if it needs to configure the IP block as operating
at 1:1 or 1:2 clock ratio (ACR bit in SDMAARM_CONFIG). Specifying both
clocks as IMX5_CLK_SDMA results in driver incorrectly thinking that
ratio is 1:1 which results in broken SDMA funtionality. Fix the code
to specify IMX5_CLK_AHB as "ahb" clock for SDMA, to avoid detecting
incorrect clock ratio.
Since 25aaa75df1e6 SDMA driver uses clock rates of "ipg" and "ahb"
clock to determine if it needs to configure the IP block as operating
at 1:1 or 1:2 clock ratio (ACR bit in SDMAARM_CONFIG). Specifying both
clocks as IMX5_CLK_SDMA results in driver incorrectly thinking that
ratio is 1:1 which results in broken SDMA funtionality. Fix the code
to specify IMX5_CLK_AHB as "ahb" clock for SDMA, to avoid detecting
incorrect clock ratio.
Since 25aaa75df1e6 SDMA driver uses clock rates of "ipg" and "ahb"
clock to determine if it needs to configure the IP block as operating
at 1:1 or 1:2 clock ratio (ACR bit in SDMAARM_CONFIG). Specifying both
clocks as IMX5_CLK_SDMA results in driver incorrectly thinking that
ratio is 1:1 which results in broken SDMA funtionality. Fix the code
to specify IMX5_CLK_AHB as "ahb" clock for SDMA, to avoid detecting
incorrect clock ratio.
The rk3288 SoC has two PWM implementations available, the "old"
implementation and the "new" one. You can switch between the two of
them by flipping a bit in the grf.
The "old" implementation is the default at chip power up but isn't the
one that's officially supposed to be used. ...and, in fact, the
driver that gets selected in Linux using the rk3288 device tree only
supports the "new" implementation.
Long ago I tried to get a switch to the right IP block landed in the
PWM driver (search for "rk3288: Switch to use the proper PWM IP") but
that got rejected. In the mean time the grf has grown a full-fledged
driver that already sets other random bits like this. That means we
can now get the fix landed.
For those wondering how things could have possibly worked for the last
4.5 years, folks have mostly been relying on the bootloader to set
this bit. ...but occasionally folks have pointed back to my old patch
series [1] in downstream kernels.
By default, for performance consideration, Intel IOMMU
driver won't flush IOTLB immediately after a buffer is
unmapped. It schedules a thread and flushes IOTLB in a
batched mode. This isn't suitable for untrusted device
since it still can access the memory even if it isn't
supposed to do so.
Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Xu Pengfei <pengfei.xu@intel.com> Tested-by: Mika Westerberg <mika.westerberg@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Historically the power supply management in this driver has been handled
in two separate places in parallel. Device-tree users simply defined an
appropriate regulator, while two boards with no DT support (da830-evm and
omapl138-hawk) passed functions defined in their respective board files
over platform data. These functions simply used legacy GPIO calls to
watch the oc GPIO for interrupts and disable the vbus GPIO when the irq
fires.
Commit d193abf1c913 ("usb: ohci-da8xx: add vbus and overcurrent gpios")
updated these GPIO calls to the modern API and moved them inside the
driver.
This however is not the optimal solution for the vbus GPIO which should
be modeled as a fixed regulator that can be controlled with a GPIO.
In order to keep the overcurrent protection available once we move the
board files to using fixed regulators we need to disable the enable_reg
regulator when the overcurrent indicator interrupt fires. Since we
cannot call regulator_disable() from interrupt context, we need to
switch to using a oneshot threaded interrupt.
Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Experimentally it can be seen that going into deep sleep (specifically
setting PMU_CLR_DMA and PMU_CLR_BUS in RK3288_PMU_PWRMODE_CON1)
appears to fail unless "aclk_dmac1" is on. The failure is that the
system never signals that it made it into suspend on the GLOBAL_PWROFF
pin and it just hangs.
NOTE that it's confirmed that it's the actual suspend that fails, not
one of the earlier calls to read/write registers. Specifically if you
comment out the "PMU_GLOBAL_INT_DISABLE" setting in
rk3288_slp_mode_set() and then comment out the "cpu_do_idle()" call in
rockchip_lpmode_enter() then you can exercise the whole suspend path
without any crashing.
This is currently not a problem with suspend upstream because there is
no current way to exercise the deep suspend code. However, anyone
trying to make it work will run into this issue.
This was not a problem on shipping rk3288-based Chromebooks because
those devices all ran on an old kernel based on 3.14. On that kernel
"aclk_dmac1" appears to be left on all the time.
There are several ways to skin this problem.
A) We could add "aclk_dmac1" to the list of critical clocks and that
apperas to work, but presumably that wastes power.
B) We could keep a list of "struct clk" objects to enable at suspend
time in clk-rk3288.c and use the standard clock APIs.
C) We could make the rk3288-pmu driver keep a list of clocks to enable
at suspend time. Presumably this would require a dts and bindings
change.
D) We could just whack the clock on in the existing syscore suspend
function where we whack a bunch of other clocks. This is particularly
easy because we know for sure that the clock's only parent
("aclk_cpu") is a critical clock so we don't need to do anything more
than ungate it.
In this case I have chosen D) because it seemed like the least work,
but any of the other options would presumably also work fine.
Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Elaine Zhang <zhangqing@rock-chips.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
When building with -Wsometimes-uninitialized, Clang warns:
drivers/soc/mediatek/mtk-pmic-wrap.c:1358:6: error: variable 'rdata' is
used uninitialized whenever '||' condition is true
[-Werror,-Wsometimes-uninitialized]
If pwrap_write returns non-zero, pwrap_read will not be called to
initialize rdata, meaning that we will use some random uninitialized
stack value in our print statement. Zero initialize rdata in case this
happens.
hook_fault_code() is an ARM32 specific API for hooking into data abort.
AM65X platforms (that integrate ARM v8 cores and select CONFIG_ARM64 as
arch) rely on pci-keystone.c but on them the enumeration of a
non-present BDF does not trigger a bus error, so the fixup exception
provided by calling hook_fault_code() is not needed and can be guarded
with CONFIG_ARM.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
[lorenzo.pieralisi@arm.com: commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
SERDES connected to the PCIe controller in AM654 requires
power on reset enable (POR_EN) to be set in the SERDES. The
SERDES driver sets POR_EN in the reset ops and it has to be
invoked before init or enable ops. In order for SERDES driver
to set POR_EN, invoke the phy_reset() API in pci-keystone driver.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
As new transfer mechanisms are added to the EC codebase, they may
not support v2 of the EC protocol.
If the v3 initial handshake transfer fails, the kernel will try
and call cmd_xfer as a fallback. If v2 is not supported, cmd_xfer
will be NULL, and the code will end up causing a kernel panic.
Add a check for NULL before calling the transfer function, along
with a helpful comment explaining how one might end up in this
situation.
The accumulator sample register is signed 32-bits wide register on
droid 4. And only the earlier version of cpcap has a signed 24-bits
wide register. We're currently passing it around as unsigned, so
let's fix that and use sign_extend32() for the earlier revision.
Signed-off-by: Tony Lindgren <tony@atomide.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Added a new local variable in the i40e_setup_tc function named
old_queue_pairs so num_queue_pairs can be restored to the correct
value in case configuring queue channels fails. Additionally, moved
the exit label in the i40e_setup_tc function so the if (need_reset)
block can be executed.
Also, fixed data packing in the i40e_setup_tc function.
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 5f84bb1a4099 ("soc/tegra: pmc: Add sysfs entries for reset info")
added sysfs entries for Tegra reset source and level. However, these
sysfs are not removed on error and so if the registering of PMC device
is probe deferred, then the next time we attempt to probe the PMC device
warnings such as the following will be displayed on boot ...
Fix this by calling device_remove_file() for each sysfs entry added on
failure. Note that we call device_remove_file() unconditionally without
checking if the sysfs entry was created in the first place, but this
should be OK because kernfs_remove_by_name_ns() will fail silently.
In pcibios_irq_init(), the PCI IRQ routing table 'pirq_table' is first
found through pirq_find_routing_table(). If the table is not found and
CONFIG_PCI_BIOS is defined, the table is then allocated in
pcibios_get_irq_routing_table() using kmalloc(). Later, if the I/O APIC is
used, this table is actually not used. In that case, the allocated table
is not freed, which is a memory leak.
Free the allocated table if it is not used.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
[bhelgaas: added Ingo's reviewed-by, since the only change since v1 was to
use the irq_routing_table local variable name he suggested] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
The XDomain protocol messages may start as soon as Thunderbolt control
channel is started. This means that if the other host starts sending
ThunderboltIP packets early enough they will be passed to the network
driver which then gets confused because its resume hook is not called
yet.
Fix this by unregistering the ThunderboltIP protocol handler when
suspending and registering it back on resume.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When running application tool switchtec-user's `firmware update` and `event
wait` commands concurrently, sometimes the firmware update speed reduced
significantly.
It is because when the MRPC event happened after MRPC event occurrence
check but before the event mask loop reaches its header register in event
ISR, the MRPC event would be masked unintentionally. Since there's no
chance to enable it again except for a module reload, all the following
MRPC execution completion checks time out.
Fix this bug by skipping the mask operation for MRPC event in event ISR,
same as what we already do for LINK event.
Disabling the SMMU when probing from within a kdump kernel so that all
incoming transactions are terminated can prevent the core of the crashed
kernel from being transferred off the machine if all I/O devices are
behind the SMMU.
Instead, continue to probe the SMMU after it is disabled so that we can
reinitialise it entirely and re-attach the DMA masters as they are reset.
Since the kdump kernel may not have drivers for all of the active DMA
masters, we suppress fault reporting to avoid spamming the console and
swamping the IRQ threads.
vfio_dev_present() which is the condition to
wait_event_interruptible_timeout(), will call vfio_group_get_device
and try to acquire the mutex group->device_lock.
wait_event_interruptible_timeout() will set the state of the current
task to TASK_INTERRUPTIBLE, before doing the condition check. This
means that we will try to acquire the mutex while already in a
sleeping state. The scheduler warns us by giving the following
warning:
[ 4050.264464] ------------[ cut here ]------------
[ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188
[ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 __might_sleep+0x76/0x90
....
clang warns that 'contextlen' may be accessed without an initialization:
fs/nfsd/nfs4xdr.c:2911:9: error: variable 'contextlen' is uninitialized when used here [-Werror,-Wuninitialized]
contextlen);
^~~~~~~~~~
fs/nfsd/nfs4xdr.c:2424:16: note: initialize the variable 'contextlen' to silence this warning
int contextlen;
^
= 0
Presumably this cannot happen, as FATTR4_WORD2_SECURITY_LABEL is
set if CONFIG_NFSD_V4_SECURITY_LABEL is enabled.
Adding another #ifdef like the other two in this function
avoids the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
A fuzzer recently triggered lockdep warnings about potential sb_writers
deadlocks caused by fh_want_write().
Looks like we aren't careful to pair each fh_want_write() with an
fh_drop_write().
It's not normally a problem since fh_put() will call fh_drop_write() for
us. And was OK for NFSv3 where we'd do one operation that might call
fh_want_write(), and then put the filehandle.
But an NFSv4 protocol fuzzer can do weird things like call unlink twice
in a compound, and then we get into trouble.
I'm a little worried about this approach of just leaving everything to
fh_put(). But I think there are probably a lot of
fh_want_write()/fh_drop_write() imbalances so for now I think we need it
to be more forgiving.
Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
FUSE filesystem server and kernel client negotiate during initialization
phase, what should be the maximum write size the client will ever issue.
Correspondingly the filesystem server then queues sys_read calls to read
requests with buffer capacity large enough to carry request header + that
max_write bytes. A filesystem server is free to set its max_write in
anywhere in the range between [1*page, fc->max_pages*page]. In particular
go-fuse[2] sets max_write by default as 64K, wheres default fc->max_pages
corresponds to 128K. Libfuse also allows users to configure max_write, but
by default presets it to possible maximum.
If max_write is < fc->max_pages*page, and in NOTIFY_RETRIEVE handler we
allow to retrieve more than max_write bytes, corresponding prepared
NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the
filesystem server, in full correspondence with server/client contract, will
be only queuing sys_read with ~max_write buffer capacity, and
fuse_dev_do_read throws away requests that cannot fit into server request
buffer. In turn the filesystem server could get stuck waiting indefinitely
for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK which is
understood by clients as that NOTIFY_REPLY was queued and will be sent
back.
Cap requested size to negotiate max_write to avoid the problem. This
aligns with the way NOTIFY_RETRIEVE handler works, which already
unconditionally caps requested retrieve size to fuse_conn->max_pages. This
way it should not hurt NOTIFY_RETRIEVE semantic if we return less data than
was originally requested.
Please see [1] for context where the problem of stuck filesystem was hit
for real, how the situation was traced and for more involving patch that
did not make it into the tree.
The device tree binding already lists compatible strings for these two
SoCs. They don't have the defect as seen on the H3, and the size and
register layout is the same as the A64. Furthermore, the driver does
not include nvmem cell definitions.
Add support for these two compatible strings, re-using the config for
the A64.
When the bit_offset in the cell is zero, the pointer to the msb will
not be properly initialized (ie, will still be pointing to the first
byte in the buffer).
This being the case, if there are bits to clear in the msb, those will
be left untouched while the mask will incorrectly clear bit positions
on the first byte.
This commit also makes sure that any byte unused in the cell is
cleared.
Commit 1f4fa50dd48f ("arm64: defconfig: Regenerate for v4.20") "fixed"
that by reverting to 'CONFIG_SCSI_UFS_HISI=m'.
Thing is, if the rootfs is stored in the on-board flash (which
is the "canonical" way of doing things), we either need these drivers
to be built-in, or we need to fiddle with an initramfs to access that
flash and eventually load the modules installed over there.
The former is the easiest, do that.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
When removing memory we need to remove the memory from the node
it was added to instead of looking up the node it should be in
in the device tree.
During testing we have seen scenarios where the affinity for a
LMB changes due to a partition migration or PRRN event. In these
cases the node the LMB exists in may not match the node the device
tree indicates it belongs in. This can lead to a system crash
when trying to DLPAR remove the LMB after a migration or PRRN
event. The current code looks up the node in the device tree to
remove the LMB from, the crash occurs when we try to offline this
node and it does not have any data, i.e. node_data[nid] == NULL.
To resolve this we need to track the node a LMB belongs to when
it is added to the system so we can remove it from that node instead
of the node that the device tree indicates it should belong to.
Currently the IRQ handler in HD-audio controller driver is registered
before the chip initialization. That is, we have some window opened
between the azx_acquire_irq() call and the CORB/RIRB setup. If an
interrupt is triggered in this small window, the IRQ handler may
access to the uninitialized RIRB buffer, which leads to a NULL
dereference Oops.
This is usually no big problem since most of Intel chips do register
the IRQ via MSI, and we've already fixed the order of the IRQ
enablement and the CORB/RIRB setup in the former commit b61749a89f82
("sound: enable interrupt after dma buffer initialization"), hence the
IRQ won't be triggered in that room. However, some platforms use a
shared IRQ, and this may allow the IRQ trigger by another source.
Another possibility is the kdump environment: a stale interrupt might
be present in there, the IRQ handler can be falsely triggered as well.
For covering this small race, let's move the azx_acquire_irq() call
after hda_intel_init_chip() call. Although this is a bit radical
change, it can cover more widely than checking the CORB/RIRB setup
locally in the callee side.
flow_offload_alloc() calls nf_route() to get a dst_entry. Internally,
nf_route() calls ip_route_output_key() that allocates a dst_entry and
holds it. So, a dst_entry should be released by dst_release() if
nf_route() is successful.
Otherwise, netns exit routine cannot be finished and the following
message is printed:
[ 257.490952] unregister_netdevice: waiting for lo to become free. Usage count = 1
We do not restart a controller in a deleting state for timeout errors.
When in this state, unblock potential request dispatchers with failed
completions by shutting down the controller on timeout detection.
Reported-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Just like IO queues, the admin queue also will not be restarted after a
controller shutdown. Unquiesce this queue so that we do not block
request dispatch on a permanently disabled controller.
Reported-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Certain platforms like K2G reguires the outbound ATU window to be
aligned. The alignment size is already present in mem->page_size.
Use the alignment size present in mem->page_size to configure an
aligned ATU window. In order to raise an interrupt, CPU has to write
to address offset from the start of the window unlike before where
writes were always to the beginning of the ATU window.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 834b90519925 ("misc: pci_endpoint_test: Add support for
PCI_ENDPOINT_TEST regs to be mapped to any BAR") while adding
test_reg_bar in order to map PCI_ENDPOINT_TEST regs to be mapped to any
BAR failed to update test_reg_bar in pci_endpoint_test, resulting in
test_reg_bar having invalid value when used outside probe.
Fix it.
Fixes: 834b90519925 ("misc: pci_endpoint_test: Add support for PCI_ENDPOINT_TEST regs to be mapped to any BAR") Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The intel_iommu_gfx_mapped flag is exported by the Intel
IOMMU driver to indicate whether an IOMMU is used for the
graphic device. In a virtualized IOMMU environment (e.g.
QEMU), an include-all IOMMU is used for graphic device.
This flag is found to be clear even the IOMMU is used.
Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Reported-by: Zhenyu Wang <zhenyuw@linux.intel.com> Fixes: c0771df8d5297 ("intel-iommu: Export a flag indicating that the IOMMU is used for iGFX.") Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
With holding queue's kobject refcount, it is safe for driver
to schedule requeue. However, blk_mq_kick_requeue_list() may
be called after blk_sync_queue() is done because of concurrent
requeue activities, then requeue work may not be completed when
freeing queue, and kernel oops is triggered.
So moving the cancel of requeue_work into blk_mq_release() for
avoiding race between requeue and freeing queue.
Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: James Smart <james.smart@broadcom.com> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: linux-scsi@vger.kernel.org, Cc: Martin K . Petersen <martin.petersen@oracle.com>, Cc: Christoph Hellwig <hch@lst.de>, Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>, Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
CONFIG_WATCHDOG_PRETIMEOUT_GOV build symbol adds watchdog_pretimeout.o
object to watchdog.o, the latter is compiled only if CONFIG_WATCHDOG_CORE
is selected, so it rightfully makes sense to add it as a dependency.
The change fixes the next compilation errors, if CONFIG_WATCHDOG_CORE=n
and CONFIG_WATCHDOG_PRETIMEOUT_GOV=y are selected:
drivers/watchdog/pretimeout_noop.o: In function `watchdog_gov_noop_register':
drivers/watchdog/pretimeout_noop.c:35: undefined reference to `watchdog_register_governor'
drivers/watchdog/pretimeout_noop.o: In function `watchdog_gov_noop_unregister':
drivers/watchdog/pretimeout_noop.c:40: undefined reference to `watchdog_unregister_governor'
drivers/watchdog/pretimeout_panic.o: In function `watchdog_gov_panic_register':
drivers/watchdog/pretimeout_panic.c:35: undefined reference to `watchdog_register_governor'
drivers/watchdog/pretimeout_panic.o: In function `watchdog_gov_panic_unregister':
drivers/watchdog/pretimeout_panic.c:40: undefined reference to `watchdog_unregister_governor'
The documentated behavior is: if max_hw_heartbeat_ms is implemented, the
minimum of the set_timeout argument and max_hw_heartbeat_ms should be used.
This patch implements this behavior.
Previously only the first 7bits were used and the input argument was
returned.
Following splat gets triggered when nfnetlink monitor is running while
xtables-nft selftests are running:
net/netfilter/nf_tables_api.c:1272 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
1 lock held by xtables-nft-mul/27006:
#0: 00000000e0f85be9 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1a/0x50
Call Trace:
nf_tables_fill_chain_info.isra.45+0x6cc/0x6e0
nf_tables_chain_notify+0xf8/0x1a0
nf_tables_commit+0x165c/0x1740
nf_tables_fill_chain_info() can be called both from dumps (rcu read locked)
or from the transaction path if a userspace process subscribed to nftables
notifications.
In the 'table dump' case, rcu_access_pointer() cannot be used: We do not
hold transaction mutex so the pointer can be NULLed right after the check.
Just unconditionally fetch the value, then have the helper return
immediately if its NULL.
In the notification case we don't hold the rcu read lock, but updates are
prevented due to transaction mutex. Use rcu_dereference_check() to make lockdep
aware of this.
There are situations when memory regions coming from dts may be
too big for the platform physical address space. This especially
concerns XPA-capable systems. Bootloader may determine more than 4GB
memory available and pass it to the kernel over dts memory node, while
kernel is built without XPA/64BIT support. In this case the region
may either simply be truncated by add_memory_region() method
or by u64->phys_addr_t type casting. But in worst case the method
can even drop the memory region if it exceeds PHYS_ADDR_MAX size.
So lets make sure the retrieved from dts memory regions are valid,
and if some of them aren't, just manually truncate them with a warning
printed out.
Signed-off-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Thomas Bogendoerfer <tbogendoerfer@suse.de> Cc: Huacai Chen <chenhc@lemote.com> Cc: Stefan Agner <stefan@agner.ch> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Juergen Gross <jgross@suse.com> Cc: Serge Semin <Sergey.Semin@t-platforms.ru> Cc: linux-mips@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
Since commit bc7d811ace4a ("netfilter: nf_ct_h323: Convert
CHECK_BOUND macro to function"), NAT traversal for H.323
doesn't work, failing to parse H323-UserInformation.
nf_h323_error_boundary() compares contents of the bitstring,
not the addresses, preventing valid H.323 packets from being
conntrack'd.
This looks like an oversight from when CHECK_BOUND macro was
converted to a function.
To fix it, stop dereferencing bs->cur and bs->end.
Fixes: bc7d811ace4a ("netfilter: nf_ct_h323: Convert CHECK_BOUND macro to function") Signed-off-by: Jakub Jankowski <shasta@toxcorp.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
rhashtable_insert_fast() may return an error value when memory
allocation fails, but flow_offload_add() does not check for errors.
This patch just adds missing error checking.
The IRQ handler, mmci_irq(), loops until all status bits have been cleared.
However, the status bit signaling busy in variant->busy_detect_flag, may be
set even if busy detection isn't monitored for the current request.
This may be the case for the CMD11 when switching the I/O voltage, which
leads to that mmci_irq() busy loops in IRQ context. Fix this problem, by
clearing the status bit for busy, before continuing to validate the
condition for the loop. This is safe, because the busy status detection has
already been taken care of by mmci_cmd_irq().
Overlayfs "fake" path is used for stacked file operations on underlying
files. Operations on files with "fake" path must not generate fsnotify
events with path data, because those events have already been generated at
overlayfs layer and because the reported event->fd for fanotify marks on
underlying inode/filesystem will have the wrong path (the overlayfs path).
When the logo is currently drawn on a virtual console, and the console
loglevel is reduced to quiet, logo_shown must be left alone, so that it
the scrolling region on that virtual console is properly reset.
Fixes: 10993504d647 ("fbcon: Silence fbcon logo on 'quiet' boots") Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Yisheng Xie <ysxie@foxmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Marko Myllynen <myllynen@redhat.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Thierry Reding <treding@nvidia.com> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
random: get_random_bytes called from init_oops_id+0x27/0x34 with crng_init=0
---[ end trace 00173d0117a88acb ]---
Calibrating delay loop... 6941.90 BogoMIPS (lpj=34709504)
Signed-off-by: Maciej Żenczykowski <maze@google.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: linux-um@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
In configfs_register_group(), if create_default_group() failed, we
forget to unlink the group. It will left a invalid item in the parent list,
which may trigger the use-after-free issue seen below:
BUG: KASAN: use-after-free in __list_add_valid+0xd4/0xe0 lib/list_debug.c:26
Read of size 8 at addr ffff8881ef61ae20 by task syz-executor.0/5996
Memory state around the buggy address: ffff8881ef61ad00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8881ef61ad80: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
>ffff8881ef61ae00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff8881ef61ae80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8881ef61af00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 5cf6a51e6062 ("configfs: allow dynamic group creation") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
In free_percpu() we sometimes call pcpu_schedule_balance_work() to
queue a work item (which does a wakeup) while holding pcpu_lock.
This creates an unnecessary lock dependency between pcpu_lock and
the scheduler's pi_lock. There are other places where we call
pcpu_schedule_balance_work() without hold pcpu_lock, and this case
doesn't need to be different.
Moving the call outside the lock prevents the following lockdep splat
when running tools/testing/selftests/bpf/{test_maps,test_progs} in
sequence with lockdep enabled:
======================================================
WARNING: possible circular locking dependency detected
5.1.0-dbg-DEV #1 Not tainted
------------------------------------------------------
kworker/23:255/18872 is trying to acquire lock: 000000000bc79290 (&(&pool->lock)->rlock){-.-.}, at: __queue_work+0xb2/0x520
but task is already holding lock: 00000000e3e7a6aa (pcpu_lock){..-.}, at: free_percpu+0x36/0x260
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
The subsystem will free the asd memory on notifier cleanup, if the asd is
added to the notifier.
However the memory is freed using kfree.
Thus, we cannot allocate the asd using devm_*
This can lead to crashes and problems.
To test this issue, just return an error at probe, but cleanup the
notifier beforehand.
Fixes: 106267444f ("[media] atmel-isc: add the Image Sensor Controller code") Signed-off-by: Eugen Hristev <eugen.hristev@microchip.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
On below paths, we can have opportunity to readahead inode page
- gc_node_segment -> f2fs_ra_node_page
- gc_data_segment -> f2fs_ra_node_page
- f2fs_fill_dentries -> f2fs_ra_node_page
Unlike synchronized read, on readahead path, we can set page uptodate
before verifying page's checksum, then read_node_page() will trigger
kernel panic once it encounters a uptodated page w/ incorrect checksum.
So considering readahead scenario, we have to do checksum each time
when loading inode page even if it is uptodated.
[ASSERT] (f2fs_check_dirent_position:1315) --> Wrong position of dirent pino:1970, name: (...)
level:8, dir_level:0, pgofs:951, correct range:[900, 901]
In old kernel, inline data and directory always reserved 200 bytes in
inode layout, even if inline_xattr is disabled, then new kernel tries
to retrieve that space for non-inline xattr inode, but for inline dentry,
its layout size should be fixed, so we just keep that reserved space.
But the problem here is that, after inline dentry conversion, inline
dentry layout no longer exists, if we still reserve inline xattr space,
after dents updates, there will be a hole in inline xattr space, which
can break hierarchy hash directory structure.
This patch fixes this issue by retrieving inline xattr space after
inline dentry conversion.
Fixes: 6afc662e68b5 ("f2fs: support flexible inline xattr size") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
- Overview
When mounting the attached crafted image and making a new file, I got this error and the error messages keep repeating.
The image is intentionally fuzzed from a normal f2fs image for testing and I run with option CONFIG_F2FS_CHECK_FS on.
- Reproduces
mkdir test
mount -t f2fs tmp.img test
cd test
touch t
- Messages
[ 58.820451] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.821485] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.822530] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.823571] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.824616] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.825640] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.826663] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.827698] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.828719] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.829759] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.830783] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.831828] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.832869] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.833888] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.834945] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.835996] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.837028] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.838051] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.839072] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.840100] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.841147] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.842186] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.843214] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.844267] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.845282] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.846305] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.847341] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
... (repeating)
During GC, if segment type stored in SSA and SIT is inconsistent, we just
skip migrating current segment directly, since we need to know the exact
type to decide the migration function we use.
So in foreground GC, we will easily run into a infinite loop as we may
select the same victim segment which has inconsistent type due to greedy
policy. In order to end up this, we choose to shutdown filesystem. For
backgrond GC, we need to do that as well, so that we can avoid latter
potential infinite looped foreground GC.
- Overview
When mounting the attached crafted image and running program, following errors are reported.
Additionally, it hangs on sync after running program.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
cc poc_13.c
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
sync
sit.vblocks and sum valid block count in sit.valid_map may be
inconsistent, segment w/ zero vblocks will be treated as free
segment, while allocating in free segment, we may allocate a
free block, if its bitmap is valid previously, it can cause
kernel crash due to bitmap verification failure.
Anyway, to avoid further serious metadata inconsistence and
corruption, it is necessary and worth to detect SIT
inconsistence. So let's enable check_block_count() to verify
vblocks and valid_map all the time rather than do it only
CONFIG_F2FS_CHECK_FS is enabled.
- Overview
When mounting the attached crafted image and running program, I got this error.
Additionally, it hangs on sync after running the this script.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
sync
We may miss xattr data with below testcase:
- mkdir dir
- setfattr -n "user.name" -v 0 dir
- for ((i = 0; i < 190; i++)) do touch dir/$i; done
- umount
- mount
- getfattr -n "user.name" dir
user.name: No such attribute
The root cause is that we persist xattr data into reserved inline xattr
space, even if inline_xattr is not enable in inline directory inode, after
inline dentry conversion, reserved space no longer exists, so that xattr
data missed.
Let's use inline xattr space only if inline_xattr flag is set on inode
to fix this iusse.
Fixes: 6afc662e68b5 ("f2fs: support flexible inline xattr size") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
During inode loading, __recover_inline_status() can recovery inode status
and set inode dirty, once we failed in following process, it will fail
the check in f2fs_evict_inode, result in trigger BUG_ON().
Let's clear dirty inode in error path of f2fs_iget() to avoid panic.
- Overview
When mounting the attached crafted image and unmounting it, following errors are reported.
Additionally, it hangs on sync after unmounting.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
mkdir test
mount -t f2fs tmp.img test
touch test/t
umount test
sync
NAT table is corrupted, so reserved meta/node inode ids were added into
free list incorrectly, during file creation, since reserved id has cached
in inode hash, so it fails the creation and preallocated nid can not be
released later, result in kernel panic.
To fix this issue, let's do nid boundary check during free nid loading.
- Overview
When mounting the attached crafted image and running program, following errors are reported.
Additionally, it hangs on sync after running program.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Overview
When mounting the attached crafted image, following errors are reported.
Additionally, it hangs on sync after trying to mount it.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
mkdir test
mount -t f2fs tmp.img test
sync
During recovery, if ofs_of_node is inconsistent in between recovered
node page and original checkpointed node page, let's just fail recovery
instead of making kernel panic.
The ADJ_TAI adjtimex mode sets the TAI-UTC offset of the system clock.
It is typically set by NTP/PTP implementations and it is automatically
updated by the kernel on leap seconds. The initial value is zero (which
applications may interpret as unknown), but this value cannot be set by
adjtimex. This limitation seems to go back to the original "nanokernel"
implementation by David Mills.
Change the ADJ_TAI check to accept zero as a valid TAI-UTC offset in
order to allow setting it back to the initial value.
Fixes: 153b5d054ac2 ("ntp: support for TAI") Suggested-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: https://lkml.kernel.org/r/20190417084833.7401-1-mlichvar@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
On failure of_irq_get() returns a negative value or zero, which is
not handled as an error in the existing implementation.
Instead of using this API, use platform_get_irq() that returns
exclusively a negative value on failure.
Also, do not output an error log in case of defer probe error.
Holding the spin-lock for all of the code in meson_pwm_apply() can
result in a "BUG: scheduling while atomic". This can happen because
clk_get_rate() (which is called from meson_pwm_calc()) may sleep.
Only hold the spin-lock when modifying registers to solve this.
The reason why we need a spin-lock in the driver is because the
REG_MISC_AB register is shared between the two channels provided by one
PWM controller. The only functions where REG_MISC_AB is modified are
meson_pwm_enable() and meson_pwm_disable() so the register reads/writes
in there need to be protected by the spin-lock.
The original code also used the spin-lock to protect the values in
struct meson_pwm_channel. This could be necessary if two consumers can
use the same PWM channel. However, PWM core doesn't allow this so we
don't need to protect the values in struct meson_pwm_channel with a
lock.
Fixes: 211ed630753d2f ("pwm: Add support for Meson PWM Controller") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 31fd85816dbe ("bpf: permits narrower load from bpf program
context fields") made the verifier add AND instructions to clear the
unwanted bits with a mask when doing a narrow load. The mask is
computed with
(1 << size * 8) - 1
where "size" is the size of the narrow load. When doing a 4 byte load
of a an 8 byte field the verifier shifts the literal 1 by 32 places to
the left. This results in an overflow of a signed integer, which is an
undefined behavior. Typically, the computed mask was zero, so the
result of the narrow load ended up being zero too.
Cast the literal to long long to avoid overflows. Note that narrow
load of the 4 byte fields does not have the undefined behavior,
because the load size can only be either 1 or 2 bytes, so shifting 1
by 8 or 16 places will not overflow it. And reading 4 bytes would not
be a narrow load of a 4 bytes field.
Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields") Reviewed-by: Alban Crequy <alban@kinvolk.io> Reviewed-by: Iago López Galeiras <iago@kinvolk.io> Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io> Cc: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Cursor position updates were accidentally causing us to attempt to interlock
window with window immediate, and without a matching window immediate update,
NVDisplay could hang forever in some circumstances.
Fixes suspend/resume on (at least) Quadro RTX4000 (TU104).
Reported-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The ignore flag is set on fake jumps in order to keep
add_jump_destinations() from setting their jump_dest, since it already
got set when the fake jump was created.
But using the ignore flag is a bit of a hack. It's normally used to
skip validation of an instruction, which doesn't really make sense for
fake jumps.
Also, after the next patch, using the ignore flag for fake jumps can
trigger a false "why am I validating an ignored function?" warning.
Instead just add an explicit check in add_jump_destinations() to skip
fake jumps.
The driver currently sets register 0xfb (Low Refresh Rate) based on the
value of mode->vrefresh. Firstly, this field is specified to be in Hz,
but the magic numbers used by the code are Hz * 1000. This essentially
leads to the low refresh rate always being set to 0x01, since the
vrefresh value will always be less than 24000. Fix the magic numbers to
be in Hz.
Secondly, according to the comment in drm_modes.h, the field is not
supposed to be used in a functional way anyway. Instead, use the helper
function drm_mode_vrefresh().
Fixes: 9c8af882bf12 ("drm: Add adv7511 encoder driver") Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Matt Redfearn <matt.redfearn@thinci.com> Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190424132210.26338-1-matt.redfearn@thinci.com Signed-off-by: Sasha Levin <sashal@kernel.org>
nv50_head_atomic_duplicate_state() makes a copy of nv50_head_atom
struct. This patch adds copying of struct member named "or", which
previously was left uninitialized in the duplicated structure.
Due to this bug, incorrect nhsync and nvsync values were sometimes used.
In my particular case, that lead to a mismatch between the output
resolution of the graphics device (GeForce GT 630 OEM) and the reported
input signal resolution on the display. xrandr reported 1680x1050, but
the display reported 1280x1024. As a result of this mismatch, the output
on the display looked like it was cropped (only part of the output was
actually visible on the display).
git bisect pointed to commit 2ca7fb5c1cc6 ("drm/nouveau/kms/nv50: handle
SetControlOutputResource from head"), which added the member "or" to
nv50_head_atom structure, but forgot to copy it in
nv50_head_atomic_duplicate_state().
Fixes: 2ca7fb5c1cc6 ("drm/nouveau/kms/nv50: handle SetControlOutputResource from head") Signed-off-by: Peteris Rudzusiks <peteris.rudzusiks@gmail.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
HW has error checks in place which check that pixel depth is explicitly
provided on DP, while HDMI has a "default" setting that we use.
In multi-display configurations with identical modelines, but different
protocols (HDMI + DP, in this case), it was possible for the DP head to
get swapped to the head which previously drove the HDMI output, without
updating HeadSetControlOutputResource(), triggering the error check and
hanging the core update.
Reported-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
583feb08e7f7 ("perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS")
The original patch prevented using multi-entry PEBS when wakeup_events != 0.
However given that wakeup_events is part of a union with wakeup_watermark, it
means that in watermark mode, PEBS multi-entry is also disabled which is not the
intent. This patch fixes this by checking is watermark mode is enabled.
I noticed that we can get a -EREMOTEIO errors on at least omap4 duovero:
twl6040 0-004b: Failed to write 2d = 19: -121
And then any following register access will produce errors.
There 2d offset above is register ACCCTL that gets written on twl6040
powerup. With error checking added to the related regcache_sync() call,
the -EREMOTEIO error is reproducable on twl6040 powerup at least
duovero.
To fix the error, we need to wait until twl6040 is accessible after the
powerup. Based on tests on omap4 duovero, we need to wait over 8ms after
powerup before register write will complete without failures. Let's also
make sure we warn about possible errors too.
Note that we have twl6040_patch[] reg_sequence with the ACCCTL register
configuration and regcache_sync() will write the new value to ACCCTL.
Signed-off-by: Tony Lindgren <tony@atomide.com> Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Where possible, we want the failsafe link configuration (one which won't
hang the OR during modeset because of not enough bandwidth for the mode)
to also be supported by the sink.
This prevents "link rate unsupported by sink" messages when link training
fails.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
To avoid interrupt storm, set the device in reset state
before bringing out the device from reset state.
Changelog v2:
- correct the subject line by adding "mfd: "
Signed-off-by: Binbin Wu <binbin.wu@intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
MODULE_DEVICE_TABLE(of, <of_match_table> should be called to complete DT
OF mathing mechanism and register it.
Before this patch:
modinfo drivers/mfd/tps65912-spi.ko | grep alias
alias: spi:tps65912
After this patch:
modinfo drivers/mfd/tps65912-spi.ko | grep alias
alias: of:N*T*Cti,tps65912C*
alias: of:N*T*Cti,tps65912
alias: spi:tps65912
Reported-by: Javier Martinez Canillas <javier@dowhile0.org> Signed-off-by: Daniel Gomez <dagmcr@gmail.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently IRQ remains enabled after .remove, later if device is probed,
IRQ is requested before .thermal_init, this may cause IRQ function be
called before device is initialized.
this patch disables interrupt in .remove, to ensure irq function
only be called after device is fully initialized.
Signed-off-by: Jiada Wang <jiada_wang@mentor.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While validating new map we require the @start_data to be strictly less
than @end_data, which is fine for regular applications (this is why this
nit didn't trigger for that long). These members are set from executable
loaders such as elf handers, still it is pretty valid to have a loadable
data section with zero size in file, in such case the start_data is equal
to end_data once kernel loader finishes.
As a result when we're trying to restore such programs the procedure fails
and the kernel returns -EINVAL. From the image dump of a program:
"cat /proc/slab_allocators" could hang forever on SMP machines with
kmemleak or object debugging enabled due to other CPUs running do_drain()
will keep making kmemleak_object or debug_objects_cache dirty and unable
to escape the first loop in leaks_show(),
do {
set_store_user_clean(cachep);
drain_cpu_caches(cachep);
...
One approach is to check cachep->name and skip both kmemleak_object and
debug_objects_cache in leaks_show(). The other is to set store_user_clean
after drain_cpu_caches() which leaves a small window between
drain_cpu_caches() and set_store_user_clean() where per-CPU caches could
be dirty again lead to slightly wrong information has been stored but
could also speed up things significantly which sounds like a good
compromise. For example,
If not find zero bit in find_next_zero_bit(), it will return the size
parameter passed in, so the start bit should be compared with bitmap_maxno
rather than cma->count. Although getting maxchunk is working fine due to
zero value of order_per_bit currently, the operation will be stuck if
order_per_bit is set as non-zero.
Link: http://lkml.kernel.org/r/20190319092734.276-1-zbestahu@gmail.com Signed-off-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Joe Perches <joe@perches.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Safonov <d.safonov@partner.samsung.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
MADV_DONTNEED is handled with mmap_sem taken in read mode. We call
page_mkclean without holding mmap_sem.
MADV_DONTNEED implies that pages in the region are unmapped and subsequent
access to the pages in that range is handled as a new page fault. This
implies that if we don't have parallel access to the region when
MADV_DONTNEED is run we expect those range to be unallocated.
w.r.t page_mkclean() we need to make sure that we don't break the
MADV_DONTNEED semantics. MADV_DONTNEED check for pmd_none without holding
pmd_lock. This implies we skip the pmd if we temporarily mark pmd none.
Avoid doing that while marking the page clean.
Keep the sequence same for dax too even though we don't support
MADV_DONTNEED for dax mapping
The bug was noticed by code review and I didn't observe any failures w.r.t
test run. This is similar to
Currently one bit in cma bitmap represents number of pages rather than
one page, cma->count means cma size in pages. So to find available pages
via find_next_zero_bit()/find_next_bit() we should use cma size not in
pages but in bits although current free pages number is correct due to
zero value of order_per_bit. Once order_per_bit is changed the bitmap
status will be incorrect.
The size input in cma_debug_show_areas() is not correct. It will
affect the available pages at some position to debug the failure issue.
This is an example with order_per_bit = 1
Before this change:
[ 4.120060] cma: number of available pages: 1@93+4@108+7@121+7@137+7@153+7@169+7@185+7@201+3@213+3@221+3@229+3@237+3@245+3@253+3@261+3@269+3@277+3@285+3@293+3@301+3@309+3@317+3@325+19@333+15@369+512@512=> 638 free of 1024 total pages
After this change:
[ 4.143234] cma: number of available pages: 2@93+8@108+14@121+14@137+14@153+14@169+14@185+14@201+6@213+6@221+6@229+6@237+6@245+6@253+6@261+6@269+6@277+6@285+6@293+6@301+6@309+6@317+6@325+38@333+30@369=> 252 free of 1024 total pages
Obviously the bitmap status before is incorrect.
Link: http://lkml.kernel.org/r/20190320060829.9144-1-zbestahu@gmail.com Signed-off-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Laura Abbott <labbott@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In node_states_check_changes_online(), N_HIGH_MEMORY is used to substitute
ZONE_HIGHMEM directly. This is not right. N_HIGH_MEMORY is to mark the
memory state of node. Here zone index is checked, which should be
compared with 'ZONE_HIGHMEM' accordingly.
Replace it with ZONE_HIGHMEM.
This is a code cleanup - no known runtime effects.
Link: http://lkml.kernel.org/r/20190320080732.14933-1-bhe@redhat.com Fixes: 8efe33f40f3e ("mm/memory_hotplug.c: simplify node_states_check_changes_online") Signed-off-by: Baoquan He <bhe@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In a low-memory situation, cc->fast_search_fail can keep increasing as it
is unable to find an available page to isolate in
fast_isolate_freepages(). As the result, it could trigger an error below,
so just compare with the maximum bits can be shifted first.
UBSAN: Undefined behaviour in mm/compaction.c:1160:30
shift exponent 64 is too large for 64-bit type 'unsigned long'
CPU: 131 PID: 1308 Comm: kcompactd1 Kdump: loaded Tainted: G
W L 5.0.0+ #17
Call trace:
dump_backtrace+0x0/0x450
show_stack+0x20/0x2c
dump_stack+0xc8/0x14c
__ubsan_handle_shift_out_of_bounds+0x7e8/0x8c4
compaction_alloc+0x2344/0x2484
unmap_and_move+0xdc/0x1dbc
migrate_pages+0x274/0x1310
compact_zone+0x26ec/0x43bc
kcompactd+0x15b8/0x1a24
kthread+0x374/0x390
ret_from_fork+0x10/0x18
[akpm@linux-foundation.org: code cleanup] Link: http://lkml.kernel.org/r/20190320203338.53367-1-cai@lca.pw Fixes: 70b44595eafe ("mm, compaction: use free lists to quickly locate a migration source") Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
I've spent some time chasing down behavior in initramfs and found
plenty of opportunity to improve the code. A first stab on that is
contained in this series.
This patch (of 7):
We free the initrd memory for all successful or error cases except for the
case where opening /initrd.image fails, which looks like an oversight.
Steven said:
: This also changes the behaviour when CONFIG_INITRAMFS_FORCE is enabled
: - specifically it means that the initrd is freed (previously it was
: ignored and never freed). But that seems like reasonable behaviour and
: the previous behaviour looks like another oversight.
Link: http://lkml.kernel.org/r/20190213174621.29297-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> [arm64] Cc: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
f022d8cb7ec7 ("mm: cma: Don't crash on allocation if CMA area can't be
activated") fixes the crash issue when activation fails via setting
cma->count as 0, same logic exists if bitmap allocation fails.
Link: http://lkml.kernel.org/r/20190325081309.6004-1-zbestahu@gmail.com Signed-off-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
342332e6a925 ("mm/page_alloc.c: introduce kernelcore=mirror option") and
later patches rewrote the calculation of node spanned pages.
e506b99696a2 ("mem-hotplug: fix node spanned pages when we have a movable
node"), but the current code still has problems,
When we have a node with only zone_movable and the node id is not zero,
the size of node spanned pages is double added.
That's because we have an empty normal zone, and zone_start_pfn or
zone_end_pfn is not between arch_zone_lowest_possible_pfn and
arch_zone_highest_possible_pfn, so we need to use clamp to constrain the
range just like the commit <96e907d13602> (bootmem: Reimplement
__absent_pages_in_range() using for_each_mem_pfn_range()).
e.g.
Zone ranges:
DMA [mem 0x0000000000001000-0x0000000000ffffff]
DMA32 [mem 0x0000000001000000-0x00000000ffffffff]
Normal [mem 0x0000000100000000-0x000000023fffffff]
Movable zone start for each node
Node 0: 0x0000000100000000
Node 1: 0x0000000140000000
Early memory node ranges
node 0: [mem 0x0000000000001000-0x000000000009efff]
node 0: [mem 0x0000000000100000-0x00000000bffdffff]
node 0: [mem 0x0000000100000000-0x000000013fffffff]
node 1: [mem 0x0000000140000000-0x000000023fffffff]
Patch series "mm/memory_hotplug: Better error handling when removing
memory", v1.
Error handling when removing memory is somewhat messed up right now. Some
errors result in warnings, others are completely ignored. Memory unplug
code can essentially not deal with errors properly as of now.
remove_memory() will never fail.
We have basically two choices:
1. Allow arch_remov_memory() and friends to fail, propagating errors via
remove_memory(). Might be problematic (e.g. DIMMs consisting of multiple
pieces added/removed separately).
2. Don't allow the functions to fail, handling errors in a nicer way.
It seems like most errors that can theoretically happen are really corner
cases and mostly theoretical (e.g. "section not valid"). However e.g.
aborting removal of sections while all callers simply continue in case of
errors is not nice.
If we can gurantee that removal of memory always works (and WARN/skip in
case of theoretical errors so we can figure out what is going on), we can
go ahead and implement better error handling when adding memory.
E.g. via add_memory():
arch_add_memory()
ret = do_stuff()
if (ret) {
arch_remove_memory();
goto error;
}
Handling here that arch_remove_memory() might fail is basically
impossible. So I suggest, let's avoid reporting errors while removing
memory, warning on theoretical errors instead and continuing instead of
aborting.
This patch (of 4):
__add_pages() doesn't add the memory resource, so __remove_pages()
shouldn't remove it. Let's factor it out. Especially as it is a special
case for memory used as system memory, added via add_memory() and friends.
We now remove the resource after removing the sections instead of doing it
the other way around. I don't think this change is problematic.
When a huge page is allocated, PagePrivate() is set if the allocation
consumed a reservation. When freeing a huge page, PagePrivate is checked.
If set, it indicates the reservation should be restored. PagePrivate
being set at free huge page time mostly happens on error paths.
When huge page reservations are created, a check is made to determine if
the mapping is associated with an explicitly mounted filesystem. If so,
pages are also reserved within the filesystem. The default action when
freeing a huge page is to decrement the usage count in any associated
explicitly mounted filesystem. However, if the reservation is to be
restored the reservation/use count within the filesystem should not be
decrementd. Otherwise, a subsequent page allocation and free for the same
mapping location will cause the file filesystem usage to go 'negative'.
Filesystem Size Used Avail Use% Mounted on
nodev 4.0G -4.0M 4.1G - /opt/hugepool
To fix, when freeing a huge page do not adjust filesystem usage if
PagePrivate() is set to indicate the reservation should be restored.
I did not cc stable as the problem has been around since reserves were
added to hugetlbfs and nobody has noticed.
Link: http://lkml.kernel.org/r/20190328234704.27083-2-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
To avoid random config build issue, select mmu notifier when HMM is
selected. In any cases when HMM get selected it will be by users that
will also wants the mmu notifier.
Link: http://lkml.kernel.org/r/20190403193318.16478-2-jglisse@redhat.com Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Patch series "compiler: allow all arches to enable
CONFIG_OPTIMIZE_INLINING", v3.
This patch (of 11):
When function tracing for IPIs is enabled, we get a warning for an
overflow of the ipi_types array with the IPI_CPU_BACKTRACE type as
triggered by raise_nmi():
arch/arm/kernel/smp.c: In function 'raise_nmi':
arch/arm/kernel/smp.c:489:2: error: array subscript is above array bounds [-Werror=array-bounds]
trace_ipi_raise(target, ipi_types[ipinr]);
This is a correct warning as we actually overflow the array here.
This patch raise_nmi() to call __smp_cross_call() instead of
smp_cross_call(), to avoid calling into ftrace. For clarification, I'm
also adding a two new code comments describing how this one is special.
The warning appears to have shown up after commit e7273ff49acf ("ARM:
8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI"), which changed the
number assignment from '15' to '8', but as far as I can tell has existed
since the IPI tracepoints were first introduced. If we decide to
backport this patch to stable kernels, we probably need to backport e7273ff49acf as well.
[yamada.masahiro@socionext.com: rebase on v5.1-rc1] Link: http://lkml.kernel.org/r/20190423034959.13525-2-yamada.masahiro@socionext.com Fixes: e7273ff49acf ("ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI") Fixes: 365ec7b17327 ("ARM: add IPI tracepoints") # v3.17 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Mathieu Malaterre <malat@debian.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Stefan Agner <stefan@agner.ch> Cc: Boris Brezillon <bbrezillon@kernel.org> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Richard Weinberger <richard@nod.at> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Marek Vasut <marek.vasut@gmail.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Borislav Petkov <bp@suse.de> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>