vmd_domain_reset() attempts to find whether the device may contain multiple
functions by checking 0x80 (Multi-Function Device), however, the hdr_type
variable has already been masked with PCI_HEADER_TYPE_MASK so the check can
never true.
To fix the issue, don't mask the read with PCI_HEADER_TYPE_MASK.
Fixes: 6aab5622296b ("PCI: vmd: Clean up domain before enumeration") Link: https://lore.kernel.org/r/20231003125300.5541-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Nirmal Patel <nirmal.patel@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If, for any reason, the open-coded arithmetic causes a wraparound,
the protection that `struct_size()` adds against potential integer
overflows is defeated. Fix this by hardening call to `struct_size()`
with `size_add()`.
Fixes: f9efae954905 ("ASoC: SOF: ipc4-topology: Add support for base config extension") Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/ZQSr15AYJpDpipg6@work Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Increase the size of the buffers used for composing the names used for
the transport debugfs entries and the vector name to avoid a potential
truncation.
This resolves the following errors when compiling the driver with W=1
and KCFLAGS=-Werror on GCC 12.3.1:
drivers/crypto/intel/qat/qat_common/adf_transport_debug.c: In function ‘adf_ring_debugfs_add’:
drivers/crypto/intel/qat/qat_common/adf_transport_debug.c:100:60: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
drivers/crypto/intel/qat/qat_common/adf_isr.c: In function ‘adf_isr_resource_alloc’:
drivers/crypto/intel/qat/qat_common/adf_isr.c:197:47: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size between 0 and 5 [-Werror=format-truncation=]
Fixes: a672a9dc872e ("crypto: qat - Intel(R) QAT transport code") Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Damian Muszynski <damian.muszynski@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
nd_region_acquire_lane uses get_cpu, which disables preemption. This is
an issue on PREEMPT_RT kernels, since btt_write_pg and also
nd_region_acquire_lane itself take a spin lock, resulting in BUG:
sleeping function called from invalid context.
Fix the issue by replacing get_cpu with smp_process_id and
migrate_disable when needed. This makes BTT operations preemptible, thus
permitting the use of spin_lock.
BUG example occurring when running ndctl tests on PREEMPT_RT kernel:
Use devm_kstrdup() instead of kstrdup() and check its return value to
avoid memory leak.
Fixes: 49bddc73d15c ("libnvdimm/of_pmem: Provide a unique name for bus provider") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The commit 1da681e52853 ("ASoC: soc-pcm.c: Clear DAIs parameters after
stream_active is updated") tries to make sure DAI parameters can be
cleared properly through moving the cleanup to the place where stream
active status is updated. However, it will cause the cleanup only
happening in soc_pcm_close().
Suppose a case: aplay -Dhw:0 44100.wav 48000.wav. The case calls
soc_pcm_open()->soc_pcm_hw_params()->soc_pcm_hw_free()->
soc_pcm_hw_params()->soc_pcm_hw_free()->soc_pcm_close() in order. The
parameters would be remained in the system even if the playback of
44100.wav is finished.
The case requires us clearing parameters in phase of soc_pcm_hw_free().
However, moving the DAI parameters cleanup back to soc_pcm_hw_free()
has the risk that DAIs parameters never be cleared if there're more
than one stream, see commit 1da681e52853 ("ASoC: soc-pcm.c: Clear DAIs
parameters after stream_active is updated") for more details.
To meet all these requirements, in addition to do DAI parameters
cleanup in soc_pcm_hw_free(), also check it in soc_pcm_close() to make
sure DAI parameters cleared if the DAI becomes inactive.
Fixes: 1da681e52853 ("ASoC: soc-pcm.c: Clear DAIs parameters after stream_active is updated") Signed-off-by: Chancel Liu <chancel.liu@nxp.com> Link: https://lore.kernel.org/r/20230920153621.711373-2-chancel.liu@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 0217a272fe13 ("scsi: ibmvfc: Store return code of H_FREE_SUB_CRQ
during cleanup") wrongly changed the busy loop check to use
rtas_busy_delay() instead of H_BUSY and H_IS_LONG_BUSY(). The busy return
codes for RTAS and hypercalls are not the same.
Fix this issue by restoring the use of H_BUSY and H_IS_LONG_BUSY().
Fixes: 0217a272fe13 ("scsi: ibmvfc: Store return code of H_FREE_SUB_CRQ during cleanup") Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com> Link: https://lore.kernel.org/r/20230921225435.3537728-5-tyreld@linux.ibm.com Reviewed-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The function adf_dev_init(), through the subsystem qat_compression,
populates the list of list of compression instances
accel_dev->compression_list. If the list of instances is not empty,
the function adf_dev_start() will then call qat_compression_registers()
register the compression algorithms into the crypto framework.
If any of the functions in adf_dev_start() fail, the caller of such
function, in the error path calls adf_dev_down() which in turn call
adf_dev_stop() and adf_dev_shutdown(), see for example the function
state_store in adf_sriov.c.
However, if the registration of compression algorithms is not done,
adf_dev_stop() will try to unregister the algorithms regardless.
This might cause the counter active_devs in qat_compression.c to get
to a negative value.
Add a new state, ADF_STATUS_COMPRESSION_ALGS_REGISTERED, which tracks
if the compression algorithms are registered into the crypto framework.
Then use this to unregister the algorithms if such flag is set. This
ensures that the compression algorithms are only unregistered if
previously registered.
Fixes: 1198ae56c9a5 ("crypto: qat - expose deflate through acomp api for QAT GEN2") Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Adam Guerin <adam.guerin@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
The function adf_dev_init(), through the subsystem qat_crypto, populates
the list of list of crypto instances accel_dev->crypto_list.
If the list of instances is not empty, the function adf_dev_start() will
then call qat_algs_registers() and qat_asym_algs_register() to register
the crypto algorithms into the crypto framework.
If any of the functions in adf_dev_start() fail, the caller of such
function, in the error path calls adf_dev_down() which in turn call
adf_dev_stop() and adf_dev_shutdown(), see for example the function
state_store in adf_sriov.c.
However, if the registration of crypto algorithms is not done,
adf_dev_stop() will try to unregister the algorithms regardless.
This might cause the counter active_devs in qat_algs.c and
qat_asym_algs.c to get to a negative value.
Add a new state, ADF_STATUS_CRYPTO_ALGS_REGISTERED, which tracks if the
crypto algorithms are registered into the crypto framework. Then use
this to unregister the algorithms if such flag is set. This ensures that
the crypto algorithms are only unregistered if previously registered.
Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework") Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Adam Guerin <adam.guerin@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the device is already in the up state, a subsequent write of `up` to
the sysfs attribute /sys/bus/pci/devices/<BDF>/qat/state brings the
device down.
Fix this behaviour by ignoring subsequent `up` commands if the device is
already in the up state.
Fixes: 1bdc85550a2b ("crypto: qat - fix concurrency issue when device state changes") Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Adam Guerin <adam.guerin@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 1bdc85550a2b ("crypto: qat - fix concurrency issue when device
state changes") introduced the function adf_dev_down() which wraps the
functions adf_dev_stop() and adf_dev_shutdown().
In a subsequent change, the sequence adf_dev_stop() followed by
adf_dev_shutdown() was then replaced across the driver with just a call
to the function adf_dev_down().
The functions adf_dev_stop() and adf_dev_shutdown() are called in error
paths to stop the accelerator and free up resources and can be called
even if the counterparts adf_dev_init() and adf_dev_start() did not
complete successfully.
However, the implementation of adf_dev_down() prevents the stop/shutdown
sequence if the device is found already down.
For example, if adf_dev_init() fails, the device status is not set as
started and therefore a call to adf_dev_down() won't be calling
adf_dev_shutdown() to undo what adf_dev_init() did.
Do not check if a device is started in adf_dev_down() but do the
equivalent check in adf_sysfs.c when handling a DEV_DOWN command from
the user.
Fixes: 2b60f79c7b81 ("crypto: qat - replace state machine calls") Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Adam Guerin <adam.guerin@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
If, for any reason, the open-coded arithmetic causes a wraparound,
the protection that `struct_size()` provides against potential integer
overflows is defeated. Fix this by hardening calls to `struct_size()`
with `size_add()`, `size_sub()` and `size_mul()`.
Fixes: 467f432a521a ("RDMA/core: Split port and device counter sysfs attributes") Fixes: a4676388e2e2 ("RDMA/core: Simplify how the gid_attrs sysfs is created") Fixes: e9dd5daf884c ("IB/umad: Refactor code to use cdev_device_add()") Fixes: 324e227ea7c9 ("RDMA/device: Add ib_device_get_by_netdev()") Fixes: 5aad26a7eac5 ("IB/core: Use struct_size() in kzalloc()") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/ZQdt4NsJFwwOYxUR@work Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the membase and pci_dev pointer were moved to a new struct in priv,
the actual membase users were left untouched, and they started reading
out arbitrary memory behind the struct instead of registers. This
unfortunately turned the RNG into a constant number generator, depending
on the content of what was at that offset.
To fix this, update geode_rng_data_{read,present}() to also get the
membase via amd_geode_priv, and properly read from the right addresses
again.
Fixes: 9f6ec8dc574e ("hwrng: geode - Fix PCI device refcount leak") Reported-by: Timur I. Davletshin <timur.davletshin@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217882 Tested-by: Timur I. Davletshin <timur.davletshin@gmail.com> Suggested-by: Jo-Philipp Wich <jo@mein.io> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
The last RCU stall fix caused a massive throughput regression of the
hwrng on Raspberry Pi 0 - 3. hwrng_msleep doesn't sleep precisely enough
and usleep_range doesn't allow scheduling. So try to restore the
best possible throughput by introducing hwrng_yield which interruptable
sleeps for one jiffy.
Some performance measurements on Raspberry Pi 3B+ (arm64/defconfig):
Fixes: 96cb9d055445 ("hwrng: bcm2835 - use hwrng_msleep() instead of cpu_relax()") Link: https://lore.kernel.org/linux-arm-kernel/bc97ece5-44a3-4c4e-77da-2db3eb66b128@gmx.net/ Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
According to the documentation, drivers are responsible for undoing at
removal time all runtime PM changes done during probing.
Hence, add the missing calls to pm_runtime_dont_use_autosuspend(), which
are necessary for undoing pm_runtime_use_autosuspend().
Note this would have been handled implicitly by
devm_pm_runtime_enable(), but there is a need to continue using
pm_runtime_enable()/pm_runtime_disable() in order to ensure the runtime
PM is disabled as soon as the remove() callback is entered.
Fixes: f517ba4924ad ("ASoC: cs35l41: Add support for hibernate memory retention mode") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/20230907171010.1447274-7-cristian.ciocaltea@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The interrupt handler invokes pm_runtime_get_sync() without checking the
returned error code.
Add a proper verification and switch to pm_runtime_resume_and_get(), to
avoid the need to call pm_runtime_put_noidle() for decrementing the PM
usage counter before returning from the error condition.
Fixes: f517ba4924ad ("ASoC: cs35l41: Add support for hibernate memory retention mode") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/20230907171010.1447274-6-cristian.ciocaltea@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Technically, an interrupt handler can be called before probe() finishes
its execution, hence ensure the pll_lock completion object is always
initialized before being accessed in cs35l41_irq().
Fixes: f5030564938b ("ALSA: cs35l41: Add shared boost feature") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/20230907171010.1447274-4-cristian.ciocaltea@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The return code of regmap_multi_reg_write() call related to "MDSYNC
down" sequence is shadowed by the subsequent
wait_for_completion_timeout() invocation, which is expected to time
timeout in case the write operation failed.
Let cs35l41_global_enable() return the correct error code instead of
-ETIMEDOUT.
Fixes: f5030564938b ("ALSA: cs35l41: Add shared boost feature") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com> Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/20230907171010.1447274-2-cristian.ciocaltea@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Use a similar approach as commit a419beac4a07 ("module/decompress: use
vmalloc() for zstd decompression workspace") and replace kmalloc() with
vmalloc() also for the gzip module decompression workspace.
In this case the workspace is represented by struct inflate_workspace
that can be fairly large for kmalloc() and it can potentially lead to
allocation errors on certain systems:
Considering that there is no need to use continuous physical memory,
simply switch to vmalloc() to provide a more reliable in-kernel module
decompression.
Fixes: b1ae6dc41eaa ("module: add in-kernel support for decompressing") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We never initialize the two interval tree nodes, and zero fill is not the
same as RB_CLEAR_NODE. This can hide issues where we missed adding the
area to the trees. Factor out the allocation and clear the two nodes.
Fixes: 51fe6141f0f6 ("iommufd: Data structure to provide IOVA to PFN mapping") Link: https://lore.kernel.org/r/20231030145035.GG691768@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When redescribing ports I assumed that missing "label" (like "cpu")
means switch port isn't used. That was incorrect and I realized my
change made Linux always use the first (5) CPU port (there are 3 of
them).
While above should technically be possible it often isn't correct:
1. Non-default switch ports are often connected to Ethernet interfaces
not fully covered by vendor setup (they may miss MACs)
2. On some devices non-default ports require specifying fixed link
This fixes network connectivity for some devices. It was reported &
tested for Netgear R8000. It also affects Linksys EA9200 with its
downstream DTS.
Fixes: ba4aebce23b2 ("ARM: dts: BCM5301X: Describe switch ports in the main DTS") Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Link: https://lore.kernel.org/r/20231013103314.10306-1-zajec5@gmail.com Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
As it was pointed out by Simon Ser, the DRM_MODE_CONNECTOR_USB connector
is reserved for the GUD devices. Other drivers (i915, amdgpu) use
DRM_MODE_CONNECTOR_DisplayPort even if the DP stream is handled by the
USB-C altmode. While we are still working on implementing the proper way
to let userspace know that the DP is wrapped into USB-C, change
connector type to be DRM_MODE_CONNECTOR_DisplayPort.
Fixes: 080b4e24852b ("soc: qcom: pmic_glink: Introduce altmode support") Cc: Simon Ser <contact@emersion.fr> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Acked-by: Simon Ser <contact@emersion.fr> Link: https://lore.kernel.org/r/20231010225229.77027-1-dmitry.baryshkov@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Benchmark command is copied into an array in the stack. The array is
BENCHMARK_ARGS items long but the command line could try to provide a
longer command. Argument size is also fixed by BENCHMARK_ARG_SIZE (63
bytes of space after fitting the terminating \0 character) and user
could have inputted argument longer than that.
Return error in case the benchmark command does not fit to the space
allocated for it.
Compiling pidfd selftest after adding a __printf() attribute to
ksft_print_msg() and ksft_test_result_pass() exposes -Wformat warnings
in error_report(), test_pidfd_poll_exec_thread(),
child_poll_exec_test(), test_pidfd_poll_leader_exit_thread(),
child_poll_leader_exit_test().
The ksft_test_result_pass() in error_report() expects a string but
doesn't provide any argument after the format string. All the other
calls to ksft_print_msg() in the functions mentioned above have format
strings that don't match with other passed arguments.
Fix format specifiers so they match the passed variables.
Add a missing variable to ksft_test_result_pass() inside
error_report() so it matches other cases in the switch statement.
Fixes: 2def297ec7fb ("pidfd: add tests for NSpid info in fdinfo") Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The shared interrupts 0-9 of the TKE are mapped to interrupts 0-9, but
shared interrupts 10-15 are mapped to 256-261. Correct the mapping for
the final 6 interrupts. This prevents the TKE from requesting the RTC
interrupt (along with several GTE and watchdog interrupts).
Set the 'TEGRA_BPMP_MESSAGE_RESET' bit in newly added 'flags' field
of 'struct tegra_bpmp_message' to request for the reset of BPMP IPC
channels. This is used along with the 'suspended' check in BPMP driver
for handling early bandwidth requests due to the hotplug of CPU's
during system resume before the driver gets resumed.
Fixes: f41e1442ac5b ("cpufreq: tegra194: add OPP support and set bandwidth") Co-developed-by: Sumit Gupta <sumitg@nvidia.com> Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add suspend hook and a 'suspended' field in the 'struct tegra_bpmp'
to mark if BPMP is suspended. Also, add a 'flags' field in the
'struct tegra_bpmp_message' whose 'TEGRA_BPMP_MESSAGE_RESET' bit can be
set from the Tegra MC driver to signal that the reset of BPMP IPC
channels is required before sending MRQ to the BPMP FW. Together both
the fields allow us to handle any requests that might be sent too soon
as they can cause hang during system resume.
One case where we see BPMP requests being sent before the BPMP driver
has resumed is the memory bandwidth requests which are triggered by
onlining the CPUs during system resume. The CPUs are onlined before the
BPMP has resumed and we need to reset the BPMP IPC channels to handle
these requests.
The additional check for 'flags' is done to avoid any un-intended BPMP
IPC reset if the tegra_bpmp_transfer*() API gets called during suspend
sequence after the BPMP driver is suspended.
Fixes: f41e1442ac5b ("cpufreq: tegra194: add OPP support and set bandwidth") Co-developed-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The first compatible entry for the jpegenc should be 'nxp,imx8qm-jpgenc'.
Change it accordingly to fix the following schema warning:
imx8qm-apalis-eval.dtb: jpegenc@58450000: compatible: 'oneOf' conditional failed, one must be fixed:
'nxp,imx8qm-jpgdec' is not one of ['nxp,imx8qxp-jpgdec', 'nxp,imx8qxp-jpgenc']
'nxp,imx8qm-jpgenc' was expected
'nxp,imx8qxp-jpgdec' was expected
The pinmux for LED3 and LED4 are incorrectly attached to the
omap3_pmx_core when they should be connected to the omap3_pmx_wkup
pin mux. This was likely masked by the fact that the bootloader
used to do all the pinmuxing.
Fixes: 0dbf99542caf ("ARM: dts: am3517-evm: Add User LEDs and Pushbutton") Signed-off-by: Adam Ford <aford173@gmail.com>
Message-ID: <20231005000402.50879-1-aford173@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
An FF-A ABI could support both the SMC32 and SMC64 conventions.
A callee that runs in the AArch64 execution state and implements such
an ABI must implement both SMC32 and SMC64 conventions of the ABI.
So the FF-A drivers will need the option to choose the mode irrespective
of FF-A version and the partition execution mode flag in the partition
information.
Let us remove the check on the FF-A version for allowing the selection
of 32bit mode of messaging. The driver will continue to set the 32-bit
mode if the partition execution mode flag specified that the partition
supports only 32-bit execution.
Commit 19b8766459c4 ("firmware: arm_ffa: Fix FFA device names for logical
partitions") added an ID to the FFA device using ida_alloc() and append
the same to "arm-ffa" to make up a unique device name. However it missed
to stash the id value in ffa_dev to help freeing the ID later when the
device is destroyed.
Due to the missing/unassigned ID in FFA device, we get the following
warning when the FF-A device is unregistered.
| ida_free called for id=0 which is not allocated.
| WARNING: CPU: 7 PID: 1 at lib/idr.c:525 ida_free+0x114/0x164
| CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc4 #209
| pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
| pc : ida_free+0x114/0x164
| lr : ida_free+0x114/0x164
| Call trace:
| ida_free+0x114/0x164
| ffa_release_device+0x24/0x3c
| device_release+0x34/0x8c
| kobject_put+0x94/0xf8
| put_device+0x18/0x24
| klist_devices_put+0x14/0x20
| klist_next+0xc8/0x114
| bus_for_each_dev+0xd8/0x144
| arm_ffa_bus_exit+0x30/0x54
| ffa_init+0x68/0x330
| do_one_initcall+0xdc/0x250
| do_initcall_level+0x8c/0xac
| do_initcalls+0x54/0x94
| do_basic_setup+0x1c/0x28
| kernel_init_freeable+0x104/0x170
| kernel_init+0x20/0x1a0
| ret_from_fork+0x10/0x20
Fix the same by actually assigning the ID in the FFA device this time
for real.
The TI-SCI message protocol provides a way to communicate between
various compute processors with a central system controller entity. It
provides the fundamental device management capability and clock control
in the SOCs that it's used in.
The remove function failed to do all the necessary cleanup if
there are registered users. Some things are freed however which
likely results in an oops later on.
Ensure that the driver isn't unbound by suppressing its bind and unbind
sysfs attributes. As the driver is built-in there is no way to remove
device once bound.
We can also remove the ti_sci_remove call along with the
ti_sci_debugfs_destroy as there are no callers for it any longer.
Fixes: aa276781a64a ("firmware: Add basic support for TI System Control Interface (TI-SCI) protocol") Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Closes: https://lore.kernel.org/linux-arm-kernel/20230216083908.mvmydic5lpi3ogo7@pengutronix.de/ Suggested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Dhruva Gole <d-gole@ti.com> Link: https://lore.kernel.org/r/20230921091025.133130-1-d-gole@ti.com Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
modprobe cpumask_kunit and rmmod cpumask_kunit, kmemleak detect
a suspected memory leak as below.
If kunit_filter_suites() in kunit_module_init() succeeds, the
suite_set.start will not be NULL and the kunit_free_suite_set() in
kunit_module_exit() should free all the memory which has not
been freed. However the test_cases in suites is left out.
Usually there is only one llcc device. But if there were a second, even
a failed probe call would modify the global drv_data pointer. So check
if drv_data is valid before overwriting it.
Fixed regulator put under "regulators" node will not be populated,
unless simple-bus or something similar is used. Drop the "regulators"
wrapper node to fix this.
Add the missing regulator supplies to the ADV7533 HDMI bridge to fix
the following dtbs_check warnings. They are all also supplied by
pm8916_l6 so there is no functional difference.
apq8016-sbc.dtb: bridge@39: 'dvdd-supply' is a required property
apq8016-sbc.dtb: bridge@39: 'pvdd-supply' is a required property
apq8016-sbc.dtb: bridge@39: 'a2vdd-supply' is a required property
from schema display/bridge/adi,adv7533.yaml
A recent submission [1] from Rob has added additionalProperties: false
to the interrupt-controller child node of RISC-V cpus, highlighting that
the D1 DT has been incorrectly using #address-cells since its
introduction. It has no child nodes, so #address-cells is not needed.
Remove it.
Both the CN9130-CRB and CN9130-DB use the SPI1 interface but had the
pinctrl node labelled as "cp0_spi0_pins". Use the label "cp0_spi1_pins"
and update the node name to "cp0-spi-pins-1" to avoid confusion with the
pinctrl options for SPI0.
Fixes: 4c43a41e5b8c ("arm64: dts: cn913x: add device trees for topology B boards") Fixes: 5c0ee54723f3 ("arm64: dts: add support for Marvell cn9130-crb platform") Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Qualcomm Embedded USB Debugger (EUD) second port should point to Type-C
USB connector. Such connector was defined directly in root node of
sc7280.dtsi which is clearly wrong. SC7280 is a chip, so physically it
does not have USB Type-C port. The connector is usually accessible
through some USB switch or controller.
Doug Anderson said that he wasn't ever able to use EUD on Herobrine
boards, probably because of invalid or missing DTS description - DTS is
saying EUD is on usb_2 node, which is connected to a USB Hub, not to the
Type-C port.
Correct the EUD/USB connector topology by removing the top-level fake
USB connector and EUD port pointing to it, and disabling the incomplete
EUD device node.
This fixes also dtbs_check warnings:
sc7280-herobrine-crd.dtb: connector: ports:port@0: 'reg' is a required property
Newer RB1 board revisions have a debug UART on QUP0. Sadly, it looks
like even when ordering one in retail, customers receive prototype
boards with "Enginering Sample" written on them.
Use QUP4 for UART to make all known RB1 boards boot.
There are two entries for similar reserved memory: qseecom@cb400000 and
audio@cb400000. Keep the qseecom as it is longer.
Warning (unique_unit_address_if_enabled): /reserved-memory/audio@cb400000: duplicate unit-address (also used in node /reserved-memory/qseecom@cb400000)
The original commit hasn't been updated according to
refactoring done in sdm845.dtsi.
Fixes: a1ade6cac5a2 ("arm64: dts: qcom: sdm845: Switch PSCI cpu idle states from PC to OSI") Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: David Heidelberg <david@ixit.cz> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Abel Vesa <abel.vesa@linaro.org> Link: https://lore.kernel.org/r/20230912071205.11502-1-david@ixit.cz Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When we fail to register the uncore pmu, the pmu context may not been
allocated. The error handing will call cpuhp_state_remove_instance()
to call uncore pmu offline callback, which migrate the pmu context.
Since that's liable to lead to some kind of use-after-free.
Use cpuhp_state_remove_instance_nocalls() instead of
cpuhp_state_remove_instance() so that the notifiers don't execute after
the PMU device has been failed to register.
Fixes: a0ab25cd82ee ("drivers/perf: hisi: Add support for HiSilicon PA PMU driver")
FIxes: 3bf30882c3c7 ("drivers/perf: hisi: Add support for HiSilicon SLLC PMU driver") Signed-off-by: Junhao He <hejunhao3@huawei.com> Link: https://lore.kernel.org/r/20231024113630.13472-1-hejunhao3@huawei.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Check whether the event type matches the PMU type firstly in
pmu::event_init() before touching the event. Otherwise we'll
change the events of others and lead to incorrect results.
Since in perf_init_event() we may call every pmu's event_init()
in a certain case, we should not modify the event if it's not
ours.
Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20231024092954.42297-2-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
It transpires that dtm_unit_info is another register which got shuffled
in CMN-700 without me noticing. Fix that in a way which also proactively
fixes the fragile laziness of its consumer, just in case any further
fields ever get added alongside dtc_domain.
When tearing down a 'hisi_hns3' PMU, we mistakenly run the CPU hotplug
callbacks after the device has been unregistered, leading to fireworks
when we try to execute empty function callbacks within the driver:
| Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
| CPU: 0 PID: 15 Comm: cpuhp/0 Tainted: G W O 5.12.0-rc4+ #1
| Hardware name: , BIOS KpxxxFPGA 1P B600 V143 04/22/2021
| pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
| pc : perf_pmu_migrate_context+0x98/0x38c
| lr : perf_pmu_migrate_context+0x94/0x38c
|
| Call trace:
| perf_pmu_migrate_context+0x98/0x38c
| hisi_hns3_pmu_offline_cpu+0x104/0x12c [hisi_hns3_pmu]
Use cpuhp_state_remove_instance_nocalls() instead of
cpuhp_state_remove_instance() so that the notifiers don't execute after
the PMU device has been unregistered.
Due to the initial confusion about MIPI_DSI_MODE_EOT_PACKET, properly
renamed to MIPI_DSI_MODE_NO_EOT_PACKET, reflecting its actual meaning,
both the DSI_TXRX_CON register setting for bit (HSTX_)DIS_EOT and the
later calculation for horizontal sync-active (HSA), back (HBP) and
front (HFP) porches got incorrect due to the logic being inverted.
This means that a number of settings were wrong because....:
- DSI_TXRX_CON register setting: bit (HSTX_)DIS_EOT should be
set in order to disable the End of Transmission packet;
- Horizontal Sync and Back/Front porches: The delta used to
calculate all of HSA, HBP and HFP should account for the
additional EOT packet.
Before this change...
- Bit (HSTX_)DIS_EOT was being set when EOT packet was enabled;
- For HSA/HBP/HFP delta... all three were wrong, as words were
added when EOT disabled, instead of when EOT packet enabled!
Invert the logic around flag MIPI_DSI_MODE_NO_EOT_PACKET in the
MediaTek DSI driver to fix the aforementioned issues.
Fixes: 8b2b99fd7931 ("drm/mediatek: dsi: Fine tune the line time caused by EOTp") Fixes: c87d1c4b5b9a ("drm/mediatek: dsi: Use symbolized register definition") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> Tested-by: Michael Walle <mwalle@kernel.org> Link: https://patchwork.kernel.org/project/dri-devel/patch/20230523104234.7849-1-angelogioacchino.delregno@collabora.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The AppliedMicro XGene-1 CPU has an erratum where the timer condition
would only consider TVAL, not CVAL. We currently apply a workaround when
seeing the PartNum field of MIDR_EL1 being 0x000, under the assumption
that this would match only the XGene-1 CPU model.
However even the Ampere eMAG (aka XGene-3) uses that same part number, and
only differs in the "Variant" and "Revision" fields: XGene-1's MIDR is
0x500f0000, our eMAG reports 0x503f0002. Experiments show the latter
doesn't show the faulty behaviour.
Increase the specificity of the check to only consider partnum 0x000 and
variant 0x00, to exclude the Ampere eMAG.
Fixes: 012f18850452 ("clocksource/drivers/arm_arch_timer: Work around broken CVAL implementations") Reported-by: Ross Burton <ross.burton@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20231016153127.116101-1-andre.przywara@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the drm/msm init code gets an error during output modeset
initialisation, the kernel will report an error regarding DRM memory
manager not being clean during shutdown. This is because
msm_dsi_modeset_init() allocates a piece of GEM memory for the TX
buffer, but destruction of the buffer happens only at
msm_dsi_host_destroy(), which is called during DSI driver's remove()
time, much later than the DRM MM shutdown.
To solve this issue, move the TX buffer destruction to dsi_unbind(), so
that the buffer is destructed at the correct time. Note, we also have to
store a reference to the address space, because priv->kms->aspace is
cleared before components are unbound.
Use exiting function to free the allocated GEM object instead of
open-coding it. This has a bonus of internally calling
msm_gem_put_vaddr() to compensate for msm_gem_get_vaddr() in
msm_get_kernel_new().
Fixes: 1e29dff00400 ("drm/msm: Add a common function to free kernel buffer objects") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/562239/ Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Linux enables MSI-X before disabling INTx, but keeps MSI-X masked until
the table is filled. Then it disables INTx just before clearing MASKALL
bit. Currently this approach is rejected by xen-pciback.
According to the PCIe spec, device cannot use INTx when MSI/MSI-X is
enabled (in other words: enabling MSI/MSI-X implicitly disables INTx).
Change the logic to consider INTx disabled if MSI/MSI-X is enabled. This
applies to three places:
- checking currently enabled interrupts type,
- transition to MSI/MSI-X - where INTx would be implicitly disabled,
- clearing INTx disable bit - which can be allowed even if MSI/MSI-X is
enabled, as device should consider INTx disabled anyway in that case
The "ret" variable is declared as ssize_t and it can hold negative error
codes but the "rk_obj->base.size" variable is type size_t. This means
that when we compare them, they are both type promoted to size_t and the
negative error code becomes a high unsigned value and is treated as
success. Add a cast to fix this.
When KPTI is in use, we cannot register a runstate region as XEN
requires that this is always a valid VA, which we cannot guarantee. Due
to this, xen_starting_cpu() must avoid registering each CPU's runstate
region, and xen_guest_init() must avoid setting up features that depend
upon it.
We tried to ensure that in commit:
f88af7229f6f22ce (" xen/arm: do not setup the runstate info page if kpti is enabled")
... where we added checks for xen_kernel_unmapped_at_usr(), which wraps
arm64_kernel_unmapped_at_el0() on arm64 and is always false on 32-bit
arm.
Unfortunately, as xen_guest_init() is an early_initcall, this happens
before secondary CPUs are booted and arm64 has finalized the
ARM64_UNMAP_KERNEL_AT_EL0 cpucap which backs
arm64_kernel_unmapped_at_el0(), and so this can subsequently be set as
secondary CPUs are onlined. On a big.LITTLE system where the boot CPU
does not require KPTI but some secondary CPUs do, this will result in
xen_guest_init() intializing features that depend on the runstate
region, and xen_starting_cpu() registering the runstate region on some
CPUs before KPTI is subsequent enabled, resulting the the problems the
aforementioned commit tried to avoid.
Handle this more robsutly by deferring the initialization of the
runstate region until secondary CPUs have been initialized and the
ARM64_UNMAP_KERNEL_AT_EL0 cpucap has been finalized. The per-cpu work is
moved into a new hotplug starting function which is registered later
when we're certain that KPTI will not be used.
Fixes: f88af7229f6f ("xen/arm: do not setup the runstate info page if kpti is enabled") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Bertrand Marquis <bertrand.marquis@arm.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If DSI host attachment fails, the LT9611UXC driver will remove the
bridge without ensuring that there is no outstanding HPD work being
done. In rare cases this can result in the warnings regarding the mutex
being incorrect. Fix this by forcebly freing IRQ and flushing the work.
Original implementation over allocates the memory size for the
contexts list. The size of memory for the contexts list is based
on the number of iommu groups specified in the device tree.
Fixes: 8aa5bcb61612 ("gpu: host1x: Add context device management code") Signed-off-by: Johnny Liu <johnliu@nvidia.com> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230901115910.701518-1-cyndis@kapsi.fi Signed-off-by: Sasha Levin <sashal@kernel.org>
snprintf() returns the "number of characters which *would* be generated for
the given input", not the size *really* generated.
In order to avoid too large values for 'str_size' (and potential negative
values for "PSOC_RAZWI_ENG_STR_SIZE - str_size") use scnprintf()
instead of snprintf().
The difference between drm_atomic_helper_commit_tail() and
drm_atomic_helper_commit_tail_rpm() is
drm_atomic_helper_commit_tail() will commit plane first and
then enable crtc, drm_atomic_helper_commit_tail_rpm() will
enable crtc first and then commit plane.
Before mediatek-drm enables crtc, the power and clk required
by OVL have not been turned on, so the commit plane cannot be
committed before crtc is enabled. That means OVL layer should
not be enabled before crtc is enabled.
Therefore, the atomic_commit_tail of mediatek-drm is hooked with
drm_atomic_helper_commit_tail_rpm().
Another reason is that the plane_state of drm_atomic_state is not
synchronized with the plane_state stored in mtk_crtc during crtc enablng,
so just set all planes to disabled.
Fixes: 119f5173628a ("drm/mediatek: Add DRM Driver for Mediatek SoC MT8173.") Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Link: https://patchwork.kernel.org/project/linux-mediatek/patch/20230809125722.24112-3-jason-jh.lin@mediatek.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
According to the comment in drm_atomic_helper_async_commit(),
we should make sure FBs have been swapped, so that cleanups in the
new_state performs a cleanup in the old FB.
So we should move swapping FBs after calling mtk_plane_update_new_state(),
to avoid using the old FB which could be freed.
Fixes: 1a64a7aff8da ("drm/mediatek: Fix cursor plane no update") Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Reviewed-by: Alexandre Mergnat <amergnat@baylibre.com> Link: https://patchwork.kernel.org/project/linux-mediatek/patch/20230809125722.24112-2-jason-jh.lin@mediatek.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add missing mmsys_dev_num to mt8188 vdosys0 driver data.
Fixes: 54b48080278a ("drm/mediatek: Add mediatek-drm of vdosys0 support for mt8188") Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> Reviewed-by: CK Hu <ck.hu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Fei Shao <fshao@chromium.org> Tested-by: Fei Shao <fshao@chromium.org> Link: https://patchwork.kernel.org/project/dri-devel/patch/20231004024013.18956-2-jason-jh.lin@mediatek.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
nbufs tracks the number of buffers and not the last bgid. In 16-bit, we
have 2^16 valid buffers, but the check mistakenly rejects the last
bid. Let's fix it to make the interface consistent with the
documentation.
Commit 3851d25c75ed0 ("io_uring: check for rollover of buffer ID when
providing buffers") introduced a check to prevent wrapping the BID
counter when sqe->off is provided, but it's off-by-one too
restrictive, rejecting the last possible BID (65534).
If no plane was newly enabled or changed scaling, there can be no new
scaling mismatch with the cursor plane.
By not pulling non-cursor plane states into all atomic commits while
the cursor plane is enabled, this avoids synchronizing all cursor plane
changes to vertical blank, which caused the following IGT tests to fail:
Fixes: 003048ddf44b ("drm/amd/display: Check all enabled planes in dm_check_crtc_cursor") Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: bc0b79ce2050 ("drm/amd/display: Bail from dm_check_crtc_cursor if no relevant change") Signed-off-by: Sasha Levin <sashal@kernel.org>
It was only checking planes which had any state changes in the same
commit. However, it also needs to check other enabled planes.
Not doing this meant that a commit might spuriously "succeed", resulting
in the cursor plane displaying with incorrect scaling. See
https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3177#note_1824263
for an example.
Fixes: d1bfbe8a3202 ("amd/display: check cursor plane matches underlying plane") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch fixes a null pointer dereference in the error message that is
printed when the Display Core (DC) fails to initialize. The original
message includes the DC version number, which is undefined if the DC is
not initialized.
Fixes: 9788d087caff ("drm/amd/display: improve the message printed when loading DC") Signed-off-by: Cong Liu <liucong2@kylinos.cn> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If new range is splited to multiple pranges with max_svm_range_pages
alignment and added to update_list, svm validate and map should keep
going after error to make sure prange->mapped_to_gpu flag is up to date
for the whole range.
svm validate and map update set prange->mapped_to_gpu after mapping to
GPUs successfully, otherwise clear prange->mapped_to_gpu flag (for
update mapping case) instead of setting error flag, we can remove
the redundant error flag to simpliy code.
Refactor to remove goto and update prange->mapped_to_gpu flag inside
svm_range_lock, to guarant we always evict queues or unmap from GPUs if
there are invalid ranges.
After svm validate and map return error -EAGIN, the caller retry will
update the mapping for the whole range again.
Fixes: c22b04407097 ("drm/amdkfd: flag added to handle errors from svm validate and map") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: James Zhu <james.zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The validated_once flag is not used after the prefault was removed, The
prefault was needed to ensure validate all system memory pages at least
once before mapping or migrating the range to GPU.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: eb3c357bcb28 ("drm/amdkfd: Handle errors from svm validate and map") Signed-off-by: Sasha Levin <sashal@kernel.org>
if hmm_range_get_pages returns EBUSY error during
svm_range_validate_and_map, within the context of a page fault
interrupt. This should retry through svm_range_restore_pages
callback. Therefore we treat this as EAGAIN error instead, and defer
it to restore pages fallback.
Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: eb3c357bcb28 ("drm/amdkfd: Handle errors from svm validate and map") Signed-off-by: Sasha Levin <sashal@kernel.org>
This patch fixes:
1: ref number of prange's svm_bo got decreased by an async call from hmm. When
wait svm_bo of prange got released we shoul also wait prang->svm_bo become NULL,
otherwise prange->svm_bo may be set to null after allocate new vram buffer.
2: During waiting svm_bo of prange got released in a while loop should reschedule
current task to give other tasks oppotunity to run, specially the the workque
task that handles svm_bo ref release, otherwise we may enter to softlock.
Signed-off-by: Xiaogang.Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
On GFX v9.4.3 dGPU, applications have random timeout failure when XNACK
on, dmesg log has "amdgpu: IH soft ring buffer overflow 0x900, 0x900",
because dGPU mode has 272 cam entries. After increasing IH soft ring
to 512 entries, no more IH soft ring overflow message and application
passed.
Fixes: bf80d34b6c58 ("drm/amdgpu: Increase soft IH ring size") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Based on grepping through the source code these drivers appear to be
missing a call to drm_atomic_helper_shutdown() at system shutdown time
and at driver remove (or unbind) time. Among other things, this means
that if a panel is in use that it won't be cleanly powered off at
system shutdown time.
The fact that we should call drm_atomic_helper_shutdown() in the case
of OS shutdown/restart and at driver remove (or unbind) time comes
straight out of the kernel doc "driver instance overview" in
drm_drv.c.
A few notes about these fixes:
- I confirmed that these drivers were all DRIVER_MODESET type drivers,
which I believe makes this relevant.
- I confirmed that these drivers were all DRIVER_ATOMIC.
- When adding drm_atomic_helper_shutdown() to the remove/unbind path,
I added it after drm_kms_helper_poll_fini() when the driver had
it. This seemed to be what other drivers did. If
drm_kms_helper_poll_fini() wasn't there I added it straight after
drm_dev_unregister().
- This patch deals with drivers using the component model in similar
ways as the patch ("drm: Call drm_atomic_helper_shutdown() at
shutdown time for misc drivers")
- These fixes rely on the patch ("drm/atomic-helper:
drm_atomic_helper_shutdown(NULL) should be a noop") to simplify
shutdown.