git.ipfire.org Git - thirdparty/kernel/linux.git/log

nvme-tcp: show hostnqn when connecting to tcp target

Log hostnqn when connecting to nvme target.
As hostnqn could be changed, logging this information
in syslog at appropriate time may help in troubleshooting.

Signed-off-by: Nitin U. Yewale <nyewale@redhat.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvmet-fc: use RCU list iterator for assoc_list

The assoc_list is a RCU protected list, thus use the RCU flavor of list
functions.

Let's use this opportunity and refactor this code and move the lookup
into a helper and give it a descriptive name.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvmet-fc: take ref count on tgtport before delete assoc

We have to ensure that the tgtport is not going away
before be have remove all the associations.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvmet-fc: avoid deadlock on delete association path

When deleting an association the shutdown path is deadlocking because we
try to flush the nvmet_wq nested. Avoid this by deadlock by deferring
the put work into its own work item.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvmet-fc: abort command when there is no binding

When the target port has not active port binding, there is no point in
trying to process the command as it has to fail anyway. Instead adding
checks to all commands abort the command early.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvmet-fc: do not tack refs on tgtports from assoc

The association life time is tied to the life time of the target port.
That means we should not take extra a refcount when creating a
association.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvmet-fc: remove null hostport pointer check

An association has always a valid hostport pointer. Remove useless
null pointer check.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvmet-fc: hold reference on hostport match

The hostport data structure is shared between the association, this why
we keep track of the users via a refcount. So we should not decrement
the refcount on a match and free the hostport several times.

Reported by KASAN.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvmet-fc: free queue and assoc directly

Neither struct nvmet_fc_tgt_queue nor struct nvmet_fc_tgt_assoc are data
structure which are used in a RCU context. So there is no reason to
delay the free operation.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvmet-fc: defer cleanup using RCU properly

When the target executes a disconnect and the host triggers a reconnect
immediately, the reconnect command still finds an existing association.

The reconnect crashes later on because nvmet_fc_delete_target_assoc
blindly removes resources while the reconnect code wants to use it.

To address this, nvmet_fc_find_target_assoc should not be able to
lookup an association which is being removed. The association list
is already under RCU lifetime management, so let's properly use it
and remove the association from the list and wait for a grace period
before cleaning up all. This means we also can drop the RCU management
on the queues, because this is now handled via the association itself.

A second step split the execution context so that the initial disconnect
command can complete without running the reconnect code in the same
context. As usual, this is done by deferring the ->done to a workqueue.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvmet-fc: release reference on target port

In case we return early out of __nvmet_fc_finish_ls_req() we still have
to release the reference on the target port.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvmet-fcloop: swap the list_add_tail arguments

The first argument of list_add_tail function is the new element which
should be added to the list which is the second argument. Swap the
arguments to allow processing more than one element at a time.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme-fc: do not wait in vain when unloading module

The module exit path has race between deleting all controllers and
freeing 'left over IDs'. To prevent double free a synchronization
between nvme_delete_ctrl and ida_destroy has been added by the initial
commit.

There is some logic around trying to prevent from hanging forever in
wait_for_completion, though it does not handling all cases. E.g.
blktests is able to reproduce the situation where the module unload
hangs forever.

If we completely rely on the cleanup code executed from the
nvme_delete_ctrl path, all IDs will be freed eventually. This makes
calling ida_destroy unnecessary. We only have to ensure that all
nvme_delete_ctrl code has been executed before we leave
nvme_fc_exit_module. This is done by flushing the nvme_delete_wq
workqueue.

While at it, remove the unused nvme_fc_wq workqueue too.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

eventfs: Get rid of dentry pointers without refcounts

The eventfs inode had pointers to dentries (and child dentries) without
actually holding a refcount on said pointer. That is fundamentally
broken, and while eventfs tried to then maintain coherence with dentries
going away by hooking into the '.d_iput' callback, that doesn't actually
work since it's not ordered wrt lookups.

There were two reasonms why eventfs tried to keep a pointer to a dentry:

- the creation of a 'events' directory would actually have a stable
 dentry pointer that it created with tracefs_start_creating().

 And it needed that dentry when tearing it all down again in
 eventfs_remove_events_dir().

 This use is actually ok, because the special top-level events
 directory dentries are actually stable, not just a temporary cache of
 the eventfs data structures.

- the 'eventfs_inode' (aka ei) needs to stay around as long as there
 are dentries that refer to it.

 It then used these dentry pointers as a replacement for doing
 reference counting: it would try to make sure that there was only
 ever one dentry associated with an event_inode, and keep a child
 dentry array around to see which dentries might still refer to the
 parent ei.

This gets rid of the invalid dentry pointer use, and renames the one
valid case to a different name to make it clear that it's not just any
random dentry.

The magic child dentry array that is kind of a "reverse reference list"
is simply replaced by having child dentries take a ref to the ei. As
does the directory dentries. That makes the broken use case go away.

Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.280463000@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

eventfs: Clean up dentry ops and add revalidate function

In order for the dentries to stay up-to-date with the eventfs changes,
just add a 'd_revalidate' function that checks the 'is_freed' bit.

Also, clean up the dentry release to actually use d_release() rather
than the slightly odd d_iput() function. We don't care about the inode,
all we want to do is to get rid of the refcount to the eventfs data
added by dentry->d_fsdata.

It would probably be cleaner to make eventfs its own filesystem, or at
least set its own dentry ops when looking up eventfs files. But as it
is, only eventfs dentries use d_fsdata, so we don't really need to split
these things up by use.

Another thing that might be worth doing is to make all eventfs lookups
mark their dentries as not worth caching. We could do that with
d_delete(), but the DCACHE_DONTCACHE flag would likely be even better.

As it is, the dentries are all freeable, but they only tend to get freed
at memory pressure rather than more proactively. But that's a separate
issue.

Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.124644253@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

eventfs: Remove unused d_parent pointer field

It's never used

Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240131185512.961772428@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

tracefs: dentry lookup crapectomy

The dentry lookup for eventfs files was very broken, and had lots of
signs of the old situation where the filesystem names were all created
statically in the dentry tree, rather than being looked up dynamically
based on the eventfs data structures.

You could see it in the naming - how it claimed to "create" dentries
rather than just look up the dentries that were given it.

You could see it in various nonsensical and very incorrect operations,
like using "simple_lookup()" on the dentries that were passed in, which
only results in those dentries becoming negative dentries. Which meant
that any other lookup would possibly return ENOENT if it saw that
negative dentry before the data was then later filled in.

You could see it in the immense amount of nonsensical code that didn't
actually just do lookups.

Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240131233227.73db55e1@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

ALSA: usb-audio: Ignore clock selector errors for single connection

For devices with multiple clock sources connected to a selector, we need
to check what a clock selector control request has returned. This is
needed to ensure that a requested clock source is indeed selected and for
autoclock feature to work.

For devices with single clock source connected, if we get an error there
is nothing else we can do about it. We can't skip clock selector setup as
it is required by some devices. So lets just ignore error in this case.

This should fix various buggy Mackie devices:

[ 649.109785] usb 1-1.3: parse_audio_format_rates_v2v3(): unable to find clock source (clock -32)
[ 649.111946] usb 1-1.3: parse_audio_format_rates_v2v3(): unable to find clock source (clock -32)
[ 649.113822] usb 1-1.3: parse_audio_format_rates_v2v3(): unable to find clock source (clock -32)

There is also interesting info from the Windows documentation [1] (this
is probably why manufacturers dont't even test this feature):

"The USB Audio 2.0 driver doesn't support clock selection. The driver
uses the Clock Source Entity, which is selected by default and never
issues a Clock Selector Control SET CUR request."

Link: https://learn.microsoft.com/en-us/windows-hardware/drivers/audio/usb-2-0-audio-drivers
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217314
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218175
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218342
Signed-off-by: Alexander Tsoy <alexander@tsoy.me>
Link: https://lore.kernel.org/r/20240201115308.17838-1-alexander@tsoy.me
Signed-off-by: Takashi Iwai <tiwai@suse.de>

octeontx2-pf: Remove xdp queues on program detach

XDP queues are created/destroyed when a XDP program
is attached/detached. In current driver xdp_queues are not
getting destroyed on program exit due to incorrect xdp_queue
and tot_tx_queue count values.

This patch fixes the issue by setting tot_tx_queue and xdp_queue
count to correct values. It also fixes xdp.data_hard_start address.

Fixes: 06059a1a9a4a ("octeontx2-pf: Add XDP support to netdev PF")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Link: https://lore.kernel.org/r/20240130120610.16673-1-gakula@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

drm/amdgpu/pm: Use inline function for IP version check

Use existing inline function for IP version check.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

io_uring/net: fix sr->len for IORING_OP_RECV with MSG_WAITALL and buffers

If we use IORING_OP_RECV with provided buffers and pass in '0' as the
length of the request, the length is retrieved from the selected buffer.
If MSG_WAITALL is also set and we get a short receive, then we may hit
the retry path which decrements sr->len and increments the buffer for
a retry. However, the length is still zero at this point, which means
that sr->len now becomes huge and import_ubuf() will cap it to
MAX_RW_COUNT and subsequently return -EFAULT for the range as a whole.

Fix this by always assigning sr->len once the buffer has been selected.

Cc: stable@vger.kernel.org
Fixes: 7ba89d2af17a ("io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ALSA: hda/realtek: Enable headset mic on Vaio VJFE-ADL

Vaio VJFE-ADL is equipped with ALC269VC, and it needs
ALC298_FIXUP_SPK_VOLUME quirk to make its headset mic work.

Signed-off-by: Edson Juliano Drosdeck <edson.drosdeck@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20240201122114.30080-1-edson.drosdeck@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda: cs35l56: Remove unused test stub function

Remove an unused stub function that calls a non-existant function.

This function was accidentally added as part of commit
2144833e7b41 ("ALSA: hda: cirrus_scodec: Add KUnit test"). It was
a relic of an earlier version of the test that should have been
removed.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 2144833e7b41 ("ALSA: hda: cirrus_scodec: Add KUnit test")
Link: https://msgid.link/r/20240129162737.497-19-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ALSA: hda: cs35l56: Firmware file must match the version of preloaded firmware

Check whether the firmware is already patched. If so, include the
firmware version in the firmware file name.

If the firmware has already been patched by the BIOS the driver
can only replace it if it has control of hard RESET.

If the driver cannot replace the firmware, it can still load a wmfw
(for ALSA control definitions) and/or a bin (for additional tunings).
But these must match the version of firmware that is running on the
CS35L56.

The firmware is pre-patched if either:
- FIRMWARE_MISSING == 0, or
- it is a secured CS35L56 (which implies that is was already patched),

cs35l56_hw_init() will set preloaded_fw_ver to the (non-zero)
firmware version if either of these conditions is true.

Normal (unpatched or replaceable firmware):
 cs35l56-rev-dsp1-misc[-system_name].[wmfw|bin]

Preloaded firmware:
 cs35l56-rev[-s]-VVVVVV-dsp1-misc[-system_name].[wmfw|bin]

Where:
 [-s] is an optional -s added into the name for a secured CS35L56
 VVVVVV is the 24-bit firmware version in hexadecimal.

Backport note:
This won't apply to kernel versions older than v6.6.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 73cfbfa9caea ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
Link: https://msgid.link/r/20240129162737.497-18-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ALSA: hda: cs35l56: Fix filename string field layout

Change the filename field layout to:
cs35l56-rev[-s]-dsp1-misc[-sub].[wmfw|bin]

This is to keep the same firmware file naming scheme as the
CS35L56 ASoC driver.

This is not a compatibility break because no firmware files have
been published.

The original field layout matched the ASoC driver, but the way the
ASoC driver used the wm_adsp driver config to form this filename
was bugged. Fixing the ASoC driver to use the correct wm_adsp config
strings means that the 's' flag (to indicate a secured part) has to
move to somewhere after the first '-'.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 73cfbfa9caea ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
Link: https://msgid.link/r/20240129162737.497-17-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ALSA: hda: cs35l56: Fix order of searching for firmware files

Check for the cases of system-specific bin file without a
wmfw before falling back to looking for a generic wmfw.

All system-specific options should be tried before falling
back to loading a generic wmfw/bin. With the original code,
the presence of a fallback generic wmfw on the filesystem
would prevent using a system-specific tuning with a ROM
firmware.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 73cfbfa9caea ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
Link: https://msgid.link/r/20240129162737.497-16-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: cs35l56: Allow more time for firmware to boot

The original 50ms timeout for firmware boot is not long enough for
worst-case time to reboot after a firmware download. Increase the
timeout to 250ms.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: e49611252900 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Link: https://msgid.link/r/20240129162737.497-15-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: cs35l56: Load tunings for the correct speaker models

If the "spk-id-gpios" property is present it points to GPIOs whose
value must be used to select the correct bin file to match the
speakers.

Some manufacturers use multiple sources of speakers, which need
different tunings for best performance. On these models the type of
speaker fitted is indicated by the values of one or more GPIOs. The
number formed by the GPIOs identifies the tuning required.

The speaker ID must be used in combination with the subsystem ID
(either from PCI SSID or cirrus,firmware-uid property), because the
GPIOs can only indicate variants of a specific model.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 1a1c3d794ef6 ("ASoC: cs35l56: Use PCI SSID as the firmware UID")
Link: https://msgid.link/r/20240129162737.497-14-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: cs35l56: Firmware file must match the version of preloaded firmware

Check during initialization whether the firmware is already patched.
If so, include the firmware version in the wm_adsp fwf_name string.

If the firmware has already been patched by the BIOS the driver
can only replace it if it has control of hard RESET.

If the driver cannot replace the firmware, it can still load a wmfw
(for ALSA control definitions) and/or a bin (for additional tunings).
But these must match the version of firmware that is running on the
CS35L56.

The firmware is pre-patched if FIRMWARE_MISSING == 0.

Including the firmware version in the fwf_name string will
qualify the firmware file name:

Normal (unpatched or replaceable firmware):
 cs35l56-rev-dsp1-misc[-system_name].[wmfw|bin]

Preloaded firmware:
 cs35l56-rev[-s]-VVVVVV-dsp1-misc[-system_name].[wmfw|bin]

Where:
 [-s] is an optional -s added into the name for a secured CS35L56
 VVVVVV is the 24-bit firmware version in hexadecimal.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 608f1b0dbdde ("ASoC: cs35l56: Move DSP part string generation so that it is done only once")
Link: https://msgid.link/r/20240129162737.497-13-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: cs35l56: Fix misuse of wm_adsp 'part' string for silicon revision

Put the silicon revision and secured flag in the wm_adsp fwf_name
string instead of including them in the part string.

This changes the format of the firmware name string from

cs35l56[s]-rev-misc[-system_name]

to
cs35l56-rev[-s]-misc[-system_name]

No firmware files have been published, so this doesn't cause a
compatibility break.

Silicon revision and secured flag are included in the firmware
filename to pick a firmware compatible with the part. These strings
were being added to the part string, but that is a misuse of the
string. The correct place for these is the fwf_name string, which
is specifically intended to select between multiple firmware files
for the same part.

Backport note:
This won't apply to kernels older than v6.6.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 608f1b0dbdde ("ASoC: cs35l56: Move DSP part string generation so that it is done only once")
Link: https://msgid.link/r/20240129162737.497-12-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: cs35l56: Fix for initializing ASP1 mixer registers

Defer initializing the state of the ASP1 mixer registers until
the firmware has been downloaded and rebooted.

On a SoundWire system the ASP is free for use as a chip-to-chip
interconnect. This can be either for the firmware on multiple
CS35L56 to share reference audio; or as a bridge to another
device. If it is a firmware interconnect it is owned by the
firmware and the Linux driver should avoid writing the registers.
However, if it is a bridge then Linux may take over and handle
it as a normal codec-to-codec link. Even if the ASP is used
as a firmware-firmware interconnect it is useful to have
ALSA controls for the ASP mixer. They are at least useful for
debugging.

CS35L56 is designed for SDCA and a generic SDCA driver would
know nothing about these chip-specific registers. So if the
ASP is being used on a SoundWire system the firmware sets up the
ASP mixer registers. This means that we can't assume the default
state of these registers. But we don't know the initial state
that the firmware set them to until after the firmware has been
downloaded and booted, which can take several seconds when
downloading multiple amps.

DAPM normally reads the initial state of mux registers during
probe() but this would mean blocking probe() for several seconds
until the firmware has initialized them. To avoid this, the
mixer muxes are set SND_SOC_NOPM to prevent DAPM trying to read
the register state. Custom get/set callbacks are implemented for
ALSA control access, and these can safely block waiting for the
firmware download.

After the firmware download has completed, the state of the
mux registers is known so a work job is queued to call
snd_soc_dapm_mux_update_power() on each of the mux widgets.

Backport note:
This won't apply cleanly to kernels older than v6.6.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: e49611252900 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Link: https://msgid.link/r/20240129162737.497-11-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ALSA: hda: cs35l56: Initialize all ASP1 registers

Add ASP1_FRAME_CONTROL1, ASP1_FRAME_CONTROL5 and the ASP1_TX?_INPUT
registers to the sequence used to initialize the ASP configuration.
Write this sequence to the cache and directly to the registers to
ensure that they match.

A system-specific firmware can patch these registers to values that are
not the silicon default, so that the CS35L56 boots already in the
configuration used by Windows or by "driverless" Windows setups such
as factory tuning.

These may not match how Linux is configuring the HDA codec. And anyway
on Linux the ALSA controls are used to configure routing options.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 73cfbfa9caea ("ALSA: hda/cs35l56: Add driver for Cirrus Logic CS35L56 amplifier")
Link: https://msgid.link/r/20240129162737.497-10-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: cs35l56: Fix default SDW TX mixer registers

Patch the SDW TX mixer registers to silicon defaults.

CS35L56 is designed for SDCA and a generic SDCA driver would
know nothing about these chip-specific registers. So the
firmware sets up the SDW TX mixer registers to whatever audio
is relevant on a specific system.

This means that the driver cannot assume the initial values
of these registers. But Linux has ALSA controls to configure
routing, so the registers can be patched to silicon default and
the ALSA controls used to select what audio to feed back to the
host capture path.

Backport note:
This won't apply to kernels older than v6.6.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: e49611252900 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Link: https://msgid.link/r/20240129162737.497-9-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: cs35l56: Fix to ensure ASP1 registers match cache

Add a dummy SUPPLY widget connected to the ASP that forces the
chip registers to match the regmap cache when the ASP is
powered-up.

On a SoundWire system the ASP is free for use as a chip-to-chip
interconnect. This can be either for the firmware on multiple
CS35L56 to share reference audio; or as a bridge to another
device. If it is a firmware interconnect it is owned by the
firmware and the Linux driver should avoid writing the registers.
However. If it is a bridge then Linux may take over and handle
it as a normal codec-to-codec link.

CS35L56 is designed for SDCA and a generic SDCA driver would
know nothing about these chip-specific registers. So if the
ASP is being used on a SoundWire system the firmware sets up the
ASP registers. This means that we can't assume the default
state of the ASP registers. But we don't know the initial state
that the firmware set them to until after the firmware has been
downloaded and booted, which can take several seconds when
downloading multiple amps.

To avoid blocking probe() for several seconds waiting for the
firmware, the silicon defaults are assumed. This allows the machine
driver to setup the ASP configuration during probe() without being
blocked. If the ASP is hooked up and used, the SUPPLY widget
ensures that the chip registers match what was configured in the
regmap cache.

If the machine driver does not hook up the ASP, it is assumed that
it won't call any functions to configure the ASP DAI. Therefore
the regmap cache will be clean for these registers so a
regcache_sync() will not overwrite the chip registers. If the
DAI is not hooked up, the dummy SUPPLY widget will not be
invoked so it will never force-overwrite the chip registers.

Backport note:
This won't apply cleanly to kernels older than v6.6.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: e49611252900 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Link: https://msgid.link/r/20240129162737.497-8-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: cs35l56: Remove buggy checks from cs35l56_is_fw_reload_needed()

Remove the check of fw_patched from cs35l56_is_fw_reload_needed().
Also remove the redundant check for control of the reset GPIO.

The fw_patched flag is set when cs35l56_dsp_work() has completed its
steps to download firmware and power-up wm_adsp. There was a check in
cs35l56_is_fw_reload_needed() to make a quick exit of 'false' if
!fw_patched. The original idea was that the system might be suspended
before the driver has ever made any attempt to download firmware, and
in that case the driver doesn't need to return to a patched state
because it was never in a patched state.

This check of fw_patched is buggy because it prevented ever recovering
from a failed patch. If a previous attempt to patch and reboot the
silicon had failed it would leave fw_patched==false. This would mean
the driver never attempted another download even though the fault may
have been cleared (by a hard reset, for example).

It is also a redundant check because the calling code already makes
a quick exit if cs35l56_component_probe() has not been called, which
deals with the original intent of this check but in a safer way.

The check for reset GPIO is redundant: if the silicon was hard-reset
the FIRMWARE_MISSING flag will be 1. But this check created an
expectation that the suspend/resume code toggles reset. This can't
easily be protected against accidental code breakage. The only reason
for the check was to skip runtime-resuming the driver to read the
PROTECTION_STATUS register when it already knows it reset the silicon.
But in that case the driver will have to be runtime-resumed to do
the firmware download. So it created an assumption for no benefit.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 8a731fd37f8b ("ASoC: cs35l56: Move utility functions to shared file")
Link: https://msgid.link/r/20240129162737.497-7-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: cs35l56: Don't add the same register patch multiple times

Move the call to cs35l56_set_patch() earlier in cs35l56_init() so
that it only adds the register patch on first-time initialization.

The call was after the post_soft_reset label, so every time this
function was run to re-initialize the hardware after a reset it would
call regmap_register_patch() and add the same reg_sequence again.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 898673b905b9 ("ASoC: cs35l56: Move shared data into a common data structure")
Link: https://msgid.link/r/20240129162737.497-6-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: cs35l56: cs35l56_component_remove() must clean up wm_adsp

cs35l56_component_remove() must call wm_adsp_power_down() and
wm_adsp2_component_remove().

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: e49611252900 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Link: https://msgid.link/r/20240129162737.497-5-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: cs35l56: cs35l56_component_remove() must clear cs35l56->component

The cs35l56->component pointer is used by the suspend-resume handling to
know whether the driver is fully instantiated. This is to prevent it
queuing dsp_work which would result in calling wm_adsp when the driver
is not an instantiated ASoC component. So this pointer must be cleared
by cs35l56_component_remove().

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: e49611252900 ("ASoC: cs35l56: Add driver for Cirrus Logic CS35L56")
Link: https://msgid.link/r/20240129162737.497-4-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: wm_adsp: Don't overwrite fwf_name with the default

There's no need to overwrite fwf_name with a kstrdup() of the cs_dsp part
name. It is trivial to select either fwf_name or cs_dsp.part as the string
to use when building the filename in wm_adsp_request_firmware_file().

This leaves fwf_name entirely owned by the codec driver.

It also avoids problems with freeing the pointer. With the original code
fwf_name was either a pointer owned by the codec driver, or a kstrdup()
created by wm_adsp. This meant wm_adsp must free it if it set it, but not
if the codec driver set it. The code was handling this by using
devm_kstrdup().
But there is no absolute requirement that wm_adsp_common_init() must be
called from probe(), so this was a pseudo-memory leak - each new call to
wm_adsp_common_init() would allocate another block of memory but these
would only be freed if the owning codec driver was removed.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Link: https://msgid.link/r/20240129162737.497-3-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

ASoC: wm_adsp: Fix firmware file search order

Check for the cases of system-specific bin file without a
wmfw before falling back to looking for a generic wmfw.

All system-specific options should be tried before falling
back to loading a generic wmfw/bin. With the original code,
the presence of a fallback generic wmfw on the filesystem
would prevent using a system-specific tuning with a ROM
firmware.

Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Fixes: 0e7d82cbea8b ("ASoC: wm_adsp: Add support for loading bin files without wmfw")
Link: https://msgid.link/r/20240129162737.497-2-rf@opensource.cirrus.com
Signed-off-by: Mark Brown <broonie@kernel.org>

Merge tag 'asoc-fix-v6.8-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.8

Quite a lot of fixes that came in since the merge window, a large
portion for for Qualcomm and ES8326.

The 8 DAI support for Qualcomm is just raising a constant to allow for
devies that otherwise only need DTs, and there's a few other device ID
updates for sunxi (Allwinner) and AMD platforms.

drm/tegra: Do not assume that a NULL domain means no DMA IOMMU

Previously with tegra-smmu, even with CONFIG_IOMMU_DMA, the default domain
could have been left as NULL. The NULL domain is specially recognized by
host1x_client_iommu_attach() as meaning it is not the DMA domain and
should be replaced with the special shared domain.

This happened prior to the below commit because tegra-smmu was using the
NULL domain to mean IDENTITY.

Now that the domain is properly labled the test in DRM doesn't see NULL.
Check for IDENTITY as well to enable the special domains.

Fixes: c8cc2655cc6c ("iommu/tegra-smmu: Implement an IDENTITY domain")
Reported-by: diogo.ivo@tecnico.ulisboa.pt
Closes: https://lore.kernel.org/all/bbmhcoghrprmbdibnjum6lefix2eoquxrde7wyqeulm4xabmlm@b6jy32saugqh/
Tested-by: diogo.ivo@tecnico.ulisboa.pt
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/0-v1-3049f92c4812+16691-host1x_def_dom_fix_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>

iommu: Allow ops->default_domain to work when !CONFIG_IOMMU_DMA

The ops->default_domain flow used a 0 req_type to select the default
domain and this was enforced by iommu_group_alloc_default_domain().

When !CONFIG_IOMMU_DMA started forcing the old ARM32 drivers into IDENTITY
it also overroad the 0 req_type of the ops->default_domain drivers to
IDENTITY which ends up causing failures during device probe.

Make iommu_group_alloc_default_domain() accept a req_type that matches the
ops->default_domain and have iommu_group_alloc_default_domain() generate a
req_type that matches the default_domain.

This way the req_type always describes what kind of domain should be
attached and ops->default_domain overrides all other mechanisms to choose
the default domain.

Fixes: 2ad56efa80db ("powerpc/iommu: Setup a default domain and remove set_platform_dma_ops")
Fixes: 0f6a90436a57 ("iommu: Do not use IOMMU_DOMAIN_DMA if CONFIG_IOMMU_DMA is not enabled")
Reported-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Closes: https://lore.kernel.org/linux-iommu/20240123165829.630276-1-ovidiu.panait@windriver.com/
Reported-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Closes: https://lore.kernel.org/linux-iommu/170618452753.3805.4425669653666211728.stgit@ltcd48-lp2.aus.stglab.ibm.com/
Tested-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/0-v1-755bd21c4a64+525b8-iommu_def_dom_fix_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>

firewire: core: search descriptor leaf just after vendor directory entry in root directory

It appears that Sony DVMC-DA1 has a quirk that the descriptor leaf entry
locates just after the vendor directory entry in root directory. This is
not conformant to the legacy layout of configuration ROM described in
Configuration ROM for AV/C Devices 1.0 (1394 Trading Association, Dec 2000,
TA Document 1999027).

This commit changes current implementation to parse configuration ROM for
device attributes so that the descriptor leaf entry can be detected for
the vendor name.

$ config-rom-pretty-printer < Sony-DVMC-DA1.img
 ROM header and bus information block
 -----------------------------------------------------------------
1024 041ee7fb bus_info_length 4, crc_length 30, crc 59387
1028 31333934 bus_name "1394"
1032 e0644000 irmc 1, cmc 1, isc 1, bmc 0, cyc_clk_acc 100, max_rec 4 (32)
1036 08004603 company_id 080046 |
1040 0014193c device_id 12886219068 | EUI-64 576537731003586876

 root directory
 -----------------------------------------------------------------
1044 0006b681 directory_length 6, crc 46721
1048 03080046 vendor
1052 0c0083c0 node capabilities: per IEEE 1394
1056 8d00000a --> eui-64 leaf at 1096
1060 d1000003 --> unit directory at 1072
1064 c3000005 --> vendor directory at 1084
1068 8100000a --> descriptor leaf at 1108

 unit directory at 1072
 -----------------------------------------------------------------
1072 0002cdbf directory_length 2, crc 52671
1076 1200a02d specifier id
1080 13010000 version

 vendor directory at 1084
 -----------------------------------------------------------------
1084 00020cfe directory_length 2, crc 3326
1088 17fa0000 model
1092 81000008 --> descriptor leaf at 1124

 eui-64 leaf at 1096
 -----------------------------------------------------------------
1096 0002c66e leaf_length 2, crc 50798
1100 08004603 company_id 080046 |
1104 0014193c device_id 12886219068 | EUI-64 576537731003586876

 descriptor leaf at 1108
 -----------------------------------------------------------------
1108 00039e26 leaf_length 3, crc 40486
1112 00000000 textual descriptor
1116 00000000 minimal ASCII
1120 536f6e79 "Sony"

 descriptor leaf at 1124
 -----------------------------------------------------------------
1124 0005001d leaf_length 5, crc 29
1128 00000000 textual descriptor
1132 00000000 minimal ASCII
1136 44564d43 "DVMC"
1140 2d444131 "-DA1"
1144 00000000

Suggested-by: Adam Goldman <adamg@pobox.com>
Tested-by: Adam Goldman <adamg@pobox.com>
Link: https://lore.kernel.org/r/20240130100409.30128-3-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>

firewire: core: correct documentation of fw_csr_string() kernel API

Against its current description, the kernel API can accepts all types of
directory entries.

This commit corrects the documentation.

Cc: stable@vger.kernel.org
Fixes: 3c2c58cb33b3 ("firewire: core: fw_csr_string addendum")
Link: https://lore.kernel.org/r/20240130100409.30128-2-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>

drm/hwmon: Fix abi doc warnings

This fixes warnings in xe, i915 hwmon docs:

Warning: /sys/devices/.../hwmon/hwmon/curr1_crit is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:35 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:52
Warning: /sys/devices/.../hwmon/hwmon/energy1_input is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:54 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:65
Warning: /sys/devices/.../hwmon/hwmon/in0_input is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:46 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:0
Warning: /sys/devices/.../hwmon/hwmon/power1_crit is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:22 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:39
Warning: /sys/devices/.../hwmon/hwmon/power1_max is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:0 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:8
Warning: /sys/devices/.../hwmon/hwmon/power1_max_interval is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:62 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:30
Warning: /sys/devices/.../hwmon/hwmon/power1_rated_max is defined 2 times: Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon:14 Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon:22

Use a path containing the driver name to differentiate the documentation
of each entry.

Fixes: fb1b70607f73 ("drm/xe/hwmon: Expose power attributes")
Fixes: 92d44a422d0d ("drm/xe/hwmon: Expose card reactive critical power")
Fixes: fbcdc9d3bf58 ("drm/xe/hwmon: Expose input voltage attribute")
Fixes: 71d0a32524f9 ("drm/xe/hwmon: Expose hwmon energy attribute")
Fixes: 4446fcf220ce ("drm/xe/hwmon: Expose power1_max_interval")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/all/20240125113345.291118ff@canb.auug.org.au/
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240127165040.2348009-1-badal.nilawar@intel.com
(cherry picked from commit 20485e3a810c480cef60caf53988619f61127e7b)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Make all GuC ABI shift values unsigned

All GuC ABI definitions are unsigned and not defining as unsigned is
causing build errors [1].

[1] https://lore.kernel.org/all/20240123111235.3097079-1-geert@linux-m68k.org/

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240131025424.2087936-1-matthew.brost@intel.com
(cherry picked from commit d83d8ae275c6bf87506b71b8a1acd98452137dc5)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe/vm: Subclass userptr vmas

The construct allocating only parts of the vma structure when
the userptr part is not needed is very fragile. A developer could
add additional fields below the userptr part, and the code could
easily attempt to access the userptr part even if its not persent.

So introduce xe_userptr_vma which subclasses struct xe_vma the
proper way, and accordingly modify a couple of interfaces.
This should also help if adding userptr helpers to drm_gpuvm.

v2:
- Fix documentation of to_userptr_vma() (Matthew Brost)
- Fix allocation and freeing of vmas to clearer distinguish
between the types.

Closes: https://lore.kernel.org/intel-xe/0c4cc1a7-f409-4597-b110-81f9e45d1ffe@embeddedor.com/T/#u
Fixes: a4cc60a55fd9 ("drm/xe: Only alloc userptr part of xe_vma for userptrs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240131091628.12318-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit 5bd24e78829ad569fa1c3ce9a05b59bb97b91f3d)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Use LRC prefix rather than CTX prefix in lrc desc defines

The sparc build fails [1] due to CTX_VALID being redefined. Fix this by
using a better naming convention of LRC_VALID as this define is used in
setting bits in the lrc descriptor. To be uniform, change other define
with LRC prefix too.

[1] https://lore.kernel.org/all/20240123111235.3097079-1-geert@linux-m68k.org/

v2:
- s/LEGACY_64B_CONTEXT/LRC_LEGACY_64B_CONTEXT (Lucas)

Fixes: 0bc519d20ffa ("drm/xe: Remove GEN[0-9]*_ prefixes")
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240123212638.1605626-1-matthew.brost@intel.com
(cherry picked from commit 152ca51d8db03f08a71c25e999812e263839fdce)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Don't use __user error pointers

The error pointer macros are not aware of __user pointers and as a
consequence sparse warns.

Have the copy_mask() function return an integer instead of a __user
pointer.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117134048.165425-5-thomas.hellstrom@linux.intel.com
(cherry picked from commit 78366eed6853aa6a5deccb2eb182f9334d2bd208)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Annotate mcr_[un]lock()

These functions acquire and release the gt::mcr_lock. Annotate
accordingly.
Fix the corresponding sparse warning.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Fixes: fb1d55efdfcb ("drm/xe: Cleanup OPEN_BRACE style issues")
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117134048.165425-4-thomas.hellstrom@linux.intel.com
(cherry picked from commit 97fd7a7e4e877676a2ab1a687ba958b70931abcc)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Only allow 1 ufence per exec / bind IOCTL

The way exec ufences are coded only 1 ufence per IOCTL will be signaled.
It is possible to fix this but for current use cases 1 ufence per IOCTL
is sufficient. Enforce a limit of 1 ufence per IOCTL (both exec and bind
to be uniform).

v2:
- Add fixes tag (Thomas)

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124234413.1640825-1-matthew.brost@intel.com
(cherry picked from commit d1df9bfbf68c65418f30917f406b6d5bd597714e)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Grab mem_access when disabling C6 on skip_guc_pc platforms

If skip_guc_pc is set for a platform, C6 is disabled directly without
acquiring a mem_access reference, triggering an assertion inside
xe_gt_idle_disable_c6.

Fixes: 975e4a3795d4 ("drm/xe: Manually setup C6 when skip_guc_pc is set")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240126220613.865939-2-matthew.d.roper@intel.com
(cherry picked from commit 9f5971bdf78e0937206556534247243ad56cd735)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

drm/xe: Fix crash in trace_dma_fence_init()

trace_dma_fence_init() uses dma_fence_ops functions
like get_driver_name() and get_timeline_name() to generate trace
information but the Xe KMD implementation of those functions makes
use of xe_hw_fence_ctx that was being set after dma_fence_init().

So here just inverting the order to fix the crash.

Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240124171830.95774-1-jose.souza@intel.com
(cherry picked from commit c6878e47431c72168da08dfbc1496c09b2d3c246)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Merge branch 'selftests-net-a-few-pmtu-sh-fixes'

Paolo Abeni says:

====================
selftests: net: a few pmtu.sh fixes

This series try to address CI failures for the pmtu.sh tests. It
does _not_ attempt to enable all the currently skipped cases, to
avoid adding more entropy.

Tested with:

make -C tools/testing/selftests/ TARGETS=net install
vng --build --config tools/testing/selftests/net/config
vng --run . --user root -- \
./tools/testing/selftests/kselftest_install/run_kselftest.sh \
-t net:pmtu.sh
====================

Link: https://lore.kernel.org/r/cover.1706635101.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: don't access /dev/stdout in pmtu.sh

When running the pmtu.sh via the kselftest infra, accessing
/dev/stdout gives unexpected results:
 # dd: failed to open '/dev/stdout': Device or resource busy
 # TEST: IPv4, bridged vxlan4: PMTU exceptions [FAIL]

Let dd use directly the standard output to fix the above:
 # TEST: IPv4, bridged vxlan4: PMTU exceptions - nexthop objects [ OK ]

Fixes: 136a1b434bbb ("selftests: net: test vxlan pmtu exceptions with tcp")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/23d7592c5d77d75cff9b34f15c227f92e911c2ae.1706635101.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: fix available tunnels detection

The pmtu.sh test tries to detect the tunnel protocols available
in the running kernel and properly skip the unsupported cases.

In a few more complex setup, such detection is unsuccessful, as
the script currently ignores some intermediate error code at
setup time.

Before:
 # which: no nettest in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin)
 # TEST: vti6: PMTU exceptions (ESP-in-UDP) [FAIL]
 # PMTU exception wasn't created after creating tunnel exceeding link layer MTU
 # ./pmtu.sh: line 931: kill: (7543) - No such process
 # ./pmtu.sh: line 931: kill: (7544) - No such process

After:
 # xfrm4 not supported
 # TEST: vti4: PMTU exceptions [SKIP]

Fixes: ece1278a9b81 ("selftests: net: add ESP-in-UDP PMTU test")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/cab10e75fda618e6fff8c595b632f47db58b9309.1706635101.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests: net: add missing config for pmtu.sh tests

The mentioned test uses a few Kconfig still missing the
net config, add them.

Before:
 # Error: Specified qdisc kind is unknown.
 # Error: Specified qdisc kind is unknown.
 # Error: Qdisc not classful.
 # We have an error talking to the kernel
 # Error: Qdisc not classful.
 # We have an error talking to the kernel
 # policy_routing not supported
 # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions [SKIP]

After:
 # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions [ OK ]

Fixes: ec730c3e1f0e ("selftest: net: Test IPv4 PMTU exceptions with DSCP and ECN")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/8d27bf6762a5c7b3acc457d6e6872c533040f9c1.1706635101.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'pds_core-various-fixes'

Brett Creeley says:

====================
pds_core: Various fixes

This series includes the following changes:

There can be many users of the pds_core's adminq. This includes
pds_core's uses and any clients that depend on it. When the pds_core
device goes through a reset for any reason the adminq is freed
and reconfigured. There are some gaps in the current implementation
that will cause crashes during reset if any of the previously mentioned
users of the adminq attempt to use it after it's been freed.

Issues around how resets are handled, specifically regarding the driver's
error handlers.

Originally these patches were aimed at net-next, but it was requested to
push the fixes patches to net. The original patches can be found here:

https://lore.kernel.org/netdev/20240126174255.17052-1-brett.creeley@amd.com/

Also, the Reviewed-by tags were left in place from net-next reviews as the
patches didn't change.
====================

Link: https://lore.kernel.org/r/20240129234035.69802-1-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pds_core: Rework teardown/setup flow to be more common

Currently the teardown/setup flow for driver probe/remove is quite
a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up().
One key piece that's missing are the calls to pci_alloc_irq_vectors()
and pci_free_irq_vectors(). The pcie reset case is calling
pci_free_irq_vectors() on reset_prepare, but not calling the
corresponding pci_alloc_irq_vectors() on reset_done. This is causing
unexpected/unwanted interrupt behavior due to the adminq interrupt
being accidentally put into legacy interrupt mode. Also, the
pci_alloc_irq_vectors()/pci_free_irq_vectors() functions are being
called directly in probe/remove respectively.

Fix this inconsistency by making the following changes:
 1. Always call pdsc_dev_init() in pdsc_setup(), which calls
 pci_alloc_irq_vectors() and get rid of the now unused
 pds_dev_reinit().
 2. Always free/clear the pdsc->intr_info in pdsc_teardown()
 since this structure will get re-alloced in pdsc_setup().
 3. Move the calls of pci_free_irq_vectors() to pdsc_teardown()
 since pci_alloc_irq_vectors() will always be called in
 pdsc_setup()->pdsc_dev_init() for both the probe/remove and
 reset flows.
 4. Make sure to only create the debugfs "identity" entry when it
 doesn't already exist, which it will in the reset case because
 it's already been created in the initial call to pdsc_dev_init().

Fixes: ffa55858330f ("pds_core: implement pci reset handlers")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240129234035.69802-7-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pds_core: Clear BARs on reset

During reset the BARs might be accessed when they are
unmapped. This can cause unexpected issues, so fix it by
clearing the cached BAR values so they are not accessed
until they are re-mapped.

Also, make sure any places that can access the BARs
when they are NULL are prevented.

Fixes: 49ce92fbee0b ("pds_core: add FW update feature to devlink")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-6-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pds_core: Prevent race issues involving the adminq

There are multiple paths that can result in using the pdsc's
adminq.

[1] pdsc_adminq_isr and the resulting work from queue_work(),
i.e. pdsc_work_thread()->pdsc_process_adminq()

[2] pdsc_adminq_post()

When the device goes through reset via PCIe reset and/or
a fw_down/fw_up cycle due to bad PCIe state or bad device
state the adminq is destroyed and recreated.

A NULL pointer dereference can happen if [1] or [2] happens
after the adminq is already destroyed.

In order to fix this, add some further state checks and
implement reference counting for adminq uses. Reference
counting was used because multiple threads can attempt to
access the adminq at the same time via [1] or [2]. Additionally,
multiple clients (i.e. pds-vfio-pci) can be using [2]
at the same time.

The adminq_refcnt is initialized to 1 when the adminq has been
allocated and is ready to use. Users/clients of the adminq
(i.e. [1] and [2]) will increment the refcnt when they are using
the adminq. When the driver goes into a fw_down cycle it will
set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt
to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent
any further adminq_refcnt increments. Waiting for the
adminq_refcnt to hit 1 allows for any current users of the adminq
to finish before the driver frees the adminq. Once the
adminq_refcnt hits 1 the driver clears the refcnt to signify that
the adminq is deleted and cannot be used. On the fw_up cycle the
driver will once again initialize the adminq_refcnt to 1 allowing
the adminq to be used again.

Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-5-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pds_core: Use struct pdsc for the pdsc_adminq_isr private data

The initial design for the adminq interrupt was done based
on client drivers having their own adminq and adminq
interrupt. So, each client driver's adminq isr would use
their specific adminqcq for the private data struct. For the
time being the design has changed to only use a single
adminq for all clients. So, instead use the struct pdsc for
the private data to simplify things a bit.

This also has the benefit of not dereferencing the adminqcq
to access the pdsc struct when the PDSC_S_STOPPING_DRIVER bit
is set and the adminqcq has actually been cleared/freed.

Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-4-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pds_core: Cancel AQ work on teardown

There is a small window where pdsc_work_thread()
calls pdsc_process_adminq() and pdsc_process_adminq()
passes the PDSC_S_STOPPING_DRIVER check and starts
to process adminq/notifyq work and then the driver
starts a fw_down cycle. This could cause some
undefined behavior if the notifyqcq/adminqcq are
free'd while pdsc_process_adminq() is running. Use
cancel_work_sync() on the adminqcq's work struct
to make sure any pending work items are cancelled
and any in progress work items are completed.

Also, make sure to not call cancel_work_sync() if
the work item has not be initialized. Without this,
traces will happen in cases where a reset fails and
teardown is called again or if reset fails and the
driver is removed.

Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-3-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pds_core: Prevent health thread from running during reset/remove

The PCIe reset handlers can run at the same time as the
health thread. This can cause the health thread to
stomp on the PCIe reset. Fix this by preventing the
health thread from running while a PCIe reset is happening.

As part of this use timer_shutdown_sync() during reset and
remove to make sure the timer doesn't ever get rearmed.

Fixes: ffa55858330f ("pds_core: implement pci reset handlers")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20240129234035.69802-2-brett.creeley@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

af_unix: fix lockdep positive in sk_diag_dump_icons()

syzbot reported a lockdep splat [1].

Blamed commit hinted about the possible lockdep
violation, and code used unix_state_lock_nested()
in an attempt to silence lockdep.

It is not sufficient, because unix_state_lock_nested()
is already used from unix_state_double_lock().

We need to use a separate subclass.

This patch adds a distinct enumeration to make things
more explicit.

Also use swap() in unix_state_double_lock() as a clean up.

v2: add a missing inline keyword to unix_state_lock_nested()

[1]
WARNING: possible circular locking dependency detected
6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0 Not tainted

syz-executor.1/2542 is trying to acquire lock:
ffff88808b5df9e8 (rlock-AF_UNIX){+.+.}-{2:2}, at: skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863

but task is already holding lock:
ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&u->lock/1){+.+.}-{2:2}:
 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
 _raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378
 sk_diag_dump_icons net/unix/diag.c:87 [inline]
 sk_diag_fill+0x6ea/0xfe0 net/unix/diag.c:157
 sk_diag_dump net/unix/diag.c:196 [inline]
 unix_diag_dump+0x3e9/0x630 net/unix/diag.c:220
 netlink_dump+0x5c1/0xcd0 net/netlink/af_netlink.c:2264
 __netlink_dump_start+0x5d7/0x780 net/netlink/af_netlink.c:2370
 netlink_dump_start include/linux/netlink.h:338 [inline]
 unix_diag_handler_dump+0x1c3/0x8f0 net/unix/diag.c:319
 sock_diag_rcv_msg+0xe3/0x400
 netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2543
 sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
 netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
 netlink_unicast+0x7e6/0x980 net/netlink/af_netlink.c:1367
 netlink_sendmsg+0xa37/0xd70 net/netlink/af_netlink.c:1908
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg net/socket.c:745 [inline]
 sock_write_iter+0x39a/0x520 net/socket.c:1160
 call_write_iter include/linux/fs.h:2085 [inline]
 new_sync_write fs/read_write.c:497 [inline]
 vfs_write+0xa74/0xca0 fs/read_write.c:590
 ksys_write+0x1a0/0x2c0 fs/read_write.c:643
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

-> #0 (rlock-AF_UNIX){+.+.}-{2:2}:
 check_prev_add kernel/locking/lockdep.c:3134 [inline]
 check_prevs_add kernel/locking/lockdep.c:3253 [inline]
 validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869
 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
 skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863
 unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg net/socket.c:745 [inline]
 ____sys_sendmsg+0x592/0x890 net/socket.c:2584
 ___sys_sendmsg net/socket.c:2638 [inline]
 __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724
 __do_sys_sendmmsg net/socket.c:2753 [inline]
 __se_sys_sendmmsg net/socket.c:2750 [inline]
 __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

other info that might help us debug this:

Possible unsafe locking scenario:

 CPU0 CPU1
 ---- ----
 lock(&u->lock/1);
 lock(rlock-AF_UNIX);
 lock(&u->lock/1);
 lock(rlock-AF_UNIX);

*** DEADLOCK ***

1 lock held by syz-executor.1/2542:
 #0: ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089

stack backtrace:
CPU: 1 PID: 2542 Comm: syz-executor.1 Not tainted 6.8.0-rc1-syzkaller-00356-g8a696a29c690 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Call Trace:
<TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
 check_noncircular+0x366/0x490 kernel/locking/lockdep.c:2187
 check_prev_add kernel/locking/lockdep.c:3134 [inline]
 check_prevs_add kernel/locking/lockdep.c:3253 [inline]
 validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869
 __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
 skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863
 unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg net/socket.c:745 [inline]
 ____sys_sendmsg+0x592/0x890 net/socket.c:2584
 ___sys_sendmsg net/socket.c:2638 [inline]
 __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724
 __do_sys_sendmmsg net/socket.c:2753 [inline]
 __se_sys_sendmmsg net/socket.c:2750 [inline]
 __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f26d887cda9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f26d95a60c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f26d89abf80 RCX: 00007f26d887cda9
RDX: 000000000000003e RSI: 00000000200bd000 RDI: 0000000000000004
RBP: 00007f26d88c947a R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000008c0 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f26d89abf80 R15: 00007ffcfe081a68

Fixes: 2aac7a2cb0d9 ("unix_diag: Pending connections IDs NLA")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20240130184235.1620738-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nvme-fc: log human-readable opcode on timeout

The fc transport logs the opcode and fctype on command timeout.
This is sufficient information to identify the command issued,
but not very human-readable. Use the nvme_fabrics_opcode_str()
helper to also log the name of the command, as rdma and tcp already do.

Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme: split out fabrics version of nvme_opcode_str()

nvme_opcode_str() currently supports admin, IO, and fabrics commands.
However, fabrics commands aren't allowed for the pci transport.
Currently the pci caller passes 0 as the fctype,
which means any fabrics command would be displayed as "Property Set".

Move fabrics command support into a function nvme_fabrics_opcode_str()
and remove the fctype argument to nvme_opcode_str().
This way, a fabrics command will display as "Unknown" for pci.
Convert the rdma and tcp transports to use nvme_fabrics_opcode_str().

Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

Merge branch 'selftests-net-a-couple-of-typos-fixes-in-key-management-rst-tests'

Dmitry Safonov says:

====================
selftests/net: A couple of typos fixes in key-management/rst tests

Two typo fixes, noticed by Mohammad's review.
And a fix for an issue that got uncovered.

v1: https://lore.kernel.org/r/20240118-tcp-ao-test-key-mgmt-v1-0-3583ca147113@arista.com

Signed-off-by: Dmitry Safonov <dima@arista.com>
====================

Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-0-d190430a6c60@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Repair RST passive reset selftest

Currently, the test is racy and seems to not pass anymore.

In order to rectify it, aim on TCP_TW_RST.
Doesn't seem way too good with this sleep() part, but it seems as
a reasonable compromise for the test. There is a plan in-line comment on
how-to improve it, going to do it on the top, at this moment I want it
to run on netdev/patchwork selftests dashboard.

It also slightly changes tcp_ao-lib in order to get SO_ERROR propagated
to test_client_verify() return value.

Fixes: c6df7b2361d7 ("selftests/net: Add TCP-AO RST test")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-3-d190430a6c60@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Rectify key counters checks

As the names of (struct test_key) members didn't reflect whether the key
was used for TX or RX, the verification for the counters was done
incorrectly for asymmetrical selftests.

Rename these with _tx appendix and fix checks in verify_counters().
While at it, as the checks are now correct, introduce skip_counters_checks,
which is intended for tests where it's expected that a key that was set
with setsockopt(sk, IPPROTO_TCP, TCP_AO_INFO, ...) might had no chance
of getting used on the wire.

Fixes the following failures, exposed by the previous commit:
> not ok 51 server: Check current != rnext keys set before connect(): Counter pkt_good was expected to increase 0 => 0 for key 132:5
> not ok 52 server: Check current != rnext keys set before connect(): Counter pkt_good was not expected to increase 0 => 21 for key 137:10
>
> not ok 63 server: Check current flapping back on peer's RnextKey request: Counter pkt_good was expected to increase 0 => 0 for key 132:5
> not ok 64 server: Check current flapping back on peer's RnextKey request: Counter pkt_good was not expected to increase 0 => 40 for key 137:10

Cc: Mohammad Nassiri <mnassiri@ciena.com>
Fixes: 3c3ead555648 ("selftests/net: Add TCP-AO key-management test")
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-2-d190430a6c60@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

selftests/net: Argument value mismatch when calling verify_counters()

The end_server() function only operates in the server thread
and always takes an accept socket instead of a listen socket as
its input argument. To align with this, invert the boolean values
used when calling verify_counters() within the end_server() function.

As a result of this typo, the test didn't correctly check for
the non-symmetrical scenario, where i.e. peer-A uses a key <100:200>
to send data, but peer-B uses another key <105:205> to send its data.
So, in simple words, different keys for TX and RX.

Fixes: 3c3ead555648 ("selftests/net: Add TCP-AO key-management test")
Signed-off-by: Mohammad Nassiri <mnassiri@ciena.com>
Link: https://lore.kernel.org/all/934627c5-eebb-4626-be23-cfb134c01d1a@arista.com/
[amended 'Fixes' tag, added the issue description and carried-over to lkml]
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-1-d190430a6c60@arista.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

nvme: take const cmd pointer in read-only helpers

nvme_is_fabrics() and nvme_is_write() only read struct nvme_command,
so take it by const pointer. This allows callers to pass a const pointer
and communicates that these functions don't modify the command.

Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme: remove redundant status mask

In nvme_get_error_status_str(), the status code is already masked
with 0x7ff at the beginning of the function.
Don't bother masking it again when indexing nvme_statuses.

Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme: return string as char *, not unsigned char *

The functions in drivers/nvme/host/constants.c returning human-readable
status and opcode strings currently use type "const unsigned char *".
Typically string constants use type "const char *",
so remove "unsigned" from the return types.
This is a purely cosmetic change to clarify that the functions
return text strings instead of an array of bytes, for example.

Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme-common: add module description

Add MODULE_DESCRIPTION() in order to remove warnings & get clean build:-

WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/common/nvme-auth.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/common/nvme-keyring.o

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme: enable retries for authentication commands

Authentication commands might trigger a lengthy computation on the
controller or even a callout to an external entity.
In these cases the controller might return a status without the DNR
bit set, indicating that the command should be retried.
This patch enables retries for authentication commands by setting
NVME_SUBMIT_RETRY for __nvme_submit_sync_cmd().

Reported-by: Martin George <marting@netapp.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme: change __nvme_submit_sync_cmd() calling conventions

Combine the two arguments 'flags' and 'at_head' from __nvme_submit_sync_cmd()
into a single 'flags' argument and use function-specific values to indicate
what should be set within the function.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

nvme-auth: open-code single-use macros

No point in having macros just for a single function nvme_auth_submit().
Open-code them into the caller.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>

net: dsa: mv88e6xxx: Fix failed probe due to unsupported C45 reads

Not all mv88e6xxx device support C45 read/write operations. Those
which do not return -EOPNOTSUPP. However, when phylib scans the bus,
it considers this fatal, and the probe of the MDIO bus fails, which in
term causes the mv88e6xxx probe as a whole to fail.

When there is no device on the bus for a given address, the pull up
resistor on the data line results in the read returning 0xffff. The
phylib core code understands this when scanning for devices on the
bus. C45 allows multiple devices to be supported at one address, so
phylib will perform a few reads at each address, so although thought
not the most efficient solution, it is a way to avoid fatal
errors. Make use of this as a minimal fix for stable to fix the
probing problems.

Follow up patches will rework how C45 operates to make it similar to
C22 which considers -ENODEV as a none-fatal, and swap mv88e6xxx to
using this.

Cc: stable@vger.kernel.org
Fixes: 743a19e38d02 ("net: dsa: mv88e6xxx: Separate C22 and C45 transactions")
Reported-by: Tim Menninger <tmenninger@purestorage.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240129224948.1531452-1-andrew@lunn.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipv4: fix a memleak in ip_setup_cork

When inetdev_valid_mtu fails, cork->opt should be freed if it is
allocated in ip_setup_cork. Otherwise there could be a memleak.

Fixes: 501a90c94510 ("inet: protect against too small mtu values.")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240129091017.2938835-1-alexious@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

MAINTAINERS: Drop unreachable reviewer for Qualcomm ETHQOS ethernet driver

Bhupesh's email responds indicating they've changed employers and with
no new contact information. Let's drop the line from MAINTAINERS to
avoid getting the same response over and over.

Signed-off-by: Andrew Halaney <ahalaney@redhat.com>
Link: https://lore.kernel.org/r/20240129-remove-dwmac-qcom-ethqos-reviewer-v1-1-2645eab61451@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

cifs: make sure that channel scaling is done only once

Following a successful cifs_tree_connect, we have the code
to scale up/down the number of channels in the session.
However, it is not protected by a lock today.

As a result, this code can be executed by several processes
that select the same channel. The core functions handle this
well, as they pick chan_lock. However, we've seen cases where
smb2_reconnect throws some warnings.

To fix that, this change introduces a flags bitmap inside the
cifs_ses structure. A new flag type is used to ensure that
only one process enters this section at any time.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

drm/amdgpu: Reset IH OVERFLOW_CLEAR bit

Allows us to detect subsequent IH ring buffer overflows as well.

Cc: Joshua Ashton <joshua@froggi.es>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: remove asymmetrical irq disabling in vcn 4.0.5 suspend

There is no irq enabled in vcn 4.0.5 resume, causing wrong amdgpu_irq_src status.
Beside, current set function callbacks are empty with no real effect.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: drm/amdgpu: remove golden setting for gfx 11.5.0

No need to set GC golden settings in driver from gfx 11.5.0 onwards.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: reserve the BO before validating it

Fix a warning.

v2: Avoid unmapping attachment repeatedly when ERESTARTSYS.

v3: Lock the BO before accessing ttm->sg to avoid race conditions.(Felix)

[ 41.708711] WARNING: CPU: 0 PID: 1463 at drivers/gpu/drm/ttm/ttm_bo.c:846 ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.708989] Call Trace:
[ 41.708992] <TASK>
[ 41.708996] ? show_regs+0x6c/0x80
[ 41.709000] ? ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.709008] ? __warn+0x93/0x190
[ 41.709014] ? ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.709024] ? report_bug+0x1f9/0x210
[ 41.709035] ? handle_bug+0x46/0x80
[ 41.709041] ? exc_invalid_op+0x1d/0x80
[ 41.709048] ? asm_exc_invalid_op+0x1f/0x30
[ 41.709057] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[ 41.709185] ? ttm_bo_validate+0x146/0x1b0 [ttm]
[ 41.709197] ? amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x2c/0x80 [amdgpu]
[ 41.709337] ? srso_alias_return_thunk+0x5/0x7f
[ 41.709346] kfd_mem_dmaunmap_attachment+0x9e/0x1e0 [amdgpu]
[ 41.709467] amdgpu_amdkfd_gpuvm_dmaunmap_mem+0x56/0x80 [amdgpu]
[ 41.709586] kfd_ioctl_unmap_memory_from_gpu+0x1b7/0x300 [amdgpu]
[ 41.709710] kfd_ioctl+0x1ec/0x650 [amdgpu]
[ 41.709822] ? __pfx_kfd_ioctl_unmap_memory_from_gpu+0x10/0x10 [amdgpu]
[ 41.709945] ? srso_alias_return_thunk+0x5/0x7f
[ 41.709949] ? tomoyo_file_ioctl+0x20/0x30
[ 41.709959] __x64_sys_ioctl+0x9c/0xd0
[ 41.709967] do_syscall_64+0x3f/0x90
[ 41.709973] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Fixes: 101b8104307e ("drm/amdkfd: Move dma unmapping after TLB flush")
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix missing error code in 'gmc_v6/7/8/9_0_hw_init()'

Return 0 for success scenairos in 'gmc_v6/7/8/9_0_hw_init()'

Fixes the below:
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c:920 gmc_v6_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c:1104 gmc_v7_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c:1224 gmc_v8_0_hw_init() warn: missing error code? 'r'
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:2347 gmc_v9_0_hw_init() warn: missing error code? 'r'

Fixes: fac4ebd79fed ("drm/amdgpu: Fix with right return code '-EIO' in 'amdgpu_gmc_vram_checking()'")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix buffer overflow in 'get_host_router_total_dp_tunnel_bw()'

The error message buffer overflow 'dc->links' 12 <= 12 suggests that the
code is trying to access an element of the dc->links array that is
beyond its bounds. In C, arrays are zero-indexed, so an array with 12
elements has valid indices from 0 to 11. Trying to access dc->links[12]
would be an attempt to access the 13th element of a 12-element array,
which is a buffer overflow.

To fix this, ensure that the loop does not go beyond the last valid
index when accessing dc->links[i + 1] by subtracting 1 from the loop
condition.

This would ensure that i + 1 is always a valid index in the array.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_dpia_bw.c:208 get_host_router_total_dp_tunnel_bw() error: buffer overflow 'dc->links' 12 <= 12

Fixes: 59f1622a5f05 ("drm/amd/display: Add dpia display mode validation logic")
Cc: PeiChen Huang <peichen.huang@amd.com>
Cc: Aric Cyr <aric.cyr@amd.com>
Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add NULL check for kzalloc in 'amdgpu_dm_atomic_commit_tail()'

Add a NULL check for the kzalloc call that allocates memory for
dummy_updates in the amdgpu_dm_atomic_commit_tail function. Previously,
if kzalloc failed to allocate memory and returned NULL, the code would
attempt to use the NULL pointer.

The fix is to check if kzalloc returns NULL, and if so, log an error
message and skip the rest of the current loop iteration with the
continue statement. This prevents the code from attempting to use the
NULL pointer.

Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202401300629.ICnCt983-lkp@intel.com/
Fixes: 135fd1b35690 ("drm/amd/display: Reduce stack size")
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd: Don't init MEC2 firmware when it fails to load

The same calls are made directly above, but conditional on the firmware
loading and validating successfully.

Cc: stable@vger.kernel.org
Fixes: 9931b67690cf ("drm/amd: Load GFX10 microcode during early_init")
Signed-off-by: David McFarland <corngood@gmail.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix the warning info in mode1 reset

Fix the warning info below during mode1 reset.
[ +0.000004] Call Trace:
[ +0.000004] <TASK>
[ +0.000006] ? show_regs+0x6e/0x80
[ +0.000011] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000005] ? __warn+0x91/0x150
[ +0.000009] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000006] ? report_bug+0x19d/0x1b0
[ +0.000013] ? handle_bug+0x46/0x80
[ +0.000012] ? exc_invalid_op+0x1d/0x80
[ +0.000011] ? asm_exc_invalid_op+0x1f/0x30
[ +0.000014] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000007] ? __flush_work.isra.0+0x208/0x390
[ +0.000007] ? _prb_read_valid+0x216/0x290
[ +0.000008] __cancel_work_timer+0x11d/0x1a0
[ +0.000007] ? try_to_grab_pending+0xe8/0x190
[ +0.000012] cancel_work_sync+0x14/0x20
[ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]

This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler

v2:
- Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)
v4:
- squash in fix noticed by Bert in
https://gitlab.freedesktop.org/drm/amd/-/issues/3139

Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix dcn35 8k30 Underflow/Corruption Issue

[why]
odm calculation is missing for pipe split policy determination
and cause Underflow/Corruption issue.

[how]
Add the odm calculation.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Underflow workaround by increasing SR exit latency

[Why]
On 14us for exit latency time causes underflow for 8K monitor with HDR on.
Increasing the latency to 28us fixes the underflow.

[How]
Increase the latency to 28us. This workaround should be sufficient
before we figure out why SR exit so long.

Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Nicholas Susanto <nicholas.susanto@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: fix incorrect mpc_combine array size

[why]
MAX_SURFACES is per stream, while MAX_PLANES is per asic. The
mpc_combine is an array that records all the planes per asic. Therefore
MAX_PLANES should be used as the array size. Using MAX_SURFACES causes
array overflow when there are more than 3 planes.

[how]
Use the MAX_PLANES for the mpc_combine array size.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Reviewed-by: Nevenko Stupar <nevenko.stupar@amd.com>
Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix DPSTREAM CLK on and off sequence

[Why]
Secondary DP2 display fails to light up in some instances

[How]
Clock needs to be on when DPSTREAMCLK*_EN =1. This change
moves dtbclk_p enable/disable point to make sure this is
the case

Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Daniel Miess <daniel.miess@amd.com>
Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: fix USB-C flag update after enc10 feature init

[why]
BIOS's integration info table not following the original order
which is phy instance is ext_displaypath's array index.

[how]
Move them to follow the original order.

Reviewed-by: Muhammad Ahmed <ahmed.ahmed@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Charlene Liu <charlene.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: increased min_dcfclk_mhz and min_fclk_mhz

[why]
Originally, PMFW said min FCLK is 300Mhz, but min DCFCLK can be increased
to 400Mhz because min FCLK is now 600Mhz so FCLK >= 1.5 * DCFCLK hardware
requirement will still be satisfied. Increasing min DCFCLK addresses
underflow issues (underflow occurs when phantom pipe is turned on for some
Sub-Viewport configs).

[how]
Increasing DCFCLK by raising the min_dcfclk_mhz

Reviewed-by: Chaitanya Dhere <chaitanya.dhere@amd.com>
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Sohaib Nadeem <sohaib.nadeem@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Revert "drm/amd/display: initialize all the dpm level's stutter latency"

Revert commit 885c71ad791c
("drm/amd/display: initialize all the dpm level's stutter latency")

Because it causes some regression

Reviewed-by: Muhammad Ahmed <ahmed.ahmed@amd.com>
Acked-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Charlene Liu <charlene.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Use correct drm device for cgroup permission check

On GFX 9.4.3, for a given KFD node, fetch the correct drm device from
XCP manager when checking for cgroup permissions.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>