]> git.ipfire.org Git - people/ms/linux.git/log
people/ms/linux.git
3 years agohabanalabs/gaudi2: reset device upon critical ECC event
Ofir Bitton [Tue, 28 Jun 2022 05:34:28 +0000 (08:34 +0300)] 
habanalabs/gaudi2: reset device upon critical ECC event

Correctable ECC events are not fatal, but as they accumulate, the f/w
can decide that a hard-rest is required. This indication is
propagated to the host using the existing ECC event interface.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: enable gaudi2 code in driver
Oded Gabbay [Mon, 27 Jun 2022 19:32:07 +0000 (22:32 +0300)] 
habanalabs: enable gaudi2 code in driver

Enable the Gaudi2 ASIC code in the pci probe callback of the driver so
the driver will handle Gaudi2 ASICs.

Add the PCI ID to the PCI table and add the ASIC enum value to all
relevant places.

Fixup the device parameters initialization for Gaudi2.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: add gaudi2 MMU support
Moti Haimovski [Mon, 27 Jun 2022 14:21:39 +0000 (17:21 +0300)] 
habanalabs: add gaudi2 MMU support

Gaudi2 has new MMU units. A PMMU for device->host accesses, and HMMU
for HBM accesses.

The page tables of both MMUs are located in the host's memory (referred
to in the code as host-resident pgt).

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: add gaudi2 wait-for-CS support
Oded Gabbay [Mon, 27 Jun 2022 12:05:28 +0000 (15:05 +0300)] 
habanalabs: add gaudi2 wait-for-CS support

In Gaudi2 we moved to a different wait for command submission
completion model. Instead of receiving interrupt only on external
queues, we use the device's sync manager to notify us when the
entire command submission finishes.

This enables us to remove the categorization of queues to external
and internal, and treat each queue equally, without the need to parse
and patch any command buffer.

This change also requires refactoring to the IRQ handling of
CS completions.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi2: add gaudi2 profiler module
Benjamin Dotan [Sun, 26 Jun 2022 18:35:07 +0000 (21:35 +0300)] 
habanalabs/gaudi2: add gaudi2 profiler module

Add the Gaudi2 code to initialize the ASIC's profiler. The profile
receives its initialization values from the user, same as in Gaudi2,
but the code to initialize is in the driver because the configuration
space of the device is not directly exposed to the user.

Signed-off-by: Benjamin Dotan <bdotan@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi2: add gaudi2 security module
Ofir Bitton [Sun, 26 Jun 2022 18:30:25 +0000 (21:30 +0300)] 
habanalabs/gaudi2: add gaudi2 security module

Use the generic security module to block all registers in the ASIC and
then open only those that are needed to be accessed by the user.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: add generic security module
Ofir Bitton [Sun, 26 Jun 2022 18:24:50 +0000 (21:24 +0300)] 
habanalabs: add generic security module

As the ASICs become more complex and have many more registers, we need
a better way to configure the security properties.

As a reminder, we have two dedicated mechanisms for security:
Range Registers and Protection bits. Those mechanisms protect sensitive
memory and configuration areas inside the device.

The generic module handles the low-level part of the configuration,
because the configuration mechanism is identical in all ASICs. The
difference is the address ranges and register names.

Any ASIC that use this block should first block all the register
blocks in the ASIC. Then, it should open only the registers that
need to be accessed by the user (This is opposed to Goya and Gaudi,
where we blocked only what should not be accesses by the user).

The module contains several functions, to unblock single register,
multiple registers, entire blocks, ranges, ranges with mask.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: remove obsolete device variables used for testing
Oded Gabbay [Tue, 28 Jun 2022 07:53:17 +0000 (10:53 +0300)] 
habanalabs: remove obsolete device variables used for testing

There are a couple of device variables that are used for testing
purposes and they are set to fixed values.

Remove the variables that are not relevant anymore and document the
remaining variables.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: initialize new asic properties
Oded Gabbay [Fri, 24 Jun 2022 13:58:23 +0000 (16:58 +0300)] 
habanalabs: initialize new asic properties

New asic properties were added for Gaudi2. We want to initialize
and use them, when relevant, also for Goya and Gaudi.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: add unsupported functions
Oded Gabbay [Fri, 24 Jun 2022 13:47:13 +0000 (16:47 +0300)] 
habanalabs: add unsupported functions

There are a number of new ASIC-specific functions that were added
for Gaudi2. To make the common code work, we need to define empty
implementations of those functions for Goya and Gaudi.

Some functions will return error if called with Goya/Gaudi.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: add gaudi2 asic-specific code
Oded Gabbay [Sun, 26 Jun 2022 15:20:03 +0000 (18:20 +0300)] 
habanalabs: add gaudi2 asic-specific code

Add the ASIC-specific code for Gaudi2. Supply (almost) all of the
function callbacks that the driver's common code need to initialize,
finalize and submit workloads to the Gaudi2 ASIC.

It also contains the code to initialize the F/W of the Gaudi2 ASIC
and to receive events from the F/W.

It contains new debugfs entry to dump razwi events. razwi is a case
where the device's engines create a transaction that reaches an
invalid destination.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agouapi: habanalabs: add gaudi2 defines
Oded Gabbay [Fri, 24 Jun 2022 10:38:57 +0000 (13:38 +0300)] 
uapi: habanalabs: add gaudi2 defines

Add the new defines for GAUDI2 uapi interface.

It includes the following:
1. Enums of engines and PLLs.
2. New information in the info IOCTL that is retrieved by the driver.
3. Update comments regarding the CB/CS/wait for CS ioctls.
4. New fields in the debug IOCTL for configuring the profiler for
   Gaudi2.

There is no new IOCTL.

Some of the changes are also relevant for Greco (which will be
upstreamed later this year). When ever it says "Greco and onwards",
it means it is also for Gaudi2.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi2: add asic registers header files
Oded Gabbay [Fri, 24 Jun 2022 15:56:42 +0000 (18:56 +0300)] 
habanalabs/gaudi2: add asic registers header files

Add the relevant GAUDI2 ASIC registers header files. These files are
generated automatically from a tool maintained by the VLSI engineers.

There are more files which are not upstreamed because only very few
defines from those files are used in the driver. For those files, I
copied the relevant defines into gaudi2_regs.h and gaudi2_masks.h, to
reduce the size of this patch.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: remove redundant argument in access_dev_mem APIs
Ofir Bitton [Mon, 27 Jun 2022 13:59:02 +0000 (16:59 +0300)] 
habanalabs: remove redundant argument in access_dev_mem APIs

Region structure is derived from region type, hence no need to pass
it as an argument.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: use %pa to print pci bar size
Oded Gabbay [Mon, 27 Jun 2022 08:30:43 +0000 (11:30 +0300)] 
habanalabs: use %pa to print pci bar size

PCI bar size is resource_size_t so we should use %pa to make it work
correctly on all architectures.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: replace hl_poll_timeout with while loop
Dafna Hirschfeld [Sun, 26 Jun 2022 13:18:40 +0000 (16:18 +0300)] 
habanalabs/gaudi: replace hl_poll_timeout with while loop

in gaudi_scrub_device_mem, replace call to hl_poll_timeout
with a while loop to avoid using dummy variables.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: communicate supported page sizes to user
Ohad Sharabi [Sat, 25 Jun 2022 20:36:13 +0000 (23:36 +0300)] 
habanalabs: communicate supported page sizes to user

Because in future ASICs the driver will allow the user to set the
page size we need to make sure this data is propagated in all APIs.

In addition, since this is already an ASIC property we no longer need
ASIC function for it.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: remove dead code from free_device_memory()
Tomer Tayar [Fri, 24 Jun 2022 10:05:23 +0000 (13:05 +0300)] 
habanalabs: remove dead code from free_device_memory()

free_device_memory() ends with if and else, each has a return statement,
followed by another return statement that can never be reached.
Restructure the function and remove this dead code.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: enable error interrupt on ARB WDT
Oded Gabbay [Fri, 24 Jun 2022 16:11:38 +0000 (19:11 +0300)] 
habanalabs/gaudi: enable error interrupt on ARB WDT

We want to receive an error interrupt in case the watchdog timer
expires on arbitration event in the queues.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: page size can only be a power of 2
Ohad Sharabi [Wed, 22 Jun 2022 12:38:56 +0000 (15:38 +0300)] 
habanalabs: page size can only be a power of 2

We dropped support for page sizes that are not power of 2.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: refactor dma asic-specific functions
Ohad Sharabi [Sun, 12 Jun 2022 12:00:29 +0000 (15:00 +0300)] 
habanalabs: refactor dma asic-specific functions

This is a pre-requisite patch for adding tracepoints to the DMA memory
operations (allocation/free) in the driver.

The main purpose is to be able to cross data with the map operations and
determine whether memory violation occurred, for example free DMA
allocation before unmapping it from device memory.

To achieve this the DMA alloc/free code flows were refactored so that a
single DMA tracepoint will catch many flows.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: remove unused enum
Oded Gabbay [Fri, 24 Jun 2022 13:49:26 +0000 (16:49 +0300)] 
habanalabs/gaudi: remove unused enum

Also beautify code by preferring single line wherever possible.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: mask constant value before cast
Oded Gabbay [Fri, 24 Jun 2022 13:45:02 +0000 (16:45 +0300)] 
habanalabs/gaudi: mask constant value before cast

This fixes a sparse warning of
"cast truncates bits from constant value"

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: use correct type in assignment
Oded Gabbay [Fri, 24 Jun 2022 13:05:59 +0000 (16:05 +0300)] 
habanalabs/gaudi: use correct type in assignment

packets are defined as LE so we need to convert before assigning
values to them.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: fix function name in comment
Oded Gabbay [Fri, 24 Jun 2022 13:04:30 +0000 (16:04 +0300)] 
habanalabs/gaudi: fix function name in comment

function name in comment didn't match actual function name.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/goya: move dma direction enum to uapi file
Oded Gabbay [Fri, 24 Jun 2022 10:36:10 +0000 (13:36 +0300)] 
habanalabs/goya: move dma direction enum to uapi file

The values in this enum are not used by h/w but are a contract
between userspace and the kernel driver so they must be defined
in the uapi file.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: set default value for memory_scrub
Dafna Hirschfeld [Thu, 12 May 2022 13:16:25 +0000 (16:16 +0300)] 
habanalabs: set default value for memory_scrub

Set a default value for memory scrubbing

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: move call to scrub_device_mem after ctx_fini
Dafna Hirschfeld [Thu, 12 May 2022 12:20:55 +0000 (15:20 +0300)] 
habanalabs: move call to scrub_device_mem after ctx_fini

In future ASICs, it would be possible to have a non-idle
device when context is released. We thus need to postpone the
scrubbing. Postpone it to hpriv release if reset is not executed
or to device late init if reset is executed.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: use memory_scrub_val from debugfs
Dafna Hirschfeld [Thu, 23 Jun 2022 04:47:48 +0000 (07:47 +0300)] 
habanalabs/gaudi: use memory_scrub_val from debugfs

In the callback scrub_device_mem, use 'memory_scrub_val'
from debugfs for the scrubbing value.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: don't send addr and size to scrub_device_mem cb
Dafna Hirschfeld [Thu, 23 Jun 2022 04:28:17 +0000 (07:28 +0300)] 
habanalabs: don't send addr and size to scrub_device_mem cb

We use scrub_device_mem only to scrub the entire SRAM and entire
DRAM. Therefore there is no need to send addr and size
args to the callback.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: don't do memory scrubbing when unmapping
Dafna Hirschfeld [Tue, 10 May 2022 13:36:02 +0000 (16:36 +0300)] 
habanalabs: don't do memory scrubbing when unmapping

There is no need to do memory scrub when unmapping anymore as it is
an overhead as long as we have a single user at any given time.

Remove that code and change return value of free_phys_pg_pack to void

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: print if firmware is secured during load
Ofir Bitton [Wed, 22 Jun 2022 12:10:08 +0000 (15:10 +0300)] 
habanalabs: print if firmware is secured during load

For easier debug, it is desirable to have a simple way
to know whether the device is secured or not, hence we dump this
indication during boot.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: fix a race condition causing DMAR error
Yuri Nudelman [Wed, 22 Jun 2022 09:52:34 +0000 (12:52 +0300)] 
habanalabs/gaudi: fix a race condition causing DMAR error

There is a rare race condition in CB completion mechanism, that can
occur under a very high pressure of command submissions.
The preconditions for this to happen are:

 1. There should be enough command submissions for the pre-allocated
    patched CB pool to run out of commands. At this stage we start
    allocating new patched CBs as they arrive.
 2. CB size has to be exactly (128*n + 104)B for some n, i.e. 24B below
    a cache line end.

The flow:

 1. Two command buffers being completed on different streams, at the
    same time. Denote those CB1 and CB2.
 2. Each command buffer is injected with two messages, 16B each - one
    for a HBW update of the completion queue, another to raise
    interrupt.
 3. Assume CB1 updated the completion queue and raise the interrupt.
 4. Assume CB2 updated the completion queue but did not raise the
    interrupt yet.
 5. The host receives the interrupt. It goes over the completion queue
    and sees two completions - CB1 and CB2. Release them both.
 6. CB2 performs the last command. The problem is that the last command
    is split between 2 cache lines. So to read the last 8B of the last
    command, it has to access the host again. Problem is - CB2 is
    already released. This causes a DMAR error.

The solution to this problem is simply to make sure the last two
commands in the CB are always in the same cache line, using NOP padding.

Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: fix warning: var might be used uninitialized
Koby Elbaz [Wed, 22 Jun 2022 07:42:42 +0000 (10:42 +0300)] 
habanalabs/gaudi: fix warning: var might be used uninitialized

kernel test robot:
"warning: variable 'index' is used uninitialized whenever 'if' condition
is false"

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: move memory_scrub_val to hdev struct
Dafna Hirschfeld [Thu, 2 Jun 2022 08:55:00 +0000 (11:55 +0300)] 
habanalabs: move memory_scrub_val to hdev struct

move the field memory_scrub_val from struct hl_dbg_device_entry
to struct hl_device. This is because we want to use
this field also if debugfs is off.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: fix comment style
Oded Gabbay [Sun, 19 Jun 2022 09:41:19 +0000 (12:41 +0300)] 
habanalabs: fix comment style

function name should not be preceded with @

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: use kvcalloc when possible
Oded Gabbay [Sun, 19 Jun 2022 09:40:19 +0000 (12:40 +0300)] 
habanalabs: use kvcalloc when possible

kvcalloc is same as kvmalloc_array with GFP_ZERO.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: print pointer with correct modifier
Oded Gabbay [Sun, 19 Jun 2022 09:37:01 +0000 (12:37 +0300)] 
habanalabs: print pointer with correct modifier

Use %p instead of %llx for printing pointers.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: check fence pointer before use
Oded Gabbay [Sun, 19 Jun 2022 09:35:06 +0000 (12:35 +0300)] 
habanalabs: check fence pointer before use

fence pointer can be NULL in this path, as shown by an earlier check.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: add critical indication in sram ecc
ran shalit [Wed, 15 Jun 2022 18:24:38 +0000 (21:24 +0300)] 
habanalabs: add critical indication in sram ecc

Multiple SRAM SERR events are treated as critical events,
and host should be notified about it. Thus, adding is_critical
indication as part of SRAM ECC failure packet.

Signed-off-by: ran shalit <rshalit@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: notify user process on device unavailable
Tal Cohen [Thu, 9 Jun 2022 15:08:31 +0000 (18:08 +0300)] 
habanalabs/gaudi: notify user process on device unavailable

When a device error occurs, user process would like to get some
indication on the error by reading some device HW info. If the
device is unavailable, user process can't perform any HW device
reading.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: remove unused get_dma_desc_list_size
Oded Gabbay [Sat, 18 Jun 2022 18:27:07 +0000 (21:27 +0300)] 
habanalabs: remove unused get_dma_desc_list_size

This asic callback function is not called anymore from the common code.
The asic-specific function itself is called but from within the
asic-specific code.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: fix NULL dereference on cs timeout
Yuri Nudelman [Tue, 14 Jun 2022 12:14:20 +0000 (15:14 +0300)] 
habanalabs: fix NULL dereference on cs timeout

Device descriptor is accessed before an assignment

Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: fix shift out of bounds
Ofir Bitton [Wed, 15 Jun 2022 13:11:31 +0000 (16:11 +0300)] 
habanalabs/gaudi: fix shift out of bounds

When validating NIC queues, queue offset calculation must be
performed only for NIC queues.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: add validity check for cq counter offset
farah kassabri [Mon, 13 Jun 2022 13:22:20 +0000 (16:22 +0300)] 
habanalabs: add validity check for cq counter offset

Driver performs no validity check for the user cq counter offset
used in both wait_for_interrupt and register_for_timestamp APIs.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: fix incorrect MME offset calculation
Koby Elbaz [Sun, 12 Jun 2022 19:16:25 +0000 (22:16 +0300)] 
habanalabs/gaudi: fix incorrect MME offset calculation

Once FW raised an event following a MME2 QMAN error, the driver should
have gone to the corresponding status registers, trying to gather more
info on the error, yet it was accidentally accessing MME1 QMAN address
space.

Generally, we have x4 MMEs, while 0 & 2 are marked MASTER, and
1 & 3 are marked SLAVE. The former can be addressed, yet addressing
the latter is considered an access violation, and will result in a
hung system, which is what unintentionally happened above.
Note that this cannot happen in a secured system, since these registers
are protected with range registers.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: avoid unnecessary error print
Dani Liberman [Thu, 2 Jun 2022 13:15:03 +0000 (16:15 +0300)] 
habanalabs: avoid unnecessary error print

When sending a packet to FW right after it made reset, we will get
packet timeout. Since it is expected behavior, we don't need to
print an error in such case.
Hence, when driver is in hard reset it will avoid from printing error
messages about packet timeout.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: send an event notification when CS timeout occurs
Tal Cohen [Thu, 19 May 2022 15:00:55 +0000 (18:00 +0300)] 
habanalabs: send an event notification when CS timeout occurs

The Driver needs to inform the User process whenever one of its
CS is timed out. The Driver shall recognize the CS timeout and shall
send an eventfd notification, towards user space, whenever a timeout
is expired on a CS.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: send device reset notification
Tal Cohen [Wed, 8 Jun 2022 14:34:54 +0000 (17:34 +0300)] 
habanalabs/gaudi: send device reset notification

Device reset event, indicates that the device shall be reset -
after a short delay. In such case, the driver sends a notification
towards the User process. This allows the User process
to be able to take several debug actions for system
diagnostic purposes.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: invoke device reset from one code block
Tal Cohen [Wed, 8 Jun 2022 13:02:09 +0000 (16:02 +0300)] 
habanalabs/gaudi: invoke device reset from one code block

In order to prepare the driver code for device reset event
notification, change the event handler function flow to call
device reset from one code block.

In addition, the commit fixes an issue that reset was performed
w/o checking the 'hard_reset_on_fw_event' state and w/o setting
the HL_DRV_RESET_DELAY flag.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: expose undefined opcode status via info ioctl
Tal Cohen [Thu, 12 May 2022 08:59:41 +0000 (11:59 +0300)] 
habanalabs: expose undefined opcode status via info ioctl

The info ioctl retrieves information on the last undefined opcode
occurred.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: collect undefined opcode error info
Tal Cohen [Wed, 11 May 2022 15:02:39 +0000 (18:02 +0300)] 
habanalabs/gaudi: collect undefined opcode error info

when an undefined opcode error occurres, the driver collects
the relevant information from the Qman and stores it inside
the hdev data structure. An event fd indication is sent towards the
user space.

Note: another commit shall be followed which will add support to
read the error info by an ioctl.

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: fix race between hl_get_compute_ctx() and hl_ctx_put()
Tomer Tayar [Sun, 22 May 2022 06:43:54 +0000 (09:43 +0300)] 
habanalabs: fix race between hl_get_compute_ctx() and hl_ctx_put()

hl_get_compute_ctx() is used to get the pointer to the compute context
from the hpriv object.
The function is called in code paths that are not necessarily initiated
by user, so it is possible that a context release process will happen in
parallel.
This can lead to a race condition in which hl_get_compute_ctx()
retrieves the context pointer, and just before it increments the context
refcount, the context object is released and a freed memory is accessed.

To avoid this race, add a mutex to protect the context pointer in hpriv.
With this lock, hl_get_compute_ctx() will be able to detect if the
context has been released or is about to be released.

struct hl_ctx_mgr has a mutex for contexts IDR with a similar "ctx_lock"
name, so rename it to just "lock" to avoid a confusion with the new
lock.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: keep a record of completed CS outcomes
Yuri Nudelman [Tue, 24 May 2022 13:29:03 +0000 (16:29 +0300)] 
habanalabs: keep a record of completed CS outcomes

Often, the user is not interested in the completion timestamp of all
command submissions.
A common situation is, for example, when the user submits a burst of,
possibly, several thousands of commands, then request the completion
timestamp of only couple of specific key commands from all the burst.
The problem is that currently, the outcome of the early commands may be
lost, due to a large amount of later commands, that the user does not
really care about.

This patch creates a separate store with the outcomes of commands the
user has mark explicitly as interested in. This store does not mix the
marked commands with the unmarked ones, hence the data there will
survive for much longer.

Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: fix comment to reflect current code
Oded Gabbay [Sun, 5 Jun 2022 09:56:36 +0000 (12:56 +0300)] 
habanalabs/gaudi: fix comment to reflect current code

Due to code changes in the past few years, the original comment of
how parser->user_cb_size is checked was not correct anymore.

Fix it to reflect current code and add more explanation as the code
is more complex now.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: change the write flag name of error info structs
Tal Cohen [Wed, 1 Jun 2022 08:32:35 +0000 (11:32 +0300)] 
habanalabs: change the write flag name of error info structs

positive flags naming will make more clear code while adding
more 'error info' structures

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs/gaudi: move tpc assert raise into internal func
Tal Cohen [Wed, 1 Jun 2022 20:14:16 +0000 (23:14 +0300)] 
habanalabs/gaudi: move tpc assert raise into internal func

raising the tpc assert event in an internal function will make
the code cleaner as we are going to be adding more events

Signed-off-by: Tal Cohen <talcohen@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: align ioctl uapi structures to 64-bit
Dan Rapaport [Mon, 30 May 2022 11:11:45 +0000 (14:11 +0300)] 
habanalabs: align ioctl uapi structures to 64-bit

The compiler is padding the members of the struct to be aligned to
64-bit. The content of the padded bytes is and not zeroed explicitly,
hence might copy undefined data. We add a padding member to the struct
to get a zeroed 64-bit align struct.

Signed-off-by: Dan Rapaport <drapaport@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: add terminating NULL to attrs arrays
Dafna Hirschfeld [Thu, 19 May 2022 07:54:30 +0000 (10:54 +0300)] 
habanalabs: add terminating NULL to attrs arrays

Arrays of struct attribute are expected to be NULL terminated.
This is required by API methods such as device_add_groups.
This fixes a crash when loading the driver for Goya device.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: Fix kernel-doc
Jiapeng Chong [Thu, 2 Jun 2022 03:58:27 +0000 (11:58 +0800)] 
habanalabs: Fix kernel-doc

Fix the following W=1 kernel warnings:

drivers/misc/habanalabs/common/mmu/mmu_v1.c:425: warning: expecting
prototype for hl_mmu_fini(). Prototype was for hl_mmu_v1_fini() instead.

drivers/misc/habanalabs/common/mmu/mmu_v1.c:449: warning: expecting
prototype for hl_mmu_ctx_init(). Prototype was for hl_mmu_v1_ctx_init()
instead.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: Fix kernel-doc
Jiapeng Chong [Thu, 2 Jun 2022 03:58:26 +0000 (11:58 +0800)] 
habanalabs: Fix kernel-doc

Fix the following W=1 kernel warnings:

drivers/misc/habanalabs/common/pci/pci.c:454: warning: expecting
prototype for hl_fw_fini(). Prototype was for hl_pci_fini() instead.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agohabanalabs: fix double unlock on error in map_device_va()
Dan Carpenter [Wed, 25 May 2022 12:25:06 +0000 (15:25 +0300)] 
habanalabs: fix double unlock on error in map_device_va()

If hl_mmu_prefetch_cache_range() fails then this code calls
mutex_unlock(&ctx->mmu_lock) when it's no longer holding the mutex.

Fixes: 9e495e24003e ("habanalabs: do MMU prefetch as deferred work")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
3 years agodrm/exynos/exynos7_drm_decon: free resources when clk_set_parent() failed.
Jian Zhang [Tue, 12 Jul 2022 04:56:11 +0000 (13:56 +0900)] 
drm/exynos/exynos7_drm_decon: free resources when clk_set_parent() failed.

In exynos7_decon_resume, When it fails, we must use clk_disable_unprepare()
to free resource that have been used.

Fixes: 6f83d20838c09 ("drm/exynos: use DRM_DEV_ERROR to print out error
message")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jian Zhang <zhangjian210@huawei.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
3 years agodt-bindings: remove Joonyoung Shim from maintainers
Krzysztof Kozlowski [Fri, 1 Jul 2022 00:31:23 +0000 (09:31 +0900)] 
dt-bindings: remove Joonyoung Shim from maintainers

Emails to Joonyoung Shim bounce ("550 5.1.1 Recipient address rejected:
User unknown"), so remove him from maintainers of DT bindings (display,
phy).

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
3 years agodrm/exynos: MAINTAINERS: move Joonyoung Shim to credits
Krzysztof Kozlowski [Thu, 30 Jun 2022 04:12:17 +0000 (13:12 +0900)] 
drm/exynos: MAINTAINERS: move Joonyoung Shim to credits

Emails to Joonyoung Shim bounce ("550 5.1.1 Recipient address rejected:
User unknown"), so move him to credits file.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
3 years agobpf: Fix 'dubious one-bit signed bitfield' warnings
Matthieu Baerts [Mon, 11 Jul 2022 08:12:00 +0000 (10:12 +0200)] 
bpf: Fix 'dubious one-bit signed bitfield' warnings

Our CI[1] reported these warnings when using Sparse:

  $ touch net/mptcp/bpf.c
  $ make C=1 net/mptcp/bpf.o
  net/mptcp/bpf.c: note: in included file:
  include/linux/bpf_verifier.h:348:26: error: dubious one-bit signed bitfield
  include/linux/bpf_verifier.h:349:29: error: dubious one-bit signed bitfield

Set them as 'unsigned' to avoid warnings.

[1] https://github.com/multipath-tcp/mptcp_net-next/actions/runs/2643588487

Fixes: 1ade23711971 ("bpf: Inline calls to bpf_loop when callback is known")
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220711081200.2081262-1-matthieu.baerts@tessares.net
3 years agosamples/bpf: Fix xdp_redirect_map egress devmap prog
Jesper Dangaard Brouer [Mon, 11 Jul 2022 14:04:22 +0000 (16:04 +0200)] 
samples/bpf: Fix xdp_redirect_map egress devmap prog

LLVM compiler optimized out the memcpy in xdp_redirect_map_egress,
which caused the Ethernet source MAC-addr to always be zero
when enabling the devmap egress prog via cmdline --load-egress.

Issue observed with LLVM version 14.0.0
 - Shipped with Fedora 36 on target: x86_64-redhat-linux-gnu.

In verbose mode print the source MAC-addr in case xdp_devmap_attached
mode is used.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/165754826292.575614.5636444052787717159.stgit@firesoul
3 years agoMerge tag 'drm-misc-next-2022-07-07' of git://anongit.freedesktop.org/drm/drm-misc...
Dave Airlie [Tue, 12 Jul 2022 03:27:47 +0000 (13:27 +1000)] 
Merge tag 'drm-misc-next-2022-07-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

drm-misc-next for $kernel-version:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:

 * crtc: Remove unnessary include statements from drm_crtc.h, plus
   fallout in drivers

 * edid: More use of struct drm_edid; implement HF-EEODB extension

Driver Changes:

 * bridge:
   * anx7625: Implement HDP timeout via callback; Cleanups
   * fsl-ldb: Drop DE flip; Modesetting fixes
   * imx: Depend on ARCH_MXC
   * sil8620: Fix off-by-one
   * ti-sn65dsi86: Convert to atomic modesetting

 * ingenic: Fix display at maximum resolution

 * panel:
   * simple: Add support for HannStar HSD101PWW2, plus DT bindings; Add
     support for ETML0700Y5DHA, plus DT bindings

 * rockchip: Fixes

 * vc4: Cleanups

 * vmwgfx: Cleanups

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YsaHq1pvE699NtOM@linux-uq9g
3 years agoMerge tag 'drm-intel-next-2022-07-06' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Tue, 12 Jul 2022 02:54:42 +0000 (12:54 +1000)] 
Merge tag 'drm-intel-next-2022-07-06' of git://anongit.freedesktop.org/drm/drm-intel into drm-next

- Suspend fixes for Display (Jose)
- Properly block D3Cold for now (Anshuman)
- Eliminate PIPECONF RMWs from .color_commit()(Ville)
- Display info clean-up (Ville)
- Fix error code (Dan)
- Fix possible refcount leak on DP MST (Hangyu)
- Other general display clean-ups (Jani, Tom)
- Add bios debug logs (Jani)
- PCH type clean-up (Ville)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YsZNJUVh0iHOtORz@intel.com
3 years agoamd-xgbe: fix clang -Wformat warnings
Justin Stitt [Fri, 8 Jul 2022 23:26:53 +0000 (16:26 -0700)] 
amd-xgbe: fix clang -Wformat warnings

When building with Clang we encounter the following warning:
| drivers/net/ethernet/amd/xgbe/xgbe-dcb.c:234:42: error: format specifies
| type 'unsigned char' but the argument has type '__u16' (aka 'unsigned
| short') [-Werror,-Wformat] pfc->pfc_cap, pfc->pfc_en, pfc->mbc,
| pfc->delay);

pfc->pfc_cap , pfc->pfc_cn, pfc->mbc are all of type `u8` while pfc->delay is
of type `u16`. The correct format specifiers `%hh[u|x]` were used for
the first three but not for pfc->delay, which is causing the warning
above.

Variadic functions (printf-like) undergo default argument promotion.
Documentation/core-api/printk-formats.rst specifically recommends using
the promoted-to-type's format flag. In this case `%d` (or `%x` to
maintain hex representation) should be used since both u8's and u16's
are fully representable by an int.

Moreover, C11 6.3.1.1 states:
(https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf) `If an int
can represent all values of the original type ..., the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions.`

Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20220708232653.556488-1-justinstitt@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoatm: he: Use the bitmap API to allocate bitmaps
Christophe JAILLET [Sat, 9 Jul 2022 15:05:45 +0000 (17:05 +0200)] 
atm: he: Use the bitmap API to allocate bitmaps

Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/7f795bd6d5b2a00f581175b7069b229c2e5a4192.1657379127.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet/fq_impl: Use the bitmap API to allocate bitmaps
Christophe JAILLET [Sat, 9 Jul 2022 14:37:53 +0000 (16:37 +0200)] 
net/fq_impl: Use the bitmap API to allocate bitmaps

Use bitmap_zalloc()/bitmap_free() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/c7bf099af07eb497b02d195906ee8c11fea3b3bd.1657377335.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agonet: dsa: hellcreek: Use the bitmap API to allocate bitmaps
Christophe JAILLET [Sat, 9 Jul 2022 14:24:45 +0000 (16:24 +0200)] 
net: dsa: hellcreek: Use the bitmap API to allocate bitmaps

Use devm_bitmap_zalloc() instead of hand-writing them.

It is less verbose and it improves the semantic.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/8306e2ae69a5d8553691f5d10a86a4390daf594b.1657376651.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge branch 'tls-rx-follow-ups-to-nopad'
Jakub Kicinski [Tue, 12 Jul 2022 02:48:36 +0000 (19:48 -0700)] 
Merge branch 'tls-rx-follow-ups-to-nopad'

Jakub Kicinski says:

====================
tls: rx: follow-ups to NoPad

A few fixes for issues spotted by Maxim.
====================

Link: https://lore.kernel.org/r/20220709025255.323864-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftests: tls: add test for NoPad getsockopt
Jakub Kicinski [Sat, 9 Jul 2022 02:52:55 +0000 (19:52 -0700)] 
selftests: tls: add test for NoPad getsockopt

Make sure setsockopt / getsockopt behave as expected.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agotls: rx: fix the NoPad getsockopt
Jakub Kicinski [Sat, 9 Jul 2022 02:52:54 +0000 (19:52 -0700)] 
tls: rx: fix the NoPad getsockopt

Maxim reports do_tls_getsockopt_no_pad() will
always return an error. Indeed looks like refactoring
gone wrong - remove err and use value.

Reported-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Fixes: 88527790c079 ("tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agotls: rx: add counter for NoPad violations
Jakub Kicinski [Sat, 9 Jul 2022 02:52:53 +0000 (19:52 -0700)] 
tls: rx: add counter for NoPad violations

As discussed with Maxim add a counter for true NoPad violations.
This should help deployments catch unexpected padded records vs
just control records which always need re-encryption.

https: //lore.kernel.org/all/b111828e6ac34baad9f4e783127eba8344ac252d.camel@nvidia.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agotls: fix spelling of MIB
Jakub Kicinski [Sat, 9 Jul 2022 02:52:52 +0000 (19:52 -0700)] 
tls: fix spelling of MIB

MIN -> MIB

Fixes: 88527790c079 ("tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoselftest: net: add tun to .gitignore
Jakub Kicinski [Sat, 9 Jul 2022 02:41:41 +0000 (19:41 -0700)] 
selftest: net: add tun to .gitignore

Add missing .gitignore entry.

Fixes: 839b92fede7b ("selftest: tun: add test for NAPI dismantle")
Link: https://lore.kernel.org/r/20220709024141.321683-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Linus Torvalds [Tue, 12 Jul 2022 01:15:25 +0000 (18:15 -0700)] 
Merge tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 retbleed fixes from Borislav Petkov:
 "Just when you thought that all the speculation bugs were addressed and
  solved and the nightmare is complete, here's the next one: speculating
  after RET instructions and leaking privileged information using the
  now pretty much classical covert channels.

  It is called RETBleed and the mitigation effort and controlling
  functionality has been modelled similar to what already existing
  mitigations provide"

* tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  x86/speculation: Disable RRSBA behavior
  x86/kexec: Disable RET on kexec
  x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported
  x86/entry: Move PUSH_AND_CLEAR_REGS() back into error_entry
  x86/bugs: Add Cannon lake to RETBleed affected CPU list
  x86/retbleed: Add fine grained Kconfig knobs
  x86/cpu/amd: Enumerate BTC_NO
  x86/common: Stamp out the stepping madness
  KVM: VMX: Prevent RSB underflow before vmenter
  x86/speculation: Fill RSB on vmexit for IBRS
  KVM: VMX: Fix IBRS handling after vmexit
  KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
  KVM: VMX: Convert launched argument to flags
  KVM: VMX: Flatten __vmx_vcpu_run()
  objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}
  x86/speculation: Remove x86_spec_ctrl_mask
  x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit
  x86/speculation: Fix SPEC_CTRL write on SMT state change
  x86/speculation: Fix firmware entry SPEC_CTRL handling
  x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n
  ...

3 years agortla/utils: Use calloc and check the potential memory allocation failure
jianchunfu [Wed, 15 Jun 2022 07:33:48 +0000 (15:33 +0800)] 
rtla/utils: Use calloc and check the potential memory allocation failure

Replace malloc with calloc and add memory allocating check
of mon_cpus before used.

Link: https://lkml.kernel.org/r/20220615073348.6891-1-jianchunfu@cmss.chinamobile.com
Fixes: 7d0dc9576dc3 ("rtla/timerlat: Add --dma-latency option")
Signed-off-by: jianchunfu <jianchunfu@cmss.chinamobile.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
3 years agoMerge tag 'amd-drm-next-5.20-2022-07-05' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Tue, 12 Jul 2022 01:07:30 +0000 (11:07 +1000)] 
Merge tag 'amd-drm-next-5.20-2022-07-05' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-5.20-2022-07-05:

amdgpu:
- Various spelling and grammer fixes
- Various eDP fixes
- Various DMCUB fixes
- VCN fixes
- GMC 11 fixes
- RAS fixes
- TMZ support for GC 10.3.7
- GPUVM TLB flush fixes
- SMU 13.0.x updates
- DCN 3.2 Support
- DCN 3.2.1 Support
- MES updates
- GFX11 modifiers support
- USB-C fixes
- MMHUB 3.0.1 support
- SDMA 6.0 doorbell fixes
- Initial devcoredump support
- Enable high priority gfx queue on asics which support it
- Enable GPU reset for SMU 13.0.4
- OLED display fixes
- MPO fixes
- DC frame size fixes
- ASPM support for PCIE 7.4/7.6
- GPU reset support for SMU 13.0.0
- GFX11 updates
- VCN JPEG fix
- BACO support for SMU 13.0.7
- VCN instance handling fix
- GFX8 GPUVM TLB flush fix
- GPU reset rework
- VCN 4.0.2 support
- GTT size fixes
- DP link training fixes
- LSDMA 6.0.1 support
- Various backlight fixes
- Color encoding fixes
- Backlight config cleanup
- VCN 4.x unified queue cleanup

amdkfd:
- MMU notifier fixes
- Updates for GC 10.3.6 and 10.3.7
- P2P DMA support using dma-buf
- Add available memory IOCTL
- SDMA 6.0.1 fix
- MES fixes
- HMM profiler support

radeon:
- License fix
- Backlight config cleanup

UAPI:
- Add available memory IOCTL to amdkfd
  Proposed userspace: https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg75743.html
- HMM profiler support for amdkfd
  Proposed userspace: https://lists.freedesktop.org/archives/amd-gfx/2022-June/080805.html

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220705212633.6037-1-alexander.deucher@amd.com
3 years agoMerge tag 'drm-misc-fixes-2022-07-07-1' of ssh://git.freedesktop.org/git/drm/drm...
Dave Airlie [Tue, 12 Jul 2022 00:43:49 +0000 (10:43 +1000)] 
Merge tag 'drm-misc-fixes-2022-07-07-1' of ssh://git.freedesktop.org/git/drm/drm-misc into drm-fixes

Three mode setting fixes for fsl-ldb, a fbdev removal use-after-free fix,
a dma-buf fence use-after-free fix, a DMA setup fix for rockchip, an error
path fix and memory corruption fix for panfrost and one more orientation
quirk

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220708054306.wr6jcfdunuypftbq@houat
3 years agoMerge tag 'drm-intel-fixes-2022-07-07' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Tue, 12 Jul 2022 00:40:24 +0000 (10:40 +1000)] 
Merge tag 'drm-intel-fixes-2022-07-07' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Fix a possible refcount leak in DP MST connector (Hangyu)
- Fix on loading guc on ADL-N (Daniele)
- Fix vm use-after-free in vma destruction (Thomas)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YsbbgWnLTR8fr4lj@intel.com
3 years agoMerge tag 'amd-drm-fixes-5.19-2022-07-06' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Tue, 12 Jul 2022 00:34:42 +0000 (10:34 +1000)] 
Merge tag 'amd-drm-fixes-5.19-2022-07-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.19-2022-07-06:

amdgpu:
- Hibernation fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220707024421.5773-1-alexander.deucher@amd.com
3 years agoMakefile: add headers_install to kselftest targets
Guillaume Tucker [Fri, 8 Jul 2022 16:23:30 +0000 (17:23 +0100)] 
Makefile: add headers_install to kselftest targets

Add headers_install as a dependency to kselftest targets so that they
can be run directly from the top of the tree.  The kselftest Makefile
used to try to call headers_install "backwards" but failed due to the
relative path not being consistent.

Now we can either run this directly:

  $ make O=build kselftest-all

or this:

  $ make O=build headers_install
  $ make O=build -C tools/testing/selftest all

The same commands work as well when building directly in the source
tree (no O=) or any arbitrary path (relative or absolute).

Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
3 years agoselftests: drop KSFT_KHDR_INSTALL make target
Guillaume Tucker [Fri, 8 Jul 2022 16:23:29 +0000 (17:23 +0100)] 
selftests: drop KSFT_KHDR_INSTALL make target

Drop the KSFT_KHDR_INSTALL make target now that all use-cases have
been removed from the other kselftest Makefiles.

Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
3 years agoselftests: stop using KSFT_KHDR_INSTALL
Guillaume Tucker [Fri, 8 Jul 2022 16:23:28 +0000 (17:23 +0100)] 
selftests: stop using KSFT_KHDR_INSTALL

Stop using the KSFT_KHDR_INSTALL flag as installing the kernel headers
from the kselftest Makefile is causing some issues.  Instead, rely on
the headers to be installed directly by the top-level Makefile
"headers_install" make target prior to building kselftest.

Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
3 years agoselftests: drop khdr make target
Guillaume Tucker [Fri, 8 Jul 2022 16:23:27 +0000 (17:23 +0100)] 
selftests: drop khdr make target

Drop the "khdr" make target as it fails when the build directory is a
sub-directory of the source tree.  Rely on the "headers_install"
target to have been run first instead.

For example, here's a typical error this patch is addressing:

  $ make O=build -j32 kselftest-gen_tar
  make[1]: Entering directory '/home/kernelci/linux/build'
  make --no-builtin-rules INSTALL_HDR_PATH=/home/kernelci/linux/build/usr \
          ARCH=x86 -C ../../.. headers_install
  make[3]: Entering directory '/home/kernelci/linux'
  Makefile:1022: ../scripts/Makefile.extrawarn: No such file or directory

The source directory is determined in the top-level Makefile as ".."
relatively to the "build" directory, but then the kselftest Makefile
switches to "-C ../../.." so "../scripts" then points one level higher
than the source tree e.g. "linux/../scripts" - which fails obviously.
There is no other use-case in the kernel tree where a sub-directory
Makefile tries to call a top-level make target, and it appears this
isn't really a valid thing to do.

Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
3 years agoPCI/doc: Convert examples to generic power management
Bjorn Helgaas [Tue, 7 Jun 2022 23:29:46 +0000 (18:29 -0500)] 
PCI/doc: Convert examples to generic power management

PCI-specific power management (pci_driver.suspend and pci_driver.resume) is
deprecated.  Convert sample code to the generic power management framework.

Link: https://lore.kernel.org/r/20220607232946.355987-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
3 years agommc: sdhci-of-aspeed: test: Use kunit_test_suite() macro
David Gow [Sat, 9 Jul 2022 03:20:01 +0000 (11:20 +0800)] 
mmc: sdhci-of-aspeed: test: Use kunit_test_suite() macro

The kunit_test_suite() macro is no-longer incompatible with module_add,
so its use can be reinstated.

Since this fixes parsing with builtins and kunit_tool, also enable the
test by default when KUNIT_ALL_TESTS is enabled.

The test can now be run via kunit_tool with:
./tools/testing/kunit/kunit.py run --arch=x86_64 \
--kconfig_add CONFIG_OF=y --kconfig_add CONFIG_OF_ADDRESS=y \
--kconfig_add CONFIG_MMC=y --kconfig_add CONFIG_MMC_SDHCI=y \
--kconfig_add CONFIG_MMC_SDHCI_PLTFM=y \
--kconfig_add CONFIG_MMC_SDHCI_OF_ASPEED=y \
'sdhci-of-aspeed'

(It may be worth adding a .kunitconfig at some point, as there are
enough dependencies to make that command scarily long.)

Acked-by: Daniel Latypov <dlatypov@google.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
3 years agonitro_enclaves: test: Use kunit_test_suite() macro
David Gow [Sat, 9 Jul 2022 03:20:00 +0000 (11:20 +0800)] 
nitro_enclaves: test: Use kunit_test_suite() macro

The kunit_test_suite() macro previously conflicted with module_init,
making it unsuitable for use in the nitro_enclaves test. Now that it's
fixed, we can use it instead of a custom call into internal KUnit
functions to run the test.

As a side-effect, this means that the test results are properly included
with other suites when built-in. To celebrate, enable the test by
default when KUNIT_ALL_TESTS is set (and NITRO_ENCLAVES enabled).

The nitro_enclave tests can now be run via kunit_tool with:
./tools/testing/kunit/kunit.py run --arch=x86_64 \
--kconfig_add CONFIG_PCI=y --kconfig_add CONFIG_SMP=y \
--kconfig_add CONFIG_HOTPLUG_CPU=y \
--kconfig_add CONFIG_VIRT_DRIVERS=y \
--kconfig_add CONFIG_NITRO_ENCLAVES=y \
'ne_misc_dev_test'

(This is a pretty long command, so it may be worth adding a .kunitconfig
file at some point, instead.)

Reviewed-by: Andra Paraschiv <andraprs@amazon.com>
Acked-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
3 years agothunderbolt: test: Use kunit_test_suite() macro
David Gow [Sat, 9 Jul 2022 03:19:59 +0000 (11:19 +0800)] 
thunderbolt: test: Use kunit_test_suite() macro

The new implementation of kunit_test_suite() for modules no longer
conflicts with module_init, so can now be used by the thunderbolt tests.

Also update the Kconfig entry to enable the test when KUNIT_ALL_TESTS is
enabled.

This means that kunit_tool can now successfully run and parse the test
results with, for example:
./tools/testing/kunit/kunit.py run --arch=x86_64 \
--kconfig_add CONFIG_PCI=y --kconfig_add CONFIG_USB4=y \
'thunderbolt'

Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Daniel Latypov <dlatypov@google.com>
Acked-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
3 years agokunit: flatten kunit_suite*** to kunit_suite** in .kunit_test_suites
Daniel Latypov [Sat, 9 Jul 2022 03:19:58 +0000 (11:19 +0800)] 
kunit: flatten kunit_suite*** to kunit_suite** in .kunit_test_suites

We currently store kunit suites in the .kunit_test_suites ELF section as
a `struct kunit_suite***` (modulo some `const`s).
For every test file, we store a struct kunit_suite** NULL-terminated array.

This adds quite a bit of complexity to the test filtering code in the
executor.

Instead, let's just make the .kunit_test_suites section contain a single
giant array of struct kunit_suite pointers, which can then be directly
manipulated. This array is not NULL-terminated, and so none of the test
filtering code needs to NULL-terminate anything.

Tested-by: Maíra Canal <maira.canal@usp.br>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Co-developed-by: David Gow <davidgow@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
3 years agokunit: unify module and builtin suite definitions
Jeremy Kerr [Sat, 9 Jul 2022 03:19:57 +0000 (11:19 +0800)] 
kunit: unify module and builtin suite definitions

Currently, KUnit runs built-in tests and tests loaded from modules
differently. For built-in tests, the kunit_test_suite{,s}() macro adds a
list of suites in the .kunit_test_suites linker section. However, for
kernel modules, a module_init() function is used to run the test suites.

This causes problems if tests are included in a module which already
defines module_init/exit_module functions, as they'll conflict with the
kunit-provided ones.

This change removes the kunit-defined module inits, and instead parses
the kunit tests from their own section in the module. After module init,
we call __kunit_test_suites_init() on the contents of that section,
which prepares and runs the suite.

This essentially unifies the module- and non-module kunit init formats.

Tested-by: Maíra Canal <maira.canal@usp.br>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
3 years agoof: unittest: make unittest_gpio_remove() consistent with unittest_gpio_probe()
Andy Shevchenko [Fri, 8 Jul 2022 21:45:39 +0000 (00:45 +0300)] 
of: unittest: make unittest_gpio_remove() consistent with unittest_gpio_probe()

On the ->remove() stage the callback uses physical device node instead of one
from GPIO chip and the variable name which is different to one used in
unittest_gpio_probe(). Make these consistent with unittest_gpio_probe().

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220708214539.7254-2-andriy.shevchenko@linux.intel.com
3 years agoof: unittest: Switch to use fwnode instead of of_node
Andy Shevchenko [Fri, 8 Jul 2022 21:45:38 +0000 (00:45 +0300)] 
of: unittest: Switch to use fwnode instead of of_node

The OF node in the GPIO library is deprecated and soon will be removed.
GPIO library now accepts fwnode as a firmware node, so switch the module
to use it instead.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Bartosz Golaszewski <brgl@bgdev.pl>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220708214539.7254-1-andriy.shevchenko@linux.intel.com
3 years agoselftest: Taint kernel when test module loaded
David Gow [Fri, 8 Jul 2022 04:48:47 +0000 (12:48 +0800)] 
selftest: Taint kernel when test module loaded

Make any kselftest test module (using the kselftest_module framework)
taint the kernel with TAINT_TEST on module load.

Also mark the module as a test module using MODULE_INFO(test, "Y") so
that other tools can tell this is a test module. We can't rely solely
on this, though, as these test modules are also often built-in.

Finally, update the kselftest documentation to mention that the kernel
should be tainted, and how to do so manually (as below).

Note that several selftests use kernel modules which are not based on
the kselftest_module framework, and so will not automatically taint the
kernel.

This can be done in two ways:
- Moving the module to the tools/testing directory. All modules under
  this directory will taint the kernel.
- Adding the 'test' module property with:
  MODULE_INFO(test, "Y")

Similarly, selftests which do not load modules into the kernel generally
should not taint the kernel (or possibly should only do so on failure),
as it's assumed that testing from user-space should be safe. Regardless,
they can write to /proc/sys/kernel/tainted if required.

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
3 years agomodule: panic: Taint the kernel when selftest modules load
David Gow [Fri, 8 Jul 2022 04:48:45 +0000 (12:48 +0800)] 
module: panic: Taint the kernel when selftest modules load

Taint the kernel with TAINT_TEST whenever a test module loads, by adding
a new "TEST" module property, and setting it for all modules in the
tools/testing directory. This property can also be set manually, for
tests which live outside the tools/testing directory with:
MODULE_INFO(test, "Y");

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Acked-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
3 years agoDocumentation: kunit: fix example run_kunit func to allow spaces in args
Daniel Latypov [Wed, 18 May 2022 17:01:22 +0000 (10:01 -0700)] 
Documentation: kunit: fix example run_kunit func to allow spaces in args

Without the quoting, the example will mess up invocations like
$ run_kunit "Something with spaces"

Note: this example isn't valid, but if ever a usecase arises where a
flag argument might have spaces in it, it'll break.

Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>