From: Jakub Kicinski <kuba@kernel.org>
Date: Tue, 15 Apr 2025 00:29:18 +0000 (-0700)
Subject: Merge branch 'net-mlx5-hws-refactor-action-ste-handling'
X-Git-Tag: v6.16-rc1~132^2~309
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=a4cba7e98e35e618b3b4e1fce9746caad67cb308;p=thirdparty%2Flinux.git

Merge branch 'net-mlx5-hws-refactor-action-ste-handling'

Tariq Toukan says:

====================
net/mlx5: HWS, Refactor action STE handling

This patch series by Vlad refactors how action STEs are handled for
hardware steering.

Definitions
----------

* STE (Steering Table Entry): a building block for steering rules.
  Simple rules consist of a single STE that specifies both the match
  value and what actions to do. For more complex rules we have one or
  more match STEs that point to one or more action STEs. It is these
  action STEs which this patch series is primarily concerned with.

* RTC (Rule Table Context): a table that contains STEs. A matcher
  currently consists of a match RTC and, if necessary, an action RTC.
  This patch series decouples action RTCs from matchers and moves action
  RTCs to a central pool.

* Matcher: a logical container for steering rules. While the items above
  describe hardware concepts, a matcher is purely a software construct.

Current situation
-----------------

As mentioned above, a matcher currently consists of a match RTC (or
more, in case of complex matchers) and zero or one action STCs. An
action STC is only allocated if the matcher contains sufficiently
complicated action templates, or many actions.

When adding a rule, we decide based on its action template whether it
requires action STEs. If yes, we allocate the required number of action
STEs from the matcher's action STE.

When updating a rule, we need to prevent the rule ever being in an
invalid state. So we need to allocate and write new action STEs first,
then update the match STE to point to them, and finally release the old
action STEs. So there is a state when a rule needs double the action
STEs it normally uses.

Thus, for a given matcher of log_sz=N, log_action_ste_sz=A, the action
STC log_size is (N + A + 1). We need enough space to hold all the rules'
action STEs, and effectively double that space to account for the not
very common case of rules being updated. We could manage with much fewer
extra action STEs, but RTCs are allocated in powers of two. This results
in effective utilization of action RTCs of 50%, outside rule update
cases.

This is further complicated when resizing matchers. To avoid updating
all the rules to point to new match STEs, we keep existing action RTCs
around as resize_data, and only free them when the matcher is freed.

Action STE pool
---------------

This patch series decouples action RTCs from matchers by creating a
per-queue pool. When a rule needs to allocate action STEs it does so
from the pool, creating a new RTC if needed. During update two sets of
action STEs are in use, possibly from different RTCs.

The pool is sharded per-queue to avoid lock contention. Each per-queue
pool consists of 3 elements, corresponding to rx-only, tx-only and
rx-and-tx use cases. The series takes this approach because rules that
are bidirectional require that their action STEs have the same index in
the rx- and tx-RTCs, and using a single RTC would result in
unidirectional rules wasting the STEs for the unused direction.

Pool elements, in turn, consist of a list of RTCs. The driver
progressively allocates larger RTCs as they are needed to amortize the
cost of allocation.

Allocation of elements (STEs) inside RTCs is modelled by an existing
mechanism, somewhat confusingly also known as a pool. The first few
patches in the series refactor this abstraction to simplify it and adapt
it to the new schema.

Finally, this series implements periodic cleanup of unused action RTCs
as a new feature. Previously, once a matcher allocated an action RTC, it
would only be freed when the matcher is freed. This resulted in a lot of
wasted memory for matchers that had previously grown, but were now
mostly unused.

Conversely, action STE pools have a timestamp of when they were last
used. A cleanup routine periodically checks all pools. If a pool's last
usage was too far in the past, it is destroyed.

Benchmarks
----------

The test module creates a batch of (1 << 18) rules per queue and then
deletes them, in a loop. The rules are complex enough to require two
action STEs per rule.

Each queue is manipulated from a separate kernel workqueue, so there is
a 1:1 correspondence between threads and queues.

There are sleep statements between insert and delete batches so that
memory usage can be evaluated using `free -m`. The numbers below are the
diff between base memory usage (without the mlx5 module inserted) and
peak usage while running a test. The values are rounded to the nearest
hundred megabytes. The `queues` column lists how many queues the test
used.

queues          mem_before      mem_after
1               1300M            800M
4               4000M           2300M
8               7300M           3300M

Across all of the tests, insertion and deletion rates are the same
before and after these patches.

Summary of the patches
----------------------

* Patch 1: Fix matcher action template attach to avoid overrunning the
  buffer and correctly report errors.
* Patches 2-7: Cleanup the existing pool abstraction. Clarify semantics,
  and use cases, simplify API and callers.
* Patch 8: Implement the new action STE pool structure.
* Patch 9: Use the action STE pool when manipulating rules.
* Patch 10: Remove action RTC from matcher.
* Patch 11: Add logic to periodically check and free unused action RTCs.
* Patch 12: Export action STE tables in debugfs for our dump tool.
====================

Link: https://patch.msgid.link/1744312662-356571-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---

a4cba7e98e35e618b3b4e1fce9746caad67cb308