AArch64: Enable dispatch scheduling for Neoverse V2.
This patch adds dispatch constraints for Neoverse V2 and illustrates the steps
necessary to enable dispatch scheduling for an AArch64 core.
The dispatch constraints are based on section 4.1 of the Neoverse V2 SWOG.
Please note that the values used here deviate slightly from the current SWOG
version but are based on correct numbers. Arm will do an official Neoverse V2
SWOG release with the updated values in due time.
Here are the steps how we implemented the dispatch constraints for
Neoverse V2:
1. We used instruction attributes to group instructions into dispatch groups,
corresponding to operations that utilize a certain pipeline type. For that,
we added a new attribute (neoversev2_dispatch) with values for the
different dispatch groups. The values of neoversev2_dispatch are determined
using expressions of other instruction attributes.
For example, the SWOG describes a constraint of "Up to 4 uOPs utilizing the
M pipelines". Thus, one of the values of neoversev2_dispatch is "m" and it
groups instructions that use the M pipelines such as integer multiplication.
Note that we made some minor simplifications compared to the information
in the SWOG, because the instruction annotation does not allow for a fully
accurate mapping of instructions to utilized pipelines. To give one example,
the instructions IRG and LDG are both tagged with "memtag", but IRG uses
the M pipelines, while LDG uses the L pipelines.
2. In the Neoverse V2 tuning model, we added an array of available slots per
dispatch constraint and a callback function that takes an insn as
input and returns a vector of pairs (a, b) where a is an index in the
array of slots and b is the number of occupied slots. The callback
function calls get_attr_neoversev2_dispatch(insn) and switches over the
result values to create a vector of occupied slots.
Thus, the new attribute neoversev2_dispatch provides a compact way to define
the dispatch constraints.
The array of available slots, its length, and a pointer to the
callback function are collected in a struct dispatch_constraint_into
which is referenced in the tune_params.
3. We enabled dispatch scheduling for Neoverse V2 by adding the
AARCH64_EXTRA_TUNE_DISPATCH_SCHED tune flag.
Performance evaluation showed no regression in several different
workloads including SPEC2017 and GROMACS2024.
Thank you, Tamar, for helping with performance evaluation.
The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/ChangeLog:
* config/aarch64/aarch64.md: Include neoversev2.md.
* config/aarch64/tuning_models/neoversev2.h: Enable dispatch
scheduling and add dispatch constraints.
* config/aarch64/neoversev2.md: New file and new instruction attribute
neoversev2_dispatch.