match is used for following sequence copy. It is
only updated when extDict is needed, which is a
low probability case. So it can be prefetched to
reduce cache miss.
The benchmarks on various Arm platforms showed
uplift from 1% ~ 14% with gcc-11/clang-14.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: If201af4799d2455d74c79f8387404439d7f684ae
assert(op != NULL /* Precondition */);
assert(oend_w < oend /* No underflow */);
+
+#if defined(__aarch64__)
+ /* prefetch sequence starting from match that will be used for copy later */
+ PREFETCH_L1(match);
+#endif
/* Handle edge cases in a slow path:
* - Read beyond end of literals
* - Match end is within WILDCOPY_OVERLIMIT of oend