crypto: arm64/sm4 - add CE implementation for cmac/xcbc/cbcmac
This patch is a CE-optimized assembly implementation for cmac/xcbc/cbcmac.
Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 300 mode of
tcrypt, and compared the performance before and after this patch (the driver
used before this patch is XXXmac(sm4-ce)). The abscissas are blocks of
different lengths. The data is tabulated and the unit is Mb/s: