[AArch64] Optimize AES with fat build support
This patch optimizes AES encrypt/decrypt functions with each key size has its own implementation to load the key expansion just once at function prologue which yields a considerable performance increase over loading the key expansion for every block iteration. The patch also adds fat build support for the AES functions.
`make check` passes all tests. Benchmark of executing `examples/nettle-benchmark`:
| Algorithm | mode | C (Mbyte/s) | OpenSSL (Mbyte/s) | This patch (Mbyte/s) |
| ------ | ------ | ------ | ------ | ------ |
| aes128 | ECB encrypt | 95.01 | 1037.85 | 2579.62 |
| aes128 | ECB decrypt | 93.47 | 1005.15 | 2577.53 |
| aes192 | ECB encrypt | 79.60 | 893.34 | 2205.53 |
| aes192 | ECB decrypt | 78.34 | 889.17 | 2204.41 |
| aes256 | ECB encrypt | 66.64 | 782.21 | 1925.73 |
| aes256 | ECB decrypt | 65.81 | 781.37 | 1925.79 |
See merge request nettle/nettle!34