The Realtek RTL838x devices have a MIPS 4Kec core. This has a very simple pipeline.
OpenWrt uses CPU_TYPE:=4kec to honour this and adds a dedicated toolchain with
some GB of extra space. There would be no problem if that toolchain would do what
it is expected to do. Looking at the build process one can see:
during kernel builds:
ps -ef | grep mtune
... -march=mips32r2 -mtune=34kc ...
during package builds
ps -ef | grep mtune
... -mips32r2 -mtune=4kec ...
So the kernel is optimized for the wrong cpu type while the applications fit fine.
Explanation for this is the generic/308-mips32r2_tune.patch. This forces kernel
builds to -mtune=34kc. Nevertheless everything runs fine since years on the RTL838x
targets.
It does not make sense to provide a dedicated 4kec toolchain for this mess. So
change the setup as follows:
- switch CPU type to mips24kc for RTL838x -> This drops one toolchain and saves space
- Add a RTl838x specific mtune=4kec patch -> Builds kernel with the proper setting
Downside is packages will be built with -mtune=24kc. So a look at a simple benchmark
should give insight if this has really a big impact. See numbers attached. To sum it
up in two sentences
- All non RSA benchmarks are within expectation
- RSA benchmarks show large deviations (before and after)
The normal usecase for these switches is definetly not a CPU intensive workload
so this is ok for now.
Before: kernel 6.12 (mtune=34kc) + apps (mtune=4kec)
root@OpenWrt:/usr/bin# ./wolfssl-benchmark
------------------------------------------------------------------------------
wolfSSL version 5.7.6
------------------------------------------------------------------------------
wolfCrypt Benchmark (block bytes
1048576, min 1.0 sec each)
RNG 5 MiB took 1.426 seconds, 3.507 MiB/s
AES-128-CBC-enc 5 MiB took 1.178 seconds, 4.243 MiB/s
AES-128-CBC-dec 5 MiB took 1.171 seconds, 4.270 MiB/s
AES-192-CBC-enc 5 MiB took 1.307 seconds, 3.824 MiB/s
AES-192-CBC-dec 5 MiB took 1.311 seconds, 3.815 MiB/s
AES-256-CBC-enc 5 MiB took 1.447 seconds, 3.455 MiB/s
AES-256-CBC-dec 5 MiB took 1.421 seconds, 3.519 MiB/s
AES-128-GCM-enc 5 MiB took 3.772 seconds, 1.325 MiB/s
AES-128-GCM-dec 5 MiB took 3.756 seconds, 1.331 MiB/s
AES-192-GCM-enc 5 MiB took 3.939 seconds, 1.269 MiB/s
AES-192-GCM-dec 5 MiB took 3.932 seconds, 1.272 MiB/s
AES-256-GCM-enc 5 MiB took 4.043 seconds, 1.237 MiB/s
AES-256-GCM-dec 5 MiB took 4.033 seconds, 1.240 MiB/s
GMAC Default 2 MiB took 1.056 seconds, 1.895 MiB/s
AES-128-CTR 5 MiB took 1.195 seconds, 4.185 MiB/s
AES-192-CTR 5 MiB took 1.319 seconds, 3.791 MiB/s
AES-256-CTR 5 MiB took 1.460 seconds, 3.425 MiB/s
AES-CCM-enc 5 MiB took 2.279 seconds, 2.194 MiB/s
AES-CCM-dec 5 MiB took 2.273 seconds, 2.200 MiB/s
ARC4 20 MiB took 1.226 seconds, 16.315 MiB/s
CHACHA 15 MiB took 1.001 seconds, 14.982 MiB/s
CHA-POLY 15 MiB took 1.440 seconds, 10.416 MiB/s
3DES 5 MiB took 4.364 seconds, 1.146 MiB/s
MD5 25 MiB took 1.034 seconds, 24.173 MiB/s
POLY1305 35 MiB took 1.015 seconds, 34.467 MiB/s
SHA 25 MiB took 1.127 seconds, 22.183 MiB/s
SHA-256 10 MiB took 1.104 seconds, 9.056 MiB/s
SHA-384 5 MiB took 1.324 seconds, 3.775 MiB/s
SHA-512 5 MiB took 1.325 seconds, 3.774 MiB/s
SHA-512/224 5 MiB took 1.319 seconds, 3.791 MiB/s
SHA-512/256 5 MiB took 1.333 seconds, 3.751 MiB/s
AES-128-CMAC 5 MiB took 1.145 seconds, 4.366 MiB/s
AES-256-CMAC 5 MiB took 1.413 seconds, 3.539 MiB/s
HMAC-MD5 25 MiB took 1.034 seconds, 24.186 MiB/s
HMAC-SHA 25 MiB took 1.122 seconds, 22.272 MiB/s
HMAC-SHA256 10 MiB took 1.104 seconds, 9.059 MiB/s
HMAC-SHA384 5 MiB took 1.329 seconds, 3.762 MiB/s
HMAC-SHA512 5 MiB took 1.323 seconds, 3.778 MiB/s
PBKDF2 1 KiB took 1.018 seconds, 1.136 KiB/s
RSA 2048 key gen 1 ops took 15.547 sec, avg 15547.322 ms, 0.064 ops/sec
RSA 3072 key gen 1 ops took 66.131 sec, avg 66131.134 ms, 0.015 ops/sec
RSA 4096 key gen 1 ops took 563.611 sec, avg 563611.230 ms, 0.002 ops/sec
RSA 2048 public 200 ops took 1.403 sec, avg 7.015 ms, 142.542 ops/sec
RSA 2048 private 100 ops took 39.099 sec, avg 390.991 ms, 2.558 ops/sec
DH 2048 key gen 14 ops took 1.009 sec, avg 72.094 ms, 13.871 ops/sec
DH 2048 agree 100 ops took 15.714 sec, avg 157.139 ms, 6.364 ops/sec
ECC [ SECP256R1] 256 key gen 100 ops took 5.590 sec, avg 55.901 ms, 17.889 ops/sec
ECDHE [ SECP256R1] 256 agree 100 ops took 5.555 sec, avg 55.554 ms, 18.001 ops/sec
ECDSA [ SECP256R1] 256 sign 100 ops took 5.705 sec, avg 57.048 ms, 17.529 ops/sec
ECDSA [ SECP256R1] 256 verify 100 ops took 4.396 sec, avg 43.963 ms, 22.746 ops/sec
CURVE 25519 key gen 320 ops took 1.000 sec, avg 3.127 ms, 319.841 ops/sec
CURVE 25519 agree 400 ops took 1.214 sec, avg 3.034 ms, 329.546 ops/sec
Benchmark complete
After: kernel 6.12 (mtune=4kec) + apps (mtune=24kc)
root@OpenWrt:~# wolfssl-benchmark
------------------------------------------------------------------------------
wolfSSL version 5.7.6
------------------------------------------------------------------------------
wolfCrypt Benchmark (block bytes
1048576, min 1.0 sec each)
RNG 5 MiB took 1.428 seconds, 3.501 MiB/s
AES-128-CBC-enc 5 MiB took 1.174 seconds, 4.258 MiB/s
AES-128-CBC-dec 5 MiB took 1.162 seconds, 4.301 MiB/s
AES-192-CBC-enc 5 MiB took 1.307 seconds, 3.826 MiB/s
AES-192-CBC-dec 5 MiB took 1.313 seconds, 3.809 MiB/s
AES-256-CBC-enc 5 MiB took 1.432 seconds, 3.491 MiB/s
AES-256-CBC-dec 5 MiB took 1.426 seconds, 3.506 MiB/s
AES-128-GCM-enc 5 MiB took 3.761 seconds, 1.329 MiB/s
AES-128-GCM-dec 5 MiB took 3.748 seconds, 1.334 MiB/s
AES-192-GCM-enc 5 MiB took 3.918 seconds, 1.276 MiB/s
AES-192-GCM-dec 5 MiB took 3.922 seconds, 1.275 MiB/s
AES-256-GCM-enc 5 MiB took 4.019 seconds, 1.244 MiB/s
AES-256-GCM-dec 5 MiB took 4.014 seconds, 1.246 MiB/s
GMAC Default 2 MiB took 1.052 seconds, 1.900 MiB/s
AES-128-CTR 5 MiB took 1.189 seconds, 4.205 MiB/s
AES-192-CTR 5 MiB took 1.315 seconds, 3.804 MiB/s
AES-256-CTR 5 MiB took 1.455 seconds, 3.436 MiB/s
AES-CCM-enc 5 MiB took 2.257 seconds, 2.215 MiB/s
AES-CCM-dec 5 MiB took 2.269 seconds, 2.204 MiB/s
ARC4 15 MiB took 1.062 seconds, 14.124 MiB/s
CHACHA 15 MiB took 1.008 seconds, 14.880 MiB/s
CHA-POLY 15 MiB took 1.461 seconds, 10.266 MiB/s
3DES 5 MiB took 4.347 seconds, 1.150 MiB/s
MD5 25 MiB took 1.029 seconds, 24.291 MiB/s
POLY1305 35 MiB took 1.024 seconds, 34.181 MiB/s
SHA 25 MiB took 1.115 seconds, 22.418 MiB/s
SHA-256 10 MiB took 1.154 seconds, 8.664 MiB/s
SHA-384 5 MiB took 1.345 seconds, 3.718 MiB/s
SHA-512 5 MiB took 1.343 seconds, 3.723 MiB/s
SHA-512/224 5 MiB took 1.350 seconds, 3.703 MiB/s
SHA-512/256 5 MiB took 1.345 seconds, 3.718 MiB/s
AES-128-CMAC 5 MiB took 1.143 seconds, 4.376 MiB/s
AES-256-CMAC 5 MiB took 1.405 seconds, 3.559 MiB/s
HMAC-MD5 25 MiB took 1.027 seconds, 24.334 MiB/s
HMAC-SHA 25 MiB took 1.112 seconds, 22.490 MiB/s
HMAC-SHA256 10 MiB took 1.096 seconds, 9.125 MiB/s
HMAC-SHA384 5 MiB took 1.344 seconds, 3.721 MiB/s
HMAC-SHA512 5 MiB took 1.347 seconds, 3.712 MiB/s
PBKDF2 1 KiB took 1.012 seconds, 1.142 KiB/s
RSA 2048 key gen 1 ops took 27.136 sec, avg 27136.046 ms, 0.037 ops/sec
RSA 3072 key gen 1 ops took 39.922 sec, avg 39922.464 ms, 0.025 ops/sec
RSA 4096 key gen 1 ops took 519.483 sec, avg 519482.959 ms, 0.002 ops/sec
RSA 2048 public 200 ops took 1.398 sec, avg 6.989 ms, 143.073 ops/sec
RSA 2048 private 100 ops took 40.412 sec, avg 404.121 ms, 2.475 ops/sec
DH 2048 key gen 14 ops took 1.033 sec, avg 73.764 ms, 13.557 ops/sec
DH 2048 agree 100 ops took 16.401 sec, avg 164.009 ms, 6.097 ops/sec
ECC [ SECP256R1] 256 key gen 100 ops took 5.583 sec, avg 55.830 ms, 17.912 ops/sec
ECDHE [ SECP256R1] 256 agree 100 ops took 5.555 sec, avg 55.549 ms, 18.002 ops/sec
ECDSA [ SECP256R1] 256 sign 100 ops took 5.703 sec, avg 57.032 ms, 17.534 ops/sec
ECDSA [ SECP256R1] 256 verify 100 ops took 4.203 sec, avg 42.030 ms, 23.792 ops/sec
CURVE 25519 key gen 315 ops took 1.001 sec, avg 3.176 ms, 314.822 ops/sec
CURVE 25519 agree 400 ops took 1.244 sec, avg 3.110 ms, 321.579 ops/sec
Benchmark complete
Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://github.com/openwrt/openwrt/pull/19117
Signed-off-by: Robert Marko <robimarko@gmail.com>