From: Michael Weiser Date: Mon, 25 Jan 2021 18:05:47 +0000 (+0100) Subject: aarch64: Add README X-Git-Tag: nettle_3.8_release_20220602~141^2~8 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=0c5429d338103987e95de6ec25ff859adbf9a869;p=thirdparty%2Fnettle.git aarch64: Add README --- diff --git a/arm64/README b/arm64/README new file mode 100644 index 00000000..139a3cc1 --- /dev/null +++ b/arm64/README @@ -0,0 +1,45 @@ +Endianness + +Similar to arm, aarch64 can run with little-endian or big-endian memory +accesses. Endianness is handled exclusively on load and store operations. +Register layout and operation behaviour is identical in both modes. + +When writing SIMD code, endianness interaction with vector loads and stores may +exhibit seemingly unintuitive behaviour, particularly when mixing normal and +vector load/store operations. + +See https://llvm.org/docs/BigEndianNEON.html for a good overview, particularly +into the pitfalls of using ldr/str vs. ld1/st1. + +For example, ld1 {v1.2d,v2.2d},[x0] will load v1 and v2 with elements of a +one-dimensional vector from consecutive memory locations. So v1.d[0] will be +read from x0+0, v1.d[1] from x0+8 (bytes) and v2.d[0] from x0+16 and v2.d[1] +from x0+24. That'll be the same in LE and BE mode because it is the structure +of the vector prescribed by the load operation. Endianness will be applied to +the individual doublewords but the order in which they're loaded from memory +and in which they're put into d[0] and d[1] won't change. + +Another way is to explicitly load a vector of bytes using ld1 {v1.16b, +v2.16b},[x0]. This will load x0+0 into v1.b[0], x0+1 (byte) into v1.b[1] and so +forth. This load (or store) is endianness-neutral and behaves identical in LE +and BE mode. + +Care must however be taken when switching views onto the registers: d[0] is +mapped onto b[0] through b[7] and b[0] will be the least significant byte in +d[0] and b[7] will be MSB. This layout is also the same in both memory +endianness modes. ld1 {v1.16b}, however, will always load a vector of bytes +with eight elements as consecutive bytes from memory into b[0] through b[7]. +When accessed trough d[0] this will only appear as the expected +doubleword-sized number if it was indeed stored little-endian in memory. +Something similar happens when loading a vector of doublewords (ld1 +{v1.2d},[x0]) and then accessing individual bytes of it. Bytes will only be at +the expected indices if the doublewords are indeed stored in current memory +endianness in memory. Therefore it is most intuitive to use the appropriate +vector element width for the data being loaded or stored to apply the necessary +endianness correction. + +Finally, ldr/str are not vector operations. When used to load a 128bit +quadword, they will apply endianness to the whole quadword. Therefore +particular care must be taken if the loaded data is then to be regarded as +elements of e.g. a doubleword vector. Indicies may appear reversed on +big-endian systems (because they are).