Document arm endianness considerations

author Michael Weiser <michael.weiser@gmx.de>

Tue, 13 Feb 2018 21:13:14 +0000 (22:13 +0100)

committer Niels Möller <nisse@lysator.liu.se>

Sun, 25 Mar 2018 09:27:44 +0000 (11:27 +0200)
author Michael Weiser <michael.weiser@gmx.de>
Tue, 13 Feb 2018 21:13:14 +0000 (22:13 +0100)
committer Niels Möller <nisse@lysator.liu.se>
Sun, 25 Mar 2018 09:27:44 +0000 (11:27 +0200)
diff --git a/arm/README b/arm/README

index 9bacd97b030e83f6ff91eea9660b47abd2d643d4..1ba54e0d0abf39cb416b451c3c3de7269d251868 100644 (file)
--- a/arm/README
+++ b/arm/README
@@ -44,4 +44,71 @@ q12 (d24, d25)       Y
  q13 (d26, d27) Y
  q14 (d28, d29) Y
  q15 (d30, d31) Y
-                   
+
+Endianness
+
+ARM supports big- and little-endian memory access modes. Representation in
+registers stays the same but loads and stores switch bytes. This has to be
+taken into account in various cases.
+
+Two m4 macros are provided to handle these special cases in assembly source:
+IF_LE(<if-true>,<if-false>)
+IF_BE(<if-true>,<if-false>)
+respectively expand to <if-true> if the target system's endianness is
+little-endian or big-endian. Otherwise they expand to <if-false>.
+
+1. ldr/str
+
+Loading and storing 32-bit words will reverse the words' bytes in little-endian
+mode. If the handled data is actually a byte sequence or data in network byte
+order (big-endian), the loaded word needs to be reversed after load to get it
+back into correct sequence. See v6/sha1-compress.asm LOAD macro for example.
+
+2. shifts
+
+If data is to be processed with bit operations only, endianness can be ignored
+because byte-swapping on load and store will cancel each other out. Shifts
+however have to be inverted. See arm/memxor.asm for an example.
+
+3. vld1.8
+
+NEON's vld instruction can be used to produce endianness-neutral code. vld1.8
+will load a byte sequence into a register regardless of memory endianness. This
+can be used to process byte sequences. See arm/neon/umac-nh.asm for example.
+
+4. vldm/vstm
+
+Care has to be taken when using vldm/vstm because they have two non-obvious
+characteristics:
+
+a. vldm/vstm do normal byte-swapping on each value they load. When loading into
+   d (doubleword) registers, this means that bytes, halfwords and words of the
+   doubleword get swapped. When the data loaded actually represents e.g.
+   vectors of 32-bit words this will swap columns.
+a. vldm/vstm on q (quadword) registers get translated into lvdm/vstm on the
+   equivalent number of d (doubleword) registers. Instead of a 128-bit load it
+   does two 64-bit loads. When again handling vectors of 32-bit words this will
+   still swap adjacent columns but will not reverse all four columns.
+
+memory adr0: w0 w1 w2 w3
+register q0: w1 w0 w3 w2
+
+See arm/neon/chacha-core-internal.asm for an example.
+
+5. simple byte store
+
+Sometimes it is necessary to store remaining single bytes to memory. A simple
+logic will store the lowest byte from a register, then do a right shift and
+start over until all bytes are stored. Since this constitutes a
+least-significant-byte-first store, the data to be stored needs to be reversed
+first on a big-endian system. See arm/memxor.asm Lmemxor_leftover for an
+example.
+
+6. Function parameters/return values
+
+AAPCS requires 64-bit parameters to be passed to and returned from functions
+"in two consecutive registers [...] as if the value had been loaded from memory
+representation with a single LDM instruction." Since loading a big-endian
+doubleword using ldm transposes its words, the same has to be done when e.g.
+returning a 64-bit value from an assembler routine. See arm/neon/umac-nh.asm
+for an example.
author	Michael Weiser <michael.weiser@gmx.de>
	Tue, 13 Feb 2018 21:13:14 +0000 (22:13 +0100)
committer	Niels Möller <nisse@lysator.liu.se>
	Sun, 25 Mar 2018 09:27:44 +0000 (11:27 +0200)