From: Michael Weiser Date: Tue, 13 Feb 2018 21:13:14 +0000 (+0100) Subject: Document arm endianness considerations X-Git-Tag: nettle_3.5rc1~74 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=70135c70863eedfd9b300614f4a5535b8b93066c;p=thirdparty%2Fnettle.git Document arm endianness considerations Extend arm/README to provide some background on considerations to be taken into account when writing assembly routines supposed to work in big and little memory endianness. --- diff --git a/arm/README b/arm/README index 9bacd97b..1ba54e0d 100644 --- a/arm/README +++ b/arm/README @@ -44,4 +44,71 @@ q12 (d24, d25) Y q13 (d26, d27) Y q14 (d28, d29) Y q15 (d30, d31) Y - + +Endianness + +ARM supports big- and little-endian memory access modes. Representation in +registers stays the same but loads and stores switch bytes. This has to be +taken into account in various cases. + +Two m4 macros are provided to handle these special cases in assembly source: +IF_LE(,) +IF_BE(,) +respectively expand to if the target system's endianness is +little-endian or big-endian. Otherwise they expand to . + +1. ldr/str + +Loading and storing 32-bit words will reverse the words' bytes in little-endian +mode. If the handled data is actually a byte sequence or data in network byte +order (big-endian), the loaded word needs to be reversed after load to get it +back into correct sequence. See v6/sha1-compress.asm LOAD macro for example. + +2. shifts + +If data is to be processed with bit operations only, endianness can be ignored +because byte-swapping on load and store will cancel each other out. Shifts +however have to be inverted. See arm/memxor.asm for an example. + +3. vld1.8 + +NEON's vld instruction can be used to produce endianness-neutral code. vld1.8 +will load a byte sequence into a register regardless of memory endianness. This +can be used to process byte sequences. See arm/neon/umac-nh.asm for example. + +4. vldm/vstm + +Care has to be taken when using vldm/vstm because they have two non-obvious +characteristics: + +a. vldm/vstm do normal byte-swapping on each value they load. When loading into + d (doubleword) registers, this means that bytes, halfwords and words of the + doubleword get swapped. When the data loaded actually represents e.g. + vectors of 32-bit words this will swap columns. +a. vldm/vstm on q (quadword) registers get translated into lvdm/vstm on the + equivalent number of d (doubleword) registers. Instead of a 128-bit load it + does two 64-bit loads. When again handling vectors of 32-bit words this will + still swap adjacent columns but will not reverse all four columns. + +memory adr0: w0 w1 w2 w3 +register q0: w1 w0 w3 w2 + +See arm/neon/chacha-core-internal.asm for an example. + +5. simple byte store + +Sometimes it is necessary to store remaining single bytes to memory. A simple +logic will store the lowest byte from a register, then do a right shift and +start over until all bytes are stored. Since this constitutes a +least-significant-byte-first store, the data to be stored needs to be reversed +first on a big-endian system. See arm/memxor.asm Lmemxor_leftover for an +example. + +6. Function parameters/return values + +AAPCS requires 64-bit parameters to be passed to and returned from functions +"in two consecutive registers [...] as if the value had been loaded from memory +representation with a single LDM instruction." Since loading a big-endian +doubleword using ldm transposes its words, the same has to be done when e.g. +returning a 64-bit value from an assembler routine. See arm/neon/umac-nh.asm +for an example.