rs6000: __builtin_mma_disassemble_acc() doesn't store elements correctly in LE mode
PR96236 shows a problem where we don't correctly store our 512-bit accumulators
correctly in little-endian mode. The patch below detects when we're doing a
little-endian memory access and stores to the correct memory locations.