]>
Commit | Line | Data |
---|---|---|
3f6d2112 | 1 | |
aa7f3955 JS |
2 | Status |
3 | ~~~~~~ | |
3f6d2112 | 4 | |
aa7f3955 JS |
5 | As of Jan 2014 the trunk contains a port to AArch64 ARMv8 -- loosely, |
6 | the 64-bit ARM architecture. Currently it supports integer and FP | |
13efd27d JS |
7 | instructions and can run anything generated by gcc-4.8.2 -O3. The |
8 | port is under active development. | |
3f6d2112 | 9 | |
f0cbcd63 | 10 | Current limitations, as of mid-May 2014. |
3f6d2112 | 11 | |
68eb4397 | 12 | * limited support of vector (SIMD) instructions. Initial target is |
13efd27d JS |
13 | support for instructions created by gcc-4.8.2 -O3 |
14 | (via autovectorisation). This is complete. | |
3f6d2112 | 15 | |
e6f86f08 | 16 | * Integration with the built in GDB server: |
f0cbcd63 | 17 | - works ok (breakpoint, attach to a process blocked in a syscall, ...) |
e6f86f08 | 18 | - still to do: |
067c4c21 PW |
19 | arm64 xml register description files (allowing shadow registers |
20 | to be looked at). | |
e6f86f08 | 21 | cpsr transfer to/from gdb to be looked at (see also arm equivalent code) |
3f6d2112 | 22 | |
68eb4397 JS |
23 | * limited syscall support |
24 | ||
aa7f3955 JS |
25 | There has been extensive testing of the baseline simulation of integer |
26 | and FP instructions. Memcheck is also believed to work, at least for | |
27 | small examples. Other tools appear to at least not crash when running | |
28 | /bin/date. | |
29 | ||
8cb7b38c JS |
30 | Enough syscalls and instructions are supported for substantial |
31 | programs to work. Firefox 26 is able to start up and quit. The noise | |
32 | level from Memcheck is low enough to make it practical to use for real | |
33 | debugging. | |
68eb4397 | 34 | |
aa7f3955 JS |
35 | |
36 | Building | |
37 | ~~~~~~~~ | |
38 | ||
39 | You could probably build it directly on a target OS, using the normal | |
40 | non-cross scheme | |
41 | ||
42 | ./autogen.sh ; ./configure --prefix=.. ; make ; make install | |
43 | ||
44 | Development so far was however done by cross compiling, viz: | |
45 | ||
46 | export CC=aarch64-linux-gnu-gcc | |
47 | export LD=aarch64-linux-gnu-ld | |
48 | export AR=aarch64-linux-gnu-ar | |
49 | ||
50 | ./autogen.sh | |
51 | ./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux \ | |
52 | --enable-only64bit | |
53 | make -j4 | |
54 | make -j4 install | |
55 | ||
56 | Doing this assumes that the install path (`pwd`/Inst) is valid on | |
57 | both host and target, which isn't normally the case. To avoid | |
58 | this limitation, do instead: | |
59 | ||
60 | ./configure --prefix=/install/path/on/target \ | |
61 | --host=aarch64-unknown-linux \ | |
62 | --enable-only64bit | |
63 | make -j4 | |
64 | make -j4 install DESTDIR=/a/temp/dir/on/host | |
65 | # and then copy the contents of DESTDIR to the target. | |
66 | ||
67 | See README.android for more examples of cross-compile building. | |
3f6d2112 | 68 | |
3f6d2112 | 69 | |
aa7f3955 JS |
70 | Implementation tidying-up/TODO notes |
71 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
72 | ||
73 | UnwindStartRegs -- what should that contain? | |
74 | ||
75 | ||
76 | vki-arm64-linux.h: vki_sigaction_base | |
3f6d2112 JS |
77 | I really don't think that __vki_sigrestore_t sa_restorer |
78 | should be present. Adding it surely puts sa_mask at a wrong | |
79 | offset compared to (kernel) reality. But not having it causes | |
80 | compilation of m_signals.c to fail in hard to understand ways, | |
81 | so adding it temporarily. | |
82 | ||
83 | ||
84 | m_trampoline.S: what's the unexecutable-insn value? 0xFFFFFFFF | |
85 | is there at the moment, but 0x00000000 is probably what it should be. | |
86 | Also, fix indentation/tab-vs-space stuff | |
87 | ||
88 | ||
89 | ./include/vki/vki-arm64-linux.h: uses __uint128_t. Should change | |
90 | it to __vki_uint128_t, but what's the defn of that? | |
91 | ||
92 | ||
3f6d2112 JS |
93 | m_debuginfo/priv_storage.h: need proper defn of DiCfSI |
94 | ||
95 | ||
96 | readdwarf.c: is this correct? | |
97 | #elif defined(VGP_arm64_linux) | |
98 | # define FP_REG 29 //??? | |
99 | # define SP_REG 31 //??? | |
100 | # define RA_REG_DEFAULT 30 //??? | |
101 | ||
102 | ||
103 | vki-arm64-linux.h: | |
104 | re linux-3.10.5/include/uapi/asm-generic/sembuf.h | |
105 | I'd say the amd64 version has padding it shouldn't have. Check? | |
106 | ||
107 | ||
3f6d2112 JS |
108 | syswrap-linux.c run_a_thread_NORETURN assembly sections |
109 | seems like tst->os_state.exitcode has word type | |
110 | in which case the ppc64_linux use of lwz to read it, is wrong | |
111 | ||
112 | ||
3f6d2112 JS |
113 | syswrap-linux.c ML_(do_fork_clone) |
114 | assuming that VGP_arm64_linux is the same as VGP_arm_linux here | |
115 | ||
116 | ||
3f6d2112 JS |
117 | dispatch-arm64-linux.S: FIXME: set up FP control state before |
118 | entering generated code. Also fix screwy indentation. | |
119 | ||
aa7f3955 | 120 | |
3f6d2112 JS |
121 | dispatcher-ery general: what's a good (predictor-friendly) way to |
122 | branch to a register? | |
123 | ||
124 | ||
3f6d2112 JS |
125 | in vki-arm64-scnums.h |
126 | //#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT) | |
127 | Probably want to reenable that and clean up accordingly | |
128 | ||
129 | ||
3f6d2112 JS |
130 | putIRegXXorZR: figure out a way that the computed value is actually |
131 | used, so as to keep any memory reads that might generate it, alive. | |
132 | (else the simulation can lose exceptions). At least, for writes to | |
133 | the zero register generated by loads .. or .. can anything other | |
134 | integer instructions, that write to a register, cause exceptions? | |
135 | ||
136 | ||
3f6d2112 JS |
137 | loads/stores: generate stack alignment checks as necessary |
138 | ||
139 | ||
3f6d2112 JS |
140 | fix barrier insns: ISB, DMB |
141 | ||
142 | ||
3f6d2112 JS |
143 | fix atomic loads/stores |
144 | ||
145 | ||
3f6d2112 JS |
146 | FMADD/FMSUB/FNMADD/FNMSUB: generate and use the relevant fused |
147 | IROps so as to avoid double rounding | |
148 | ||
149 | ||
3f6d2112 JS |
150 | ARM64Instr_Call getRegUsage: re-check relative to what |
151 | getAllocableRegs_ARM64 makes available | |
152 | ||
153 | ||
3f6d2112 JS |
154 | Make dispatch-arm64-linux.S save any callee-saved Q regs |
155 | I think what is required is to save D8-D15 and nothing more than that. | |
156 | ||
157 | ||
3f6d2112 JS |
158 | wrapper for __NR3264_fstat -- correct? |
159 | ||
160 | ||
aa7f3955 JS |
161 | PRE(sys_clone): get rid of references to vki_modify_ldt_t and the |
162 | definition of it in vki-arm64-linux.h. Ditto for 32 bit arm. | |
3f6d2112 JS |
163 | |
164 | ||
165 | sigframe-arm64-linux.c: build_sigframe: references to nonexistent | |
166 | siguc->uc_mcontext.trap_no, siguc->uc_mcontext.error_code have been | |
167 | replaced by zero. Also in synth_ucontext. | |
168 | ||
169 | ||
3f6d2112 JS |
170 | m_debugger.c: |
171 | uregs.pstate = LibVEX_GuestARM64_get_nzcv(vex); /* is this correct? */ | |
172 | Is that remotely correct? | |
173 | ||
174 | ||
3f6d2112 JS |
175 | host_arm64_defs.c: emit_ARM64INstr: |
176 | ARM64in_VDfromX and ARM64in_VQfromXX: use simple top-half zeroing | |
177 | MOVs to vector registers instead of INS Vd.D[0], Xreg, to avoid false | |
178 | dependencies on the top half of the register. (Or at least check | |
aa7f3955 | 179 | the semantics of INS Vd.D[0] to see if it zeroes out the top.) |
3f6d2112 JS |
180 | |
181 | ||
182 | preferredVectorSubTypeFromSize: review perf effects and decide | |
183 | on a types-for-subparts policy | |
184 | ||
185 | ||
3f6d2112 JS |
186 | fold_IRExpr_Unop: add a reduction rule for this |
187 | 1Sto64(CmpNEZ64( Or64(GET:I64(1192),GET:I64(1184)) )) | |
188 | vis 1Sto64(CmpNEZ64(x)) --> CmpwNEZ64(x) | |
189 | ||
190 | ||
3f6d2112 JS |
191 | check insn selection for memcheck-only primops: |
192 | Left64 CmpwNEZ64 V128to64 V128HIto64 1Sto64 CmpNEZ64 CmpNEZ32 | |
193 | widen_z_8_to_64 1Sto32 Left32 32HLto64 CmpwNEZ32 CmpNEZ8 | |
194 | ||
195 | ||
3f6d2112 JS |
196 | isel: get rid of various cases where zero is put into a register |
197 | and just use xzr instead. Especially for CmpNEZ64/32. And for | |
198 | writing zeroes into the CC thunk fields. | |
199 | ||
200 | ||
3f6d2112 JS |
201 | /* Keep this list in sync with that in iselNext below */ |
202 | /* Keep this list in sync with that for Ist_Exit above */ | |
203 | uh .. they are not in sync | |
204 | ||
205 | ||
3f6d2112 JS |
206 | very stupid: |
207 | imm64 x23, 0xFFFFFFFFFFFFFFA0 | |
208 | 17 F4 9F D2 F7 FF BF F2 F7 FF DF F2 F7 FF FF F2 | |
209 | ||
210 | ||
3f6d2112 JS |
211 | valgrind.h: fix VALGRIND_ALIGN_STACK/VALGRIND_RESTORE_STACK, |
212 | also add CFI annotations | |
0a13c57c JS |
213 | |
214 | ||
0a13c57c JS |
215 | could possibly bring r29 into use, which be useful as it is |
216 | callee saved | |
aa7f3955 JS |
217 | |
218 | ||
219 | ubfm/sbfm etc: special case cases that are simple shifts, as iropt | |
220 | can't always simplify the general-case IR to a shift in such cases. | |
fad45c7b JS |
221 | |
222 | ||
223 | LDP,STP (immediate, simm7) (FP&VEC) | |
224 | should zero out hi parts of dst registers in the LDP case | |
225 | ||
226 | ||
227 | DUP insns: use Iop_Dup8x16, Iop_Dup16x8, Iop_Dup32x4 | |
228 | rather than doing it "by hand" | |
229 | ||
230 | ||
231 | Any place where ZeroHI64ofV128 is used in conjunction with | |
232 | FP vector IROps: find a way to make sure that arithmetic on | |
233 | the upper half of the values is "harmless." | |
234 | ||
235 | ||
236 | math_MINMAXV: use real Iop_Cat{Odd,Even}Lanes ops rather than | |
237 | inline scalar code | |
68eb4397 JS |
238 | |
239 | ||
240 | chainXDirect_ARM64: use direct jump forms when possible | |
5bdb86cc PF |
241 | |
242 | ||
243 | Raspberry Pi | |
244 | ~~~~~~~~~~~~ | |
245 | ||
246 | The Raspberry Pi since version 3 has had 64 bit hardware (aarch64). However, | |
247 | Raspberry Pi OS (formerly raspbian) has a 32-bit userland. You can check | |
248 | this using commands like file, ldd or readelf. For instance, | |
249 | ||
250 | $ file -L `which gcc` | |
251 | /usr/bin/gcc: ELF 32-bit LSB executable, ARM, EABI5 version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=6cfb4b75e1e265eb5a05ef0a1915bca9bae34674, for GNU/Linux 3.2.0, stripped | |
252 | ||
253 | As a consequence, if you try to run just "configure" it will detect aarch64 and | |
254 | select the "arm64" target, which is incorrect for the 32-bit userland. | |
255 | ||
256 | Instead you should run | |
257 | ||
258 | configure --host=armv8-unknown-linux | |
259 | ||
260 | That will override the aarch64 detection and result in a 32bit build of | |
261 | Valgrind for the "arm" target. | |
262 |