[thirdparty/valgrind.git] / README.aarch64


Status
~~~~~~

As of Jan 2014 the trunk contains a port to AArch64 ARMv8 -- loosely,
the 64-bit ARM architecture.  Currently it supports integer and FP
instructions and can run anything generated by gcc-4.8.2 -O3.  The
port is under active development.

Current limitations, as of mid-May 2014.

* limited support of vector (SIMD) instructions.  Initial target is
  support for instructions created by gcc-4.8.2 -O3
  (via autovectorisation).  This is complete.

* Integration with the built in GDB server:
   - works ok (breakpoint, attach to a process blocked in a syscall, ...)
   - still to do:
      arm64 xml register description files (allowing shadow registers
                                            to be looked at).
      cpsr transfer to/from gdb to be looked at (see also arm equivalent code)

* limited syscall support

There has been extensive testing of the baseline simulation of integer
and FP instructions.  Memcheck is also believed to work, at least for
small examples.  Other tools appear to at least not crash when running
/bin/date.

Enough syscalls and instructions are supported for substantial
programs to work.  Firefox 26 is able to start up and quit.  The noise
level from Memcheck is low enough to make it practical to use for real
debugging.


Building
~~~~~~~~

You could probably build it directly on a target OS, using the normal
non-cross scheme

  ./autogen.sh ; ./configure --prefix=.. ; make ; make install

Development so far was however done by cross compiling, viz:

  export CC=aarch64-linux-gnu-gcc
  export LD=aarch64-linux-gnu-ld
  export AR=aarch64-linux-gnu-ar

  ./autogen.sh
  ./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux \
              --enable-only64bit
  make -j4
  make -j4 install

Doing this assumes that the install path (`pwd`/Inst) is valid on
both host and target, which isn't normally the case.  To avoid
this limitation, do instead:

  ./configure --prefix=/install/path/on/target \
              --host=aarch64-unknown-linux \
              --enable-only64bit
  make -j4
  make -j4 install DESTDIR=/a/temp/dir/on/host
  # and then copy the contents of DESTDIR to the target.

See README.android for more examples of cross-compile building.


Implementation tidying-up/TODO notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

UnwindStartRegs -- what should that contain?


vki-arm64-linux.h: vki_sigaction_base
I really don't think that __vki_sigrestore_t sa_restorer
should be present.  Adding it surely puts sa_mask at a wrong
offset compared to (kernel) reality.  But not having it causes
compilation of m_signals.c to fail in hard to understand ways,
so adding it temporarily.


m_trampoline.S: what's the unexecutable-insn value? 0xFFFFFFFF 
is there at the moment, but 0x00000000 is probably what it should be.
Also, fix indentation/tab-vs-space stuff


./include/vki/vki-arm64-linux.h: uses __uint128_t.  Should change
it to __vki_uint128_t, but what's the defn of that?


m_debuginfo/priv_storage.h: need proper defn of DiCfSI


readdwarf.c: is this correct?
#elif defined(VGP_arm64_linux)
#  define FP_REG         29    //???
#  define SP_REG         31    //???
#  define RA_REG_DEFAULT 30    //???


vki-arm64-linux.h:
re linux-3.10.5/include/uapi/asm-generic/sembuf.h
I'd say the amd64 version has padding it shouldn't have.  Check?


syswrap-linux.c run_a_thread_NORETURN assembly sections
seems like tst->os_state.exitcode has word type
in which case the ppc64_linux use of lwz to read it, is wrong


syswrap-linux.c ML_(do_fork_clone)
assuming that VGP_arm64_linux is the same as VGP_arm_linux here


dispatch-arm64-linux.S: FIXME: set up FP control state before
entering generated code.  Also fix screwy indentation.


dispatcher-ery general: what's a good (predictor-friendly) way to
branch to a register?


in vki-arm64-scnums.h
//#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT)
Probably want to reenable that and clean up accordingly


putIRegXXorZR: figure out a way that the computed value is actually
used, so as to keep any memory reads that might generate it, alive.
(else the simulation can lose exceptions).  At least, for writes to
the zero register generated by loads .. or .. can anything other
integer instructions, that write to a register, cause exceptions?


loads/stores: generate stack alignment checks as necessary


fix barrier insns: ISB, DMB


fix atomic loads/stores


FMADD/FMSUB/FNMADD/FNMSUB: generate and use the relevant fused
IROps so as to avoid double rounding


ARM64Instr_Call getRegUsage: re-check relative to what
getAllocableRegs_ARM64 makes available


Make dispatch-arm64-linux.S save any callee-saved Q regs
I think what is required is to save D8-D15 and nothing more than that.


wrapper for __NR3264_fstat -- correct?


PRE(sys_clone): get rid of references to vki_modify_ldt_t and the
definition of it in vki-arm64-linux.h.  Ditto for 32 bit arm.


sigframe-arm64-linux.c: build_sigframe: references to nonexistent
siguc->uc_mcontext.trap_no, siguc->uc_mcontext.error_code have been
replaced by zero.  Also in synth_ucontext.


m_debugger.c:
uregs.pstate   = LibVEX_GuestARM64_get_nzcv(vex); /* is this correct? */
Is that remotely correct?


host_arm64_defs.c: emit_ARM64INstr:
ARM64in_VDfromX and ARM64in_VQfromXX: use simple top-half zeroing
MOVs to vector registers instead of INS Vd.D[0], Xreg, to avoid false
dependencies on the top half of the register.  (Or at least check
the semantics of INS Vd.D[0] to see if it zeroes out the top.)


preferredVectorSubTypeFromSize: review perf effects and decide
on a types-for-subparts policy


fold_IRExpr_Unop: add a reduction rule for this
1Sto64(CmpNEZ64( Or64(GET:I64(1192),GET:I64(1184)) ))
vis 1Sto64(CmpNEZ64(x)) --> CmpwNEZ64(x)


check insn selection for memcheck-only primops:
Left64 CmpwNEZ64 V128to64 V128HIto64 1Sto64 CmpNEZ64 CmpNEZ32
widen_z_8_to_64 1Sto32 Left32 32HLto64 CmpwNEZ32 CmpNEZ8


isel: get rid of various cases where zero is put into a register
and just use xzr instead.  Especially for CmpNEZ64/32.  And for
writing zeroes into the CC thunk fields.


/* Keep this list in sync with that in iselNext below */
/* Keep this list in sync with that for Ist_Exit above */
uh .. they are not in sync


very stupid:
imm64  x23, 0xFFFFFFFFFFFFFFA0
17 F4 9F D2 F7 FF BF F2 F7 FF DF F2 F7 FF FF F2 


valgrind.h: fix VALGRIND_ALIGN_STACK/VALGRIND_RESTORE_STACK,
also add CFI annotations


could possibly bring r29 into use, which be useful as it is
callee saved


ubfm/sbfm etc: special case cases that are simple shifts, as iropt
can't always simplify the general-case IR to a shift in such cases.


LDP,STP (immediate, simm7) (FP&VEC)
should zero out hi parts of dst registers in the LDP case


DUP insns: use Iop_Dup8x16, Iop_Dup16x8, Iop_Dup32x4
rather than doing it "by hand"


Any place where ZeroHI64ofV128 is used in conjunction with
FP vector IROps: find a way to make sure that arithmetic on
the upper half of the values is "harmless."


math_MINMAXV: use real Iop_Cat{Odd,Even}Lanes ops rather than
inline scalar code


chainXDirect_ARM64: use direct jump forms when possible


Raspberry Pi
~~~~~~~~~~~~

The Raspberry Pi since version 3 has had 64 bit hardware (aarch64). However,
Raspberry Pi OS (formerly raspbian) has a 32-bit userland. You can check
this using commands like file, ldd or readelf. For instance,

$ file -L `which gcc`
/usr/bin/gcc: ELF 32-bit LSB executable, ARM, EABI5 version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=6cfb4b75e1e265eb5a05ef0a1915bca9bae34674, for GNU/Linux 3.2.0, stripped

As a consequence, if you try to run just "configure" it will detect aarch64 and
select the "arm64" target, which is incorrect for the 32-bit userland.

Instead you should run

configure --host=armv8-unknown-linux

That will override the aarch64 detection and result in a 32bit build of
Valgrind for the "arm" target.
Commit	Line	Data
3f6d2112	1
aa7f3955 JS	2	Status
aa7f3955 JS	3	~~~~~~
3f6d2112	4
aa7f3955 JS	5	As of Jan 2014 the trunk contains a port to AArch64 ARMv8 -- loosely,
aa7f3955 JS	6	the 64-bit ARM architecture. Currently it supports integer and FP
13efd27d JS	7	instructions and can run anything generated by gcc-4.8.2 -O3. The
13efd27d JS	8	port is under active development.
3f6d2112	9
f0cbcd63	10	Current limitations, as of mid-May 2014.
3f6d2112	11
68eb4397	12	* limited support of vector (SIMD) instructions. Initial target is
13efd27d JS	13	support for instructions created by gcc-4.8.2 -O3
13efd27d JS	14	(via autovectorisation). This is complete.
3f6d2112	15
e6f86f08	16	* Integration with the built in GDB server:
f0cbcd63	17	- works ok (breakpoint, attach to a process blocked in a syscall, ...)
e6f86f08	18	- still to do:
067c4c21 PW	19	arm64 xml register description files (allowing shadow registers
067c4c21 PW	20	to be looked at).
e6f86f08	21	cpsr transfer to/from gdb to be looked at (see also arm equivalent code)
3f6d2112	22
68eb4397 JS	23	* limited syscall support
68eb4397 JS	24
aa7f3955 JS	25	There has been extensive testing of the baseline simulation of integer
	26	and FP instructions. Memcheck is also believed to work, at least for
	27	small examples. Other tools appear to at least not crash when running
	28	/bin/date.
	29
8cb7b38c JS	30	Enough syscalls and instructions are supported for substantial
	31	programs to work. Firefox 26 is able to start up and quit. The noise
	32	level from Memcheck is low enough to make it practical to use for real
	33	debugging.
68eb4397	34
aa7f3955 JS	35
	36	Building
	37	~~~~~~~~
	38
	39	You could probably build it directly on a target OS, using the normal
	40	non-cross scheme
	41
	42	./autogen.sh ; ./configure --prefix=.. ; make ; make install
	43
	44	Development so far was however done by cross compiling, viz:
	45
	46	export CC=aarch64-linux-gnu-gcc
	47	export LD=aarch64-linux-gnu-ld
	48	export AR=aarch64-linux-gnu-ar
	49
	50	./autogen.sh
	51	./configure --prefix=`pwd`/Inst --host=aarch64-unknown-linux \
	52	--enable-only64bit
	53	make -j4
	54	make -j4 install
	55
	56	Doing this assumes that the install path (`pwd`/Inst) is valid on
	57	both host and target, which isn't normally the case. To avoid
	58	this limitation, do instead:
	59
	60	./configure --prefix=/install/path/on/target \
	61	--host=aarch64-unknown-linux \
	62	--enable-only64bit
	63	make -j4
	64	make -j4 install DESTDIR=/a/temp/dir/on/host
	65	# and then copy the contents of DESTDIR to the target.
	66
	67	See README.android for more examples of cross-compile building.
3f6d2112	68
3f6d2112	69
aa7f3955 JS	70	Implementation tidying-up/TODO notes
	71	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	72
	73	UnwindStartRegs -- what should that contain?
	74
	75
	76	vki-arm64-linux.h: vki_sigaction_base
3f6d2112 JS	77	I really don't think that __vki_sigrestore_t sa_restorer
	78	should be present. Adding it surely puts sa_mask at a wrong
	79	offset compared to (kernel) reality. But not having it causes
	80	compilation of m_signals.c to fail in hard to understand ways,
	81	so adding it temporarily.
	82
	83
	84	m_trampoline.S: what's the unexecutable-insn value? 0xFFFFFFFF
	85	is there at the moment, but 0x00000000 is probably what it should be.
	86	Also, fix indentation/tab-vs-space stuff
	87
	88
	89	./include/vki/vki-arm64-linux.h: uses __uint128_t. Should change
	90	it to __vki_uint128_t, but what's the defn of that?
	91
	92
3f6d2112 JS	93	m_debuginfo/priv_storage.h: need proper defn of DiCfSI
	94
	95
	96	readdwarf.c: is this correct?
	97	#elif defined(VGP_arm64_linux)
	98	# define FP_REG 29 //???
	99	# define SP_REG 31 //???
	100	# define RA_REG_DEFAULT 30 //???
	101
	102
	103	vki-arm64-linux.h:
	104	re linux-3.10.5/include/uapi/asm-generic/sembuf.h
	105	I'd say the amd64 version has padding it shouldn't have. Check?
	106
	107
3f6d2112 JS	108	syswrap-linux.c run_a_thread_NORETURN assembly sections
	109	seems like tst->os_state.exitcode has word type
	110	in which case the ppc64_linux use of lwz to read it, is wrong
	111
	112
3f6d2112 JS	113	syswrap-linux.c ML_(do_fork_clone)
	114	assuming that VGP_arm64_linux is the same as VGP_arm_linux here
	115
	116
3f6d2112 JS	117	dispatch-arm64-linux.S: FIXME: set up FP control state before
	118	entering generated code. Also fix screwy indentation.
	119
aa7f3955	120
3f6d2112 JS	121	dispatcher-ery general: what's a good (predictor-friendly) way to
	122	branch to a register?
	123
	124
3f6d2112 JS	125	in vki-arm64-scnums.h
	126	//#if __BITS_PER_LONG == 64 && !defined(__SYSCALL_COMPAT)
	127	Probably want to reenable that and clean up accordingly
	128
	129
3f6d2112 JS	130	putIRegXXorZR: figure out a way that the computed value is actually
	131	used, so as to keep any memory reads that might generate it, alive.
	132	(else the simulation can lose exceptions). At least, for writes to
	133	the zero register generated by loads .. or .. can anything other
	134	integer instructions, that write to a register, cause exceptions?
	135
	136
3f6d2112 JS	137	loads/stores: generate stack alignment checks as necessary
	138
	139
3f6d2112 JS	140	fix barrier insns: ISB, DMB
	141
	142
3f6d2112 JS	143	fix atomic loads/stores
	144
	145
3f6d2112 JS	146	FMADD/FMSUB/FNMADD/FNMSUB: generate and use the relevant fused
	147	IROps so as to avoid double rounding
	148
	149
3f6d2112 JS	150	ARM64Instr_Call getRegUsage: re-check relative to what
	151	getAllocableRegs_ARM64 makes available
	152
	153
3f6d2112 JS	154	Make dispatch-arm64-linux.S save any callee-saved Q regs
	155	I think what is required is to save D8-D15 and nothing more than that.
	156
	157
3f6d2112 JS	158	wrapper for __NR3264_fstat -- correct?
	159
	160
aa7f3955 JS	161	PRE(sys_clone): get rid of references to vki_modify_ldt_t and the
aa7f3955 JS	162	definition of it in vki-arm64-linux.h. Ditto for 32 bit arm.
3f6d2112 JS	163
	164
	165	sigframe-arm64-linux.c: build_sigframe: references to nonexistent
	166	siguc->uc_mcontext.trap_no, siguc->uc_mcontext.error_code have been
	167	replaced by zero. Also in synth_ucontext.
	168
	169
3f6d2112 JS	170	m_debugger.c:
	171	uregs.pstate = LibVEX_GuestARM64_get_nzcv(vex); /* is this correct? */
	172	Is that remotely correct?
	173
	174
3f6d2112 JS	175	host_arm64_defs.c: emit_ARM64INstr:
	176	ARM64in_VDfromX and ARM64in_VQfromXX: use simple top-half zeroing
	177	MOVs to vector registers instead of INS Vd.D[0], Xreg, to avoid false
	178	dependencies on the top half of the register. (Or at least check
aa7f3955	179	the semantics of INS Vd.D[0] to see if it zeroes out the top.)
3f6d2112 JS	180
	181
	182	preferredVectorSubTypeFromSize: review perf effects and decide
	183	on a types-for-subparts policy
	184
	185
3f6d2112 JS	186	fold_IRExpr_Unop: add a reduction rule for this
	187	1Sto64(CmpNEZ64( Or64(GET:I64(1192),GET:I64(1184)) ))
	188	vis 1Sto64(CmpNEZ64(x)) --> CmpwNEZ64(x)
	189
	190
3f6d2112 JS	191	check insn selection for memcheck-only primops:
	192	Left64 CmpwNEZ64 V128to64 V128HIto64 1Sto64 CmpNEZ64 CmpNEZ32
	193	widen_z_8_to_64 1Sto32 Left32 32HLto64 CmpwNEZ32 CmpNEZ8
	194
	195
3f6d2112 JS	196	isel: get rid of various cases where zero is put into a register
	197	and just use xzr instead. Especially for CmpNEZ64/32. And for
	198	writing zeroes into the CC thunk fields.
	199
	200
3f6d2112 JS	201	/* Keep this list in sync with that in iselNext below */
	202	/* Keep this list in sync with that for Ist_Exit above */
	203	uh .. they are not in sync
	204
	205
3f6d2112 JS	206	very stupid:
	207	imm64 x23, 0xFFFFFFFFFFFFFFA0
	208	17 F4 9F D2 F7 FF BF F2 F7 FF DF F2 F7 FF FF F2
	209
	210
3f6d2112 JS	211	valgrind.h: fix VALGRIND_ALIGN_STACK/VALGRIND_RESTORE_STACK,
3f6d2112 JS	212	also add CFI annotations
0a13c57c JS	213
0a13c57c JS	214
0a13c57c JS	215	could possibly bring r29 into use, which be useful as it is
0a13c57c JS	216	callee saved
aa7f3955 JS	217
	218
	219	ubfm/sbfm etc: special case cases that are simple shifts, as iropt
	220	can't always simplify the general-case IR to a shift in such cases.
fad45c7b JS	221
	222
	223	LDP,STP (immediate, simm7) (FP&VEC)
	224	should zero out hi parts of dst registers in the LDP case
	225
	226
	227	DUP insns: use Iop_Dup8x16, Iop_Dup16x8, Iop_Dup32x4
	228	rather than doing it "by hand"
	229
	230
	231	Any place where ZeroHI64ofV128 is used in conjunction with
	232	FP vector IROps: find a way to make sure that arithmetic on
	233	the upper half of the values is "harmless."
	234
	235
	236	math_MINMAXV: use real Iop_Cat{Odd,Even}Lanes ops rather than
	237	inline scalar code
68eb4397 JS	238
	239
	240	chainXDirect_ARM64: use direct jump forms when possible
5bdb86cc PF	241
	242
	243	Raspberry Pi
	244	~~~~~~~~~~~~
	245
	246	The Raspberry Pi since version 3 has had 64 bit hardware (aarch64). However,
	247	Raspberry Pi OS (formerly raspbian) has a 32-bit userland. You can check
	248	this using commands like file, ldd or readelf. For instance,
	249
	250	$ file -L `which gcc`
	251	/usr/bin/gcc: ELF 32-bit LSB executable, ARM, EABI5 version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=6cfb4b75e1e265eb5a05ef0a1915bca9bae34674, for GNU/Linux 3.2.0, stripped
	252
	253	As a consequence, if you try to run just "configure" it will detect aarch64 and
	254	select the "arm64" target, which is incorrect for the 32-bit userland.
	255
	256	Instead you should run
	257
	258	configure --host=armv8-unknown-linux
	259
	260	That will override the aarch64 detection and result in a 32bit build of
	261	Valgrind for the "arm" target.
	262