Fix AArch64 encodings for by element instructions.
Some instructions in Armv8-a place a limitation on FP16 registers that can be
used as the register from which to select an element from.
e.g. fmla restricts Rm to 4 bits when using an FP16 register. This restriction
does not apply for all instructions, e.g. fcmla does not have this restriction
as it gets an extra bit from the M field.
Unfortunately, this restriction to S_H was added for all _Em operands before,
meaning for a large number of instructions you couldn't use the full register
file.
This fixes the issue by introducing a new operand _Em16 which applies this
restriction only when paired with S_H and leaves the _Em and the other
qualifiers for _Em16 unbounded (i.e. using the full 5 bit range).
Also the patch updates all instructions that should be affected by this.