This causes CSE to incorrectly think that the two predicates are equal because
some of the significant bits get ignored due to the subreg.
The attached patch instead makes it so it always looks at all 16-bits of the
predicate, but in turn means we need to generate a TRN that matches the expected
result mode. In effect in RTL we keep the mode as VNx16BI but during codegen
re-interpret them as the mode the predicate instruction wanted:
Which needed correction to the TRN pattern. A new TRN1_CONV unspec is
introduced which allows one to keep the arguments as VNx16BI but encode the
instruction as a type of the last operand.