We agreed with LLVM that we shouldn't enforce the architectural
dependencies between fp8 muliplication features, so remove them.
Additionally, fix a typo in the gating for FEAT_SME_F8F16 instructions,
which were mistakenly gated by +sme-f8f32 instead. Until now this
mistake had been masked by the dependency between the features.