Assembly acceleration secp384r1 opts to not use any callee-save VSRs, as
VSX enabled systems make extensive use of renaming, and so writebacks in
felem_{mul,square}() can be reordered for best cache effects.
Remove stack allocations. This in turn fixes unmatched push/pops in
felem_{mul,square}().
Signed-off-by: Rohan McLure <rohan.mclure@linux.ibm.com> Reviewed-by: Tomas Mraz <tomas@openssl.org> Reviewed-by: Shane Lontis <shane.lontis@oracle.com> Reviewed-by: Hugo Landau <hlandau@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21749)