+2014-04-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ Little Endian Vector Support
+ Backport from mainline r205333
+ 2013-11-24 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/rs6000.c (rs6000_expand_vec_perm_const_1): Correct
+ for little endian.
+
+ Backport from mainline r205241
+ 2013-11-21 Bill Schmidt <wschmidt@vnet.ibm.com>
+
+ * config/rs6000/vector.md (vec_pack_trunc_v2df): Revert previous
+ little endian change.
+ (vec_pack_sfix_trunc_v2df): Likewise.
+ (vec_pack_ufix_trunc_v2df): Likewise.
+ * config/rs6000/rs6000.c (rs6000_expand_interleave): Correct
+ double checking of endianness.
+
+ Backport from mainline r205146
+ 2013-11-20 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/vsx.md (vsx_set_<mode>): Adjust for little endian.
+ (vsx_extract_<mode>): Likewise.
+ (*vsx_extract_<mode>_one_le): New LE variant on
+ *vsx_extract_<mode>_zero.
+ (vsx_extract_v4sf): Adjust for little endian.
+
+ Backport from mainline r205080
+ 2013-11-19 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Adjust
+ V16QI vector splat case for little endian.
+
+ Backport from mainline r205045:
+
+ 2013-11-19 Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
+
+ * config/rs6000/vector.md ("mov<mode>"): Do not call
+ rs6000_emit_le_vsx_move to move into or out of GPRs.
+ * config/rs6000/rs6000.c (rs6000_emit_le_vsx_move): Assert
+ source and destination are not GPR hard regs.
+
+ Backport from mainline r204920
+ 2011-11-17 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/rs6000.c (rs6000_frame_related): Add split_reg
+ parameter and use it in REG_FRAME_RELATED_EXPR note.
+ (emit_frame_save): Call rs6000_frame_related with extra NULL_RTX
+ parameter.
+ (rs6000_emit_prologue): Likewise, but for little endian VSX
+ stores, pass the source register of the store instead.
+
+ Backport from mainline r204862
+ 2013-11-15 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/altivec.md (UNSPEC_VPERM_X, UNSPEC_VPERM_UNS_X):
+ Remove.
+ (altivec_vperm_<mode>): Revert earlier little endian change.
+ (*altivec_vperm_<mode>_internal): Remove.
+ (altivec_vperm_<mode>_uns): Revert earlier little endian change.
+ (*altivec_vperm_<mode>_uns_internal): Remove.
+ * config/rs6000/vector.md (vec_realign_load_<mode>): Revise
+ commentary.
+
+ Backport from mainline r204441
+ 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/rs6000.c (rs6000_option_override_internal):
+ Remove restriction against use of VSX instructions when generating
+ code for little endian mode.
+
+ Backport from mainline r204440
+ 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/altivec.md (mulv4si3): Ensure we generate vmulouh
+ for both big and little endian.
+ (mulv8hi3): Swap input operands for merge high and merge low
+ instructions for little endian.
+
+ Backport from mainline r204439
+ 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/altivec.md (vec_widen_umult_even_v16qi): Change
+ define_insn to define_expand that uses even patterns for big
+ endian and odd patterns for little endian.
+ (vec_widen_smult_even_v16qi): Likewise.
+ (vec_widen_umult_even_v8hi): Likewise.
+ (vec_widen_smult_even_v8hi): Likewise.
+ (vec_widen_umult_odd_v16qi): Likewise.
+ (vec_widen_smult_odd_v16qi): Likewise.
+ (vec_widen_umult_odd_v8hi): Likewise.
+ (vec_widen_smult_odd_v8hi): Likewise.
+ (altivec_vmuleub): New define_insn.
+ (altivec_vmuloub): Likewise.
+ (altivec_vmulesb): Likewise.
+ (altivec_vmulosb): Likewise.
+ (altivec_vmuleuh): Likewise.
+ (altivec_vmulouh): Likewise.
+ (altivec_vmulesh): Likewise.
+ (altivec_vmulosh): Likewise.
+
+ Backport from mainline r204395
+ 2013-11-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/vector.md (vec_pack_sfix_trunc_v2df): Adjust for
+ little endian.
+ (vec_pack_ufix_trunc_v2df): Likewise.
+
+ Backport from mainline r204363
+ 2013-11-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/altivec.md (vec_widen_umult_hi_v16qi): Swap
+ arguments to merge instruction for little endian.
+ (vec_widen_umult_lo_v16qi): Likewise.
+ (vec_widen_smult_hi_v16qi): Likewise.
+ (vec_widen_smult_lo_v16qi): Likewise.
+ (vec_widen_umult_hi_v8hi): Likewise.
+ (vec_widen_umult_lo_v8hi): Likewise.
+ (vec_widen_smult_hi_v8hi): Likewise.
+ (vec_widen_smult_lo_v8hi): Likewise.
+
+ Backport from mainline r204350
+ 2013-11-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/vsx.md (*vsx_le_perm_store_<mode> for VSX_D):
+ Replace the define_insn_and_split with a define_insn and two
+ define_splits, with the split after reload re-permuting the source
+ register to its original value.
+ (*vsx_le_perm_store_<mode> for VSX_W): Likewise.
+ (*vsx_le_perm_store_v8hi): Likewise.
+ (*vsx_le_perm_store_v16qi): Likewise.
+
+ Backport from mainline r204321
+ 2013-11-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/vector.md (vec_pack_trunc_v2df): Adjust for
+ little endian.
+
+ Backport from mainline r204321
+ 2013-11-02 Bill Schmidt <wschmidt@vnet.linux.ibm.com>
+
+ * config/rs6000/rs6000.c (rs6000_expand_vector_set): Adjust for
+ little endian.
+
+ Backport from mainline r203980
+ 2013-10-23 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/altivec.md (mulv8hi3): Adjust for little endian.
+
+ Backport from mainline r203930
+ 2013-10-22 Bill Schmidt <wschmidt@vnet.ibm.com>
+
+ * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
+ meaning of merge-high and merge-low masks for little endian; avoid
+ use of vector-pack masks for little endian for mismatched modes.
+
+ Backport from mainline r203877
+ 2013-10-20 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Adjust for
+ little endian.
+ (vec_unpacku_hi_v8hi): Likewise.
+ (vec_unpacku_lo_v16qi): Likewise.
+ (vec_unpacku_lo_v8hi): Likewise.
+
+ Backport from mainline r203863
+ 2013-10-19 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/rs6000.c (vspltis_constant): Make sure we check
+ all elements for both endian flavors.
+
+ Backport from mainline r203714
+ 2013-10-16 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * gcc/config/rs6000/vector.md (vec_unpacks_hi_v4sf): Correct for
+ endianness.
+ (vec_unpacks_lo_v4sf): Likewise.
+ (vec_unpacks_float_hi_v4si): Likewise.
+ (vec_unpacks_float_lo_v4si): Likewise.
+ (vec_unpacku_float_hi_v4si): Likewise.
+ (vec_unpacku_float_lo_v4si): Likewise.
+
+ Backport from mainline r203713
+ 2013-10-16 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/vsx.md (vsx_concat_<mode>): Adjust output for LE.
+ (vsx_concat_v2sf): Likewise.
+
+ Backport from mainline r203458
+ 2013-10-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): Generalize to
+ handle vector float as well.
+ (*vsx_le_perm_load_v4si): Likewise.
+ (*vsx_le_perm_store_v2di): Likewise.
+ (*vsx_le_perm_store_v4si): Likewise.
+
+ Backport from mainline r203457
+ 2013-10-11 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/vector.md (vec_realign_load<mode>): Generate vperm
+ directly to circumvent subtract from splat{31} workaround.
+ * config/rs6000/rs6000-protos.h (altivec_expand_vec_perm_le): New
+ prototype.
+ * config/rs6000/rs6000.c (altivec_expand_vec_perm_le): New.
+ * config/rs6000/altivec.md (define_c_enum "unspec"): Add
+ UNSPEC_VPERM_X and UNSPEC_VPERM_UNS_X.
+ (altivec_vperm_<mode>): Convert to define_insn_and_split to
+ separate big and little endian logic.
+ (*altivec_vperm_<mode>_internal): New define_insn.
+ (altivec_vperm_<mode>_uns): Convert to define_insn_and_split to
+ separate big and little endian logic.
+ (*altivec_vperm_<mode>_uns_internal): New define_insn.
+ (vec_permv16qi): Add little endian logic.
+
+ Backport from mainline r203247
+ 2013-10-07 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/rs6000.c (altivec_expand_vec_perm_const_le): New.
+ (altivec_expand_vec_perm_const): Call it.
+
+ Backport from mainline r203246
+ 2013-10-07 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * config/rs6000/vector.md (mov<mode>): Emit permuted move
+ sequences for LE VSX loads and stores at expand time.
+ * config/rs6000/rs6000-protos.h (rs6000_emit_le_vsx_move): New
+ prototype.
+ * config/rs6000/rs6000.c (rs6000_const_vec): New.
+ (rs6000_gen_le_vsx_permute): New.
+ (rs6000_gen_le_vsx_load): New.
+ (rs6000_gen_le_vsx_store): New.
+ (rs6000_gen_le_vsx_move): New.
+ * config/rs6000/vsx.md (*vsx_le_perm_load_v2di): New.
+ (*vsx_le_perm_load_v4si): New.
+ (*vsx_le_perm_load_v8hi): New.
+ (*vsx_le_perm_load_v16qi): New.
+ (*vsx_le_perm_store_v2di): New.
+ (*vsx_le_perm_store_v4si): New.
+ (*vsx_le_perm_store_v8hi): New.
+ (*vsx_le_perm_store_v16qi): New.
+ (*vsx_xxpermdi2_le_<mode>): New.
+ (*vsx_xxpermdi4_le_<mode>): New.
+ (*vsx_xxpermdi8_le_V8HI): New.
+ (*vsx_xxpermdi16_le_V16QI): New.
+ (*vsx_lxvd2x2_le_<mode>): New.
+ (*vsx_lxvd2x4_le_<mode>): New.
+ (*vsx_lxvd2x8_le_V8HI): New.
+ (*vsx_lxvd2x16_le_V16QI): New.
+ (*vsx_stxvd2x2_le_<mode>): New.
+ (*vsx_stxvd2x4_le_<mode>): New.
+ (*vsx_stxvd2x8_le_V8HI): New.
+ (*vsx_stxvd2x16_le_V16QI): New.
+
+ Backport from mainline r201235
+ 2013-07-24 Bill Schmidt <wschmidt@linux.ibm.com>
+ Anton Blanchard <anton@au1.ibm.com>
+
+ * config/rs6000/altivec.md (altivec_vpkpx): Handle little endian.
+ (altivec_vpks<VI_char>ss): Likewise.
+ (altivec_vpks<VI_char>us): Likewise.
+ (altivec_vpku<VI_char>us): Likewise.
+ (altivec_vpku<VI_char>um): Likewise.
+
+ Backport from mainline r201208
+ 2013-07-24 Bill Schmidt <wschmidt@vnet.linux.ibm.com>
+ Anton Blanchard <anton@au1.ibm.com>
+
+ * config/rs6000/vector.md (vec_realign_load_<mode>): Reorder input
+ operands to vperm for little endian.
+ * config/rs6000/rs6000.c (rs6000_expand_builtin): Use lvsr instead
+ of lvsl to create the control mask for a vperm for little endian.
+
+ Backport from mainline r201195
+ 2013-07-23 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+ Anton Blanchard <anton@au1.ibm.com>
+
+ * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse
+ two operands for little-endian.
+
+ Backport from mainline r201193
+ 2013-07-23 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+ Anton Blanchard <anton@au1.ibm.com>
+
+ * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Correct
+ selection of field for vector splat in little endian mode.
+
+ Backport from mainline r201149
+ 2013-07-22 Bill Schmidt <wschmidt@vnet.linux.ibm.com>
+ Anton Blanchard <anton@au1.ibm.com>
+
+ * config/rs6000/rs6000.c (rs6000_expand_vector_init): Fix
+ endianness when selecting field to splat.
+
2014-04-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Backport from mainline r205123:
convert_move (small_swap, swap, 0);
low_product = gen_reg_rtx (V4SImode);
- emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two));
+ emit_insn (gen_altivec_vmulouh (low_product, one, two));
high_product = gen_reg_rtx (V4SImode);
emit_insn (gen_altivec_vmsumuhm (high_product, one, small_swap, zero));
emit_insn (gen_vec_widen_smult_even_v8hi (even, operands[1], operands[2]));
emit_insn (gen_vec_widen_smult_odd_v8hi (odd, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrghw (high, even, odd));
- emit_insn (gen_altivec_vmrglw (low, even, odd));
-
- emit_insn (gen_altivec_vpkuwum (operands[0], high, low));
+ if (BYTES_BIG_ENDIAN)
+ {
+ emit_insn (gen_altivec_vmrghw (high, even, odd));
+ emit_insn (gen_altivec_vmrglw (low, even, odd));
+ emit_insn (gen_altivec_vpkuwum (operands[0], high, low));
+ }
+ else
+ {
+ emit_insn (gen_altivec_vmrghw (high, odd, even));
+ emit_insn (gen_altivec_vmrglw (low, odd, even));
+ emit_insn (gen_altivec_vpkuwum (operands[0], low, high));
+ }
DONE;
}")
"vmrgow %0,%1,%2"
[(set_attr "type" "vecperm")])
-(define_insn "vec_widen_umult_even_v16qi"
+(define_expand "vec_widen_umult_even_v16qi"
+ [(use (match_operand:V8HI 0 "register_operand" ""))
+ (use (match_operand:V16QI 1 "register_operand" ""))
+ (use (match_operand:V16QI 2 "register_operand" ""))]
+ "TARGET_ALTIVEC"
+{
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2]));
+ else
+ emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
+(define_expand "vec_widen_smult_even_v16qi"
+ [(use (match_operand:V8HI 0 "register_operand" ""))
+ (use (match_operand:V16QI 1 "register_operand" ""))
+ (use (match_operand:V16QI 2 "register_operand" ""))]
+ "TARGET_ALTIVEC"
+{
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2]));
+ else
+ emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
+(define_expand "vec_widen_umult_even_v8hi"
+ [(use (match_operand:V4SI 0 "register_operand" ""))
+ (use (match_operand:V8HI 1 "register_operand" ""))
+ (use (match_operand:V8HI 2 "register_operand" ""))]
+ "TARGET_ALTIVEC"
+{
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2]));
+ else
+ emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
+(define_expand "vec_widen_smult_even_v8hi"
+ [(use (match_operand:V4SI 0 "register_operand" ""))
+ (use (match_operand:V8HI 1 "register_operand" ""))
+ (use (match_operand:V8HI 2 "register_operand" ""))]
+ "TARGET_ALTIVEC"
+{
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2]));
+ else
+ emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
+(define_expand "vec_widen_umult_odd_v16qi"
+ [(use (match_operand:V8HI 0 "register_operand" ""))
+ (use (match_operand:V16QI 1 "register_operand" ""))
+ (use (match_operand:V16QI 2 "register_operand" ""))]
+ "TARGET_ALTIVEC"
+{
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmuloub (operands[0], operands[1], operands[2]));
+ else
+ emit_insn (gen_altivec_vmuleub (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
+(define_expand "vec_widen_smult_odd_v16qi"
+ [(use (match_operand:V8HI 0 "register_operand" ""))
+ (use (match_operand:V16QI 1 "register_operand" ""))
+ (use (match_operand:V16QI 2 "register_operand" ""))]
+ "TARGET_ALTIVEC"
+{
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmulosb (operands[0], operands[1], operands[2]));
+ else
+ emit_insn (gen_altivec_vmulesb (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
+(define_expand "vec_widen_umult_odd_v8hi"
+ [(use (match_operand:V4SI 0 "register_operand" ""))
+ (use (match_operand:V8HI 1 "register_operand" ""))
+ (use (match_operand:V8HI 2 "register_operand" ""))]
+ "TARGET_ALTIVEC"
+{
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmulouh (operands[0], operands[1], operands[2]));
+ else
+ emit_insn (gen_altivec_vmuleuh (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
+(define_expand "vec_widen_smult_odd_v8hi"
+ [(use (match_operand:V4SI 0 "register_operand" ""))
+ (use (match_operand:V8HI 1 "register_operand" ""))
+ (use (match_operand:V8HI 2 "register_operand" ""))]
+ "TARGET_ALTIVEC"
+{
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmulosh (operands[0], operands[1], operands[2]));
+ else
+ emit_insn (gen_altivec_vmulesh (operands[0], operands[1], operands[2]));
+ DONE;
+})
+
+(define_insn "altivec_vmuleub"
[(set (match_operand:V8HI 0 "register_operand" "=v")
(unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
(match_operand:V16QI 2 "register_operand" "v")]
"vmuleub %0,%1,%2"
[(set_attr "type" "veccomplex")])
-(define_insn "vec_widen_smult_even_v16qi"
+(define_insn "altivec_vmuloub"
[(set (match_operand:V8HI 0 "register_operand" "=v")
(unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
(match_operand:V16QI 2 "register_operand" "v")]
- UNSPEC_VMULESB))]
- "TARGET_ALTIVEC"
- "vmulesb %0,%1,%2"
- [(set_attr "type" "veccomplex")])
-
-(define_insn "vec_widen_umult_even_v8hi"
- [(set (match_operand:V4SI 0 "register_operand" "=v")
- (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
- (match_operand:V8HI 2 "register_operand" "v")]
- UNSPEC_VMULEUH))]
- "TARGET_ALTIVEC"
- "vmuleuh %0,%1,%2"
- [(set_attr "type" "veccomplex")])
-
-(define_insn "vec_widen_smult_even_v8hi"
- [(set (match_operand:V4SI 0 "register_operand" "=v")
- (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
- (match_operand:V8HI 2 "register_operand" "v")]
- UNSPEC_VMULESH))]
+ UNSPEC_VMULOUB))]
"TARGET_ALTIVEC"
- "vmulesh %0,%1,%2"
+ "vmuloub %0,%1,%2"
[(set_attr "type" "veccomplex")])
-(define_insn "vec_widen_umult_odd_v16qi"
+(define_insn "altivec_vmulesb"
[(set (match_operand:V8HI 0 "register_operand" "=v")
(unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
(match_operand:V16QI 2 "register_operand" "v")]
- UNSPEC_VMULOUB))]
+ UNSPEC_VMULESB))]
"TARGET_ALTIVEC"
- "vmuloub %0,%1,%2"
+ "vmulesb %0,%1,%2"
[(set_attr "type" "veccomplex")])
-(define_insn "vec_widen_smult_odd_v16qi"
+(define_insn "altivec_vmulosb"
[(set (match_operand:V8HI 0 "register_operand" "=v")
(unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
(match_operand:V16QI 2 "register_operand" "v")]
"vmulosb %0,%1,%2"
[(set_attr "type" "veccomplex")])
-(define_insn "vec_widen_umult_odd_v8hi"
+(define_insn "altivec_vmuleuh"
+ [(set (match_operand:V4SI 0 "register_operand" "=v")
+ (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
+ (match_operand:V8HI 2 "register_operand" "v")]
+ UNSPEC_VMULEUH))]
+ "TARGET_ALTIVEC"
+ "vmuleuh %0,%1,%2"
+ [(set_attr "type" "veccomplex")])
+
+(define_insn "altivec_vmulouh"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
(match_operand:V8HI 2 "register_operand" "v")]
"vmulouh %0,%1,%2"
[(set_attr "type" "veccomplex")])
-(define_insn "vec_widen_smult_odd_v8hi"
+(define_insn "altivec_vmulesh"
+ [(set (match_operand:V4SI 0 "register_operand" "=v")
+ (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
+ (match_operand:V8HI 2 "register_operand" "v")]
+ UNSPEC_VMULESH))]
+ "TARGET_ALTIVEC"
+ "vmulesh %0,%1,%2"
+ [(set_attr "type" "veccomplex")])
+
+(define_insn "altivec_vmulosh"
[(set (match_operand:V4SI 0 "register_operand" "=v")
(unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
(match_operand:V8HI 2 "register_operand" "v")]
(match_operand:V4SI 2 "register_operand" "v")]
UNSPEC_VPKPX))]
"TARGET_ALTIVEC"
- "vpkpx %0,%1,%2"
+ "*
+ {
+ if (BYTES_BIG_ENDIAN)
+ return \"vpkpx %0,%1,%2\";
+ else
+ return \"vpkpx %0,%2,%1\";
+ }"
[(set_attr "type" "vecperm")])
(define_insn "altivec_vpks<VI_char>ss"
(match_operand:VP 2 "register_operand" "v")]
UNSPEC_VPACK_SIGN_SIGN_SAT))]
"<VI_unit>"
- "vpks<VI_char>ss %0,%1,%2"
+ "*
+ {
+ if (BYTES_BIG_ENDIAN)
+ return \"vpks<VI_char>ss %0,%1,%2\";
+ else
+ return \"vpks<VI_char>ss %0,%2,%1\";
+ }"
[(set_attr "type" "vecperm")])
(define_insn "altivec_vpks<VI_char>us"
(match_operand:VP 2 "register_operand" "v")]
UNSPEC_VPACK_SIGN_UNS_SAT))]
"<VI_unit>"
- "vpks<VI_char>us %0,%1,%2"
+ "*
+ {
+ if (BYTES_BIG_ENDIAN)
+ return \"vpks<VI_char>us %0,%1,%2\";
+ else
+ return \"vpks<VI_char>us %0,%2,%1\";
+ }"
[(set_attr "type" "vecperm")])
(define_insn "altivec_vpku<VI_char>us"
(match_operand:VP 2 "register_operand" "v")]
UNSPEC_VPACK_UNS_UNS_SAT))]
"<VI_unit>"
- "vpku<VI_char>us %0,%1,%2"
+ "*
+ {
+ if (BYTES_BIG_ENDIAN)
+ return \"vpku<VI_char>us %0,%1,%2\";
+ else
+ return \"vpku<VI_char>us %0,%2,%1\";
+ }"
[(set_attr "type" "vecperm")])
(define_insn "altivec_vpku<VI_char>um"
(match_operand:VP 2 "register_operand" "v")]
UNSPEC_VPACK_UNS_UNS_MOD))]
"<VI_unit>"
- "vpku<VI_char>um %0,%1,%2"
+ "*
+ {
+ if (BYTES_BIG_ENDIAN)
+ return \"vpku<VI_char>um %0,%1,%2\";
+ else
+ return \"vpku<VI_char>um %0,%2,%1\";
+ }"
[(set_attr "type" "vecperm")])
(define_insn "*altivec_vrl<VI_char>"
(match_operand:V16QI 3 "register_operand" "")]
UNSPEC_VPERM))]
"TARGET_ALTIVEC"
- "")
+{
+ if (!BYTES_BIG_ENDIAN) {
+ altivec_expand_vec_perm_le (operands);
+ DONE;
+ }
+})
(define_expand "vec_perm_constv16qi"
[(match_operand:V16QI 0 "register_operand" "")
rtx vzero = gen_reg_rtx (V8HImode);
rtx mask = gen_reg_rtx (V16QImode);
rtvec v = rtvec_alloc (16);
+ bool be = BYTES_BIG_ENDIAN;
emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
- RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 0);
- RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1);
- RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 2);
- RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3);
- RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 4);
- RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5);
- RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 6);
- RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7);
+ RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7);
+ RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 0 : 16);
+ RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 6);
+ RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16);
+ RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5);
+ RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 2 : 16);
+ RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 4);
+ RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16);
+ RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3);
+ RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 4 : 16);
+ RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 2);
+ RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16);
+ RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1);
+ RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 6 : 16);
+ RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 0);
+ RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16);
emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
rtx vzero = gen_reg_rtx (V4SImode);
rtx mask = gen_reg_rtx (V16QImode);
rtvec v = rtvec_alloc (16);
+ bool be = BYTES_BIG_ENDIAN;
emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
- RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17);
- RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 0);
- RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 1);
- RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17);
- RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 2);
- RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 3);
- RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17);
- RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 4);
- RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 5);
- RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17);
- RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 6);
- RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 7);
+ RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7);
+ RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 6);
+ RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 0 : 17);
+ RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16);
+ RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5);
+ RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 4);
+ RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 2 : 17);
+ RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16);
+ RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3);
+ RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 2);
+ RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 4 : 17);
+ RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16);
+ RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1);
+ RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 0);
+ RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 6 : 17);
+ RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16);
emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
rtx vzero = gen_reg_rtx (V8HImode);
rtx mask = gen_reg_rtx (V16QImode);
rtvec v = rtvec_alloc (16);
+ bool be = BYTES_BIG_ENDIAN;
emit_insn (gen_altivec_vspltish (vzero, const0_rtx));
- RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 8);
- RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9);
- RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 10);
- RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11);
- RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 12);
- RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13);
- RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 14);
- RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15);
+ RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
+ RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 8 : 16);
+ RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 14);
+ RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16);
+ RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
+ RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 10 : 16);
+ RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 12);
+ RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
+ RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
+ RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 12 : 16);
+ RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 10);
+ RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
+ RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9);
+ RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 14 : 16);
+ RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 8);
+ RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask));
rtx vzero = gen_reg_rtx (V4SImode);
rtx mask = gen_reg_rtx (V16QImode);
rtvec v = rtvec_alloc (16);
+ bool be = BYTES_BIG_ENDIAN;
emit_insn (gen_altivec_vspltisw (vzero, const0_rtx));
- RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, 17);
- RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, 8);
- RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, 9);
- RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, 17);
- RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, 10);
- RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, 11);
- RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, 17);
- RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, 12);
- RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, 13);
- RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, 16);
- RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, 17);
- RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, 14);
- RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, 15);
+ RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15);
+ RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 14);
+ RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 8 : 17);
+ RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16);
+ RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13);
+ RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 12);
+ RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 10 : 17);
+ RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16);
+ RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11);
+ RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 10);
+ RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 12 : 17);
+ RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16);
+ RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9);
+ RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 8);
+ RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17);
+ RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16);
emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask));
emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
+ else
+ emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
DONE;
}")
emit_insn (gen_vec_widen_umult_even_v16qi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_umult_odd_v16qi (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
+ else
+ emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
DONE;
}")
emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrghh (operands[0], ve, vo));
+ else
+ emit_insn (gen_altivec_vmrghh (operands[0], vo, ve));
DONE;
}")
emit_insn (gen_vec_widen_smult_even_v16qi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_smult_odd_v16qi (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrglh (operands[0], ve, vo));
+ else
+ emit_insn (gen_altivec_vmrglh (operands[0], vo, ve));
DONE;
}")
emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
+ else
+ emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
DONE;
}")
emit_insn (gen_vec_widen_umult_even_v8hi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_umult_odd_v8hi (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
+ else
+ emit_insn (gen_altivec_vmrglw (operands[0], vo, ve));
DONE;
}")
emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrghw (operands[0], ve, vo));
+ else
+ emit_insn (gen_altivec_vmrghw (operands[0], vo, ve));
DONE;
}")
emit_insn (gen_vec_widen_smult_even_v8hi (ve, operands[1], operands[2]));
emit_insn (gen_vec_widen_smult_odd_v8hi (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrglw (operands[0], ve, vo));
+ else
+ emit_insn (gen_altivec_vmrglw (operands[0], vo, ve));
DONE;
}")
extern void rs6000_expand_vector_set (rtx, rtx, int);
extern void rs6000_expand_vector_extract (rtx, rtx, int);
extern bool altivec_expand_vec_perm_const (rtx op[4]);
+extern void altivec_expand_vec_perm_le (rtx op[4]);
extern bool rs6000_expand_vec_perm_const (rtx op[4]);
extern void rs6000_expand_extract_even (rtx, rtx, rtx);
extern void rs6000_expand_interleave (rtx, rtx, rtx, bool);
extern void rs6000_fatal_bad_address (rtx);
extern rtx create_TOC_reference (rtx, rtx);
extern void rs6000_split_multireg_move (rtx, rtx);
+extern void rs6000_emit_le_vsx_move (rtx, rtx, enum machine_mode);
extern void rs6000_emit_move (rtx, rtx, enum machine_mode);
extern rtx rs6000_secondary_memory_needed_rtx (enum machine_mode);
extern rtx (*rs6000_legitimize_reload_address_ptr) (rtx, enum machine_mode,
}
else if (TARGET_PAIRED_FLOAT)
msg = N_("-mvsx and -mpaired are incompatible");
- /* The hardware will allow VSX and little endian, but until we make sure
- things like vector select, etc. work don't allow VSX on little endian
- systems at this point. */
- else if (!BYTES_BIG_ENDIAN)
- msg = N_("-mvsx used with little endian code");
else if (TARGET_AVOID_XFORM > 0)
msg = N_("-mvsx needs indexed addressing");
else if (!TARGET_ALTIVEC && (rs6000_isa_flags_explicit
/* Check if VAL is present in every STEP-th element, and the
other elements are filled with its most significant bit. */
- for (i = 0; i < nunits - 1; ++i)
+ for (i = 1; i < nunits; ++i)
{
HOST_WIDE_INT desired_val;
- if (((BYTES_BIG_ENDIAN ? i + 1 : i) & (step - 1)) == 0)
+ unsigned elt = BYTES_BIG_ENDIAN ? nunits - 1 - i : i;
+ if ((i & (step - 1)) == 0)
desired_val = val;
else
desired_val = msb_val;
- if (desired_val != const_vector_elt_as_int (op, i))
+ if (desired_val != const_vector_elt_as_int (op, elt))
return false;
}
of 64-bit items is not supported on Altivec. */
if (all_same && GET_MODE_SIZE (inner_mode) <= 4)
{
+ rtx field;
mem = assign_stack_temp (mode, GET_MODE_SIZE (inner_mode));
emit_move_insn (adjust_address_nv (mem, inner_mode, 0),
XVECEXP (vals, 0, 0));
gen_rtx_SET (VOIDmode,
target, mem),
x)));
+ field = (BYTES_BIG_ENDIAN ? const0_rtx
+ : GEN_INT (GET_MODE_NUNITS (mode) - 1));
x = gen_rtx_VEC_SELECT (inner_mode, target,
gen_rtx_PARALLEL (VOIDmode,
- gen_rtvec (1, const0_rtx)));
+ gen_rtvec (1, field)));
emit_insn (gen_rtx_SET (VOIDmode, target,
gen_rtx_VEC_DUPLICATE (mode, x)));
return;
XVECEXP (mask, 0, elt*width + i)
= GEN_INT (i + 0x10);
x = gen_rtx_CONST_VECTOR (V16QImode, XVEC (mask, 0));
- x = gen_rtx_UNSPEC (mode,
- gen_rtvec (3, target, reg,
- force_reg (V16QImode, x)),
- UNSPEC_VPERM);
+
+ if (BYTES_BIG_ENDIAN)
+ x = gen_rtx_UNSPEC (mode,
+ gen_rtvec (3, target, reg,
+ force_reg (V16QImode, x)),
+ UNSPEC_VPERM);
+ else
+ {
+ /* Invert selector. */
+ rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode,
+ gen_rtx_CONST_INT (QImode, -1));
+ rtx tmp = gen_reg_rtx (V16QImode);
+ emit_move_insn (tmp, splat);
+ x = gen_rtx_MINUS (V16QImode, tmp, force_reg (V16QImode, x));
+ emit_move_insn (tmp, x);
+
+ /* Permute with operands reversed and adjusted selector. */
+ x = gen_rtx_UNSPEC (mode, gen_rtvec (3, reg, target, tmp),
+ UNSPEC_VPERM);
+ }
+
emit_insn (gen_rtx_SET (VOIDmode, target, x));
}
copy_addr_to_reg (XEXP (operands[1], 0)));
}
+/* Generate a vector of constants to permute MODE for a little-endian
+ storage operation by swapping the two halves of a vector. */
+static rtvec
+rs6000_const_vec (enum machine_mode mode)
+{
+ int i, subparts;
+ rtvec v;
+
+ switch (mode)
+ {
+ case V2DFmode:
+ case V2DImode:
+ subparts = 2;
+ break;
+ case V4SFmode:
+ case V4SImode:
+ subparts = 4;
+ break;
+ case V8HImode:
+ subparts = 8;
+ break;
+ case V16QImode:
+ subparts = 16;
+ break;
+ default:
+ gcc_unreachable();
+ }
+
+ v = rtvec_alloc (subparts);
+
+ for (i = 0; i < subparts / 2; ++i)
+ RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i + subparts / 2);
+ for (i = subparts / 2; i < subparts; ++i)
+ RTVEC_ELT (v, i) = gen_rtx_CONST_INT (DImode, i - subparts / 2);
+
+ return v;
+}
+
+/* Generate a permute rtx that represents an lxvd2x, stxvd2x, or xxpermdi
+ for a VSX load or store operation. */
+rtx
+rs6000_gen_le_vsx_permute (rtx source, enum machine_mode mode)
+{
+ rtx par = gen_rtx_PARALLEL (VOIDmode, rs6000_const_vec (mode));
+ return gen_rtx_VEC_SELECT (mode, source, par);
+}
+
+/* Emit a little-endian load from vector memory location SOURCE to VSX
+ register DEST in mode MODE. The load is done with two permuting
+ insn's that represent an lxvd2x and xxpermdi. */
+void
+rs6000_emit_le_vsx_load (rtx dest, rtx source, enum machine_mode mode)
+{
+ rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (dest) : dest;
+ rtx permute_mem = rs6000_gen_le_vsx_permute (source, mode);
+ rtx permute_reg = rs6000_gen_le_vsx_permute (tmp, mode);
+ emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_mem));
+ emit_insn (gen_rtx_SET (VOIDmode, dest, permute_reg));
+}
+
+/* Emit a little-endian store to vector memory location DEST from VSX
+ register SOURCE in mode MODE. The store is done with two permuting
+ insn's that represent an xxpermdi and an stxvd2x. */
+void
+rs6000_emit_le_vsx_store (rtx dest, rtx source, enum machine_mode mode)
+{
+ rtx tmp = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (source) : source;
+ rtx permute_src = rs6000_gen_le_vsx_permute (source, mode);
+ rtx permute_tmp = rs6000_gen_le_vsx_permute (tmp, mode);
+ emit_insn (gen_rtx_SET (VOIDmode, tmp, permute_src));
+ emit_insn (gen_rtx_SET (VOIDmode, dest, permute_tmp));
+}
+
+/* Emit a sequence representing a little-endian VSX load or store,
+ moving data from SOURCE to DEST in mode MODE. This is done
+ separately from rs6000_emit_move to ensure it is called only
+ during expand. LE VSX loads and stores introduced later are
+ handled with a split. The expand-time RTL generation allows
+ us to optimize away redundant pairs of register-permutes. */
+void
+rs6000_emit_le_vsx_move (rtx dest, rtx source, enum machine_mode mode)
+{
+ gcc_assert (!BYTES_BIG_ENDIAN
+ && VECTOR_MEM_VSX_P (mode)
+ && mode != TImode
+ && !gpr_or_gpr_p (dest, source)
+ && (MEM_P (source) ^ MEM_P (dest)));
+
+ if (MEM_P (source))
+ {
+ gcc_assert (REG_P (dest));
+ rs6000_emit_le_vsx_load (dest, source, mode);
+ }
+ else
+ {
+ if (!REG_P (source))
+ source = force_reg (mode, source);
+ rs6000_emit_le_vsx_store (dest, source, mode);
+ }
+}
+
/* Emit a move from SOURCE to DEST in mode MODE. */
void
rs6000_emit_move (rtx dest, rtx source, enum machine_mode mode)
case ALTIVEC_BUILTIN_MASK_FOR_LOAD:
case ALTIVEC_BUILTIN_MASK_FOR_STORE:
{
- int icode = (int) CODE_FOR_altivec_lvsr;
+ int icode = (BYTES_BIG_ENDIAN ? (int) CODE_FOR_altivec_lvsr
+ : (int) CODE_FOR_altivec_lvsl);
enum machine_mode tmode = insn_data[icode].operand[0].mode;
enum machine_mode mode = insn_data[icode].operand[1].mode;
tree arg;
static rtx
rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE_INT val,
- rtx reg2, rtx rreg)
+ rtx reg2, rtx rreg, rtx split_reg)
{
rtx real, temp;
}
}
+ /* If a store insn has been split into multiple insns, the
+ true source register is given by split_reg. */
+ if (split_reg != NULL_RTX)
+ real = gen_rtx_SET (VOIDmode, SET_DEST (real), split_reg);
+
RTX_FRAME_RELATED_P (insn) = 1;
add_reg_note (insn, REG_FRAME_RELATED_EXPR, real);
reg = gen_rtx_REG (mode, regno);
insn = emit_insn (gen_frame_store (reg, frame_reg, offset));
return rs6000_frame_related (insn, frame_reg, frame_reg_to_sp,
- NULL_RTX, NULL_RTX);
+ NULL_RTX, NULL_RTX, NULL_RTX);
}
/* Emit an offset memory reference suitable for a frame store, while
insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
- treg, GEN_INT (-info->total_size));
+ treg, GEN_INT (-info->total_size), NULL_RTX);
sp_off = frame_off = info->total_size;
}
insn = emit_move_insn (mem, reg);
rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
- NULL_RTX, NULL_RTX);
+ NULL_RTX, NULL_RTX, NULL_RTX);
END_USE (0);
}
}
info->lr_save_offset,
DFmode, sel);
rs6000_frame_related (insn, ptr_reg, sp_off,
- NULL_RTX, NULL_RTX);
+ NULL_RTX, NULL_RTX, NULL_RTX);
if (lr)
END_USE (0);
}
SAVRES_SAVE | SAVRES_GPR);
rs6000_frame_related (insn, spe_save_area_ptr, sp_off - save_off,
- NULL_RTX, NULL_RTX);
+ NULL_RTX, NULL_RTX, NULL_RTX);
}
/* Move the static chain pointer back. */
info->lr_save_offset + ptr_off,
reg_mode, sel);
rs6000_frame_related (insn, ptr_reg, sp_off - ptr_off,
- NULL_RTX, NULL_RTX);
+ NULL_RTX, NULL_RTX, NULL_RTX);
if (lr)
END_USE (0);
}
info->gp_save_offset + frame_off + reg_size * i);
insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
- NULL_RTX, NULL_RTX);
+ NULL_RTX, NULL_RTX, NULL_RTX);
}
else if (!WORLD_SAVE_P (info))
{
info->altivec_save_offset + ptr_off,
0, V4SImode, SAVRES_SAVE | SAVRES_VR);
rs6000_frame_related (insn, scratch_reg, sp_off - ptr_off,
- NULL_RTX, NULL_RTX);
+ NULL_RTX, NULL_RTX, NULL_RTX);
if (REGNO (frame_reg_rtx) == REGNO (scratch_reg))
{
/* The oddity mentioned above clobbered our frame reg. */
for (i = info->first_altivec_reg_save; i <= LAST_ALTIVEC_REGNO; ++i)
if (info->vrsave_mask & ALTIVEC_REG_BIT (i))
{
- rtx areg, savereg, mem;
+ rtx areg, savereg, mem, split_reg;
int offset;
offset = (info->altivec_save_offset + frame_off
insn = emit_move_insn (mem, savereg);
+ /* When we split a VSX store into two insns, we need to make
+ sure the DWARF info knows which register we are storing.
+ Pass it in to be used on the appropriate note. */
+ if (!BYTES_BIG_ENDIAN
+ && GET_CODE (PATTERN (insn)) == SET
+ && GET_CODE (SET_SRC (PATTERN (insn))) == VEC_SELECT)
+ split_reg = savereg;
+ else
+ split_reg = NULL_RTX;
+
rs6000_frame_related (insn, frame_reg_rtx, sp_off - frame_off,
- areg, GEN_INT (offset));
+ areg, GEN_INT (offset), split_reg);
}
}
}
}
+/* Expand an Altivec constant permutation for little endian mode.
+ There are two issues: First, the two input operands must be
+ swapped so that together they form a double-wide array in LE
+ order. Second, the vperm instruction has surprising behavior
+ in LE mode: it interprets the elements of the source vectors
+ in BE mode ("left to right") and interprets the elements of
+ the destination vector in LE mode ("right to left"). To
+ correct for this, we must subtract each element of the permute
+ control vector from 31.
+
+ For example, suppose we want to concatenate vr10 = {0, 1, 2, 3}
+ with vr11 = {4, 5, 6, 7} and extract {0, 2, 4, 6} using a vperm.
+ We place {0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27} in vr12 to
+ serve as the permute control vector. Then, in BE mode,
+
+ vperm 9,10,11,12
+
+ places the desired result in vr9. However, in LE mode the
+ vector contents will be
+
+ vr10 = 00000003 00000002 00000001 00000000
+ vr11 = 00000007 00000006 00000005 00000004
+
+ The result of the vperm using the same permute control vector is
+
+ vr9 = 05000000 07000000 01000000 03000000
+
+ That is, the leftmost 4 bytes of vr10 are interpreted as the
+ source for the rightmost 4 bytes of vr9, and so on.
+
+ If we change the permute control vector to
+
+ vr12 = {31,20,29,28,23,22,21,20,15,14,13,12,7,6,5,4}
+
+ and issue
+
+ vperm 9,11,10,12
+
+ we get the desired
+
+ vr9 = 00000006 00000004 00000002 00000000. */
+
+void
+altivec_expand_vec_perm_const_le (rtx operands[4])
+{
+ unsigned int i;
+ rtx perm[16];
+ rtx constv, unspec;
+ rtx target = operands[0];
+ rtx op0 = operands[1];
+ rtx op1 = operands[2];
+ rtx sel = operands[3];
+
+ /* Unpack and adjust the constant selector. */
+ for (i = 0; i < 16; ++i)
+ {
+ rtx e = XVECEXP (sel, 0, i);
+ unsigned int elt = 31 - (INTVAL (e) & 31);
+ perm[i] = GEN_INT (elt);
+ }
+
+ /* Expand to a permute, swapping the inputs and using the
+ adjusted selector. */
+ if (!REG_P (op0))
+ op0 = force_reg (V16QImode, op0);
+ if (!REG_P (op1))
+ op1 = force_reg (V16QImode, op1);
+
+ constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
+ constv = force_reg (V16QImode, constv);
+ unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, constv),
+ UNSPEC_VPERM);
+ if (!REG_P (target))
+ {
+ rtx tmp = gen_reg_rtx (V16QImode);
+ emit_move_insn (tmp, unspec);
+ unspec = tmp;
+ }
+
+ emit_move_insn (target, unspec);
+}
+
+/* Similarly to altivec_expand_vec_perm_const_le, we must adjust the
+ permute control vector. But here it's not a constant, so we must
+ generate a vector splat/subtract to do the adjustment. */
+
+void
+altivec_expand_vec_perm_le (rtx operands[4])
+{
+ rtx splat, unspec;
+ rtx target = operands[0];
+ rtx op0 = operands[1];
+ rtx op1 = operands[2];
+ rtx sel = operands[3];
+ rtx tmp = target;
+
+ /* Get everything in regs so the pattern matches. */
+ if (!REG_P (op0))
+ op0 = force_reg (V16QImode, op0);
+ if (!REG_P (op1))
+ op1 = force_reg (V16QImode, op1);
+ if (!REG_P (sel))
+ sel = force_reg (V16QImode, sel);
+ if (!REG_P (target))
+ tmp = gen_reg_rtx (V16QImode);
+
+ /* SEL = splat(31) - SEL. */
+ /* We want to subtract from 31, but we can't vspltisb 31 since
+ it's out of range. -1 works as well because only the low-order
+ five bits of the permute control vector elements are used. */
+ splat = gen_rtx_VEC_DUPLICATE (V16QImode,
+ gen_rtx_CONST_INT (QImode, -1));
+ emit_move_insn (tmp, splat);
+ sel = gen_rtx_MINUS (V16QImode, tmp, sel);
+ emit_move_insn (tmp, sel);
+
+ /* Permute with operands reversed and adjusted selector. */
+ unspec = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, op1, op0, tmp),
+ UNSPEC_VPERM);
+
+ /* Copy into target, possibly by way of a register. */
+ if (!REG_P (target))
+ {
+ emit_move_insn (tmp, unspec);
+ unspec = tmp;
+ }
+
+ emit_move_insn (target, unspec);
+}
+
/* Expand an Altivec constant permutation. Return true if we match
an efficient implementation; false to fall back to VPERM. */
{ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } },
{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum,
{ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } },
- { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb,
+ { OPTION_MASK_ALTIVEC,
+ BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb,
{ 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } },
- { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh,
+ { OPTION_MASK_ALTIVEC,
+ BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh,
{ 0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23 } },
- { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw,
+ { OPTION_MASK_ALTIVEC,
+ BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw,
{ 0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23 } },
- { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb,
+ { OPTION_MASK_ALTIVEC,
+ BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb,
{ 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } },
- { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh,
+ { OPTION_MASK_ALTIVEC,
+ BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh,
{ 8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } },
- { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw,
+ { OPTION_MASK_ALTIVEC,
+ BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw,
{ 8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } },
{ OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew,
{ 0, 1, 2, 3, 16, 17, 18, 19, 8, 9, 10, 11, 24, 25, 26, 27 } },
break;
if (i == 16)
{
+ if (!BYTES_BIG_ENDIAN)
+ elt = 15 - elt;
emit_insn (gen_altivec_vspltb (target, op0, GEN_INT (elt)));
return true;
}
break;
if (i == 16)
{
+ int field = BYTES_BIG_ENDIAN ? elt / 2 : 7 - elt / 2;
x = gen_reg_rtx (V8HImode);
emit_insn (gen_altivec_vsplth (x, gen_lowpart (V8HImode, op0),
- GEN_INT (elt / 2)));
+ GEN_INT (field)));
emit_move_insn (target, gen_lowpart (V16QImode, x));
return true;
}
break;
if (i == 16)
{
+ int field = BYTES_BIG_ENDIAN ? elt / 4 : 3 - elt / 4;
x = gen_reg_rtx (V4SImode);
emit_insn (gen_altivec_vspltw (x, gen_lowpart (V4SImode, op0),
- GEN_INT (elt / 4)));
+ GEN_INT (field)));
emit_move_insn (target, gen_lowpart (V16QImode, x));
return true;
}
enum machine_mode omode = insn_data[icode].operand[0].mode;
enum machine_mode imode = insn_data[icode].operand[1].mode;
- if (swapped)
+ /* For little-endian, don't use vpkuwum and vpkuhum if the
+ underlying vector type is not V4SI and V8HI, respectively.
+ For example, using vpkuwum with a V8HI picks up the even
+ halfwords (BE numbering) when the even halfwords (LE
+ numbering) are what we need. */
+ if (!BYTES_BIG_ENDIAN
+ && icode == CODE_FOR_altivec_vpkuwum
+ && ((GET_CODE (op0) == REG
+ && GET_MODE (op0) != V4SImode)
+ || (GET_CODE (op0) == SUBREG
+ && GET_MODE (XEXP (op0, 0)) != V4SImode)))
+ continue;
+ if (!BYTES_BIG_ENDIAN
+ && icode == CODE_FOR_altivec_vpkuhum
+ && ((GET_CODE (op0) == REG
+ && GET_MODE (op0) != V8HImode)
+ || (GET_CODE (op0) == SUBREG
+ && GET_MODE (XEXP (op0, 0)) != V8HImode)))
+ continue;
+
+ /* For little-endian, the two input operands must be swapped
+ (or swapped back) to ensure proper right-to-left numbering
+ from 0 to 2N-1. */
+ if (swapped ^ !BYTES_BIG_ENDIAN)
x = op0, op0 = op1, op1 = x;
if (imode != V16QImode)
{
}
}
+ if (!BYTES_BIG_ENDIAN)
+ {
+ altivec_expand_vec_perm_const_le (operands);
+ return true;
+ }
+
return false;
}
gcc_assert (GET_MODE_NUNITS (vmode) == 2);
dmode = mode_for_vector (GET_MODE_INNER (vmode), 4);
+ /* For little endian, swap operands and invert/swap selectors
+ to get the correct xxpermdi. The operand swap sets up the
+ inputs as a little endian array. The selectors are swapped
+ because they are defined to use big endian ordering. The
+ selectors are inverted to get the correct doublewords for
+ little endian ordering. */
+ if (!BYTES_BIG_ENDIAN)
+ {
+ int n;
+ perm0 = 3 - perm0;
+ perm1 = 3 - perm1;
+ n = perm0, perm0 = perm1, perm1 = n;
+ x = op0, op0 = op1, op1 = x;
+ }
+
x = gen_rtx_VEC_CONCAT (dmode, op0, op1);
v = gen_rtvec (2, GEN_INT (perm0), GEN_INT (perm1));
x = gen_rtx_VEC_SELECT (vmode, x, gen_rtx_PARALLEL (VOIDmode, v));
unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
rtx perm[16];
- high = (highp == BYTES_BIG_ENDIAN ? 0 : nelt / 2);
+ high = (highp ? 0 : nelt / 2);
for (i = 0; i < nelt / 2; i++)
{
perm[i * 2] = GEN_INT (i + high);
(smax "smax")])
\f
-;; Vector move instructions.
+;; Vector move instructions. Little-endian VSX loads and stores require
+;; special handling to circumvent "element endianness."
(define_expand "mov<mode>"
[(set (match_operand:VEC_M 0 "nonimmediate_operand" "")
(match_operand:VEC_M 1 "any_operand" ""))]
&& !vlogical_operand (operands[1], <MODE>mode))
operands[1] = force_reg (<MODE>mode, operands[1]);
}
+ if (!BYTES_BIG_ENDIAN
+ && VECTOR_MEM_VSX_P (<MODE>mode)
+ && <MODE>mode != TImode
+ && !gpr_or_gpr_p (operands[0], operands[1])
+ && (memory_operand (operands[0], <MODE>mode)
+ ^ memory_operand (operands[1], <MODE>mode)))
+ {
+ rs6000_emit_le_vsx_move (operands[0], operands[1], <MODE>mode);
+ DONE;
+ }
})
;; Generic vector floating point load/store instructions. These will match
{
rtx reg = gen_reg_rtx (V4SFmode);
- rs6000_expand_interleave (reg, operands[1], operands[1], true);
+ rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
emit_insn (gen_vsx_xvcvspdp (operands[0], reg));
DONE;
})
{
rtx reg = gen_reg_rtx (V4SFmode);
- rs6000_expand_interleave (reg, operands[1], operands[1], false);
+ rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
emit_insn (gen_vsx_xvcvspdp (operands[0], reg));
DONE;
})
{
rtx reg = gen_reg_rtx (V4SImode);
- rs6000_expand_interleave (reg, operands[1], operands[1], true);
+ rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg));
DONE;
})
{
rtx reg = gen_reg_rtx (V4SImode);
- rs6000_expand_interleave (reg, operands[1], operands[1], false);
+ rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg));
DONE;
})
{
rtx reg = gen_reg_rtx (V4SImode);
- rs6000_expand_interleave (reg, operands[1], operands[1], true);
+ rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN);
emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg));
DONE;
})
{
rtx reg = gen_reg_rtx (V4SImode);
- rs6000_expand_interleave (reg, operands[1], operands[1], false);
+ rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN);
emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg));
DONE;
})
(match_operand:V16QI 3 "vlogical_operand" "")]
"VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
{
- emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1], operands[2],
- operands[3]));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vperm_<mode> (operands[0], operands[1],
+ operands[2], operands[3]));
+ else
+ {
+ /* We have changed lvsr to lvsl, so to complete the transformation
+ of vperm for LE, we must swap the inputs. */
+ rtx unspec = gen_rtx_UNSPEC (<MODE>mode,
+ gen_rtvec (3, operands[2],
+ operands[1], operands[3]),
+ UNSPEC_VPERM);
+ emit_move_insn (operands[0], unspec);
+ }
DONE;
})
])
;; VSX moves
+
+;; The patterns for LE permuted loads and stores come before the general
+;; VSX moves so they match first.
+(define_insn_and_split "*vsx_le_perm_load_<mode>"
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+ (match_operand:VSX_D 1 "memory_operand" "Z"))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX"
+ "#"
+ "!BYTES_BIG_ENDIAN && TARGET_VSX"
+ [(set (match_dup 2)
+ (vec_select:<MODE>
+ (match_dup 1)
+ (parallel [(const_int 1) (const_int 0)])))
+ (set (match_dup 0)
+ (vec_select:<MODE>
+ (match_dup 2)
+ (parallel [(const_int 1) (const_int 0)])))]
+ "
+{
+ operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
+ : operands[0];
+}
+ "
+ [(set_attr "type" "vecload")
+ (set_attr "length" "8")])
+
+(define_insn_and_split "*vsx_le_perm_load_<mode>"
+ [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
+ (match_operand:VSX_W 1 "memory_operand" "Z"))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX"
+ "#"
+ "!BYTES_BIG_ENDIAN && TARGET_VSX"
+ [(set (match_dup 2)
+ (vec_select:<MODE>
+ (match_dup 1)
+ (parallel [(const_int 2) (const_int 3)
+ (const_int 0) (const_int 1)])))
+ (set (match_dup 0)
+ (vec_select:<MODE>
+ (match_dup 2)
+ (parallel [(const_int 2) (const_int 3)
+ (const_int 0) (const_int 1)])))]
+ "
+{
+ operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
+ : operands[0];
+}
+ "
+ [(set_attr "type" "vecload")
+ (set_attr "length" "8")])
+
+(define_insn_and_split "*vsx_le_perm_load_v8hi"
+ [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
+ (match_operand:V8HI 1 "memory_operand" "Z"))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX"
+ "#"
+ "!BYTES_BIG_ENDIAN && TARGET_VSX"
+ [(set (match_dup 2)
+ (vec_select:V8HI
+ (match_dup 1)
+ (parallel [(const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)])))
+ (set (match_dup 0)
+ (vec_select:V8HI
+ (match_dup 2)
+ (parallel [(const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)])))]
+ "
+{
+ operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
+ : operands[0];
+}
+ "
+ [(set_attr "type" "vecload")
+ (set_attr "length" "8")])
+
+(define_insn_and_split "*vsx_le_perm_load_v16qi"
+ [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+ (match_operand:V16QI 1 "memory_operand" "Z"))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX"
+ "#"
+ "!BYTES_BIG_ENDIAN && TARGET_VSX"
+ [(set (match_dup 2)
+ (vec_select:V16QI
+ (match_dup 1)
+ (parallel [(const_int 8) (const_int 9)
+ (const_int 10) (const_int 11)
+ (const_int 12) (const_int 13)
+ (const_int 14) (const_int 15)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)
+ (const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)])))
+ (set (match_dup 0)
+ (vec_select:V16QI
+ (match_dup 2)
+ (parallel [(const_int 8) (const_int 9)
+ (const_int 10) (const_int 11)
+ (const_int 12) (const_int 13)
+ (const_int 14) (const_int 15)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)
+ (const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)])))]
+ "
+{
+ operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[0])
+ : operands[0];
+}
+ "
+ [(set_attr "type" "vecload")
+ (set_attr "length" "8")])
+
+(define_insn "*vsx_le_perm_store_<mode>"
+ [(set (match_operand:VSX_D 0 "memory_operand" "=Z")
+ (match_operand:VSX_D 1 "vsx_register_operand" "+wa"))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX"
+ "#"
+ [(set_attr "type" "vecstore")
+ (set_attr "length" "12")])
+
+(define_split
+ [(set (match_operand:VSX_D 0 "memory_operand" "")
+ (match_operand:VSX_D 1 "vsx_register_operand" ""))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
+ [(set (match_dup 2)
+ (vec_select:<MODE>
+ (match_dup 1)
+ (parallel [(const_int 1) (const_int 0)])))
+ (set (match_dup 0)
+ (vec_select:<MODE>
+ (match_dup 2)
+ (parallel [(const_int 1) (const_int 0)])))]
+{
+ operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
+ : operands[1];
+})
+
+;; The post-reload split requires that we re-permute the source
+;; register in case it is still live.
+(define_split
+ [(set (match_operand:VSX_D 0 "memory_operand" "")
+ (match_operand:VSX_D 1 "vsx_register_operand" ""))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
+ [(set (match_dup 1)
+ (vec_select:<MODE>
+ (match_dup 1)
+ (parallel [(const_int 1) (const_int 0)])))
+ (set (match_dup 0)
+ (vec_select:<MODE>
+ (match_dup 1)
+ (parallel [(const_int 1) (const_int 0)])))
+ (set (match_dup 1)
+ (vec_select:<MODE>
+ (match_dup 1)
+ (parallel [(const_int 1) (const_int 0)])))]
+ "")
+
+(define_insn "*vsx_le_perm_store_<mode>"
+ [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
+ (match_operand:VSX_W 1 "vsx_register_operand" "+wa"))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX"
+ "#"
+ [(set_attr "type" "vecstore")
+ (set_attr "length" "12")])
+
+(define_split
+ [(set (match_operand:VSX_W 0 "memory_operand" "")
+ (match_operand:VSX_W 1 "vsx_register_operand" ""))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
+ [(set (match_dup 2)
+ (vec_select:<MODE>
+ (match_dup 1)
+ (parallel [(const_int 2) (const_int 3)
+ (const_int 0) (const_int 1)])))
+ (set (match_dup 0)
+ (vec_select:<MODE>
+ (match_dup 2)
+ (parallel [(const_int 2) (const_int 3)
+ (const_int 0) (const_int 1)])))]
+{
+ operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
+ : operands[1];
+})
+
+;; The post-reload split requires that we re-permute the source
+;; register in case it is still live.
+(define_split
+ [(set (match_operand:VSX_W 0 "memory_operand" "")
+ (match_operand:VSX_W 1 "vsx_register_operand" ""))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
+ [(set (match_dup 1)
+ (vec_select:<MODE>
+ (match_dup 1)
+ (parallel [(const_int 2) (const_int 3)
+ (const_int 0) (const_int 1)])))
+ (set (match_dup 0)
+ (vec_select:<MODE>
+ (match_dup 1)
+ (parallel [(const_int 2) (const_int 3)
+ (const_int 0) (const_int 1)])))
+ (set (match_dup 1)
+ (vec_select:<MODE>
+ (match_dup 1)
+ (parallel [(const_int 2) (const_int 3)
+ (const_int 0) (const_int 1)])))]
+ "")
+
+(define_insn "*vsx_le_perm_store_v8hi"
+ [(set (match_operand:V8HI 0 "memory_operand" "=Z")
+ (match_operand:V8HI 1 "vsx_register_operand" "+wa"))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX"
+ "#"
+ [(set_attr "type" "vecstore")
+ (set_attr "length" "12")])
+
+(define_split
+ [(set (match_operand:V8HI 0 "memory_operand" "")
+ (match_operand:V8HI 1 "vsx_register_operand" ""))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
+ [(set (match_dup 2)
+ (vec_select:V8HI
+ (match_dup 1)
+ (parallel [(const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)])))
+ (set (match_dup 0)
+ (vec_select:V8HI
+ (match_dup 2)
+ (parallel [(const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)])))]
+{
+ operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
+ : operands[1];
+})
+
+;; The post-reload split requires that we re-permute the source
+;; register in case it is still live.
+(define_split
+ [(set (match_operand:V8HI 0 "memory_operand" "")
+ (match_operand:V8HI 1 "vsx_register_operand" ""))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
+ [(set (match_dup 1)
+ (vec_select:V8HI
+ (match_dup 1)
+ (parallel [(const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)])))
+ (set (match_dup 0)
+ (vec_select:V8HI
+ (match_dup 1)
+ (parallel [(const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)])))
+ (set (match_dup 1)
+ (vec_select:V8HI
+ (match_dup 1)
+ (parallel [(const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)])))]
+ "")
+
+(define_insn "*vsx_le_perm_store_v16qi"
+ [(set (match_operand:V16QI 0 "memory_operand" "=Z")
+ (match_operand:V16QI 1 "vsx_register_operand" "+wa"))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX"
+ "#"
+ [(set_attr "type" "vecstore")
+ (set_attr "length" "12")])
+
+(define_split
+ [(set (match_operand:V16QI 0 "memory_operand" "")
+ (match_operand:V16QI 1 "vsx_register_operand" ""))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX && !reload_completed"
+ [(set (match_dup 2)
+ (vec_select:V16QI
+ (match_dup 1)
+ (parallel [(const_int 8) (const_int 9)
+ (const_int 10) (const_int 11)
+ (const_int 12) (const_int 13)
+ (const_int 14) (const_int 15)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)
+ (const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)])))
+ (set (match_dup 0)
+ (vec_select:V16QI
+ (match_dup 2)
+ (parallel [(const_int 8) (const_int 9)
+ (const_int 10) (const_int 11)
+ (const_int 12) (const_int 13)
+ (const_int 14) (const_int 15)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)
+ (const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)])))]
+{
+ operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
+ : operands[1];
+})
+
+;; The post-reload split requires that we re-permute the source
+;; register in case it is still live.
+(define_split
+ [(set (match_operand:V16QI 0 "memory_operand" "")
+ (match_operand:V16QI 1 "vsx_register_operand" ""))]
+ "!BYTES_BIG_ENDIAN && TARGET_VSX && reload_completed"
+ [(set (match_dup 1)
+ (vec_select:V16QI
+ (match_dup 1)
+ (parallel [(const_int 8) (const_int 9)
+ (const_int 10) (const_int 11)
+ (const_int 12) (const_int 13)
+ (const_int 14) (const_int 15)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)
+ (const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)])))
+ (set (match_dup 0)
+ (vec_select:V16QI
+ (match_dup 1)
+ (parallel [(const_int 8) (const_int 9)
+ (const_int 10) (const_int 11)
+ (const_int 12) (const_int 13)
+ (const_int 14) (const_int 15)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)
+ (const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)])))
+ (set (match_dup 1)
+ (vec_select:V16QI
+ (match_dup 1)
+ (parallel [(const_int 8) (const_int 9)
+ (const_int 10) (const_int 11)
+ (const_int 12) (const_int 13)
+ (const_int 14) (const_int 15)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)
+ (const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)])))]
+ "")
+
+
(define_insn "*vsx_mov<mode>"
[(set (match_operand:VSX_M 0 "nonimmediate_operand" "=Z,<VSr>,<VSr>,?Z,?wa,?wa,wQ,?&r,??Y,??r,??r,<VSr>,?wa,*r,v,wZ, v")
(match_operand:VSX_M 1 "input_operand" "<VSr>,Z,<VSr>,wa,Z,wa,r,wQ,r,Y,r,j,j,j,W,v,wZ"))]
(match_operand:<VS_scalar> 1 "vsx_register_operand" "ws,wa")
(match_operand:<VS_scalar> 2 "vsx_register_operand" "ws,wa")))]
"VECTOR_MEM_VSX_P (<MODE>mode)"
- "xxpermdi %x0,%x1,%x2,0"
+{
+ if (BYTES_BIG_ENDIAN)
+ return "xxpermdi %x0,%x1,%x2,0";
+ else
+ return "xxpermdi %x0,%x2,%x1,0";
+}
[(set_attr "type" "vecperm")])
;; Special purpose concat using xxpermdi to glue two single precision values
(match_operand:SF 2 "vsx_register_operand" "f,f")]
UNSPEC_VSX_CONCAT))]
"VECTOR_MEM_VSX_P (V2DFmode)"
- "xxpermdi %x0,%x1,%x2,0"
+{
+ if (BYTES_BIG_ENDIAN)
+ return "xxpermdi %x0,%x1,%x2,0";
+ else
+ return "xxpermdi %x0,%x2,%x1,0";
+}
+ [(set_attr "type" "vecperm")])
+
+;; xxpermdi for little endian loads and stores. We need several of
+;; these since the form of the PARALLEL differs by mode.
+(define_insn "*vsx_xxpermdi2_le_<mode>"
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+ (vec_select:VSX_D
+ (match_operand:VSX_D 1 "vsx_register_operand" "wa")
+ (parallel [(const_int 1) (const_int 0)])))]
+ "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
+ "xxpermdi %x0,%x1,%x1,2"
[(set_attr "type" "vecperm")])
+(define_insn "*vsx_xxpermdi4_le_<mode>"
+ [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
+ (vec_select:VSX_W
+ (match_operand:VSX_W 1 "vsx_register_operand" "wa")
+ (parallel [(const_int 2) (const_int 3)
+ (const_int 0) (const_int 1)])))]
+ "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
+ "xxpermdi %x0,%x1,%x1,2"
+ [(set_attr "type" "vecperm")])
+
+(define_insn "*vsx_xxpermdi8_le_V8HI"
+ [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
+ (vec_select:V8HI
+ (match_operand:V8HI 1 "vsx_register_operand" "wa")
+ (parallel [(const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)])))]
+ "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
+ "xxpermdi %x0,%x1,%x1,2"
+ [(set_attr "type" "vecperm")])
+
+(define_insn "*vsx_xxpermdi16_le_V16QI"
+ [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+ (vec_select:V16QI
+ (match_operand:V16QI 1 "vsx_register_operand" "wa")
+ (parallel [(const_int 8) (const_int 9)
+ (const_int 10) (const_int 11)
+ (const_int 12) (const_int 13)
+ (const_int 14) (const_int 15)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)
+ (const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)])))]
+ "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
+ "xxpermdi %x0,%x1,%x1,2"
+ [(set_attr "type" "vecperm")])
+
+;; lxvd2x for little endian loads. We need several of
+;; these since the form of the PARALLEL differs by mode.
+(define_insn "*vsx_lxvd2x2_le_<mode>"
+ [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+ (vec_select:VSX_D
+ (match_operand:VSX_D 1 "memory_operand" "Z")
+ (parallel [(const_int 1) (const_int 0)])))]
+ "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
+ "lxvd2x %x0,%y1"
+ [(set_attr "type" "vecload")])
+
+(define_insn "*vsx_lxvd2x4_le_<mode>"
+ [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
+ (vec_select:VSX_W
+ (match_operand:VSX_W 1 "memory_operand" "Z")
+ (parallel [(const_int 2) (const_int 3)
+ (const_int 0) (const_int 1)])))]
+ "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
+ "lxvd2x %x0,%y1"
+ [(set_attr "type" "vecload")])
+
+(define_insn "*vsx_lxvd2x8_le_V8HI"
+ [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
+ (vec_select:V8HI
+ (match_operand:V8HI 1 "memory_operand" "Z")
+ (parallel [(const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)])))]
+ "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
+ "lxvd2x %x0,%y1"
+ [(set_attr "type" "vecload")])
+
+(define_insn "*vsx_lxvd2x16_le_V16QI"
+ [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa")
+ (vec_select:V16QI
+ (match_operand:V16QI 1 "memory_operand" "Z")
+ (parallel [(const_int 8) (const_int 9)
+ (const_int 10) (const_int 11)
+ (const_int 12) (const_int 13)
+ (const_int 14) (const_int 15)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)
+ (const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)])))]
+ "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
+ "lxvd2x %x0,%y1"
+ [(set_attr "type" "vecload")])
+
+;; stxvd2x for little endian stores. We need several of
+;; these since the form of the PARALLEL differs by mode.
+(define_insn "*vsx_stxvd2x2_le_<mode>"
+ [(set (match_operand:VSX_D 0 "memory_operand" "=Z")
+ (vec_select:VSX_D
+ (match_operand:VSX_D 1 "vsx_register_operand" "wa")
+ (parallel [(const_int 1) (const_int 0)])))]
+ "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
+ "stxvd2x %x1,%y0"
+ [(set_attr "type" "vecstore")])
+
+(define_insn "*vsx_stxvd2x4_le_<mode>"
+ [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
+ (vec_select:VSX_W
+ (match_operand:VSX_W 1 "vsx_register_operand" "wa")
+ (parallel [(const_int 2) (const_int 3)
+ (const_int 0) (const_int 1)])))]
+ "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (<MODE>mode)"
+ "stxvd2x %x1,%y0"
+ [(set_attr "type" "vecstore")])
+
+(define_insn "*vsx_stxvd2x8_le_V8HI"
+ [(set (match_operand:V8HI 0 "memory_operand" "=Z")
+ (vec_select:V8HI
+ (match_operand:V8HI 1 "vsx_register_operand" "wa")
+ (parallel [(const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)])))]
+ "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V8HImode)"
+ "stxvd2x %x1,%y0"
+ [(set_attr "type" "vecstore")])
+
+(define_insn "*vsx_stxvd2x16_le_V16QI"
+ [(set (match_operand:V16QI 0 "memory_operand" "=Z")
+ (vec_select:V16QI
+ (match_operand:V16QI 1 "vsx_register_operand" "wa")
+ (parallel [(const_int 8) (const_int 9)
+ (const_int 10) (const_int 11)
+ (const_int 12) (const_int 13)
+ (const_int 14) (const_int 15)
+ (const_int 0) (const_int 1)
+ (const_int 2) (const_int 3)
+ (const_int 4) (const_int 5)
+ (const_int 6) (const_int 7)])))]
+ "!BYTES_BIG_ENDIAN && VECTOR_MEM_VSX_P (V16QImode)"
+ "stxvd2x %x1,%y0"
+ [(set_attr "type" "vecstore")])
+
;; Set the element of a V2DI/VD2F mode
(define_insn "vsx_set_<mode>"
[(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?wa")
UNSPEC_VSX_SET))]
"VECTOR_MEM_VSX_P (<MODE>mode)"
{
- if (INTVAL (operands[3]) == 0)
+ int idx_first = BYTES_BIG_ENDIAN ? 0 : 1;
+ if (INTVAL (operands[3]) == idx_first)
return \"xxpermdi %x0,%x2,%x1,1\";
- else if (INTVAL (operands[3]) == 1)
+ else if (INTVAL (operands[3]) == 1 - idx_first)
return \"xxpermdi %x0,%x1,%x2,0\";
else
gcc_unreachable ();
[(match_operand:QI 2 "u5bit_cint_operand" "i,i,i")])))]
"VECTOR_MEM_VSX_P (<MODE>mode)"
{
+ int fldDM;
gcc_assert (UINTVAL (operands[2]) <= 1);
- operands[3] = GEN_INT (INTVAL (operands[2]) << 1);
+ fldDM = INTVAL (operands[2]) << 1;
+ if (!BYTES_BIG_ENDIAN)
+ fldDM = 3 - fldDM;
+ operands[3] = GEN_INT (fldDM);
return \"xxpermdi %x0,%x1,%x1,%3\";
}
[(set_attr "type" "vecperm")])
(const_string "fpload")))
(set_attr "length" "4")])
+;; Optimize extracting element 1 from memory for little endian
+(define_insn "*vsx_extract_<mode>_one_le"
+ [(set (match_operand:<VS_scalar> 0 "vsx_register_operand" "=ws,d,?wa")
+ (vec_select:<VS_scalar>
+ (match_operand:VSX_D 1 "indexed_or_indirect_operand" "Z,Z,Z")
+ (parallel [(const_int 1)])))]
+ "VECTOR_MEM_VSX_P (<MODE>mode) && !WORDS_BIG_ENDIAN"
+ "lxsd%U1x %x0,%y1"
+ [(set (attr "type")
+ (if_then_else
+ (match_test "update_indexed_address_mem (operands[1], VOIDmode)")
+ (const_string "fpload_ux")
+ (const_string "fpload")))
+ (set_attr "length" "4")])
+
;; Extract a SF element from V4SF
(define_insn_and_split "vsx_extract_v4sf"
[(set (match_operand:SF 0 "vsx_register_operand" "=f,f")
rtx op2 = operands[2];
rtx op3 = operands[3];
rtx tmp;
- HOST_WIDE_INT ele = INTVAL (op2);
+ HOST_WIDE_INT ele = BYTES_BIG_ENDIAN ? INTVAL (op2) : 3 - INTVAL (op2);
if (ele == 0)
tmp = op1;
+2014-04-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ Little Endian Vector Support
+ Backport from mainline r205638
+ 2013-12-03 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * gcc.dg/vect/costmodel/ppc/costmodel-slp-34.c: Skip for little
+ endian.
+
+ Backport from mainline r205146
+ 2013-11-20 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * gcc.target/powerpc/pr48258-1.c: Skip for little endian.
+
+ Backport from mainline r204862
+ 2013-11-15 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * gcc.dg/vmx/3b-15.c: Revise for little endian.
+
+ Backport from mainline r204321
+ 2013-11-02 Bill Schmidt <wschmidt@vnet.linux.ibm.com>
+
+ * gcc.dg/vmx/vec-set.c: New.
+
+ Backport from mainline r204138
+ 2013-10-28 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * gcc.dg/vmx/gcc-bug-i.c: Add little endian variant.
+ * gcc.dg/vmx/eg-5.c: Likewise.
+
+ Backport from mainline r203930
+ 2013-10-22 Bill Schmidt <wschmidt@vnet.ibm.com>
+
+ * gcc.target/powerpc/altivec-perm-1.c: Move the two vector pack
+ tests into...
+ * gcc.target/powerpc/altivec-perm-3.c: ...this new test, which is
+ restricted to big-endian targets.
+
+ Backport from mainline r203246
+ 2013-10-07 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * gcc.target/powerpc/pr43154.c: Skip for ppc64 little endian.
+ * gcc.target/powerpc/fusion.c: Likewise.
+
2014-04-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Backport from mainline
/* { dg-require-effective-target vect_int } */
+/* { dg-skip-if "cost too high" { powerpc*le-*-* } { "*" } { "" } } */
#include <stdarg.h>
#include "../../tree-vect.h"
vector unsigned char
f (vector unsigned char a, vector unsigned char b, vector unsigned char c)
{
+#ifdef __BIG_ENDIAN__
return vec_perm(a,b,c);
+#else
+ return vec_perm(b,a,c);
+#endif
}
static void test()
8,9,10,11,12,13,14,15}),
((vector unsigned char){70,71,72,73,74,75,76,77,
78,79,80,81,82,83,84,85}),
+#ifdef __BIG_ENDIAN__
((vector unsigned char){0x1,0x14,0x18,0x10,0x16,0x15,0x19,0x1a,
0x1c,0x1c,0x1c,0x12,0x8,0x1d,0x1b,0xe})),
+#else
+ ((vector unsigned char){0x1e,0xb,0x7,0xf,0x9,0xa,0x6,0x5,
+ 0x3,0x3,0x3,0xd,0x17,0x2,0x4,0x11})),
+#endif
((vector unsigned char){1,74,78,70,76,75,79,80,82,82,82,72,8,83,81,14})),
"f");
}
/* Set result to a vector of f32 0's */
vector float result = ((vector float){0.,0.,0.,0.});
+#ifdef __LITTLE_ENDIAN__
+ result = vec_madd (c0, vec_splat (v, 3), result);
+ result = vec_madd (c1, vec_splat (v, 2), result);
+ result = vec_madd (c2, vec_splat (v, 1), result);
+ result = vec_madd (c3, vec_splat (v, 0), result);
+#else
result = vec_madd (c0, vec_splat (v, 0), result);
result = vec_madd (c1, vec_splat (v, 1), result);
result = vec_madd (c2, vec_splat (v, 2), result);
result = vec_madd (c3, vec_splat (v, 3), result);
+#endif
return result;
}
#define DO_INLINE __attribute__ ((always_inline))
#define DONT_INLINE __attribute__ ((noinline))
+#ifdef __LITTLE_ENDIAN__
+static inline DO_INLINE int inline_me(vector signed short data)
+{
+ union {vector signed short v; signed short s[8];} u;
+ signed short x;
+ unsigned char x1, x2;
+
+ u.v = data;
+ x = u.s[7];
+ x1 = (x >> 8) & 0xff;
+ x2 = x & 0xff;
+ return ((x2 << 8) | x1);
+}
+#else
static inline DO_INLINE int inline_me(vector signed short data)
{
union {vector signed short v; signed short s[8];} u;
u.v = data;
return u.s[7];
}
+#endif
static DONT_INLINE int foo(vector signed short data)
{
--- /dev/null
+#include "harness.h"
+
+vector short
+vec_set (short m)
+{
+ return (vector short){m, 0, 0, 0, 0, 0, 0, 0};
+}
+
+static void test()
+{
+ check (vec_all_eq (vec_set (7),
+ ((vector short){7, 0, 0, 0, 0, 0, 0, 0})),
+ "vec_set");
+}
return __builtin_shuffle(x, (V){ 4,5,6,7, 4,5,6,7, 4,5,6,7, 4,5,6,7, });
}
-V p2(V x, V y)
-{
- return __builtin_shuffle(x, y,
- (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 });
-
-}
-
-V p4(V x, V y)
-{
- return __builtin_shuffle(x, y,
- (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 });
-}
-
V h1(V x, V y)
{
return __builtin_shuffle(x, y,
/* { dg-final { scan-assembler "vspltb" } } */
/* { dg-final { scan-assembler "vsplth" } } */
/* { dg-final { scan-assembler "vspltw" } } */
-/* { dg-final { scan-assembler "vpkuhum" } } */
-/* { dg-final { scan-assembler "vpkuwum" } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
+/* { dg-options "-O -maltivec -mno-vsx" } */
+
+typedef unsigned char V __attribute__((vector_size(16)));
+
+V p2(V x, V y)
+{
+ return __builtin_shuffle(x, y,
+ (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 });
+
+}
+
+V p4(V x, V y)
+{
+ return __builtin_shuffle(x, y,
+ (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 });
+}
+
+/* { dg-final { scan-assembler-not "vperm" } } */
+/* { dg-final { scan-assembler "vpkuhum" } } */
+/* { dg-final { scan-assembler "vpkuwum" } } */
/* { dg-do compile { target { powerpc*-*-* } } } */
/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
/* { dg-require-effective-target powerpc_p8vector_ok } */
/* { dg-options "-mcpu=power7 -mtune=power8 -O3" } */
/* { dg-do compile { target { powerpc*-*-* } } } */
/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
/* { dg-require-effective-target powerpc_vsx_ok } */
/* { dg-options "-O2 -mcpu=power7" } */
/* { dg-do compile } */
/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "" { powerpc*le-*-* } { "*" } { "" } } */
/* { dg-require-effective-target powerpc_vsx_ok } */
/* { dg-options "-O3 -mcpu=power7 -mabi=altivec -ffast-math -fno-unroll-loops" } */
/* { dg-final { scan-assembler-times "xvaddsp" 3 } } */
+2014-04-04 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ Backport from mainline
+ 2013-11-18 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
+
+ * lex.c (search_line_fast): Correct for little endian.
+
2014-03-06 Jakub Jelinek <jakub@redhat.com>
Backport from mainline
beginning with all ones and shifting in zeros according to the
mis-alignment. The LVSR instruction pulls the exact shift we
want from the address. */
+#ifdef __BIG_ENDIAN__
mask = __builtin_vec_lvsr(0, s);
mask = __builtin_vec_perm(zero, ones, mask);
+#else
+ mask = __builtin_vec_lvsl(0, s);
+ mask = __builtin_vec_perm(ones, zero, mask);
+#endif
data &= mask;
/* While altivec loads mask addresses, we still need to align S so
/* L now contains 0xff in bytes for which we matched one of the
relevant characters. We can find the byte index by finding
its bit index and dividing by 8. */
+#ifdef __BIG_ENDIAN__
l = __builtin_clzl(l) >> 3;
+#else
+ l = __builtin_ctzl(l) >> 3;
+#endif
return s + l;
#undef N