[rs6000] Add 128-bit support for vec_xl(), vec_xl_be(), vec_xst(), vec_xst_be() builtins.

Message ID 1516291577.6075.2.camel@us.ibm.com
State New
Headers show
Series
  • [rs6000] Add 128-bit support for vec_xl(), vec_xl_be(), vec_xst(), vec_xst_be() builtins.
Related show

Commit Message

Carl Love Jan. 18, 2018, 4:06 p.m.
GCC maintainers:

The following patch adds missing 128-bit support for the builtins
vec_xl(), vec_xl_be(), vec_xst(), vec_xst_be().  It also includes a bug
fix required for the new 128-bit arguments for the vec_xst_be() and
vec_xl_be() builtins.  New test cases are also included.  This patch
completes the tests of all the various load/store builtins that I have
been working on.  This patch adds a torture test for the various load
store tests to check that they work for -O0, -O1 and -O2 testing.  This
was done as the work on the load/store builtins found numerous issues
that were optimization level dependent which were fixed by the previous
commits as well as this commit.

The patch has been tested on 

  powerpc64le-unknown-linux-gnu (Power 8 LE)
  powerpc64le-unknown-linux-gnu (Power 8 BE)
  powerpc64le-unknown-linux-gnu (Power 9 LE)

Let me know if the patch looks OK or not.  Let me know if you want to
include it in stage 4 or wait for the next release.  Thanks.

                   Carl Love
--------------------------------------------------------------------


gcc/ChangeLog:

2018-01-18 Carl Love <cel@us.ibm.com>

	* config/rs6000/rs6000-builtin.def (ST_ELEMREV_V1TI, LD_ELEMREV_V1TI,
	LVX_V1TI): Add macro expansion.
	* config/rs6000/rs6000-c.c (altivec_builtin_types): Add argument
	definitions for VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_VEC_ST,
	VSX_BUILTIN_VEC_XL, LD_ELEMREV_V1TI builtins.
	* config/rs6000/rs6000-p8swap.c (insn_is_swappable_p);
	Change check to determine if the instruction is a byte reversing
	entry.  Fix typo in comment.
	* config/rs6000/rs6000.c (altivec_expand_builtin): Add case entry
	for VSX_BUILTIN_ST_ELEMREV_V1TI and VSX_BUILTIN_LD_ELEMREV_V1TI.
	Add def_builtin calls for new builtins.
	* config/rs6000/vsx.md (vsx_st_elemrev_v1ti, vsx_ld_elemrev_v1ti):
	Add define_insn expansion.

gcc/testsuite/ChangeLog:

2018-01-18  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/powerpc.exp: Add torture tests for
	 builtins-4-runnable.c, builtins-6-runnable.c,
	 builtins-5-p9-runnable.c, builtins-6-p9-runnable.c.
	* gcc.target/powerpc/builtins-6-runnable.c: New test file.
	* gcc.target/powerpc/builtins-4-runnable.c: Add additional tests
	for signed/unsigned 128-bit and long long int loads.
---
 gcc/config/rs6000/rs6000-builtin.def               |    3 +
 gcc/config/rs6000/rs6000-c.c                       |   39 +
 gcc/config/rs6000/rs6000-p8swap.c                  |    5 +-
 gcc/config/rs6000/rs6000.c                         |   20 +
 gcc/config/rs6000/vsx.md                           |   43 +-
 .../gcc.target/powerpc/builtins-4-runnable.c       |  494 +++++++++-
 .../gcc.target/powerpc/builtins-6-runnable.c       | 1001 ++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/powerpc.exp       |   12 +
 8 files changed, 1582 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-6-runnable.c

-- 
2.11.0

Comments

Segher Boessenkool Jan. 18, 2018, 10:51 p.m. | #1
Hi Carl,

On Thu, Jan 18, 2018 at 08:06:17AM -0800, Carl Love wrote:
> The following patch adds missing 128-bit support for the builtins

> vec_xl(), vec_xl_be(), vec_xst(), vec_xst_be().  It also includes a bug

> fix required for the new 128-bit arguments for the vec_xst_be() and

> vec_xl_be() builtins.  New test cases are also included.  This patch

> completes the tests of all the various load/store builtins that I have

> been working on.  This patch adds a torture test for the various load

> store tests to check that they work for -O0, -O1 and -O2 testing.  This

> was done as the work on the load/store builtins found numerous issues

> that were optimization level dependent which were fixed by the previous

> commits as well as this commit.


> Let me know if the patch looks OK or not.  Let me know if you want to

> include it in stage 4 or wait for the next release.  Thanks.


Since this is last I think we should include it now; it isn't likely
to regress anything, either.

But, some comments, and a real bug:

> 2018-01-18  Carl Love  <cel@us.ibm.com>

> 	* gcc.target/powerpc/powerpc.exp: Add torture tests for

> 	 builtins-4-runnable.c, builtins-6-runnable.c,

> 	 builtins-5-p9-runnable.c, builtins-6-p9-runnable.c.


The indent here is wrong (stray space).

> --- a/gcc/config/rs6000/rs6000.c

> +++ b/gcc/config/rs6000/rs6000.c

> @@ -15572,6 +15572,12 @@ altivec_expand_builtin (tree exp, rtx target, bool *expandedp)

>         unaligned-supporting store, so use a generic expander.  For

>         little-endian, the exact element-reversing instruction must

>         be used.  */

> +   case VSX_BUILTIN_ST_ELEMREV_V1TI:

> +     {

> +       enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v1ti

> +			      : CODE_FOR_vsx_st_elemrev_v1ti);

> +       return altivec_expand_stv_builtin (code, exp);

> +      }


Last line has a space too many.  Or, actually, all the rest have one
too few?

> --- a/gcc/config/rs6000/vsx.md

> +++ b/gcc/config/rs6000/vsx.md

> @@ -1093,6 +1093,18 @@ (define_insn "vsx_ld_elemrev_v2di"

>    "lxvd2x %x0,%y1"

>    [(set_attr "type" "vecload")])

>  

> +(define_insn "vsx_ld_elemrev_v1ti"

> +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa")

> +        (vec_select:V1TI

> +	  (match_operand:V1TI 1 "memory_operand" "Z")

> +	  (parallel [(const_int 0)])))]

> +  "VECTOR_MEM_VSX_P (V1TImode) && !BYTES_BIG_ENDIAN"

> +{

> +  return "lxvd2x %x0,%y1; xxpermdi %x0,%x0,%x0,2";

> +}


Not ; but \; please.  No space after it.

We currently have exactly as many xxpermdi,2 as xxswapdi (147 each)
but the latter is more readable, please prefer that.

> +(define_insn "vsx_st_elemrev_v1ti"

> +  [(set (match_operand:V1TI 0 "memory_operand" "=Z")

> +        (vec_select:V1TI

> +          (match_operand:V1TI 1 "vsx_register_operand" "wa")

> +          (parallel [(const_int 0)])))]

> +  "VECTOR_MEM_VSX_P (V2DImode) && !BYTES_BIG_ENDIAN"

> +{

> +  return "xxpermdi %x1,%x1,%x1,2; stxvd2x %x1,%y0";

> +}


This is wrong: it changes operand 1 but the RTL does not mention that.

> --- a/gcc/testsuite/gcc.target/powerpc/powerpc.exp

> +++ b/gcc/testsuite/gcc.target/powerpc/powerpc.exp

> @@ -49,4 +49,16 @@ gcc-dg-runtest [list $srcdir/$subdir/savres.c] "" $alti

>  

>  # All done.

>  torture-finish

> +

> +torture-init 

> +# Test load/store builtins at all optimizations

> +set-torture-options [list -O0 -O1 -O2]


All?  -O3 and -Os as well?  (And maybe more, but those are important).

> +gcc-dg-runtest [list $srcdir/$subdir/builtins-4-runnable.c \

> +		     $srcdir/$subdir/builtins-6-runnable.c \

> +		     $srcdir/$subdir/builtins-5-p9-runnable.c \

> +	       	     $srcdir/$subdir/builtins-6-p9-runnable.c] "" $DEFAULT_CFLAGS


Weird indent on this line.

Rest looks fine :-)


Segher
Segher Boessenkool Jan. 19, 2018, 4:13 p.m. | #2
On Thu, Jan 18, 2018 at 04:51:47PM -0600, Segher Boessenkool wrote:
> > +(define_insn "vsx_ld_elemrev_v1ti"

> > +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa")

> > +        (vec_select:V1TI

> > +	  (match_operand:V1TI 1 "memory_operand" "Z")

> > +	  (parallel [(const_int 0)])))]

> > +  "VECTOR_MEM_VSX_P (V1TImode) && !BYTES_BIG_ENDIAN"

> > +{

> > +  return "lxvd2x %x0,%y1; xxpermdi %x0,%x0,%x0,2";

> > +}


> We currently have exactly as many xxpermdi,2 as xxswapdi (147 each)

> but the latter is more readable, please prefer that.


Ignore this part; I managed to fumble my grep commands.  We have *no*
xxswapd in the source currently (well, one in comments, and 11 xxswapdi
but that is a misspelling); stage 4 is not the time to start using it
(do all supported assemblers implement it, implement it correctly, etc.)

So your xxpermdi is the best for now.


Segher
Carl Love Jan. 19, 2018, 6:07 p.m. | #3
On Fri, 2018-01-19 at 10:13 -0600, Segher Boessenkool wrote:
> On Thu, Jan 18, 2018 at 04:51:47PM -0600, Segher Boessenkool wrote:

> > > +(define_insn "vsx_ld_elemrev_v1ti"

> > > +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa")

> > > +        (vec_select:V1TI

> > > +	  (match_operand:V1TI 1 "memory_operand" "Z")

> > > +	  (parallel [(const_int 0)])))]

> > > +  "VECTOR_MEM_VSX_P (V1TImode) && !BYTES_BIG_ENDIAN"

> > > +{

> > > +  return "lxvd2x %x0,%y1; xxpermdi %x0,%x0,%x0,2";

> > > +}

> > We currently have exactly as many xxpermdi,2 as xxswapdi (147 each)

> > but the latter is more readable, please prefer that.

> 

> Ignore this part; I managed to fumble my grep commands.  We have *no*

> xxswapd in the source currently (well, one in comments, and 11

> xxswapdi

> but that is a misspelling); stage 4 is not the time to start using it

> (do all supported assemblers implement it, implement it correctly,

> etc.)

> 

> So your xxpermdi is the best for now.

> 


I was going to ask you about that again.  I seem to be getting
regressions with it for gcc -O0 builtins-4-runnable.c.  Will revert and
retest.  

                    Carl
Carl Love Jan. 19, 2018, 6:25 p.m. | #4
On Fri, 2018-01-19 at 10:13 -0600, Segher Boessenkool wrote:
> On Thu, Jan 18, 2018 at 04:51:47PM -0600, Segher Boessenkool wrote:

> > > +(define_insn "vsx_ld_elemrev_v1ti"

> > > +  [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa")

> > > +        (vec_select:V1TI

> > > +	  (match_operand:V1TI 1 "memory_operand" "Z")

> > > +	  (parallel [(const_int 0)])))]

> > > +  "VECTOR_MEM_VSX_P (V1TImode) && !BYTES_BIG_ENDIAN"

> > > +{

> > > +  return "lxvd2x %x0,%y1; xxpermdi %x0,%x0,%x0,2";

> > > +}

> > We currently have exactly as many xxpermdi,2 as xxswapdi (147 each)

> > but the latter is more readable, please prefer that.

> 

> Ignore this part; I managed to fumble my grep commands.  We have *no*

> xxswapd in the source currently (well, one in comments, and 11

> xxswapdi

> but that is a misspelling); stage 4 is not the time to start using it

> (do all supported assemblers implement it, implement it correctly,

> etc.)

> 

> So your xxpermdi is the best for now.



Segher:

Here are the key changes that I am testing now for vsx.md
and powerpc.exp.  Just making sure we are on the same page here.


diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 03f8ec2d6..6ea05e46e 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1093,6 +1093,18 @@ (define_insn "vsx_ld_elemrev_v2di"
   "lxvd2x %x0,%y1"
   [(set_attr "type" "vecload")])
 
+(define_insn "vsx_ld_elemrev_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa")
+        (vec_select:V1TI
+	  (match_operand:V1TI 1 "memory_operand" "Z")
+	  (parallel [(const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V1TImode) && !BYTES_BIG_ENDIAN"
+{
+   return "lxvd2x %x0,%y1\;xxpermdi %x0,%x0,%x0,2";               <<----- Reverted change
+}
+  [(set_attr "type" "vecload")
+   (set_attr "length" "8")])
+
 (define_insn "vsx_ld_elemrev_v2df"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa")
         (vec_select:V2DF
@@ -1222,6 +1234,18 @@ (define_insn "*vsx_ld_elemrev_v16qi_internal"
   "lxvb16x %x0,%y1"
   [(set_attr "type" "vecload")])
 
+(define_insn "vsx_st_elemrev_v1ti"
+  [(set (match_operand:V1TI 0 "memory_operand" "=Z")
+        (vec_select:V1TI
+          (match_operand:V1TI 1 "vsx_register_operand" "+wa")         <<---  Fix RTL to mention
+          (parallel [(const_int 0)])))]                                      operand 1 change
+  "VECTOR_MEM_VSX_P (V2DImode) && !BYTES_BIG_ENDIAN"
+{
+  return "xxpermdi %x1,%x1,%x1,2\;stxvd2x %x1,%y0";                    <<------ Reverted change
+}
+  [(set_attr "type" "vecstore")
+   (set_attr "length" "8")])
+
 (define_insn "vsx_st_elemrev_v2df"
   [(set (match_operand:V2DF 0 "memory_operand" "=Z")
         (vec_select:V2DF
@@ -1272,7 +1296,7 @@ (define_expand "vsx_st_elemrev_v8hi"
 {
   if (!TARGET_P9_VECTOR)
     {
-      rtx subreg, perm[16], pcv;
+      rtx mem_subreg, subreg, perm[16], pcv;
       rtx tmp = gen_reg_rtx (V8HImode);
       /* 2 is leftmost element in register */
       unsigned int reorder[16] = {13,12,15,14,9,8,11,10,5,4,7,6,1,0,3,2};
@@ -1287,11 +1311,21 @@ (define_expand "vsx_st_elemrev_v8hi"
       emit_insn (gen_altivec_vperm_v8hi_direct (tmp, operands[1],
                                                 operands[1], pcv));
       subreg = simplify_gen_subreg (V4SImode, tmp, V8HImode, 0);
-      emit_insn (gen_vsx_st_elemrev_v4si (subreg, operands[0]));
+      mem_subreg = simplify_gen_subreg (V4SImode, operands[0], V8HImode, 0);
+      emit_insn (gen_vsx_st_elemrev_v4si (mem_subreg, subreg));
       DONE;
     }
 })
 
+(define_insn "*vsx_st_elemrev_v2di_internal"
+  [(set (match_operand:V2DI 0 "memory_operand" "=Z")
+        (vec_select:V2DI
+          (match_operand:V2DI 1 "vsx_register_operand" "wa")
+          (parallel [(const_int 1) (const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V2DImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR"
+  "stxvd2x %x1,%y0"
+  [(set_attr "type" "vecstore")])
+
 (define_insn "*vsx_st_elemrev_v8hi_internal"
   [(set (match_operand:V8HI 0 "memory_operand" "=Z")
         (vediff --git a/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.cindex ed37424ca..de9b916de 100644--- a/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c+++ b/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c@@ -1,10 +1,13 @@ /* { dg-do run } */ /* { dg-require-effective-target vsx_hw } */-/* { dg-options "-maltivec -mvsx" } */  +/* { dg-options "-maltivec -mvsx" } */  #include <inttypes.h> #include <altivec.h> // vector++#ifdef DEBUG #include <stdio.h>+#endif  void abort (void); @@ -24,9 +27,11 @@ int main() {    float data_f[100];   double data_d[100];-   +  __uint128_t data_u128[100];+  __int128_t data_128[100];+   signed long long disp;-   +   vector signed char vec_c_expected1, vec_c_expected2, vec_c_result1, vec_c_result2;   vector unsigned char vec_uc_expected1, vec_uc_expected2,     vec_uc_result1, vec_uc_result2;@@ -42,11 +47,13 @@ int main() {     vec_sll_result1, vec_sll_result2;   vector unsigned long long vec_ull_expected1, vec_ull_expected2,     vec_ull_result1, vec_ull_result2;+  vector __int128_t vec_128_expected1, vec_128_result1;+  vector __uint128_t vec_u128_expected1, vec_u128_result1;   vector float vec_f_expected1, vec_f_expected2, vec_f_result1, vec_f_result2;   vector double vec_d_expected1, vec_d_expected2, vec_d_result1, vec_d_result2;   char buf[20];   signed long long zero = (signed long long) 0;-  +   for (i = 0; i < 100; i++)     {       data_c[i] = i;@@ -59,21 +66,304 @@ int main() {       data_ull[i] = i+1001;       data_f[i] = i+100000.0;       data_d[i] = i+1000000.0;+      data_128[i] = i + 12800000;+      data_u128[i] = i + 12800001;     }-  -  disp = 0;++  // vec_xl() tests+  disp = 1;++  vec_c_expected1 = (vector signed char){0, 1, 2, 3, 4, 5, 6, 7,+					 8, 9, 10, 11, 12, 13, 14, 15};+  vec_c_result1 = vec_xl (0, data_c);++  vec_c_expected2 = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8, 9,+					 10, 11, 12, 13, 14, 15, 16};+  vec_c_result2 = vec_xl (disp, data_c);++  vec_uc_expected1 = (vector unsigned char){1, 2, 3, 4, 5, 6, 7, 8, 9,+					    10, 11, 12, 13, 14, 15, 16};+  vec_uc_result1 = vec_xl (0, data_uc);++  vec_uc_expected2 = (vector unsigned char){2, 3, 4, 5, 6, 7, 8, 9, 10,+					    11, 12, 13, 14, 15, 16, 17};+  vec_uc_result2 = vec_xl (disp, data_uc);++  for (i = 0; i < 16; i++)+    {+      if (vec_c_result1[i] != vec_c_expected1[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_c_result1[%d] = %d; vec_c_expected1[%d] = %d\n",+	       i,  vec_c_result1[i], i, vec_c_expected1[i]);+#else+	abort ();+#endif+      if (vec_c_result2[i] != vec_c_expected2[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_c_result2[%d] = %d; vec_c_expected2[%d] = %d\n",+	       i,  vec_c_result2[i], i, vec_c_expected2[i]);+#else+	abort ();+#endif++      if (vec_uc_result1[i] != vec_uc_expected1[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_uc_result1[%d] = %d; vec_uc_expected1[%d] = %d\n",+	       i,  vec_uc_result1[i], i, vec_uc_expected1[i]);+#else+	abort ();+#endif++      if (vec_uc_result2[i] != vec_uc_expected2[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_uc_result2[%d] = %d; vec_uc_expected2[%d] = %d\n",+	       i,  vec_uc_result2[i], i, vec_uc_expected2[i]);+#else+	abort ();+#endif+    }++  disp = 2;+  vec_ssi_expected1 = (vector signed short){10, 11, 12, 13, 14, 15, 16, 17};++  vec_ssi_result1 = vec_xl (0, data_ssi);++  vec_ssi_expected2 = (vector signed short){11, 12, 13, 14, 15, 16, 17, 18};+  vec_ssi_result2 = vec_xl (disp, data_ssi);++  vec_usi_expected1 = (vector unsigned short){11, 12, 13, 14, 15, 16, 17, 18};+  vec_usi_result1 = vec_xl (0, data_usi);++  vec_usi_expected2 = (vector unsigned short){12, 13, 14, 15, 16, 17, 18, 19};+  vec_usi_result2 = vec_xl (disp, data_usi);+++  for (i = 0; i < 8; i++)+    {+      if (vec_ssi_result1[i] != vec_ssi_expected1[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_ssi_result1[%d] = %d; vec_ssi_expected1[%d] = %d\n",+	       i,  vec_ssi_result1[i], i, vec_ssi_expected1[i]);+#else+	abort ();+#endif+      if (vec_ssi_result2[i] != vec_ssi_expected2[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_ssi_result2[%d] = %d; vec_ssi_expected2[%d] = %d\n",+	       i,  vec_ssi_result2[i], i, vec_ssi_expected2[i]);+#else+	abort ();+#endif++      if (vec_usi_result1[i] != vec_usi_expected1[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_usi_result1[%d] = %d; vec_usi_expected1[%d] = %d\n",+	       i,  vec_usi_result1[i], i, vec_usi_expected1[i]);+#else+	abort ();+#endif++      if (vec_usi_result2[i] != vec_usi_expected2[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_usi_result2[%d] = %d; vec_usi_expected2[%d] = %d\n",+	       i,  vec_usi_result2[i], i, vec_usi_expected2[i]);+#else+	abort ();+#endif+    }++  disp = 4;+  vec_si_result1 = vec_xl (zero, data_si);+  vec_si_expected1 = (vector int){100, 101, 102, 103};++  vec_si_result2 = vec_xl (disp, data_si);+  vec_si_expected2 = (vector int){101, 102, 103, 104};++  vec_ui_result1 = vec_xl (zero, data_ui);+  vec_ui_expected1 = (vector unsigned int){101, 102, 103, 104};++  vec_ui_result2 = vec_xl (disp, data_ui);+  vec_ui_expected2 = (vector unsigned int){102, 103, 104, 105};++  for (i = 0; i < 4; i++)+    {+      if (vec_si_result1[i] != vec_si_expected1[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_si_result1[%d] = %d; vec_si_expected1[%d] = %d\n",+	       i,  vec_si_result1[i], i, vec_si_expected1[i]);+#else+	abort ();+#endif+      if (vec_si_result2[i] != vec_si_expected2[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_si_result2[%d] = %d; vec_si_expected2[%d] = %d\n",+	       i,  vec_si_result2[i], i, vec_si_expected2[i]);+#else+	abort ();+#endif++      if (vec_ui_result1[i] != vec_ui_expected1[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_ui_result1[%d] = %d; vec_ui_expected1[%d] = %d\n",+	       i,  vec_ui_result1[i], i, vec_ui_expected1[i]);+#else+	abort ();+#endif++      if (vec_ui_result2[i] != vec_ui_expected2[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_ui_result2[%d] = %d; vec_ui_expected1[%d] = %d\n",+	       i,  vec_si_result2[i], i, vec_ui_expected1[i]);+#else+	abort ();+#endif+    }++  disp = 8;+  vec_sll_result1 = vec_xl (zero, data_sll);+  vec_sll_expected1 = (vector signed long long){1000, 1001};++  vec_sll_result2 = vec_xl (disp, data_sll);+  vec_sll_expected2 = (vector signed long long){1001, 1002};++  vec_ull_result1 = vec_xl (zero, data_ull);+  vec_ull_expected1 = (vector unsigned long long){1001, 1002};++  vec_ull_result2 = vec_xl (disp, data_ull);+  vec_ull_expected2 = (vector unsigned long long){1002, 1003};++  for (i = 0; i < 2; i++)+    {+      if (vec_sll_result1[i] != vec_sll_expected1[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_sll_result1[%d] = %lld; vec_sll_expected1[%d] = %lld\n",+	       i,  vec_sll_result1[i], i, vec_sll_expected1[i]);+#else+	abort ();+#endif++      if (vec_sll_result2[i] != vec_sll_expected2[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_sll_result2[%d] = %lld; vec_sll_expected2[%d] = %lld\n",+	       i,  vec_sll_result2[i], i, vec_sll_expected2[i]);+#else+	abort ();+#endif++      if (vec_ull_result1[i] != vec_ull_expected1[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_ull_result1[%d] = %lld; vec_ull_expected1[%d] = %lld\n",+	       i,  vec_ull_result1[i], i, vec_ull_expected1[i]);+#else+	abort ();+#endif++      if (vec_ull_result2[i] != vec_ull_expected2[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_ull_result2[%d] = %lld; vec_ull_expected2[%d] = %lld\n",+	       i,  vec_ull_result2[i], i, vec_ull_expected2[i]);+#else+	abort ();+#endif+    }++  disp = 4;+  vec_f_result1 = vec_xl (zero, data_f);+  vec_f_expected1 = (vector float){100000.0, 100001.0, 100002.0, 100003.0};++  vec_f_result2 = vec_xl (disp, data_f);+  vec_f_expected2 = (vector float){100001.0, 100002.0, 100003.0, 100004.0};++  for (i = 0; i < 4; i++)+    {+      if (vec_f_result1[i] != vec_f_expected1[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_f_result1[%d] = %f; vec_f_expected1[%d] = %f\n",+	       i,  vec_f_result1[i], i, vec_f_expected1[i]);+#else+	abort ();+#endif++      if (vec_f_result2[i] != vec_f_expected2[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_f_result2[%d] = %f; vec_f_expected2[%d] = %f\n",+	       i,  vec_f_result2[i], i, vec_f_expected2[i]);+#else+	abort ();+#endif+    }++  disp = 8;+  vec_d_result1 = vec_xl (zero, data_d);+  vec_d_expected1 = (vector double){1000000.0, 1000001.0};++  vec_d_result2 = vec_xl (disp, data_d);+  vec_d_expected2 = (vector double){1000001.0, 1000002.0};++  for (i = 0; i < 2; i++)+    {+      if (vec_d_result1[i] != vec_d_expected1[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_d_result1[%d] = %f; vec_f_expected1[%d] = %f\n",+	       i,  vec_d_result1[i], i, vec_d_expected1[i]);+#else+	abort ();+#endif++      if (vec_d_result2[i] != vec_d_expected2[i])+#ifdef DEBUG+	printf("Error: vec_xl(), vec_d_result2[%d] = %f; vec_f_expected2[%d] = %f\n",+	       i,  vec_d_result2[i], i, vec_d_expected2[i]);+#else+	abort ();+#endif+    }++  vec_128_expected1 = (vector __int128_t){12800000};+  vec_128_result1 = vec_xl (zero, data_128);++  if (vec_128_expected1[0] != vec_128_result1[0])+    {+#ifdef DEBUG+	printf("Error: vec_xl(), vec_128_result1[0] = %lld %llu; ",+	       vec_128_result1[0] >> 64,+	       vec_128_result1[0] & (__int128_t)0xFFFFFFFFFFFFFFFF);+	printf("vec_128_expected1[0] = %lld %llu\n",+	       vec_128_expected1[0] >> 64,+	       vec_128_expected1[0] & (__int128_t)0xFFFFFFFFFFFFFFFF);+#else+	abort ();+#endif+    }++  vec_u128_result1 = vec_xl (zero, data_u128);+  vec_u128_expected1 = (vector __uint128_t){12800001};+  if (vec_u128_expected1[0] != vec_u128_result1[0])+    {+#ifdef DEBUG+	printf("Error: vec_xl(), vec_u128_result1[0] = %lld; ",+	       vec_u128_result1[0] >> 64,+	       vec_u128_result1[0] & (__int128_t)0xFFFFFFFFFFFFFFFF);+	printf("vec_u128_expected1[0] = %lld\n",+	       vec_u128_expected1[0] >> 64,+	       vec_u128_expected1[0] & (__int128_t)0xFFFFFFFFFFFFFFFF);+#else+	abort ();+#endif+    }++  // vec_xl_be() tests+  disp = 1; #ifdef __BIG_ENDIAN__-  printf("BIG ENDIAN\n");   vec_c_expected1 = (vector signed char){0, 1, 2, 3, 4, 5, 6, 7, 					 8, 9, 10, 11, 12, 13, 14, 15}; #else-  printf("LITTLE ENDIAN\n");   vec_c_expected1 = (vector signed char){15, 14, 13, 12, 11, 10, 9, 8, 					 7, 6, 5, 4, 3, 2, 1, 0}; #endif   vec_c_result1 = vec_xl_be (0, data_c); -  disp = 1;+  #ifdef __BIG_ENDIAN__   vec_c_expected2 = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8,@@ -108,16 +398,36 @@ int main() {   for (i = 0; i < 16; i++)     {       if (vec_c_result1[i] != vec_c_expected1[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_c_result1[%d] = %d; vec_c_expected1[%d] = %d\n",+	       i,  vec_c_result1[i], i, vec_c_expected1[i]);+#else+	abort ();+#endif        if (vec_c_result2[i] != vec_c_expected2[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_c_result2[%d] = %d; vec_c_expected2[%d] = %d\n",+	       i,  vec_c_result2[i], i, vec_c_expected2[i]);+#else+	abort ();+#endif        if (vec_uc_result1[i] != vec_uc_expected1[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_uc_result1[%d] = %d; vec_uc_expected1[%d] = %d\n",+	       i,  vec_uc_result1[i], i, vec_uc_expected1[i]);+#else+	abort ();+#endif        if (vec_uc_result2[i] != vec_uc_expected2[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_uc_result2[%d] = %d; vec_uc_expected2[%d] = %d\n",+	       i,  vec_uc_result1[i], i, vec_uc_expected1[i]);+#else+	abort ();+#endif     }    vec_ssi_result1 = vec_xl_be (zero, data_ssi);@@ -144,7 +454,7 @@ int main() { #else   vec_usi_expected1 = (vector unsigned short){18, 17, 16, 15, 14, 13, 12, 11}; #endif-   +   disp = 2;   vec_usi_result2 = vec_xl_be (disp, data_usi); @@ -157,16 +467,36 @@ int main() {   for (i = 0; i < 8; i++)     {       if (vec_ssi_result1[i] != vec_ssi_expected1[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_ssi_result1[%d] = %d; vec_ssi_expected1[%d] = %d\n",+	       i,  vec_ssi_result1[i], i, vec_ssi_expected1[i]);+#else+	abort ();+#endif        if (vec_ssi_result2[i] != vec_ssi_expected2[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_ssi_result2[%d] = %d; vec_ssi_expected2[%d] = %d\n",+	       i,  vec_ssi_result2[i], i, vec_ssi_expected2[i]);+#else+	abort ();+#endif        if (vec_usi_result1[i] != vec_usi_expected1[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_usi_result1[%d] = %d; vec_usi_expected1[%d] = %d\n",+	       i,  vec_usi_result1[i], i, vec_usi_expected1[i]);+#else+	abort ();+#endif        if (vec_usi_result2[i] != vec_usi_expected2[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_usi_result2[%d] = %d; vec_usi_expected2[%d] = %d\n",+	       i,  vec_usi_result2[i], i, vec_usi_expected2[i]);+#else+	abort ();+#endif     }    vec_si_result1 = vec_xl_be (zero, data_si);@@ -207,16 +537,36 @@ int main() {   for (i = 0; i < 4; i++)     {       if (vec_si_result1[i] != vec_si_expected1[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_si_result1[%d] = %d; vec_si_expected1[%d] = %d\n",+	       i,  vec_si_result1[i], i, vec_si_expected1[i]);+#else+	abort ();+#endif        if (vec_si_result2[i] != vec_si_expected2[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_si_result2[%d] = %d; vec_si_expected2[%d] = %d\n",+	       i,  vec_si_result2[i], i, vec_si_expected2[i]);+#else+	abort ();+#endif        if (vec_ui_result1[i] != vec_ui_expected1[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_ui_result1[%d] = %d; vec_ui_expected1[%d] = %d\n",+	       i,  vec_ui_result1[i], i, vec_ui_expected1[i]);+#else+	abort ();+#endif        if (vec_ui_result2[i] != vec_ui_expected2[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_ui_result2[%d] = %d; vec_ui_expected2[%d] = %d\n",+	       i,  vec_ui_result2[i], i, vec_ui_expected2[i]);+#else+	abort ();+#endif     }    vec_sll_result1 = vec_xl_be (zero, data_sll);@@ -257,16 +607,36 @@ int main() {   for (i = 0; i < 2; i++)     {       if (vec_sll_result1[i] != vec_sll_expected1[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_sll_result1[%d] = %lld; vec_sll_expected1[%d] = %d\n",+	       i,  vec_sll_result1[i], i, vec_sll_expected1[i]);+#else+	abort ();+#endif        if (vec_sll_result2[i] != vec_sll_expected2[i])+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_sll_result2[%d] = %lld; vec_sll_expected2[%d] = %d\n",+	       i,  vec_sll_result2[i], i, vec_sll_expected2[i]);+#else 	abort ();+#endif        if (vec_ull_result1[i] != vec_ull_expected1[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_ull_result1[%d] = %lld; vec_ull_expected1[%d] = %d\n",+	       i,  vec_ull_result1[i], i, vec_ull_expected1[i]);+#else+	abort ();+#endif        if (vec_ull_result2[i] != vec_ull_expected2[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_ull_result2[%d] = %lld; vec_ull_expected2[%d] = %d\n",+	       i,  vec_ull_result2[i], i, vec_sll_expected2[i]);+#else+	abort ();+#endif     }    vec_f_result1 = vec_xl_be (zero, data_f);@@ -289,9 +659,20 @@ int main() {   for (i = 0; i < 4; i++)     {       if (vec_f_result1[i] != vec_f_expected1[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_f_result1[%d] = %f; vec_f_expected1[%d] = %f\n",+	       i,  vec_f_result1[i], i, vec_f_expected1[i]);+#else+	abort ();+#endif+       if (vec_f_result2[i] != vec_f_expected2[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_f_result2[%d] = %f; vec_f_expected2[%d] = %f\n",+	       i,  vec_f_result2[i], i, vec_f_expected2[i]);+#else+	abort ();+#endif     }    vec_d_result1 = vec_xl_be (zero, data_d);@@ -314,8 +695,63 @@ int main() {   for (i = 0; i < 2; i++)     {       if (vec_d_result1[i] != vec_d_expected1[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_d_result2[%d] = %f; vec_d_expected2[%d] = %f\n",+	       i,  vec_d_result2[i], i, vec_d_expected2[i]);+#else+	abort ();+#endif+       if (vec_d_result2[i] != vec_d_expected2[i])-        abort ();+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_d_result2[%d] = %f; vec_d_expected2[%d] = %f\n",+	       i,  vec_d_result2[i], i, vec_d_expected2[i]);+#else+	abort ();+#endif+    }++  disp = 0;+  vec_128_result1 = vec_xl_be (zero, data_128);+#ifdef __BIG_ENDIAN__+  vec_128_expected1 = (vector __int128_t){ (__int128_t)12800000 };+#else+  vec_128_expected1 = (vector __int128_t){ (__int128_t)12800000 };+#endif++  if (vec_128_expected1[0] != vec_128_result1[0])+    {+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_128_result1[0] = %llu %llu;",+	       vec_128_result1[0] >> 64,+	       vec_128_result1[0] & 0xFFFFFFFFFFFFFFFF);+	printf(" vec_128_expected1[0] = %llu %llu\n",+	       vec_128_expected1[0] >> 64,+	       vec_128_expected1[0] & 0xFFFFFFFFFFFFFFFF);+#else+      abort ();+#endif+    }++#ifdef __BIG_ENDIAN__+  vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)12800001 };+#else+  vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)12800001 };+#endif++  vec_u128_result1 = vec_xl_be (zero, data_u128);++  if (vec_u128_expected1[0] != vec_u128_result1[0])+    {+#ifdef DEBUG+	printf("Error: vec_xl_be(), vec_u128_result1[0] = %llu %llu;",+	       vec_u128_result1[0] >> 64,+	       vec_u128_result1[0] & 0xFFFFFFFFFFFFFFFF);+	printf(" vec_u128_expected1[0] = %llu %llu\n",+	       vec_u128_expected1[0] >> 64,+	       vec_u128_expected1[0] & 0xFFFFFFFFFFFFFFFF);+#else+      abort ();+#endif     } }diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-6-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-6-runnable.cnew file mode 100644index 000000000..5d313124b--- /dev/null+++ b/gcc/testsuite/gcc.target/powerpc/builtins-6-runnable.c@@ -0,0 +1,1001 @@+/* { dg-do run { target { powerpc*-*-* && { lp64 && p8vector_hw } } } } */+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */+/* { dg-options "-mcpu=power8 -O3" } */++#include <stdint.h>+#include <stdio.h>+#include <inttypes.h>+#include <altivec.h>++#define TRUE 1+#define FALSE 0++#ifdef DEBUG+#include <stdio.h>+#endif++void abort (void);++int result_wrong_sc (vector signed char vec_expected,+		     vector signed char vec_actual)+{+  int i;++  for (i = 0; i < 16; i++)+    if (vec_expected[i] != vec_actual[i])+      return TRUE;++  return FALSE;+}++void print_sc (vector signed char vec_expected,+	       vector signed char vec_actual)+{+  int i;++  printf("expected signed char data\n");+  for (i = 0; i < 16; i++)+    printf(" %d,", vec_expected[i]);++  printf("\nactual signed char data\n");+  for (i = 0; i < 16; i++)+    printf(" %d,", vec_actual[i]);+  printf("\n");+}++int result_wrong_uc (vector unsigned char vec_expected,+		     vector unsigned char vec_actual)+{+  int i;++  for (i = 0; i < 16; i++)+    if (vec_expected[i] != vec_actual[i])+      return TRUE;++  return FALSE;+}++void print_uc (vector unsigned char vec_expected,+	       vector unsigned char vec_actual)+{+  int i;++  printf("expected signed char data\n");+  for (i = 0; i < 16; i++)+    printf(" %d,", vec_expected[i]);++  printf("\nactual signed char data\n");+  for (i = 0; i < 16; i++)+    printf(" %d,", vec_actual[i]);+  printf("\n");+}++int result_wrong_us (vector unsigned short vec_expected,+		     vector unsigned short vec_actual)+{+  int i;++  for (i = 0; i < 8; i++)+    if (vec_expected[i] != vec_actual[i])+      return TRUE;++  return FALSE;+}++void print_us (vector unsigned short vec_expected,+	       vector unsigned short vec_actual)+{+  int i;++  printf("expected unsigned short data\n");+  for (i = 0; i < 8; i++)+    printf(" %d,", vec_expected[i]);++  printf("\nactual unsigned short data\n");+  for (i = 0; i < 8; i++)+    printf(" %d,", vec_actual[i]);+  printf("\n");+}++int result_wrong_ss (vector signed short vec_expected,+		     vector signed short vec_actual)+{+  int i;++  for (i = 0; i < 8; i++)+    if (vec_expected[i] != vec_actual[i])+      return TRUE;++  return FALSE;+}++void print_ss (vector signed short vec_expected,+	       vector signed short vec_actual)+{+  int i;++  printf("expected signed short data\n");+  for (i = 0; i < 8; i++)+    printf(" %d,", vec_expected[i]);++  printf("\nactual signed short data\n");+  for (i = 0; i < 8; i++)+    printf(" %d,", vec_actual[i]);+  printf("\n");+}++int result_wrong_ui (vector unsigned int vec_expected,+		     vector unsigned int vec_actual)+{+  int i;++  for (i = 0; i < 4; i++)+    if (vec_expected[i] != vec_actual[i])+      return TRUE;++  return FALSE;+}++void print_ui (vector unsigned int vec_expected,+	       vector unsigned int vec_actual)+{+  int i;++  printf("expected unsigned int data\n");+  for (i = 0; i < 4; i++)+    printf(" %d,", vec_expected[i]);++  printf("\nactual unsigned int data\n");+  for (i = 0; i < 4; i++)+    printf(" %d,", vec_actual[i]);+  printf("\n");+}++int result_wrong_si (vector signed int vec_expected,+		     vector signed int vec_actual)+{+  int i;++  for (i = 0; i < 4; i++)+    if (vec_expected[i] != vec_actual[i])+      return TRUE;++  return FALSE;+}++void print_si (vector signed int vec_expected,+	       vector signed int vec_actual)+{+  int i;++  printf("expected signed int data\n");+  for (i = 0; i < 4; i++)+    printf(" %d,", vec_expected[i]);++  printf("\nactual signed int data\n");+  for (i = 0; i < 4; i++)+    printf(" %d,", vec_actual[i]);+  printf("\n");+}++int result_wrong_ull (vector unsigned long long vec_expected,+		      vector unsigned long long vec_actual)+{+  int i;++  for (i = 0; i < 2; i++)+    if (vec_expected[i] != vec_actual[i])+      return TRUE;++  return FALSE;+}++void print_ull (vector unsigned long long vec_expected,+		vector unsigned long long vec_actual)+{+  int i;++  printf("expected unsigned long long data\n");+  for (i = 0; i < 2; i++)+	  //    printf(" %llu,", vec_expected[i]);+    printf(" 0x%llx,", vec_expected[i]);++  printf("\nactual unsigned long long data\n");+  for (i = 0; i < 2; i++)+	  //    printf(" %llu,", vec_actual[i]);+    printf("0x %llx,", vec_actual[i]);+  printf("\n");+}++int result_wrong_sll (vector signed long long vec_expected,+		      vector signed long long vec_actual)+{+  int i;++  for (i = 0; i < 2; i++)+    if (vec_expected[i] != vec_actual[i])+      return TRUE;++  return FALSE;+}++void print_sll (vector signed long long vec_expected,+		vector signed long long vec_actual)+{+  int i;++  printf("expected signed long long data\n");+  for (i = 0; i < 2; i++)+    printf(" %lld,", vec_expected[i]);++  printf("\nactual signed long long data\n");+  for (i = 0; i < 2; i++)+    printf(" %lld,", vec_actual[i]);+  printf("\n");+}++int result_wrong_u128 (vector __uint128_t vec_expected,+		       vector __uint128_t vec_actual)+{+  int i;++    if (vec_expected[0] != vec_actual[0])+      return TRUE;++  return FALSE;+}++void print_u128 (vector __uint128_t vec_expected,+		 vector __uint128_t vec_actual)+{+  printf("expected uint128 data\n");+  printf(" %llu%llu\n", (unsigned long long)(vec_expected[0] >> 64),+	 (unsigned long long)(vec_expected[0] & 0xFFFFFFFFFFFFFFFF));++  printf("\nactual uint128 data\n");+  printf(" %llu%llu\n", (unsigned long long)(vec_actual[0] >> 64),+	 (unsigned long long)(vec_actual[0] & 0xFFFFFFFFFFFFFFFF));+}+++int result_wrong_s128 (vector __int128_t vec_expected,+		       vector __int128_t vec_actual)+{+  int i;++    if (vec_expected[0] != vec_actual[0])+      return TRUE;++  return FALSE;+}++void print_s128 (vector __int128 vec_expected,+		 vector __int128 vec_actual)+{+  printf("expected int128 data\n");+  printf(" %lld%llu\n", (signed long long)(vec_expected[0] >> 64),+	 (unsigned long long)(vec_expected[0] & 0xFFFFFFFFFFFFFFFF));++  printf("\nactual int128 data\n");+  printf(" %lld%llu\n", (signed long long)(vec_actual[0] >> 64),+	 (unsigned long long)(vec_actual[0] & 0xFFFFFFFFFFFFFFFF));+}++int result_wrong_d (vector double vec_expected,+		    vector double vec_actual)+{+  int i;++  for (i = 0; i < 2; i++)+    if (vec_expected[i] != vec_actual[i])+      return TRUE;++  return FALSE;+}++void print_d (vector double vec_expected,+	      vector double vec_actual)+{+  int i;++  printf("expected double data\n");+  for (i = 0; i < 2; i++)+    printf(" %f,", vec_expected[i]);++  printf("\nactual double data\n");+  for (i = 0; i < 2; i++)+    printf(" %f,", vec_actual[i]);+  printf("\n");+}++int result_wrong_f (vector float vec_expected,+		    vector float vec_actual)+{+  int i;++  for (i = 0; i < 4; i++)+    if (vec_expected[i] != vec_actual[i])+      return TRUE;++  return FALSE;+}++void print_f (vector float vec_expected,+	      vector float vec_actual)+{+  int i;++  printf("expected float data\n");+  for (i = 0; i < 4; i++)+    printf(" %f,", vec_expected[i]);++  printf("\nactual float data\n");+  for (i = 0; i < 4; i++)+    printf(" %f,", vec_actual[i]);+  printf("\n");+}++int main() {+   int i, j;+   size_t len;+   vector signed char store_data_sc;+   vector unsigned char store_data_uc;+   vector signed int store_data_si;+   vector unsigned int store_data_ui;+   vector __int128_t store_data_s128;+   vector __uint128_t store_data_u128;+   vector signed long long int store_data_sll;+   vector unsigned long long int store_data_ull;+   vector signed short store_data_ss;+   vector unsigned short store_data_us;+   vector double store_data_d;+   vector float store_data_f;++   signed char *address_sc;+   unsigned char *address_uc;+   signed int *address_si;+   unsigned int *address_ui;+   __int128_t *address_s128;+   __uint128_t *address_u128;+   signed long long int *address_sll;+   unsigned long long int *address_ull;+   signed short int *address_ss;+   unsigned short int *address_us;+   double *address_d;+   float *address_f;++   vector unsigned char *datap;++   vector unsigned char vec_uc_expected1, vec_uc_result1;+   vector signed char vec_sc_expected1, vec_sc_result1;+   vector signed int vec_si_expected1, vec_si_result1;+   vector unsigned int vec_ui_expected1, vec_ui_result1;+   vector __int128_t vec_s128_expected1, vec_s128_result1;+   vector __uint128_t vec_u128_expected1, vec_u128_result1;+   vector signed long long int vec_sll_expected1, vec_sll_result1;+   vector unsigned long long int vec_ull_expected1, vec_ull_result1;+   vector signed short int vec_ss_expected1, vec_ss_result1;+   vector unsigned short int vec_us_expected1, vec_us_result1;+   vector double vec_d_expected1, vec_d_result1;+   vector float vec_f_expected1, vec_f_result1;++   signed long long disp;++   /* VEC_XST */+   disp = 0;+   vec_sc_expected1 = (vector signed char){ -7, -6, -5, -4, -3, -2, -1, 0,+					    1, 2, 3, 4, 5, 6, 7, 8 };+   store_data_sc = (vector signed char){  -7, -6, -5, -4, -3, -2, -1, 0,+					  1, 2, 3, 4, 5, 6, 7, 8 };++   for (i=0; i<16; i++)+     vec_sc_result1[i] = 0;++   address_sc = &vec_sc_result1[0];++   vec_xst (store_data_sc, disp, address_sc);++   if (result_wrong_sc (vec_sc_expected1, vec_sc_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst, sc disp = 0, result does not match expected result\n");+       print_sc (vec_sc_expected1, vec_sc_result1);+#else+       abort();+#endif+     }++   disp = 2;+   vec_sc_expected1 = (vector signed char){  0, 0, -7, -6, -5, -4, -3, -2,+					     -1, 0, 1, 2, 3, 4, 5, 6 };+   store_data_sc = (vector signed char){ -7, -6, -5, -4, -3, -2, -1, 0,+					 1, 2, 3, 4, 5, 6, 7, 8 };++   for (i=0; i<16; i++)+     vec_sc_result1[i] = 0;++   address_sc = &vec_sc_result1[0];++   vec_xst (store_data_sc, disp, address_sc);++   if (result_wrong_sc (vec_sc_expected1, vec_sc_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst, sc disp = 2, result does not match expected result\n");+       print_sc (vec_sc_expected1, vec_sc_result1);+#else+       abort();+#endif+     }++   disp = 0;+   vec_uc_expected1 = (vector unsigned char){ 0, 1, 2, 3, 4, 5, 6, 7,+					      8, 9, 10, 11, 12, 13, 14, 15 };+   store_data_uc = (vector unsigned char){ 0, 1, 2, 3, 4, 5, 6, 7,+					   8, 9, 10, 11, 12, 13, 14, 15 };++   for (i=0; i<16; i++)+     vec_uc_result1[i] = 0;++   address_uc = &vec_uc_result1[0];++   vec_xst (store_data_uc, disp, address_uc);++   if (result_wrong_uc (vec_uc_expected1, vec_uc_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst, uc disp = 0, result does not match expected result\n");+       print_uc (vec_uc_expected1, vec_uc_result1);+#else+       abort();+#endif+     }++   disp = 0;+   vec_ss_expected1 = (vector signed short int){ -4, -3, -2, -1, 0, 1, 2, 3 };+   store_data_ss = (vector signed short int){ -4, -3, -2, -1, 0, 1, 2, 3 };++   for (i=0; i<8; i++)+     vec_ss_result1[i] = 0;++   address_ss = &vec_ss_result1[0];++   vec_xst (store_data_ss, disp, address_ss);++   if (result_wrong_ss (vec_ss_expected1, vec_ss_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst, ss disp = 0, result does not match expected result\n");+       print_ss (vec_ss_expected1, vec_ss_result1);+#else+       abort();+#endif+     }++   disp = 0;+   vec_us_expected1 = (vector unsigned short int){ 0, 1, 2, 3, 4, 5, 6, 7 };+   store_data_us = (vector unsigned short int){ 0, 1, 2, 3, 4, 5, 6, 7 };++   for (i=0; i<8; i++)+     vec_us_result1[i] = 0;++   address_us = &vec_us_result1[0];++   vec_xst (store_data_us, disp, address_us);++   if (result_wrong_us (vec_us_expected1, vec_us_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst, us disp = 0, result does not match expected result\n");+       print_us (vec_us_expected1, vec_us_result1);+#else+       abort();+#endif+     }++   disp = 0;+   vec_si_expected1 = (vector signed int){ -2, -1, 0, 1 };+   store_data_si = (vector signed int){ -2, -1, 0, 1 };++   for (i=0; i<4; i++)+     vec_si_result1[i] = 0;++   address_si = &vec_si_result1[0];++   vec_xst (store_data_si, disp, address_si);++   if (result_wrong_si (vec_si_expected1, vec_si_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst, si disp = 0, result does not match expected result\n");+       print_si (vec_si_expected1, vec_si_result1);+#else+       abort();+#endif+     }++   disp = 0;+   vec_ui_expected1 = (vector unsigned int){ -2, -1, 0, 1 };+   store_data_ui = (vector unsigned int){ -2, -1, 0, 1 };++   for (i=0; i<4; i++)+     vec_ui_result1[i] = 0;++   address_ui = &vec_ui_result1[0];++   vec_xst (store_data_ui, disp, address_ui);++   if (result_wrong_ui (vec_ui_expected1, vec_ui_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst, ui disp = 0, result does not match expected result\n");+       print_ui (vec_ui_expected1, vec_ui_result1);+#else+       abort();+#endif+     }++   disp = 0;+   vec_sll_expected1 = (vector signed long long){ -1, 0 };+   store_data_sll = (vector signed long long ){ -1, 0 };++   for (i=0; i<2; i++)+     vec_sll_result1[i] = 0;++   address_sll = (signed long long *)(&vec_sll_result1[0]);++   vec_xst (store_data_sll, disp, address_sll);++   if (result_wrong_sll (vec_sll_expected1, vec_sll_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst, sll disp = 0, result does not match expected result\n");+       print_sll (vec_sll_expected1, vec_sll_result1);+#else+       abort();+#endif+     }++   disp = 0;+   vec_ull_expected1 = (vector unsigned long long){ 0, 1 };+   store_data_ull = (vector unsigned long long){  0, 1 };++   for (i=0; i<2; i++)+     vec_ull_result1[i] = 0;++   address_ull = (unsigned long long int *)(&vec_ull_result1[0]);++   vec_xst (store_data_ull, disp, address_ull);++   if (result_wrong_ull (vec_ull_expected1, vec_ull_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst, ull disp = 0, result does not match expected result\n");+       print_ull (vec_ull_expected1, vec_ull_result1);+#else+       abort();+#endif+     }++   disp = 0;+   vec_s128_expected1 = (vector __int128_t){ 12345 };+   store_data_s128 = (vector __int128_t){  12345 };++   vec_s128_result1[0] = 0;++   address_s128 = (__int128_t *)(&vec_s128_result1[0]);++   vec_xst (store_data_s128, disp, address_s128);++   if (result_wrong_s128 (vec_s128_expected1, vec_s128_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst, s128 disp = 0, result does not match expected result\n");+       print_s128 (vec_s128_expected1, vec_s128_result1);+#else+       abort();+#endif+     }++   disp = 0;+   vec_u128_expected1 = (vector __uint128_t){ 12345 };+   store_data_u128 = (vector __uint128_t){  12345 };++   vec_u128_result1[0] = 0;++   address_u128 = (__int128_t *)(&vec_u128_result1[0]);++   vec_xst (store_data_u128, disp, address_u128);++   if (result_wrong_u128 (vec_u128_expected1, vec_u128_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst, u128 disp = 0, result does not match expected result\n");+       print_u128 (vec_u128_expected1, vec_u128_result1);+#else+       abort();+#endif+     }++   disp = 0;+   vec_d_expected1 = (vector double){ 0, 1 };+   store_data_d = (vector double){  0, 1 };++   for (i=0; i<2; i++)+     vec_d_result1[i] = 0;++   address_d = (double *)(&vec_d_result1[0]);++   vec_xst (store_data_d, disp, address_d);++   if (result_wrong_d (vec_d_expected1, vec_d_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst, double disp = 0, result does not match expected result\n");+       print_d (vec_d_expected1, vec_d_result1);+#else+       abort();+#endif+     }++   disp = 0;+   vec_f_expected1 = (vector float){ 0, 1 };+   store_data_f = (vector float){  0, 1 };++   for (i=0; i<4; i++)+     vec_f_result1[i] = 0;++   address_f = (float *)(&vec_f_result1[0]);++   vec_xst (store_data_f, disp, address_f);++   if (result_wrong_f (vec_f_expected1, vec_f_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst, float disp = 0, result does not match expected result\n");+       print_f (vec_f_expected1, vec_f_result1);+#else+       abort();+#endif+     }++   /* VEC_XST_BE, these always load in BE order regardless of+      machine endianess.  */+   disp = 0;+#ifdef __BIG_ENDIAN__+   vec_sc_expected1 = (vector signed char){ -7, -6, -5, -4, -3, -2, -1, 0,+					    1, 2, 3, 4, 5, 6, 7, 8 };+#else+   vec_sc_expected1 = (vector signed char){ 8, 7, 6, 5, 4, 3, 2, 1,+					    0, -1, -2, -3, -4, -5, -6, -7 };+#endif+   store_data_sc = (vector signed char){  -7, -6, -5, -4, -3, -2, -1, 0,+					  1, 2, 3, 4, 5, 6, 7, 8 };++   for (i=0; i<16; i++)+     vec_sc_result1[i] = 0;++   address_sc = &vec_sc_result1[0];++   vec_xst_be (store_data_sc, disp, address_sc);++   if (result_wrong_sc (vec_sc_expected1, vec_sc_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst_be, sc disp = 0, result does not match expected result\n");+       print_sc (vec_sc_expected1, vec_sc_result1);+#else+       abort();+#endif+     }++   disp = 2;+#ifdef __BIG_ENDIAN__+   vec_sc_expected1 = (vector signed char){  0, 0, -7, -6, -5, -4, -3, -2,+					     -1, 0, 1, 2, 3, 4, 5, 6 };+#else+   vec_sc_expected1 = (vector signed char){  0, 0, 8, 7, 6, 5, 4, 3,+					     2, 1, 0, -1, -2, -3, -4, -5 };+#endif+   store_data_sc = (vector signed char){ -7, -6, -5, -4, -3, -2, -1, 0,+					 1, 2, 3, 4, 5, 6, 7, 8 };++   for (i=0; i<16; i++)+     vec_sc_result1[i] = 0;++   address_sc = &vec_sc_result1[0];++   vec_xst_be (store_data_sc, disp, address_sc);++   if (result_wrong_sc (vec_sc_expected1, vec_sc_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst_be, sc disp = 2, result does not match expected result\n");+       print_sc (vec_sc_expected1, vec_sc_result1);+#else+       abort();+#endif+     }++   disp = 0;+#ifdef __BIG_ENDIAN__+   vec_uc_expected1 = (vector unsigned char){ 0, 1, 2, 3, 4, 5, 6, 7,+					      8, 9, 10, 11, 12, 13, 14, 15 };+#else+   vec_uc_expected1 = (vector unsigned char){ 15, 14, 13, 12, 11, 10, 9, 8,+					      7, 6, 5, 4, 3, 2, 1 };+#endif+   store_data_uc = (vector unsigned char){ 0, 1, 2, 3, 4, 5, 6, 7,+					   8, 9, 10, 11, 12, 13, 14, 15 };++   for (i=0; i<16; i++)+     vec_uc_result1[i] = 0;++   address_uc = &vec_uc_result1[0];++   vec_xst_be (store_data_uc, disp, address_uc);++   if (result_wrong_uc (vec_uc_expected1, vec_uc_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst_be, uc disp = 0, result does not match expected result\n");+       print_uc (vec_uc_expected1, vec_uc_result1);+#else+       abort();+#endif+     }++   disp = 0;+#ifdef __BIG_ENDIAN__+   vec_ss_expected1 = (vector signed short int){ -4, -3, -2, -1, 0, 1, 2, 3 };+#else+   vec_ss_expected1 = (vector signed short int){ 3, 2, 1, 0, -1, -2, -3, -4 };+#endif+   store_data_ss = (vector signed short int){ -4, -3, -2, -1, 0, 1, 2, 3 };++   for (i=0; i<8; i++)+     vec_ss_result1[i] = 0;++   address_ss = &vec_ss_result1[0];++   vec_xst_be (store_data_ss, disp, address_ss);++   if (result_wrong_ss (vec_ss_expected1, vec_ss_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst_be, ss disp = 0, result does not match expected result\n");+       print_ss (vec_ss_expected1, vec_ss_result1);+#else+       abort();+#endif+     }++   disp = 0;+#ifdef __BIG_ENDIAN__+   vec_us_expected1 = (vector unsigned short int){ 0, 1, 2, 3, 4, 5, 6, 7 };+#else+   vec_us_expected1 = (vector unsigned short int){ 7, 6, 5, 4, 3, 2, 1, 0 };+#endif+   store_data_us = (vector unsigned short int){ 0, 1, 2, 3, 4, 5, 6, 7 };++   for (i=0; i<8; i++)+     vec_us_result1[i] = 0;++   address_us = &vec_us_result1[0];++   vec_xst_be (store_data_us, disp, address_us);++   if (result_wrong_us (vec_us_expected1, vec_us_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst_be, us disp = 0, result does not match expected result\n");+       print_us (vec_us_expected1, vec_us_result1);+#else+       abort();+#endif+     }++#if 0+   disp = 0;+#ifdef __BIG_ENDIAN__+   vec_si_expected1 = (vector signed int){ -2, -1, 0, 1 };+#else+   vec_si_expected1 = (vector signed int){ 1, 0, -1, -2 };+#endif+   store_data_si = (vector signed int){ -2, -1, 0, 1 };++   for (i=0; i<4; i++)+     vec_si_result1[i] = 0;++   address_si = &vec_si_result1[0];++   vec_xst_be (store_data_si, disp, address_si);+   if (result_wrong_si (vec_si_expected1, vec_si_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst_be, si disp = 0, result does not match expected result\n");+       print_si (vec_si_expected1, vec_si_result1);+#else+       abort();+#endif+     }+#endif++#if 0+   disp = 0;+#ifdef __BIG_ENDIAN__+   vec_ui_expected1 = (vector unsigned int){ -2, -1, 0, 1 };+#else+   vec_ui_expected1 = (vector unsigned int){ 1, 0, -1, -2 };+#endif+   store_data_ui = (vector unsigned int){ -2, -1, 0, 1 };++   for (i=0; i<4; i++)+     vec_ui_result1[i] = 0;++   address_ui = &vec_ui_result1[0];++   vec_xst_be (store_data_ui, disp, address_ui);++   if (result_wrong_ui (vec_ui_expected1, vec_ui_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst_be, ui disp = 0, result does not match expected result\n");+       print_ui (vec_ui_expected1, vec_ui_result1);+#else+       abort();+#endif+     }+#endif+   +   disp = 0;+#ifdef __BIG_ENDIAN__+   vec_sll_expected1 = (vector signed long long){ -1, 0 };+#else+   vec_sll_expected1 = (vector signed long long){ 0, -1 };+#endif+   store_data_sll = (vector signed long long ){ -1, 0 };++   for (i=0; i<2; i++)+     vec_sll_result1[i] = 0;++   address_sll = (signed long long *)(&vec_sll_result1[0]);++   vec_xst_be (store_data_sll, disp, address_sll);++   if (result_wrong_sll (vec_sll_expected1, vec_sll_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst_be, sll disp = 0, result does not match expected result\n");+       print_sll (vec_sll_expected1, vec_sll_result1);+#else+       abort();+#endif+     }++   disp = 0;+#ifdef __BIG_ENDIAN__+   vec_ull_expected1 = (vector unsigned long long){ 0, 1234567890123456 };+#else+   vec_ull_expected1 = (vector unsigned long long){1234567890123456, 0 };+#endif   +   store_data_ull = (vector unsigned long long){  0, 1234567890123456 };++   for (i=0; i<2; i++)+     vec_ull_result1[i] = 0;++   address_ull = (unsigned long long int *)(&vec_ull_result1[0]);++   vec_xst_be (store_data_ull, disp, address_ull);++   if (result_wrong_ull (vec_ull_expected1, vec_ull_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst_be, ull disp = 0, result does not match expected result\n");+       print_ull (vec_ull_expected1, vec_ull_result1);+#else+       abort();+#endif+     }++   disp = 0;++#ifdef __BIG_ENDIAN__+   vec_s128_expected1 = (vector __int128_t){ (__uint128_t)12345678911121314 };+#else+   vec_s128_expected1 = (vector __int128_t){ (__uint128_t)12345678911121314 };+#endif+   store_data_s128 = (vector __int128_t)(__uint128_t){  12345678911121314 };++   vec_s128_result1[0] = 0;++   address_s128 = (__int128_t *)(&vec_s128_result1[0]);++   vec_xst_be (store_data_s128, disp, address_s128);++   if (res0001-Add-tests-for-vec_xl-vec_xl_be-vec_xst-vec_xst_be-bu.patchult_wrong_s128 (vec_s128_expected1, vec_s128_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst_be, s128 disp = 0, result does not match expected result\n");+       print_s128 (vec_s128_expected1, vec_s128_result1);+#else+       abort();+#endif+     }++   disp = 0;+#ifdef __BIG_ENDIAN__+   vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)1234567891112131415 };+#else+   vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)1234567891112131415 };+#endif+   store_data_u128 = (vector __uint128_t){ (__uint128_t)1234567891112131415 };++   vec_u128_result1[0] = 0;++   address_u128 = (__int128_t *)(&vec_u128_result1[0]);++   vec_xst_be (store_data_u128, disp, address_u128);++   if (result_wrong_u128 (vec_u128_expected1, vec_u128_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst_be, u128 disp = 0, result does not match expected result\n");+       print_u128 (vec_u128_expected1, vec_u128_result1);+#else+       abort();+#endif+     }++   disp = 0;+#ifdef __BIG_ENDIAN__+   vec_d_expected1 = (vector double){ 0.0, 1.1 };+#else+   vec_d_expected1 = (vector double){ 1.1, 0.0 };+#endif+   store_data_d = (vector double){  0.0, 1.1 };++   for (i=0; i<2; i++)+     vec_d_result1[i] = 0;++   address_d = (double *)(&vec_d_result1[0]);++   vec_xst_be (store_data_d, disp, address_d);++   if (result_wrong_d (vec_d_expected1, vec_d_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst_be, double disp = 0, result does not match expected result\n");+       print_d (vec_d_expected1, vec_d_result1);+#else+       abort();+#endif+     }++#if 0+   disp = 0;+#ifdef __BIG_ENDIAN__+   vec_f_expected1 = (vector float){ 0.0, 1.2, 2.3, 3.4 };+#else+   vec_f_expected1 = (vector float){ 3.4, 2.3, 1.2, 0.0 };+#endif+   store_data_f = (vector float){ 0.0, 1.2, 2.3, 3.4 };++   for (i=0; i<4; i++)+     vec_f_result1[i] = 0;++   address0001-Add-tests-for-vec_xl-vec_xl_be-vec_xst-vec_xst_be-bu.patch_f = (float *)(&vec_f_result1[0]);++   vec_xst_be (store_data_f, disp, address_f);++   if (result_wrong_f (vec_f_expected1, vec_f_result1))+     {+#ifdef DEBUG+       printf("Error: vec_xst_be, float disp = 0, result does not match expected result\n");+       print_f (vec_f_expected1, vec_f_result1);+#else+       abort();+#endif+     }+#endif+}c_select:V8HI
@@ -1320,7 +1354,7 @@ (define_expand "vsx_st_elemrev_v16qi"
 {
   if (!TARGET_P9_VECTOR)
     {
-      rtx subreg, perm[16], pcv;
+      rtx mem_subreg, subreg, perm[16], pcv;
       rtx tmp = gen_reg_rtx (V16QImode);
       /* 3 is leftmost element in register */
       unsigned int reorder[16] = {12,13,14,15,8,9,10,11,4,5,6,7,0,1,2,3};
@@ -1335,7 +1369,8 @@ (define_expand "vsx_st_elemrev_v16qi"
       emit_insn (gen_altivec_vperm_v16qi_direct (tmp, operands[1],
                                                  operands[1], pcv));
       subreg = simplify_gen_subreg (V4SImode, tmp, V16QImode, 0);
-      emit_insn (gen_vsx_st_elemrev_v4si (subreg, operands[0]));
+      mem_subreg = simplify_gen_subreg (V4SImode, operands[0], V16QImode, 0);
+      emit_insn (gen_vsx_st_elemrev_v4si (mem_subreg, subreg));
       DONE;
     }
 })


diff --git a/gcc/testsuite/gcc.target/powerpc/powerpc.exp b/gcc/testsuite/gcc.target/powerpc/powerpc.exp
index 93b3239b3..148acb1a1 100644
--- a/gcc/testsuite/gcc.target/powerpc/powerpc.exp
+++ b/gcc/testsuite/gcc.target/powerpc/powerpc.exp
@@ -49,4 +49,16 @@ gcc-dg-runtest [list $srcdir/$subdir/savres.c] "" $alti
 
 # All done.
 torture-finish
+
+torture-init 
+# Test load/store builtins at multiple optimizations                    <<  Comment change
+set-torture-options [list -O0 -Os -O1 -O2 -O3]                          <<  Optimization levels
+gcc-dg-runtest [list $srcdir/$subdir/builtins-4-runnable.c \
+		$srcdir/$subdir/builtins-6-runnable.c \                      
+		$srcdir/$subdir/builtins-5-p9-runnable.c \
+	       	$srcdir/$subdir/builtins-6-p9-runnable.c] "" $DEFAULT_CFLAGS
+
+# All done.
+torture-finish
+
 dg-finish
-- 
2.11.0
Carl Love Jan. 22, 2018, 5:06 p.m. | #5
Segher:

I put back the xxpermdi,2 stuff.  Per our private discussion about the
parallel [(const_int 0)], I found that I could get GCC to compile
without parallel.  GCC worked with a -O0 on the test case but I got and
IRC when using -O1.  So, I had to put the parallel back in.  The patch
is now works for -O0, -O1, -O2 and -O3.  

I have completed the regression testing again for 

  powerpc64le-unknown-linux-gnu (Power 8 LE)
  powerpc64le-unknown-linux-gnu (Power 8 BE)
  powerpc64le-unknown-linux-gnu (Power 9 LE)

and all looks good.

So, just for the record, here is the final patch that will get
committed.

                     Carl Love
--------------------------------------------------------------------------------

gcc/ChangeLog:

2018-01-22 Carl Love <cel@us.ibm.com>

	* config/rs6000/rs6000-builtin.def (ST_ELEMREV_V1TI, LD_ELEMREV_V1TI,
	LVX_V1TI): Add macro expansion.
	* config/rs6000/rs6000-c.c (altivec_builtin_types): Add argument
	definitions for VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_VEC_ST,
	VSX_BUILTIN_VEC_XL, LD_ELEMREV_V1TI builtins.
	* config/rs6000/rs6000-p8swap.c (insn_is_swappable_p);
	Change check to determine if the instruction is a byte reversing
	entry.  Fix typo in comment.
	* config/rs6000/rs6000.c (altivec_expand_builtin): Add case entry
	for VSX_BUILTIN_ST_ELEMREV_V1TI and VSX_BUILTIN_LD_ELEMREV_V1TI.
	Add def_builtin calls for new builtins.
	* config/rs6000/vsx.md (vsx_st_elemrev_v1ti, vsx_ld_elemrev_v1ti):
	Add define_insn expansion.

gcc/testsuite/ChangeLog:

2018-01-22  Carl Love  <cel@us.ibm.com>
	* gcc.target/powerpc/powerpc.exp: Add torture tests for
	builtins-4-runnable.c, builtins-6-runnable.c,
	builtins-5-p9-runnable.c, builtins-6-p9-runnable.c.
	* gcc.target/powerpc/builtins-6-runnable.c: New test file.
	* gcc.target/powerpc/builtins-4-runnable.c: Add additional tests
	for signed/unsigned 128-bit and long long int loads.
---
 gcc/config/rs6000/rs6000-builtin.def               |    3 +
 gcc/config/rs6000/rs6000-c.c                       |   39 +
 gcc/config/rs6000/rs6000-p8swap.c                  |    5 +-
 gcc/config/rs6000/rs6000.c                         |   20 +
 gcc/config/rs6000/vsx.md                           |   42 +-
 .../gcc.target/powerpc/builtins-4-runnable.c       |  494 +++++++++-
 .../gcc.target/powerpc/builtins-6-runnable.c       | 1001 ++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/powerpc.exp       |   12 +
 8 files changed, 1581 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-6-runnable.c

diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index b17036c5a..757fd6d50 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1234,6 +1234,7 @@ BU_ALTIVEC_X (LVXL_V8HI,	"lvxl_v8hi",	    MEM)
 BU_ALTIVEC_X (LVXL_V16QI,	"lvxl_v16qi",	    MEM)
 BU_ALTIVEC_X (LVX,		"lvx",		    MEM)
 BU_ALTIVEC_X (LVX_V2DF,		"lvx_v2df",	    MEM)
+BU_ALTIVEC_X (LVX_V1TI,		"lvx_v1ti",	    MEM)
 BU_ALTIVEC_X (LVX_V2DI,		"lvx_v2di",	    MEM)
 BU_ALTIVEC_X (LVX_V4SF,		"lvx_v4sf",	    MEM)
 BU_ALTIVEC_X (LVX_V4SI,		"lvx_v4si",	    MEM)
@@ -1783,12 +1784,14 @@ BU_VSX_X (STXVW4X_V4SF,	      "stxvw4x_v4sf",	MEM)
 BU_VSX_X (STXVW4X_V4SI,	      "stxvw4x_v4si",	MEM)
 BU_VSX_X (STXVW4X_V8HI,	      "stxvw4x_v8hi",	MEM)
 BU_VSX_X (STXVW4X_V16QI,      "stxvw4x_v16qi",	MEM)
+BU_VSX_X (LD_ELEMREV_V1TI,    "ld_elemrev_v1ti",  MEM)
 BU_VSX_X (LD_ELEMREV_V2DF,    "ld_elemrev_v2df",  MEM)
 BU_VSX_X (LD_ELEMREV_V2DI,    "ld_elemrev_v2di",  MEM)
 BU_VSX_X (LD_ELEMREV_V4SF,    "ld_elemrev_v4sf",  MEM)
 BU_VSX_X (LD_ELEMREV_V4SI,    "ld_elemrev_v4si",  MEM)
 BU_VSX_X (LD_ELEMREV_V8HI,    "ld_elemrev_v8hi",  MEM)
 BU_VSX_X (LD_ELEMREV_V16QI,   "ld_elemrev_v16qi", MEM)
+BU_VSX_X (ST_ELEMREV_V1TI,    "st_elemrev_v1ti",  MEM)
 BU_VSX_X (ST_ELEMREV_V2DF,    "st_elemrev_v2df",  MEM)
 BU_VSX_X (ST_ELEMREV_V2DI,    "st_elemrev_v2di",  MEM)
 BU_VSX_X (ST_ELEMREV_V4SF,    "st_elemrev_v4sf",  MEM)
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 123e46aa1..a0f790d39 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -3149,16 +3149,27 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DF,
     RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_INTTI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_V1TI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTTI, 0 },
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_INTDI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
     ~RS6000_BTI_unsigned_V2DI, 0 },
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
     ~RS6000_BTI_unsigned_long_long, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTDI, 0 },
+
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SF,
     RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SF,
@@ -3193,6 +3204,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
   { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DF,
     RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 },
+  { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_INTTI, 0 },
+  { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTTI, 0 },
   { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DI,
     RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DI,
@@ -4076,6 +4091,10 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF },
   { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DF,
     RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double },
+  { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V1TI,
+    RS6000_BTI_void, RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_INTTI },
+  { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V1TI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTTI },
   { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DI,
     RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI },
   { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DI,
@@ -4177,9 +4196,19 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_INTTI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTTI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
     ~RS6000_BTI_unsigned_V2DI, 0 },
   { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_long_long, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V2DI, 0 },
   { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SF,
     RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
@@ -4231,6 +4260,16 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DF,
     RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double },
   { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
+    RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTDI,
+    ~RS6000_BTI_long_long },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTDI,
+    ~RS6000_BTI_unsigned_long_long },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V1TI,
+    RS6000_BTI_void, RS6000_BTI_V1TI, RS6000_BTI_INTDI, ~RS6000_BTI_INTTI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V1TI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTDI, ~RS6000_BTI_UINTTI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
     RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI },
   { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
     RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
diff --git a/gcc/config/rs6000/rs6000-p8swap.c b/gcc/config/rs6000/rs6000-p8swap.c
index cb88ffbb5..f5f046720 100644
--- a/gcc/config/rs6000/rs6000-p8swap.c
+++ b/gcc/config/rs6000/rs6000-p8swap.c
@@ -734,10 +734,11 @@ insn_is_swappable_p (swap_web_entry *insn_entry, rtx insn,
   if (insn_entry[i].is_store)
     {
       if (GET_CODE (body) == SET
-	  && GET_CODE (SET_SRC (body)) != UNSPEC)
+	  && GET_CODE (SET_SRC (body)) != UNSPEC
+	  && GET_CODE (SET_SRC (body)) != VEC_SELECT)
 	{
 	  rtx lhs = SET_DEST (body);
-	  /* Even without a swap, the LHS might be a vec_select for, say,
+	  /* Even without a swap, the RHS might be a vec_select for, say,
 	     a byte-reversing store.  */
 	  if (GET_CODE (lhs) != MEM)
 	    return 0;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 46e00dd9a..b96f5ea45 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15572,6 +15572,12 @@ altivec_expand_builtin (tree exp, rtx target, bool *expandedp)
        unaligned-supporting store, so use a generic expander.  For
        little-endian, the exact element-reversing instruction must
        be used.  */
+   case VSX_BUILTIN_ST_ELEMREV_V1TI:
+     {
+        enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v1ti
+			       : CODE_FOR_vsx_st_elemrev_v1ti);
+        return altivec_expand_stv_builtin (code, exp);
+      }
     case VSX_BUILTIN_ST_ELEMREV_V2DF:
       {
 	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v2df
@@ -15846,6 +15852,12 @@ altivec_expand_builtin (tree exp, rtx target, bool *expandedp)
 			       : CODE_FOR_vsx_ld_elemrev_v2df);
 	return altivec_expand_lv_builtin (code, exp, target, false);
       }
+    case VSX_BUILTIN_LD_ELEMREV_V1TI:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v1ti
+			       : CODE_FOR_vsx_ld_elemrev_v1ti);
+	return altivec_expand_lv_builtin (code, exp, target, false);
+      }
     case VSX_BUILTIN_LD_ELEMREV_V2DI:
       {
 	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v2di
@@ -17383,6 +17395,10 @@ altivec_init_builtins (void)
     = build_function_type_list (void_type_node,
 				V2DF_type_node, long_integer_type_node,
 				pvoid_type_node, NULL_TREE);
+  tree void_ftype_v1ti_long_pvoid
+    = build_function_type_list (void_type_node,
+				V1TI_type_node, long_integer_type_node,
+				pvoid_type_node, NULL_TREE);
   tree void_ftype_v2di_long_pvoid
     = build_function_type_list (void_type_node,
 				V2DI_type_node, long_integer_type_node,
@@ -17538,6 +17554,8 @@ altivec_init_builtins (void)
 	       VSX_BUILTIN_LD_ELEMREV_V16QI);
   def_builtin ("__builtin_vsx_st_elemrev_v2df", void_ftype_v2df_long_pvoid,
 	       VSX_BUILTIN_ST_ELEMREV_V2DF);
+  def_builtin ("__builtin_vsx_st_elemrev_v1ti", void_ftype_v1ti_long_pvoid,
+	       VSX_BUILTIN_ST_ELEMREV_V1TI);
   def_builtin ("__builtin_vsx_st_elemrev_v2di", void_ftype_v2di_long_pvoid,
 	       VSX_BUILTIN_ST_ELEMREV_V2DI);
   def_builtin ("__builtin_vsx_st_elemrev_v4sf", void_ftype_v4sf_long_pvoid,
@@ -17861,6 +17879,8 @@ altivec_init_builtins (void)
 	= build_function_type_list (void_type_node,
 				    V1TI_type_node, long_integer_type_node,
 				    pvoid_type_node, NULL_TREE);
+      def_builtin ("__builtin_vsx_ld_elemrev_v1ti", v1ti_ftype_long_pcvoid,
+		   VSX_BUILTIN_LD_ELEMREV_V1TI);
       def_builtin ("__builtin_vsx_lxvd2x_v1ti", v1ti_ftype_long_pcvoid,
 		   VSX_BUILTIN_LXVD2X_V1TI);
       def_builtin ("__builtin_vsx_stxvd2x_v1ti", void_ftype_v1ti_long_pvoid,
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 03f8ec2d6..97add65a4 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1093,6 +1093,17 @@ (define_insn "vsx_ld_elemrev_v2di"
   "lxvd2x %x0,%y1"
   [(set_attr "type" "vecload")])
 
+(define_insn "vsx_ld_elemrev_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa")
+        (vec_select:V1TI
+	  (match_operand:V1TI 1 "memory_operand" "Z")
+	  (parallel [(const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V1TImode) && !BYTES_BIG_ENDIAN"
+{
+   return "lxvd2x %x0,%y1\;xxpermdi %x0,%x0,%x0,2";
+}
+  [(set_attr "type" "vecload")])
+
 (define_insn "vsx_ld_elemrev_v2df"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa")
         (vec_select:V2DF
@@ -1222,6 +1233,18 @@ (define_insn "*vsx_ld_elemrev_v16qi_internal"
   "lxvb16x %x0,%y1"
   [(set_attr "type" "vecload")])
 
+(define_insn "vsx_st_elemrev_v1ti"
+  [(set (match_operand:V1TI 0 "memory_operand" "=Z")
+        (vec_select:V1TI
+          (match_operand:V1TI 1 "vsx_register_operand" "+wa")
+          (parallel [(const_int 0)])))
+   (clobber (match_dup 1))]
+  "VECTOR_MEM_VSX_P (V2DImode) && !BYTES_BIG_ENDIAN"
+{
+  return "xxpermdi %x1,%x1,%x1,2\;stxvd2x %x1,%y0";
+}
+  [(set_attr "type" "vecstore")])
+
 (define_insn "vsx_st_elemrev_v2df"
   [(set (match_operand:V2DF 0 "memory_operand" "=Z")
         (vec_select:V2DF
@@ -1272,7 +1295,7 @@ (define_expand "vsx_st_elemrev_v8hi"
 {
   if (!TARGET_P9_VECTOR)
     {
-      rtx subreg, perm[16], pcv;
+      rtx mem_subreg, subreg, perm[16], pcv;
       rtx tmp = gen_reg_rtx (V8HImode);
       /* 2 is leftmost element in register */
       unsigned int reorder[16] = {13,12,15,14,9,8,11,10,5,4,7,6,1,0,3,2};
@@ -1287,11 +1310,21 @@ (define_expand "vsx_st_elemrev_v8hi"
       emit_insn (gen_altivec_vperm_v8hi_direct (tmp, operands[1],
                                                 operands[1], pcv));
       subreg = simplify_gen_subreg (V4SImode, tmp, V8HImode, 0);
-      emit_insn (gen_vsx_st_elemrev_v4si (subreg, operands[0]));
+      mem_subreg = simplify_gen_subreg (V4SImode, operands[0], V8HImode, 0);
+      emit_insn (gen_vsx_st_elemrev_v4si (mem_subreg, subreg));
       DONE;
     }
 })
 
+(define_insn "*vsx_st_elemrev_v2di_internal"
+  [(set (match_operand:V2DI 0 "memory_operand" "=Z")
+        (vec_select:V2DI
+          (match_operand:V2DI 1 "vsx_register_operand" "wa")
+          (parallel [(const_int 1) (const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V2DImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR"
+  "stxvd2x %x1,%y0"
+  [(set_attr "type" "vecstore")])
+
 (define_insn "*vsx_st_elemrev_v8hi_internal"
   [(set (match_operand:V8HI 0 "memory_operand" "=Z")
         (vec_select:V8HI
@@ -1320,7 +1353,7 @@ (define_expand "vsx_st_elemrev_v16qi"
 {
   if (!TARGET_P9_VECTOR)
     {
-      rtx subreg, perm[16], pcv;
+      rtx mem_subreg, subreg, perm[16], pcv;
       rtx tmp = gen_reg_rtx (V16QImode);
       /* 3 is leftmost element in register */
       unsigned int reorder[16] = {12,13,14,15,8,9,10,11,4,5,6,7,0,1,2,3};
@@ -1335,7 +1368,8 @@ (define_expand "vsx_st_elemrev_v16qi"
       emit_insn (gen_altivec_vperm_v16qi_direct (tmp, operands[1],
                                                  operands[1], pcv));
       subreg = simplify_gen_subreg (V4SImode, tmp, V16QImode, 0);
-      emit_insn (gen_vsx_st_elemrev_v4si (subreg, operands[0]));
+      mem_subreg = simplify_gen_subreg (V4SImode, operands[0], V16QImode, 0);
+      emit_insn (gen_vsx_st_elemrev_v4si (mem_subreg, subreg));
       DONE;
     }
 })
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c
index ed37424ca..de9b916de 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c
@@ -1,10 +1,13 @@
 /* { dg-do run } */
 /* { dg-require-effective-target vsx_hw } */
-/* { dg-options "-maltivec -mvsx" } */  
+/* { dg-options "-maltivec -mvsx" } */
 
 #include <inttypes.h>
 #include <altivec.h> // vector
+
+#ifdef DEBUG
 #include <stdio.h>
+#endif
 
 void abort (void);
 
@@ -24,9 +27,11 @@ int main() {
 
   float data_f[100];
   double data_d[100];
-   
+  __uint128_t data_u128[100];
+  __int128_t data_128[100];
+
   signed long long disp;
-   
+
   vector signed char vec_c_expected1, vec_c_expected2, vec_c_result1, vec_c_result2;
   vector unsigned char vec_uc_expected1, vec_uc_expected2,
     vec_uc_result1, vec_uc_result2;
@@ -42,11 +47,13 @@ int main() {
     vec_sll_result1, vec_sll_result2;
   vector unsigned long long vec_ull_expected1, vec_ull_expected2,
     vec_ull_result1, vec_ull_result2;
+  vector __int128_t vec_128_expected1, vec_128_result1;
+  vector __uint128_t vec_u128_expected1, vec_u128_result1;
   vector float vec_f_expected1, vec_f_expected2, vec_f_result1, vec_f_result2;
   vector double vec_d_expected1, vec_d_expected2, vec_d_result1, vec_d_result2;
   char buf[20];
   signed long long zero = (signed long long) 0;
-  
+
   for (i = 0; i < 100; i++)
     {
       data_c[i] = i;
@@ -59,21 +66,304 @@ int main() {
       data_ull[i] = i+1001;
       data_f[i] = i+100000.0;
       data_d[i] = i+1000000.0;
+      data_128[i] = i + 12800000;
+      data_u128[i] = i + 12800001;
     }
-  
-  disp = 0;
+
+  // vec_xl() tests
+  disp = 1;
+
+  vec_c_expected1 = (vector signed char){0, 1, 2, 3, 4, 5, 6, 7,
+					 8, 9, 10, 11, 12, 13, 14, 15};
+  vec_c_result1 = vec_xl (0, data_c);
+
+  vec_c_expected2 = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8, 9,
+					 10, 11, 12, 13, 14, 15, 16};
+  vec_c_result2 = vec_xl (disp, data_c);
+
+  vec_uc_expected1 = (vector unsigned char){1, 2, 3, 4, 5, 6, 7, 8, 9,
+					    10, 11, 12, 13, 14, 15, 16};
+  vec_uc_result1 = vec_xl (0, data_uc);
+
+  vec_uc_expected2 = (vector unsigned char){2, 3, 4, 5, 6, 7, 8, 9, 10,
+					    11, 12, 13, 14, 15, 16, 17};
+  vec_uc_result2 = vec_xl (disp, data_uc);
+
+  for (i = 0; i < 16; i++)
+    {
+      if (vec_c_result1[i] != vec_c_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_c_result1[%d] = %d; vec_c_expected1[%d] = %d\n",
+	       i,  vec_c_result1[i], i, vec_c_expected1[i]);
+#else
+	abort ();
+#endif
+      if (vec_c_result2[i] != vec_c_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_c_result2[%d] = %d; vec_c_expected2[%d] = %d\n",
+	       i,  vec_c_result2[i], i, vec_c_expected2[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_uc_result1[i] != vec_uc_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_uc_result1[%d] = %d; vec_uc_expected1[%d] = %d\n",
+	       i,  vec_uc_result1[i], i, vec_uc_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_uc_result2[i] != vec_uc_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_uc_result2[%d] = %d; vec_uc_expected2[%d] = %d\n",
+	       i,  vec_uc_result2[i], i, vec_uc_expected2[i]);
+#else
+	abort ();
+#endif
+    }
+
+  disp = 2;
+  vec_ssi_expected1 = (vector signed short){10, 11, 12, 13, 14, 15, 16, 17};
+
+  vec_ssi_result1 = vec_xl (0, data_ssi);
+
+  vec_ssi_expected2 = (vector signed short){11, 12, 13, 14, 15, 16, 17, 18};
+  vec_ssi_result2 = vec_xl (disp, data_ssi);
+
+  vec_usi_expected1 = (vector unsigned short){11, 12, 13, 14, 15, 16, 17, 18};
+  vec_usi_result1 = vec_xl (0, data_usi);
+
+  vec_usi_expected2 = (vector unsigned short){12, 13, 14, 15, 16, 17, 18, 19};
+  vec_usi_result2 = vec_xl (disp, data_usi);
+
+
+  for (i = 0; i < 8; i++)
+    {
+      if (vec_ssi_result1[i] != vec_ssi_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_ssi_result1[%d] = %d; vec_ssi_expected1[%d] = %d\n",
+	       i,  vec_ssi_result1[i], i, vec_ssi_expected1[i]);
+#else
+	abort ();
+#endif
+      if (vec_ssi_result2[i] != vec_ssi_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_ssi_result2[%d] = %d; vec_ssi_expected2[%d] = %d\n",
+	       i,  vec_ssi_result2[i], i, vec_ssi_expected2[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_usi_result1[i] != vec_usi_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_usi_result1[%d] = %d; vec_usi_expected1[%d] = %d\n",
+	       i,  vec_usi_result1[i], i, vec_usi_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_usi_result2[i] != vec_usi_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_usi_result2[%d] = %d; vec_usi_expected2[%d] = %d\n",
+	       i,  vec_usi_result2[i], i, vec_usi_expected2[i]);
+#else
+	abort ();
+#endif
+    }
+
+  disp = 4;
+  vec_si_result1 = vec_xl (zero, data_si);
+  vec_si_expected1 = (vector int){100, 101, 102, 103};
+
+  vec_si_result2 = vec_xl (disp, data_si);
+  vec_si_expected2 = (vector int){101, 102, 103, 104};
+
+  vec_ui_result1 = vec_xl (zero, data_ui);
+  vec_ui_expected1 = (vector unsigned int){101, 102, 103, 104};
+
+  vec_ui_result2 = vec_xl (disp, data_ui);
+  vec_ui_expected2 = (vector unsigned int){102, 103, 104, 105};
+
+  for (i = 0; i < 4; i++)
+    {
+      if (vec_si_result1[i] != vec_si_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_si_result1[%d] = %d; vec_si_expected1[%d] = %d\n",
+	       i,  vec_si_result1[i], i, vec_si_expected1[i]);
+#else
+	abort ();
+#endif
+      if (vec_si_result2[i] != vec_si_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_si_result2[%d] = %d; vec_si_expected2[%d] = %d\n",
+	       i,  vec_si_result2[i], i, vec_si_expected2[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_ui_result1[i] != vec_ui_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_ui_result1[%d] = %d; vec_ui_expected1[%d] = %d\n",
+	       i,  vec_ui_result1[i], i, vec_ui_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_ui_result2[i] != vec_ui_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_ui_result2[%d] = %d; vec_ui_expected1[%d] = %d\n",
+	       i,  vec_si_result2[i], i, vec_ui_expected1[i]);
+#else
+	abort ();
+#endif
+    }
+
+  disp = 8;
+  vec_sll_result1 = vec_xl (zero, data_sll);
+  vec_sll_expected1 = (vector signed long long){1000, 1001};
+
+  vec_sll_result2 = vec_xl (disp, data_sll);
+  vec_sll_expected2 = (vector signed long long){1001, 1002};
+
+  vec_ull_result1 = vec_xl (zero, data_ull);
+  vec_ull_expected1 = (vector unsigned long long){1001, 1002};
+
+  vec_ull_result2 = vec_xl (disp, data_ull);
+  vec_ull_expected2 = (vector unsigned long long){1002, 1003};
+
+  for (i = 0; i < 2; i++)
+    {
+      if (vec_sll_result1[i] != vec_sll_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_sll_result1[%d] = %lld; vec_sll_expected1[%d] = %lld\n",
+	       i,  vec_sll_result1[i], i, vec_sll_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_sll_result2[i] != vec_sll_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_sll_result2[%d] = %lld; vec_sll_expected2[%d] = %lld\n",
+	       i,  vec_sll_result2[i], i, vec_sll_expected2[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_ull_result1[i] != vec_ull_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_ull_result1[%d] = %lld; vec_ull_expected1[%d] = %lld\n",
+	       i,  vec_ull_result1[i], i, vec_ull_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_ull_result2[i] != vec_ull_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_ull_result2[%d] = %lld; vec_ull_expected2[%d] = %lld\n",
+	       i,  vec_ull_result2[i], i, vec_ull_expected2[i]);
+#else
+	abort ();
+#endif
+    }
+
+  disp = 4;
+  vec_f_result1 = vec_xl (zero, data_f);
+  vec_f_expected1 = (vector float){100000.0, 100001.0, 100002.0, 100003.0};
+
+  vec_f_result2 = vec_xl (disp, data_f);
+  vec_f_expected2 = (vector float){100001.0, 100002.0, 100003.0, 100004.0};
+
+  for (i = 0; i < 4; i++)
+    {
+      if (vec_f_result1[i] != vec_f_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_f_result1[%d] = %f; vec_f_expected1[%d] = %f\n",
+	       i,  vec_f_result1[i], i, vec_f_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_f_result2[i] != vec_f_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_f_result2[%d] = %f; vec_f_expected2[%d] = %f\n",
+	       i,  vec_f_result2[i], i, vec_f_expected2[i]);
+#else
+	abort ();
+#endif
+    }
+
+  disp = 8;
+  vec_d_result1 = vec_xl (zero, data_d);
+  vec_d_expected1 = (vector double){1000000.0, 1000001.0};
+
+  vec_d_result2 = vec_xl (disp, data_d);
+  vec_d_expected2 = (vector double){1000001.0, 1000002.0};
+
+  for (i = 0; i < 2; i++)
+    {
+      if (vec_d_result1[i] != vec_d_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_d_result1[%d] = %f; vec_f_expected1[%d] = %f\n",
+	       i,  vec_d_result1[i], i, vec_d_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_d_result2[i] != vec_d_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_d_result2[%d] = %f; vec_f_expected2[%d] = %f\n",
+	       i,  vec_d_result2[i], i, vec_d_expected2[i]);
+#else
+	abort ();
+#endif
+    }
+
+  vec_128_expected1 = (vector __int128_t){12800000};
+  vec_128_result1 = vec_xl (zero, data_128);
+
+  if (vec_128_expected1[0] != vec_128_result1[0])
+    {
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_128_result1[0] = %lld %llu; ",
+	       vec_128_result1[0] >> 64,
+	       vec_128_result1[0] & (__int128_t)0xFFFFFFFFFFFFFFFF);
+	printf("vec_128_expected1[0] = %lld %llu\n",
+	       vec_128_expected1[0] >> 64,
+	       vec_128_expected1[0] & (__int128_t)0xFFFFFFFFFFFFFFFF);
+#else
+	abort ();
+#endif
+    }
+
+  vec_u128_result1 = vec_xl (zero, data_u128);
+  vec_u128_expected1 = (vector __uint128_t){12800001};
+  if (vec_u128_expected1[0] != vec_u128_result1[0])
+    {
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_u128_result1[0] = %lld; ",
+	       vec_u128_result1[0] >> 64,
+	       vec_u128_result1[0] & (__int128_t)0xFFFFFFFFFFFFFFFF);
+	printf("vec_u128_expected1[0] = %lld\n",
+	       vec_u128_expected1[0] >> 64,
+	       vec_u128_expected1[0] & (__int128_t)0xFFFFFFFFFFFFFFFF);
+#else
+	abort ();
+#endif
+    }
+
+  // vec_xl_be() tests
+  disp = 1;
 #ifdef __BIG_ENDIAN__
-  printf("BIG ENDIAN\n");
   vec_c_expected1 = (vector signed char){0, 1, 2, 3, 4, 5, 6, 7,
 					 8, 9, 10, 11, 12, 13, 14, 15};
 #else
-  printf("LITTLE ENDIAN\n");
   vec_c_expected1 = (vector signed char){15, 14, 13, 12, 11, 10, 9, 8,
 					 7, 6, 5, 4, 3, 2, 1, 0};
 #endif
   vec_c_result1 = vec_xl_be (0, data_c);
 
-  disp = 1;
+
 
 #ifdef __BIG_ENDIAN__
   vec_c_expected2 = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8,
@@ -108,16 +398,36 @@ int main() {
   for (i = 0; i < 16; i++)
     {
       if (vec_c_result1[i] != vec_c_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_c_result1[%d] = %d; vec_c_expected1[%d] = %d\n",
+	       i,  vec_c_result1[i], i, vec_c_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_c_result2[i] != vec_c_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_c_result2[%d] = %d; vec_c_expected2[%d] = %d\n",
+	       i,  vec_c_result2[i], i, vec_c_expected2[i]);
+#else
+	abort ();
+#endif
 
       if (vec_uc_result1[i] != vec_uc_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_uc_result1[%d] = %d; vec_uc_expected1[%d] = %d\n",
+	       i,  vec_uc_result1[i], i, vec_uc_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_uc_result2[i] != vec_uc_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_uc_result2[%d] = %d; vec_uc_expected2[%d] = %d\n",
+	       i,  vec_uc_result1[i], i, vec_uc_expected1[i]);
+#else
+	abort ();
+#endif
     }
 
   vec_ssi_result1 = vec_xl_be (zero, data_ssi);
@@ -144,7 +454,7 @@ int main() {
 #else
   vec_usi_expected1 = (vector unsigned short){18, 17, 16, 15, 14, 13, 12, 11};
 #endif
-   
+
   disp = 2;
   vec_usi_result2 = vec_xl_be (disp, data_usi);
 
@@ -157,16 +467,36 @@ int main() {
   for (i = 0; i < 8; i++)
     {
       if (vec_ssi_result1[i] != vec_ssi_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_ssi_result1[%d] = %d; vec_ssi_expected1[%d] = %d\n",
+	       i,  vec_ssi_result1[i], i, vec_ssi_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_ssi_result2[i] != vec_ssi_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_ssi_result2[%d] = %d; vec_ssi_expected2[%d] = %d\n",
+	       i,  vec_ssi_result2[i], i, vec_ssi_expected2[i]);
+#else
+	abort ();
+#endif
 
       if (vec_usi_result1[i] != vec_usi_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_usi_result1[%d] = %d; vec_usi_expected1[%d] = %d\n",
+	       i,  vec_usi_result1[i], i, vec_usi_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_usi_result2[i] != vec_usi_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_usi_result2[%d] = %d; vec_usi_expected2[%d] = %d\n",
+	       i,  vec_usi_result2[i], i, vec_usi_expected2[i]);
+#else
+	abort ();
+#endif
     }
 
   vec_si_result1 = vec_xl_be (zero, data_si);
@@ -207,16 +537,36 @@ int main() {
   for (i = 0; i < 4; i++)
     {
       if (vec_si_result1[i] != vec_si_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_si_result1[%d] = %d; vec_si_expected1[%d] = %d\n",
+	       i,  vec_si_result1[i], i, vec_si_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_si_result2[i] != vec_si_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_si_result2[%d] = %d; vec_si_expected2[%d] = %d\n",
+	       i,  vec_si_result2[i], i, vec_si_expected2[i]);
+#else
+	abort ();
+#endif
 
       if (vec_ui_result1[i] != vec_ui_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_ui_result1[%d] = %d; vec_ui_expected1[%d] = %d\n",
+	       i,  vec_ui_result1[i], i, vec_ui_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_ui_result2[i] != vec_ui_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_ui_result2[%d] = %d; vec_ui_expected2[%d] = %d\n",
+	       i,  vec_ui_result2[i], i, vec_ui_expected2[i]);
+#else
+	abort ();
+#endif
     }
 
   vec_sll_result1 = vec_xl_be (zero, data_sll);
@@ -257,16 +607,36 @@ int main() {
   for (i = 0; i < 2; i++)
     {
       if (vec_sll_result1[i] != vec_sll_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_sll_result1[%d] = %lld; vec_sll_expected1[%d] = %d\n",
+	       i,  vec_sll_result1[i], i, vec_sll_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_sll_result2[i] != vec_sll_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_sll_result2[%d] = %lld; vec_sll_expected2[%d] = %d\n",
+	       i,  vec_sll_result2[i], i, vec_sll_expected2[i]);
+#else
 	abort ();
+#endif
 
       if (vec_ull_result1[i] != vec_ull_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_ull_result1[%d] = %lld; vec_ull_expected1[%d] = %d\n",
+	       i,  vec_ull_result1[i], i, vec_ull_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_ull_result2[i] != vec_ull_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_ull_result2[%d] = %lld; vec_ull_expected2[%d] = %d\n",
+	       i,  vec_ull_result2[i], i, vec_sll_expected2[i]);
+#else
+	abort ();
+#endif
     }
 
   vec_f_result1 = vec_xl_be (zero, data_f);
@@ -289,9 +659,20 @@ int main() {
   for (i = 0; i < 4; i++)
     {
       if (vec_f_result1[i] != vec_f_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_f_result1[%d] = %f; vec_f_expected1[%d] = %f\n",
+	       i,  vec_f_result1[i], i, vec_f_expected1[i]);
+#else
+	abort ();
+#endif
+
       if (vec_f_result2[i] != vec_f_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_f_result2[%d] = %f; vec_f_expected2[%d] = %f\n",
+	       i,  vec_f_result2[i], i, vec_f_expected2[i]);
+#else
+	abort ();
+#endif
     }
 
   vec_d_result1 = vec_xl_be (zero, data_d);
@@ -314,8 +695,63 @@ int main() {
   for (i = 0; i < 2; i++)
     {
       if (vec_d_result1[i] != vec_d_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_d_result2[%d] = %f; vec_d_expected2[%d] = %f\n",
+	       i,  vec_d_result2[i], i, vec_d_expected2[i]);
+#else
+	abort ();
+#endif
+
       if (vec_d_result2[i] != vec_d_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_d_result2[%d] = %f; vec_d_expected2[%d] = %f\n",
+	       i,  vec_d_result2[i], i, vec_d_expected2[i]);
+#else
+	abort ();
+#endif
+    }
+
+  disp = 0;
+  vec_128_result1 = vec_xl_be (zero, data_128);
+#ifdef __BIG_ENDIAN__
+  vec_128_expected1 = (vector __int128_t){ (__int128_t)12800000 };
+#else
+  vec_128_expected1 = (vector __int128_t){ (__int128_t)12800000 };
+#endif
+
+  if (vec_128_expected1[0] != vec_128_result1[0])
+    {
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_128_result1[0] = %llu %llu;",
+	       vec_128_result1[0] >> 64,
+	       vec_128_result1[0] & 0xFFFFFFFFFFFFFFFF);
+	printf(" vec_128_expected1[0] = %llu %llu\n",
+	       vec_128_expected1[0] >> 64,
+	       vec_128_expected1[0] & 0xFFFFFFFFFFFFFFFF);
+#else
+      abort ();
+#endif
+    }
+
+#ifdef __BIG_ENDIAN__
+  vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)12800001 };
+#else
+  vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)12800001 };
+#endif
+
+  vec_u128_result1 = vec_xl_be (zero, data_u128);
+
+  if (vec_u128_expected1[0] != vec_u128_result1[0])
+    {
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_u128_result1[0] = %llu %llu;",
+	       vec_u128_result1[0] >> 64,
+	       vec_u128_result1[0] & 0xFFFFFFFFFFFFFFFF);
+	printf(" vec_u128_expected1[0] = %llu %llu\n",
+	       vec_u128_expected1[0] >> 64,
+	       vec_u128_expected1[0] & 0xFFFFFFFFFFFFFFFF);
+#else
+      abort ();
+#endif
     }
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-6-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-6-runnable.c
new file mode 100644
index 000000000..5d313124b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-6-runnable.c
@@ -0,0 +1,1001 @@
+/* { dg-do run { target { powerpc*-*-* && { lp64 && p8vector_hw } } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O3" } */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <inttypes.h>
+#include <altivec.h>
+
+#define TRUE 1
+#define FALSE 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+void abort (void);
+
+int result_wrong_sc (vector signed char vec_expected,
+		     vector signed char vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 16; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_sc (vector signed char vec_expected,
+	       vector signed char vec_actual)
+{
+  int i;
+
+  printf("expected signed char data\n");
+  for (i = 0; i < 16; i++)
+    printf(" %d,", vec_expected[i]);
+
+  printf("\nactual signed char data\n");
+  for (i = 0; i < 16; i++)
+    printf(" %d,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_uc (vector unsigned char vec_expected,
+		     vector unsigned char vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 16; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_uc (vector unsigned char vec_expected,
+	       vector unsigned char vec_actual)
+{
+  int i;
+
+  printf("expected signed char data\n");
+  for (i = 0; i < 16; i++)
+    printf(" %d,", vec_expected[i]);
+
+  printf("\nactual signed char data\n");
+  for (i = 0; i < 16; i++)
+    printf(" %d,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_us (vector unsigned short vec_expected,
+		     vector unsigned short vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 8; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_us (vector unsigned short vec_expected,
+	       vector unsigned short vec_actual)
+{
+  int i;
+
+  printf("expected unsigned short data\n");
+  for (i = 0; i < 8; i++)
+    printf(" %d,", vec_expected[i]);
+
+  printf("\nactual unsigned short data\n");
+  for (i = 0; i < 8; i++)
+    printf(" %d,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_ss (vector signed short vec_expected,
+		     vector signed short vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 8; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_ss (vector signed short vec_expected,
+	       vector signed short vec_actual)
+{
+  int i;
+
+  printf("expected signed short data\n");
+  for (i = 0; i < 8; i++)
+    printf(" %d,", vec_expected[i]);
+
+  printf("\nactual signed short data\n");
+  for (i = 0; i < 8; i++)
+    printf(" %d,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_ui (vector unsigned int vec_expected,
+		     vector unsigned int vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 4; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_ui (vector unsigned int vec_expected,
+	       vector unsigned int vec_actual)
+{
+  int i;
+
+  printf("expected unsigned int data\n");
+  for (i = 0; i < 4; i++)
+    printf(" %d,", vec_expected[i]);
+
+  printf("\nactual unsigned int data\n");
+  for (i = 0; i < 4; i++)
+    printf(" %d,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_si (vector signed int vec_expected,
+		     vector signed int vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 4; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_si (vector signed int vec_expected,
+	       vector signed int vec_actual)
+{
+  int i;
+
+  printf("expected signed int data\n");
+  for (i = 0; i < 4; i++)
+    printf(" %d,", vec_expected[i]);
+
+  printf("\nactual signed int data\n");
+  for (i = 0; i < 4; i++)
+    printf(" %d,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_ull (vector unsigned long long vec_expected,
+		      vector unsigned long long vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_ull (vector unsigned long long vec_expected,
+		vector unsigned long long vec_actual)
+{
+  int i;
+
+  printf("expected unsigned long long data\n");
+  for (i = 0; i < 2; i++)
+	  //    printf(" %llu,", vec_expected[i]);
+    printf(" 0x%llx,", vec_expected[i]);
+
+  printf("\nactual unsigned long long data\n");
+  for (i = 0; i < 2; i++)
+	  //    printf(" %llu,", vec_actual[i]);
+    printf("0x %llx,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_sll (vector signed long long vec_expected,
+		      vector signed long long vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_sll (vector signed long long vec_expected,
+		vector signed long long vec_actual)
+{
+  int i;
+
+  printf("expected signed long long data\n");
+  for (i = 0; i < 2; i++)
+    printf(" %lld,", vec_expected[i]);
+
+  printf("\nactual signed long long data\n");
+  for (i = 0; i < 2; i++)
+    printf(" %lld,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_u128 (vector __uint128_t vec_expected,
+		       vector __uint128_t vec_actual)
+{
+  int i;
+
+    if (vec_expected[0] != vec_actual[0])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_u128 (vector __uint128_t vec_expected,
+		 vector __uint128_t vec_actual)
+{
+  printf("expected uint128 data\n");
+  printf(" %llu%llu\n", (unsigned long long)(vec_expected[0] >> 64),
+	 (unsigned long long)(vec_expected[0] & 0xFFFFFFFFFFFFFFFF));
+
+  printf("\nactual uint128 data\n");
+  printf(" %llu%llu\n", (unsigned long long)(vec_actual[0] >> 64),
+	 (unsigned long long)(vec_actual[0] & 0xFFFFFFFFFFFFFFFF));
+}
+
+
+int result_wrong_s128 (vector __int128_t vec_expected,
+		       vector __int128_t vec_actual)
+{
+  int i;
+
+    if (vec_expected[0] != vec_actual[0])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_s128 (vector __int128 vec_expected,
+		 vector __int128 vec_actual)
+{
+  printf("expected int128 data\n");
+  printf(" %lld%llu\n", (signed long long)(vec_expected[0] >> 64),
+	 (unsigned long long)(vec_expected[0] & 0xFFFFFFFFFFFFFFFF));
+
+  printf("\nactual int128 data\n");
+  printf(" %lld%llu\n", (signed long long)(vec_actual[0] >> 64),
+	 (unsigned long long)(vec_actual[0] & 0xFFFFFFFFFFFFFFFF));
+}
+
+int result_wrong_d (vector double vec_expected,
+		    vector double vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_d (vector double vec_expected,
+	      vector double vec_actual)
+{
+  int i;
+
+  printf("expected double data\n");
+  for (i = 0; i < 2; i++)
+    printf(" %f,", vec_expected[i]);
+
+  printf("\nactual double data\n");
+  for (i = 0; i < 2; i++)
+    printf(" %f,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_f (vector float vec_expected,
+		    vector float vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 4; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_f (vector float vec_expected,
+	      vector float vec_actual)
+{
+  int i;
+
+  printf("expected float data\n");
+  for (i = 0; i < 4; i++)
+    printf(" %f,", vec_expected[i]);
+
+  printf("\nactual float data\n");
+  for (i = 0; i < 4; i++)
+    printf(" %f,", vec_actual[i]);
+  printf("\n");
+}
+
+int main() {
+   int i, j;
+   size_t len;
+   vector signed char store_data_sc;
+   vector unsigned char store_data_uc;
+   vector signed int store_data_si;
+   vector unsigned int store_data_ui;
+   vector __int128_t store_data_s128;
+   vector __uint128_t store_data_u128;
+   vector signed long long int store_data_sll;
+   vector unsigned long long int store_data_ull;
+   vector signed short store_data_ss;
+   vector unsigned short store_data_us;
+   vector double store_data_d;
+   vector float store_data_f;
+
+   signed char *address_sc;
+   unsigned char *address_uc;
+   signed int *address_si;
+   unsigned int *address_ui;
+   __int128_t *address_s128;
+   __uint128_t *address_u128;
+   signed long long int *address_sll;
+   unsigned long long int *address_ull;
+   signed short int *address_ss;
+   unsigned short int *address_us;
+   double *address_d;
+   float *address_f;
+
+   vector unsigned char *datap;
+
+   vector unsigned char vec_uc_expected1, vec_uc_result1;
+   vector signed char vec_sc_expected1, vec_sc_result1;
+   vector signed int vec_si_expected1, vec_si_result1;
+   vector unsigned int vec_ui_expected1, vec_ui_result1;
+   vector __int128_t vec_s128_expected1, vec_s128_result1;
+   vector __uint128_t vec_u128_expected1, vec_u128_result1;
+   vector signed long long int vec_sll_expected1, vec_sll_result1;
+   vector unsigned long long int vec_ull_expected1, vec_ull_result1;
+   vector signed short int vec_ss_expected1, vec_ss_result1;
+   vector unsigned short int vec_us_expected1, vec_us_result1;
+   vector double vec_d_expected1, vec_d_result1;
+   vector float vec_f_expected1, vec_f_result1;
+
+   signed long long disp;
+
+   /* VEC_XST */
+   disp = 0;
+   vec_sc_expected1 = (vector signed char){ -7, -6, -5, -4, -3, -2, -1, 0,
+					    1, 2, 3, 4, 5, 6, 7, 8 };
+   store_data_sc = (vector signed char){  -7, -6, -5, -4, -3, -2, -1, 0,
+					  1, 2, 3, 4, 5, 6, 7, 8 };
+
+   for (i=0; i<16; i++)
+     vec_sc_result1[i] = 0;
+
+   address_sc = &vec_sc_result1[0];
+
+   vec_xst (store_data_sc, disp, address_sc);
+
+   if (result_wrong_sc (vec_sc_expected1, vec_sc_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, sc disp = 0, result does not match expected result\n");
+       print_sc (vec_sc_expected1, vec_sc_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 2;
+   vec_sc_expected1 = (vector signed char){  0, 0, -7, -6, -5, -4, -3, -2,
+					     -1, 0, 1, 2, 3, 4, 5, 6 };
+   store_data_sc = (vector signed char){ -7, -6, -5, -4, -3, -2, -1, 0,
+					 1, 2, 3, 4, 5, 6, 7, 8 };
+
+   for (i=0; i<16; i++)
+     vec_sc_result1[i] = 0;
+
+   address_sc = &vec_sc_result1[0];
+
+   vec_xst (store_data_sc, disp, address_sc);
+
+   if (result_wrong_sc (vec_sc_expected1, vec_sc_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, sc disp = 2, result does not match expected result\n");
+       print_sc (vec_sc_expected1, vec_sc_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_uc_expected1 = (vector unsigned char){ 0, 1, 2, 3, 4, 5, 6, 7,
+					      8, 9, 10, 11, 12, 13, 14, 15 };
+   store_data_uc = (vector unsigned char){ 0, 1, 2, 3, 4, 5, 6, 7,
+					   8, 9, 10, 11, 12, 13, 14, 15 };
+
+   for (i=0; i<16; i++)
+     vec_uc_result1[i] = 0;
+
+   address_uc = &vec_uc_result1[0];
+
+   vec_xst (store_data_uc, disp, address_uc);
+
+   if (result_wrong_uc (vec_uc_expected1, vec_uc_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, uc disp = 0, result does not match expected result\n");
+       print_uc (vec_uc_expected1, vec_uc_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_ss_expected1 = (vector signed short int){ -4, -3, -2, -1, 0, 1, 2, 3 };
+   store_data_ss = (vector signed short int){ -4, -3, -2, -1, 0, 1, 2, 3 };
+
+   for (i=0; i<8; i++)
+     vec_ss_result1[i] = 0;
+
+   address_ss = &vec_ss_result1[0];
+
+   vec_xst (store_data_ss, disp, address_ss);
+
+   if (result_wrong_ss (vec_ss_expected1, vec_ss_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, ss disp = 0, result does not match expected result\n");
+       print_ss (vec_ss_expected1, vec_ss_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_us_expected1 = (vector unsigned short int){ 0, 1, 2, 3, 4, 5, 6, 7 };
+   store_data_us = (vector unsigned short int){ 0, 1, 2, 3, 4, 5, 6, 7 };
+
+   for (i=0; i<8; i++)
+     vec_us_result1[i] = 0;
+
+   address_us = &vec_us_result1[0];
+
+   vec_xst (store_data_us, disp, address_us);
+
+   if (result_wrong_us (vec_us_expected1, vec_us_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, us disp = 0, result does not match expected result\n");
+       print_us (vec_us_expected1, vec_us_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_si_expected1 = (vector signed int){ -2, -1, 0, 1 };
+   store_data_si = (vector signed int){ -2, -1, 0, 1 };
+
+   for (i=0; i<4; i++)
+     vec_si_result1[i] = 0;
+
+   address_si = &vec_si_result1[0];
+
+   vec_xst (store_data_si, disp, address_si);
+
+   if (result_wrong_si (vec_si_expected1, vec_si_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, si disp = 0, result does not match expected result\n");
+       print_si (vec_si_expected1, vec_si_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_ui_expected1 = (vector unsigned int){ -2, -1, 0, 1 };
+   store_data_ui = (vector unsigned int){ -2, -1, 0, 1 };
+
+   for (i=0; i<4; i++)
+     vec_ui_result1[i] = 0;
+
+   address_ui = &vec_ui_result1[0];
+
+   vec_xst (store_data_ui, disp, address_ui);
+
+   if (result_wrong_ui (vec_ui_expected1, vec_ui_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, ui disp = 0, result does not match expected result\n");
+       print_ui (vec_ui_expected1, vec_ui_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_sll_expected1 = (vector signed long long){ -1, 0 };
+   store_data_sll = (vector signed long long ){ -1, 0 };
+
+   for (i=0; i<2; i++)
+     vec_sll_result1[i] = 0;
+
+   address_sll = (signed long long *)(&vec_sll_result1[0]);
+
+   vec_xst (store_data_sll, disp, address_sll);
+
+   if (result_wrong_sll (vec_sll_expected1, vec_sll_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, sll disp = 0, result does not match expected result\n");
+       print_sll (vec_sll_expected1, vec_sll_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_ull_expected1 = (vector unsigned long long){ 0, 1 };
+   store_data_ull = (vector unsigned long long){  0, 1 };
+
+   for (i=0; i<2; i++)
+     vec_ull_result1[i] = 0;
+
+   address_ull = (unsigned long long int *)(&vec_ull_result1[0]);
+
+   vec_xst (store_data_ull, disp, address_ull);
+
+   if (result_wrong_ull (vec_ull_expected1, vec_ull_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, ull disp = 0, result does not match expected result\n");
+       print_ull (vec_ull_expected1, vec_ull_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_s128_expected1 = (vector __int128_t){ 12345 };
+   store_data_s128 = (vector __int128_t){  12345 };
+
+   vec_s128_result1[0] = 0;
+
+   address_s128 = (__int128_t *)(&vec_s128_result1[0]);
+
+   vec_xst (store_data_s128, disp, address_s128);
+
+   if (result_wrong_s128 (vec_s128_expected1, vec_s128_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, s128 disp = 0, result does not match expected result\n");
+       print_s128 (vec_s128_expected1, vec_s128_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_u128_expected1 = (vector __uint128_t){ 12345 };
+   store_data_u128 = (vector __uint128_t){  12345 };
+
+   vec_u128_result1[0] = 0;
+
+   address_u128 = (__int128_t *)(&vec_u128_result1[0]);
+
+   vec_xst (store_data_u128, disp, address_u128);
+
+   if (result_wrong_u128 (vec_u128_expected1, vec_u128_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, u128 disp = 0, result does not match expected result\n");
+       print_u128 (vec_u128_expected1, vec_u128_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_d_expected1 = (vector double){ 0, 1 };
+   store_data_d = (vector double){  0, 1 };
+
+   for (i=0; i<2; i++)
+     vec_d_result1[i] = 0;
+
+   address_d = (double *)(&vec_d_result1[0]);
+
+   vec_xst (store_data_d, disp, address_d);
+
+   if (result_wrong_d (vec_d_expected1, vec_d_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, double disp = 0, result does not match expected result\n");
+       print_d (vec_d_expected1, vec_d_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_f_expected1 = (vector float){ 0, 1 };
+   store_data_f = (vector float){  0, 1 };
+
+   for (i=0; i<4; i++)
+     vec_f_result1[i] = 0;
+
+   address_f = (float *)(&vec_f_result1[0]);
+
+   vec_xst (store_data_f, disp, address_f);
+
+   if (result_wrong_f (vec_f_expected1, vec_f_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, float disp = 0, result does not match expected result\n");
+       print_f (vec_f_expected1, vec_f_result1);
+#else
+       abort();
+#endif
+     }
+
+   /* VEC_XST_BE, these always load in BE order regardless of
+      machine endianess.  */
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_sc_expected1 = (vector signed char){ -7, -6, -5, -4, -3, -2, -1, 0,
+					    1, 2, 3, 4, 5, 6, 7, 8 };
+#else
+   vec_sc_expected1 = (vector signed char){ 8, 7, 6, 5, 4, 3, 2, 1,
+					    0, -1, -2, -3, -4, -5, -6, -7 };
+#endif
+   store_data_sc = (vector signed char){  -7, -6, -5, -4, -3, -2, -1, 0,
+					  1, 2, 3, 4, 5, 6, 7, 8 };
+
+   for (i=0; i<16; i++)
+     vec_sc_result1[i] = 0;
+
+   address_sc = &vec_sc_result1[0];
+
+   vec_xst_be (store_data_sc, disp, address_sc);
+
+   if (result_wrong_sc (vec_sc_expected1, vec_sc_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, sc disp = 0, result does not match expected result\n");
+       print_sc (vec_sc_expected1, vec_sc_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 2;
+#ifdef __BIG_ENDIAN__
+   vec_sc_expected1 = (vector signed char){  0, 0, -7, -6, -5, -4, -3, -2,
+					     -1, 0, 1, 2, 3, 4, 5, 6 };
+#else
+   vec_sc_expected1 = (vector signed char){  0, 0, 8, 7, 6, 5, 4, 3,
+					     2, 1, 0, -1, -2, -3, -4, -5 };
+#endif
+   store_data_sc = (vector signed char){ -7, -6, -5, -4, -3, -2, -1, 0,
+					 1, 2, 3, 4, 5, 6, 7, 8 };
+
+   for (i=0; i<16; i++)
+     vec_sc_result1[i] = 0;
+
+   address_sc = &vec_sc_result1[0];
+
+   vec_xst_be (store_data_sc, disp, address_sc);
+
+   if (result_wrong_sc (vec_sc_expected1, vec_sc_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, sc disp = 2, result does not match expected result\n");
+       print_sc (vec_sc_expected1, vec_sc_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_uc_expected1 = (vector unsigned char){ 0, 1, 2, 3, 4, 5, 6, 7,
+					      8, 9, 10, 11, 12, 13, 14, 15 };
+#else
+   vec_uc_expected1 = (vector unsigned char){ 15, 14, 13, 12, 11, 10, 9, 8,
+					      7, 6, 5, 4, 3, 2, 1 };
+#endif
+   store_data_uc = (vector unsigned char){ 0, 1, 2, 3, 4, 5, 6, 7,
+					   8, 9, 10, 11, 12, 13, 14, 15 };
+
+   for (i=0; i<16; i++)
+     vec_uc_result1[i] = 0;
+
+   address_uc = &vec_uc_result1[0];
+
+   vec_xst_be (store_data_uc, disp, address_uc);
+
+   if (result_wrong_uc (vec_uc_expected1, vec_uc_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, uc disp = 0, result does not match expected result\n");
+       print_uc (vec_uc_expected1, vec_uc_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_ss_expected1 = (vector signed short int){ -4, -3, -2, -1, 0, 1, 2, 3 };
+#else
+   vec_ss_expected1 = (vector signed short int){ 3, 2, 1, 0, -1, -2, -3, -4 };
+#endif
+   store_data_ss = (vector signed short int){ -4, -3, -2, -1, 0, 1, 2, 3 };
+
+   for (i=0; i<8; i++)
+     vec_ss_result1[i] = 0;
+
+   address_ss = &vec_ss_result1[0];
+
+   vec_xst_be (store_data_ss, disp, address_ss);
+
+   if (result_wrong_ss (vec_ss_expected1, vec_ss_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, ss disp = 0, result does not match expected result\n");
+       print_ss (vec_ss_expected1, vec_ss_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_us_expected1 = (vector unsigned short int){ 0, 1, 2, 3, 4, 5, 6, 7 };
+#else
+   vec_us_expected1 = (vector unsigned short int){ 7, 6, 5, 4, 3, 2, 1, 0 };
+#endif
+   store_data_us = (vector unsigned short int){ 0, 1, 2, 3, 4, 5, 6, 7 };
+
+   for (i=0; i<8; i++)
+     vec_us_result1[i] = 0;
+
+   address_us = &vec_us_result1[0];
+
+   vec_xst_be (store_data_us, disp, address_us);
+
+   if (result_wrong_us (vec_us_expected1, vec_us_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, us disp = 0, result does not match expected result\n");
+       print_us (vec_us_expected1, vec_us_result1);
+#else
+       abort();
+#endif
+     }
+
+#if 0
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_si_expected1 = (vector signed int){ -2, -1, 0, 1 };
+#else
+   vec_si_expected1 = (vector signed int){ 1, 0, -1, -2 };
+#endif
+   store_data_si = (vector signed int){ -2, -1, 0, 1 };
+
+   for (i=0; i<4; i++)
+     vec_si_result1[i] = 0;
+
+   address_si = &vec_si_result1[0];
+
+   vec_xst_be (store_data_si, disp, address_si);
+   if (result_wrong_si (vec_si_expected1, vec_si_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, si disp = 0, result does not match expected result\n");
+       print_si (vec_si_expected1, vec_si_result1);
+#else
+       abort();
+#endif
+     }
+#endif
+
+#if 0
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_ui_expected1 = (vector unsigned int){ -2, -1, 0, 1 };
+#else
+   vec_ui_expected1 = (vector unsigned int){ 1, 0, -1, -2 };
+#endif
+   store_data_ui = (vector unsigned int){ -2, -1, 0, 1 };
+
+   for (i=0; i<4; i++)
+     vec_ui_result1[i] = 0;
+
+   address_ui = &vec_ui_result1[0];
+
+   vec_xst_be (store_data_ui, disp, address_ui);
+
+   if (result_wrong_ui (vec_ui_expected1, vec_ui_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, ui disp = 0, result does not match expected result\n");
+       print_ui (vec_ui_expected1, vec_ui_result1);
+#else
+       abort();
+#endif
+     }
+#endif
+   
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_sll_expected1 = (vector signed long long){ -1, 0 };
+#else
+   vec_sll_expected1 = (vector signed long long){ 0, -1 };
+#endif
+   store_data_sll = (vector signed long long ){ -1, 0 };
+
+   for (i=0; i<2; i++)
+     vec_sll_result1[i] = 0;
+
+   address_sll = (signed long long *)(&vec_sll_result1[0]);
+
+   vec_xst_be (store_data_sll, disp, address_sll);
+
+   if (result_wrong_sll (vec_sll_expected1, vec_sll_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, sll disp = 0, result does not match expected result\n");
+       print_sll (vec_sll_expected1, vec_sll_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_ull_expected1 = (vector unsigned long long){ 0, 1234567890123456 };
+#else
+   vec_ull_expected1 = (vector unsigned long long){1234567890123456, 0 };
+#endif   
+   store_data_ull = (vector unsigned long long){  0, 1234567890123456 };
+
+   for (i=0; i<2; i++)
+     vec_ull_result1[i] = 0;
+
+   address_ull = (unsigned long long int *)(&vec_ull_result1[0]);
+
+   vec_xst_be (store_data_ull, disp, address_ull);
+
+   if (result_wrong_ull (vec_ull_expected1, vec_ull_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, ull disp = 0, result does not match expected result\n");
+       print_ull (vec_ull_expected1, vec_ull_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+
+#ifdef __BIG_ENDIAN__
+   vec_s128_expected1 = (vector __int128_t){ (__uint128_t)12345678911121314 };
+#else
+   vec_s128_expected1 = (vector __int128_t){ (__uint128_t)12345678911121314 };
+#endif
+   store_data_s128 = (vector __int128_t)(__uint128_t){  12345678911121314 };
+
+   vec_s128_result1[0] = 0;
+
+   address_s128 = (__int128_t *)(&vec_s128_result1[0]);
+
+   vec_xst_be (store_data_s128, disp, address_s128);
+
+   if (result_wrong_s128 (vec_s128_expected1, vec_s128_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, s128 disp = 0, result does not match expected result\n");
+       print_s128 (vec_s128_expected1, vec_s128_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)1234567891112131415 };
+#else
+   vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)1234567891112131415 };
+#endif
+   store_data_u128 = (vector __uint128_t){ (__uint128_t)1234567891112131415 };
+
+   vec_u128_result1[0] = 0;
+
+   address_u128 = (__int128_t *)(&vec_u128_result1[0]);
+
+   vec_xst_be (store_data_u128, disp, address_u128);
+
+   if (result_wrong_u128 (vec_u128_expected1, vec_u128_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, u128 disp = 0, result does not match expected result\n");
+       print_u128 (vec_u128_expected1, vec_u128_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_d_expected1 = (vector double){ 0.0, 1.1 };
+#else
+   vec_d_expected1 = (vector double){ 1.1, 0.0 };
+#endif
+   store_data_d = (vector double){  0.0, 1.1 };
+
+   for (i=0; i<2; i++)
+     vec_d_result1[i] = 0;
+
+   address_d = (double *)(&vec_d_result1[0]);
+
+   vec_xst_be (store_data_d, disp, address_d);
+
+   if (result_wrong_d (vec_d_expected1, vec_d_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, double disp = 0, result does not match expected result\n");
+       print_d (vec_d_expected1, vec_d_result1);
+#else
+       abort();
+#endif
+     }
+
+#if 0
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_f_expected1 = (vector float){ 0.0, 1.2, 2.3, 3.4 };
+#else
+   vec_f_expected1 = (vector float){ 3.4, 2.3, 1.2, 0.0 };
+#endif
+   store_data_f = (vector float){ 0.0, 1.2, 2.3, 3.4 };
+
+   for (i=0; i<4; i++)
+     vec_f_result1[i] = 0;
+
+   address_f = (float *)(&vec_f_result1[0]);
+
+   vec_xst_be (store_data_f, disp, address_f);
+
+   if (result_wrong_f (vec_f_expected1, vec_f_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, float disp = 0, result does not match expected result\n");
+       print_f (vec_f_expected1, vec_f_result1);
+#else
+       abort();
+#endif
+     }
+#endif
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/powerpc.exp b/gcc/testsuite/gcc.target/powerpc/powerpc.exp
index 93b3239b3..148acb1a1 100644
--- a/gcc/testsuite/gcc.target/powerpc/powerpc.exp
+++ b/gcc/testsuite/gcc.target/powerpc/powerpc.exp
@@ -49,4 +49,16 @@ gcc-dg-runtest [list $srcdir/$subdir/savres.c] "" $alti
 
 # All done.
 torture-finish
+
+torture-init 
+# Test load/store builtins at multiple optimizations
+set-torture-options [list -O0 -Os -O1 -O2 -O3]
+gcc-dg-runtest [list $srcdir/$subdir/builtins-4-runnable.c \
+		$srcdir/$subdir/builtins-6-runnable.c \
+		$srcdir/$subdir/builtins-5-p9-runnable.c \
+	       	$srcdir/$subdir/builtins-6-p9-runnable.c] "" $DEFAULT_CFLAGS
+
+# All done.
+torture-finish
+
 dg-finish
-- 
2.11.0

Patch

diff --git a/gcc/config/rs6000/rs6000-builtin.def b/gcc/config/rs6000/rs6000-builtin.def
index b17036c5a..757fd6d50 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1234,6 +1234,7 @@  BU_ALTIVEC_X (LVXL_V8HI,	"lvxl_v8hi",	    MEM)
 BU_ALTIVEC_X (LVXL_V16QI,	"lvxl_v16qi",	    MEM)
 BU_ALTIVEC_X (LVX,		"lvx",		    MEM)
 BU_ALTIVEC_X (LVX_V2DF,		"lvx_v2df",	    MEM)
+BU_ALTIVEC_X (LVX_V1TI,		"lvx_v1ti",	    MEM)
 BU_ALTIVEC_X (LVX_V2DI,		"lvx_v2di",	    MEM)
 BU_ALTIVEC_X (LVX_V4SF,		"lvx_v4sf",	    MEM)
 BU_ALTIVEC_X (LVX_V4SI,		"lvx_v4si",	    MEM)
@@ -1783,12 +1784,14 @@  BU_VSX_X (STXVW4X_V4SF,	      "stxvw4x_v4sf",	MEM)
 BU_VSX_X (STXVW4X_V4SI,	      "stxvw4x_v4si",	MEM)
 BU_VSX_X (STXVW4X_V8HI,	      "stxvw4x_v8hi",	MEM)
 BU_VSX_X (STXVW4X_V16QI,      "stxvw4x_v16qi",	MEM)
+BU_VSX_X (LD_ELEMREV_V1TI,    "ld_elemrev_v1ti",  MEM)
 BU_VSX_X (LD_ELEMREV_V2DF,    "ld_elemrev_v2df",  MEM)
 BU_VSX_X (LD_ELEMREV_V2DI,    "ld_elemrev_v2di",  MEM)
 BU_VSX_X (LD_ELEMREV_V4SF,    "ld_elemrev_v4sf",  MEM)
 BU_VSX_X (LD_ELEMREV_V4SI,    "ld_elemrev_v4si",  MEM)
 BU_VSX_X (LD_ELEMREV_V8HI,    "ld_elemrev_v8hi",  MEM)
 BU_VSX_X (LD_ELEMREV_V16QI,   "ld_elemrev_v16qi", MEM)
+BU_VSX_X (ST_ELEMREV_V1TI,    "st_elemrev_v1ti",  MEM)
 BU_VSX_X (ST_ELEMREV_V2DF,    "st_elemrev_v2df",  MEM)
 BU_VSX_X (ST_ELEMREV_V2DI,    "st_elemrev_v2di",  MEM)
 BU_VSX_X (ST_ELEMREV_V4SF,    "st_elemrev_v4sf",  MEM)
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 123e46aa1..a0f790d39 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -3149,16 +3149,27 @@  const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DF,
     RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_INTTI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_V1TI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTTI, 0 },
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_INTDI, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
     ~RS6000_BTI_unsigned_V2DI, 0 },
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
     ~RS6000_BTI_unsigned_long_long, 0 },
+  { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTDI, 0 },
+
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SF,
     RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
   { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SF,
@@ -3193,6 +3204,10 @@  const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 },
   { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DF,
     RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 },
+  { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V1TI,
+    RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_INTTI, 0 },
+  { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V1TI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTTI, 0 },
   { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DI,
     RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DI,
@@ -4076,6 +4091,10 @@  const struct altivec_builtin_types altivec_overloaded_builtins[] = {
     RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF },
   { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DF,
     RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double },
+  { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V1TI,
+    RS6000_BTI_void, RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_INTTI },
+  { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V1TI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTTI },
   { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DI,
     RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI },
   { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DI,
@@ -4177,9 +4196,19 @@  const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 },
   { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_INTTI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTTI, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
     ~RS6000_BTI_unsigned_V2DI, 0 },
   { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
+    RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_long_long, 0 },
+  { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVD2X_V2DI,
     RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V2DI, 0 },
   { VSX_BUILTIN_VEC_LD, VSX_BUILTIN_LXVW4X_V4SF,
     RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 },
@@ -4231,6 +4260,16 @@  const struct altivec_builtin_types altivec_overloaded_builtins[] = {
   { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DF,
     RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double },
   { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
+    RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTDI,
+    ~RS6000_BTI_long_long },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTDI,
+    ~RS6000_BTI_unsigned_long_long },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V1TI,
+    RS6000_BTI_void, RS6000_BTI_V1TI, RS6000_BTI_INTDI, ~RS6000_BTI_INTTI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V1TI,
+    RS6000_BTI_void, RS6000_BTI_unsigned_V1TI, RS6000_BTI_INTDI, ~RS6000_BTI_UINTTI },
+  { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
     RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI },
   { VSX_BUILTIN_VEC_ST, VSX_BUILTIN_STXVD2X_V2DI,
     RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI,
diff --git a/gcc/config/rs6000/rs6000-p8swap.c b/gcc/config/rs6000/rs6000-p8swap.c
index cb88ffbb5..f5f046720 100644
--- a/gcc/config/rs6000/rs6000-p8swap.c
+++ b/gcc/config/rs6000/rs6000-p8swap.c
@@ -734,10 +734,11 @@  insn_is_swappable_p (swap_web_entry *insn_entry, rtx insn,
   if (insn_entry[i].is_store)
     {
       if (GET_CODE (body) == SET
-	  && GET_CODE (SET_SRC (body)) != UNSPEC)
+	  && GET_CODE (SET_SRC (body)) != UNSPEC
+	  && GET_CODE (SET_SRC (body)) != VEC_SELECT)
 	{
 	  rtx lhs = SET_DEST (body);
-	  /* Even without a swap, the LHS might be a vec_select for, say,
+	  /* Even without a swap, the RHS might be a vec_select for, say,
 	     a byte-reversing store.  */
 	  if (GET_CODE (lhs) != MEM)
 	    return 0;
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 46e00dd9a..f0b396abd 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15572,6 +15572,12 @@  altivec_expand_builtin (tree exp, rtx target, bool *expandedp)
        unaligned-supporting store, so use a generic expander.  For
        little-endian, the exact element-reversing instruction must
        be used.  */
+   case VSX_BUILTIN_ST_ELEMREV_V1TI:
+     {
+       enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v1ti
+			      : CODE_FOR_vsx_st_elemrev_v1ti);
+       return altivec_expand_stv_builtin (code, exp);
+      }
     case VSX_BUILTIN_ST_ELEMREV_V2DF:
       {
 	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_store_v2df
@@ -15846,6 +15852,12 @@  altivec_expand_builtin (tree exp, rtx target, bool *expandedp)
 			       : CODE_FOR_vsx_ld_elemrev_v2df);
 	return altivec_expand_lv_builtin (code, exp, target, false);
       }
+    case VSX_BUILTIN_LD_ELEMREV_V1TI:
+      {
+	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v1ti
+			       : CODE_FOR_vsx_ld_elemrev_v1ti);
+	return altivec_expand_lv_builtin (code, exp, target, false);
+      }
     case VSX_BUILTIN_LD_ELEMREV_V2DI:
       {
 	enum insn_code code = (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_load_v2di
@@ -17383,6 +17395,10 @@  altivec_init_builtins (void)
     = build_function_type_list (void_type_node,
 				V2DF_type_node, long_integer_type_node,
 				pvoid_type_node, NULL_TREE);
+  tree void_ftype_v1ti_long_pvoid
+    = build_function_type_list (void_type_node,
+				V1TI_type_node, long_integer_type_node,
+				pvoid_type_node, NULL_TREE);
   tree void_ftype_v2di_long_pvoid
     = build_function_type_list (void_type_node,
 				V2DI_type_node, long_integer_type_node,
@@ -17538,6 +17554,8 @@  altivec_init_builtins (void)
 	       VSX_BUILTIN_LD_ELEMREV_V16QI);
   def_builtin ("__builtin_vsx_st_elemrev_v2df", void_ftype_v2df_long_pvoid,
 	       VSX_BUILTIN_ST_ELEMREV_V2DF);
+  def_builtin ("__builtin_vsx_st_elemrev_v1ti", void_ftype_v1ti_long_pvoid,
+	       VSX_BUILTIN_ST_ELEMREV_V1TI);
   def_builtin ("__builtin_vsx_st_elemrev_v2di", void_ftype_v2di_long_pvoid,
 	       VSX_BUILTIN_ST_ELEMREV_V2DI);
   def_builtin ("__builtin_vsx_st_elemrev_v4sf", void_ftype_v4sf_long_pvoid,
@@ -17861,6 +17879,8 @@  altivec_init_builtins (void)
 	= build_function_type_list (void_type_node,
 				    V1TI_type_node, long_integer_type_node,
 				    pvoid_type_node, NULL_TREE);
+      def_builtin ("__builtin_vsx_ld_elemrev_v1ti", v1ti_ftype_long_pcvoid,
+		   VSX_BUILTIN_LD_ELEMREV_V1TI);
       def_builtin ("__builtin_vsx_lxvd2x_v1ti", v1ti_ftype_long_pcvoid,
 		   VSX_BUILTIN_LXVD2X_V1TI);
       def_builtin ("__builtin_vsx_stxvd2x_v1ti", void_ftype_v1ti_long_pvoid,
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 03f8ec2d6..75256afd8 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1093,6 +1093,18 @@  (define_insn "vsx_ld_elemrev_v2di"
   "lxvd2x %x0,%y1"
   [(set_attr "type" "vecload")])
 
+(define_insn "vsx_ld_elemrev_v1ti"
+  [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa")
+        (vec_select:V1TI
+	  (match_operand:V1TI 1 "memory_operand" "Z")
+	  (parallel [(const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V1TImode) && !BYTES_BIG_ENDIAN"
+{
+  return "lxvd2x %x0,%y1; xxpermdi %x0,%x0,%x0,2";
+}
+  [(set_attr "type" "vecload")
+   (set_attr "length" "8")])
+
 (define_insn "vsx_ld_elemrev_v2df"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa")
         (vec_select:V2DF
@@ -1222,6 +1234,18 @@  (define_insn "*vsx_ld_elemrev_v16qi_internal"
   "lxvb16x %x0,%y1"
   [(set_attr "type" "vecload")])
 
+(define_insn "vsx_st_elemrev_v1ti"
+  [(set (match_operand:V1TI 0 "memory_operand" "=Z")
+        (vec_select:V1TI
+          (match_operand:V1TI 1 "vsx_register_operand" "wa")
+          (parallel [(const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V2DImode) && !BYTES_BIG_ENDIAN"
+{
+  return "xxpermdi %x1,%x1,%x1,2; stxvd2x %x1,%y0";
+}
+  [(set_attr "type" "vecstore")
+   (set_attr "length" "8")])
+
 (define_insn "vsx_st_elemrev_v2df"
   [(set (match_operand:V2DF 0 "memory_operand" "=Z")
         (vec_select:V2DF
@@ -1272,7 +1296,7 @@  (define_expand "vsx_st_elemrev_v8hi"
 {
   if (!TARGET_P9_VECTOR)
     {
-      rtx subreg, perm[16], pcv;
+      rtx mem_subreg, subreg, perm[16], pcv;
       rtx tmp = gen_reg_rtx (V8HImode);
       /* 2 is leftmost element in register */
       unsigned int reorder[16] = {13,12,15,14,9,8,11,10,5,4,7,6,1,0,3,2};
@@ -1287,11 +1311,21 @@  (define_expand "vsx_st_elemrev_v8hi"
       emit_insn (gen_altivec_vperm_v8hi_direct (tmp, operands[1],
                                                 operands[1], pcv));
       subreg = simplify_gen_subreg (V4SImode, tmp, V8HImode, 0);
-      emit_insn (gen_vsx_st_elemrev_v4si (subreg, operands[0]));
+      mem_subreg = simplify_gen_subreg (V4SImode, operands[0], V8HImode, 0);
+      emit_insn (gen_vsx_st_elemrev_v4si (mem_subreg, subreg));
       DONE;
     }
 })
 
+(define_insn "*vsx_st_elemrev_v2di_internal"
+  [(set (match_operand:V2DI 0 "memory_operand" "=Z")
+        (vec_select:V2DI
+          (match_operand:V2DI 1 "vsx_register_operand" "wa")
+          (parallel [(const_int 1) (const_int 0)])))]
+  "VECTOR_MEM_VSX_P (V2DImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR"
+  "stxvd2x %x1,%y0"
+  [(set_attr "type" "vecstore")])
+
 (define_insn "*vsx_st_elemrev_v8hi_internal"
   [(set (match_operand:V8HI 0 "memory_operand" "=Z")
         (vec_select:V8HI
@@ -1320,7 +1354,7 @@  (define_expand "vsx_st_elemrev_v16qi"
 {
   if (!TARGET_P9_VECTOR)
     {
-      rtx subreg, perm[16], pcv;
+      rtx mem_subreg, subreg, perm[16], pcv;
       rtx tmp = gen_reg_rtx (V16QImode);
       /* 3 is leftmost element in register */
       unsigned int reorder[16] = {12,13,14,15,8,9,10,11,4,5,6,7,0,1,2,3};
@@ -1335,7 +1369,8 @@  (define_expand "vsx_st_elemrev_v16qi"
       emit_insn (gen_altivec_vperm_v16qi_direct (tmp, operands[1],
                                                  operands[1], pcv));
       subreg = simplify_gen_subreg (V4SImode, tmp, V16QImode, 0);
-      emit_insn (gen_vsx_st_elemrev_v4si (subreg, operands[0]));
+      mem_subreg = simplify_gen_subreg (V4SImode, operands[0], V16QImode, 0);
+      emit_insn (gen_vsx_st_elemrev_v4si (mem_subreg, subreg));
       DONE;
     }
 })
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c
index ed37424ca..de9b916de 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-4-runnable.c
@@ -1,10 +1,13 @@ 
 /* { dg-do run } */
 /* { dg-require-effective-target vsx_hw } */
-/* { dg-options "-maltivec -mvsx" } */  
+/* { dg-options "-maltivec -mvsx" } */
 
 #include <inttypes.h>
 #include <altivec.h> // vector
+
+#ifdef DEBUG
 #include <stdio.h>
+#endif
 
 void abort (void);
 
@@ -24,9 +27,11 @@  int main() {
 
   float data_f[100];
   double data_d[100];
-   
+  __uint128_t data_u128[100];
+  __int128_t data_128[100];
+
   signed long long disp;
-   
+
   vector signed char vec_c_expected1, vec_c_expected2, vec_c_result1, vec_c_result2;
   vector unsigned char vec_uc_expected1, vec_uc_expected2,
     vec_uc_result1, vec_uc_result2;
@@ -42,11 +47,13 @@  int main() {
     vec_sll_result1, vec_sll_result2;
   vector unsigned long long vec_ull_expected1, vec_ull_expected2,
     vec_ull_result1, vec_ull_result2;
+  vector __int128_t vec_128_expected1, vec_128_result1;
+  vector __uint128_t vec_u128_expected1, vec_u128_result1;
   vector float vec_f_expected1, vec_f_expected2, vec_f_result1, vec_f_result2;
   vector double vec_d_expected1, vec_d_expected2, vec_d_result1, vec_d_result2;
   char buf[20];
   signed long long zero = (signed long long) 0;
-  
+
   for (i = 0; i < 100; i++)
     {
       data_c[i] = i;
@@ -59,21 +66,304 @@  int main() {
       data_ull[i] = i+1001;
       data_f[i] = i+100000.0;
       data_d[i] = i+1000000.0;
+      data_128[i] = i + 12800000;
+      data_u128[i] = i + 12800001;
     }
-  
-  disp = 0;
+
+  // vec_xl() tests
+  disp = 1;
+
+  vec_c_expected1 = (vector signed char){0, 1, 2, 3, 4, 5, 6, 7,
+					 8, 9, 10, 11, 12, 13, 14, 15};
+  vec_c_result1 = vec_xl (0, data_c);
+
+  vec_c_expected2 = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8, 9,
+					 10, 11, 12, 13, 14, 15, 16};
+  vec_c_result2 = vec_xl (disp, data_c);
+
+  vec_uc_expected1 = (vector unsigned char){1, 2, 3, 4, 5, 6, 7, 8, 9,
+					    10, 11, 12, 13, 14, 15, 16};
+  vec_uc_result1 = vec_xl (0, data_uc);
+
+  vec_uc_expected2 = (vector unsigned char){2, 3, 4, 5, 6, 7, 8, 9, 10,
+					    11, 12, 13, 14, 15, 16, 17};
+  vec_uc_result2 = vec_xl (disp, data_uc);
+
+  for (i = 0; i < 16; i++)
+    {
+      if (vec_c_result1[i] != vec_c_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_c_result1[%d] = %d; vec_c_expected1[%d] = %d\n",
+	       i,  vec_c_result1[i], i, vec_c_expected1[i]);
+#else
+	abort ();
+#endif
+      if (vec_c_result2[i] != vec_c_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_c_result2[%d] = %d; vec_c_expected2[%d] = %d\n",
+	       i,  vec_c_result2[i], i, vec_c_expected2[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_uc_result1[i] != vec_uc_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_uc_result1[%d] = %d; vec_uc_expected1[%d] = %d\n",
+	       i,  vec_uc_result1[i], i, vec_uc_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_uc_result2[i] != vec_uc_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_uc_result2[%d] = %d; vec_uc_expected2[%d] = %d\n",
+	       i,  vec_uc_result2[i], i, vec_uc_expected2[i]);
+#else
+	abort ();
+#endif
+    }
+
+  disp = 2;
+  vec_ssi_expected1 = (vector signed short){10, 11, 12, 13, 14, 15, 16, 17};
+
+  vec_ssi_result1 = vec_xl (0, data_ssi);
+
+  vec_ssi_expected2 = (vector signed short){11, 12, 13, 14, 15, 16, 17, 18};
+  vec_ssi_result2 = vec_xl (disp, data_ssi);
+
+  vec_usi_expected1 = (vector unsigned short){11, 12, 13, 14, 15, 16, 17, 18};
+  vec_usi_result1 = vec_xl (0, data_usi);
+
+  vec_usi_expected2 = (vector unsigned short){12, 13, 14, 15, 16, 17, 18, 19};
+  vec_usi_result2 = vec_xl (disp, data_usi);
+
+
+  for (i = 0; i < 8; i++)
+    {
+      if (vec_ssi_result1[i] != vec_ssi_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_ssi_result1[%d] = %d; vec_ssi_expected1[%d] = %d\n",
+	       i,  vec_ssi_result1[i], i, vec_ssi_expected1[i]);
+#else
+	abort ();
+#endif
+      if (vec_ssi_result2[i] != vec_ssi_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_ssi_result2[%d] = %d; vec_ssi_expected2[%d] = %d\n",
+	       i,  vec_ssi_result2[i], i, vec_ssi_expected2[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_usi_result1[i] != vec_usi_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_usi_result1[%d] = %d; vec_usi_expected1[%d] = %d\n",
+	       i,  vec_usi_result1[i], i, vec_usi_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_usi_result2[i] != vec_usi_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_usi_result2[%d] = %d; vec_usi_expected2[%d] = %d\n",
+	       i,  vec_usi_result2[i], i, vec_usi_expected2[i]);
+#else
+	abort ();
+#endif
+    }
+
+  disp = 4;
+  vec_si_result1 = vec_xl (zero, data_si);
+  vec_si_expected1 = (vector int){100, 101, 102, 103};
+
+  vec_si_result2 = vec_xl (disp, data_si);
+  vec_si_expected2 = (vector int){101, 102, 103, 104};
+
+  vec_ui_result1 = vec_xl (zero, data_ui);
+  vec_ui_expected1 = (vector unsigned int){101, 102, 103, 104};
+
+  vec_ui_result2 = vec_xl (disp, data_ui);
+  vec_ui_expected2 = (vector unsigned int){102, 103, 104, 105};
+
+  for (i = 0; i < 4; i++)
+    {
+      if (vec_si_result1[i] != vec_si_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_si_result1[%d] = %d; vec_si_expected1[%d] = %d\n",
+	       i,  vec_si_result1[i], i, vec_si_expected1[i]);
+#else
+	abort ();
+#endif
+      if (vec_si_result2[i] != vec_si_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_si_result2[%d] = %d; vec_si_expected2[%d] = %d\n",
+	       i,  vec_si_result2[i], i, vec_si_expected2[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_ui_result1[i] != vec_ui_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_ui_result1[%d] = %d; vec_ui_expected1[%d] = %d\n",
+	       i,  vec_ui_result1[i], i, vec_ui_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_ui_result2[i] != vec_ui_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_ui_result2[%d] = %d; vec_ui_expected1[%d] = %d\n",
+	       i,  vec_si_result2[i], i, vec_ui_expected1[i]);
+#else
+	abort ();
+#endif
+    }
+
+  disp = 8;
+  vec_sll_result1 = vec_xl (zero, data_sll);
+  vec_sll_expected1 = (vector signed long long){1000, 1001};
+
+  vec_sll_result2 = vec_xl (disp, data_sll);
+  vec_sll_expected2 = (vector signed long long){1001, 1002};
+
+  vec_ull_result1 = vec_xl (zero, data_ull);
+  vec_ull_expected1 = (vector unsigned long long){1001, 1002};
+
+  vec_ull_result2 = vec_xl (disp, data_ull);
+  vec_ull_expected2 = (vector unsigned long long){1002, 1003};
+
+  for (i = 0; i < 2; i++)
+    {
+      if (vec_sll_result1[i] != vec_sll_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_sll_result1[%d] = %lld; vec_sll_expected1[%d] = %lld\n",
+	       i,  vec_sll_result1[i], i, vec_sll_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_sll_result2[i] != vec_sll_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_sll_result2[%d] = %lld; vec_sll_expected2[%d] = %lld\n",
+	       i,  vec_sll_result2[i], i, vec_sll_expected2[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_ull_result1[i] != vec_ull_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_ull_result1[%d] = %lld; vec_ull_expected1[%d] = %lld\n",
+	       i,  vec_ull_result1[i], i, vec_ull_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_ull_result2[i] != vec_ull_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_ull_result2[%d] = %lld; vec_ull_expected2[%d] = %lld\n",
+	       i,  vec_ull_result2[i], i, vec_ull_expected2[i]);
+#else
+	abort ();
+#endif
+    }
+
+  disp = 4;
+  vec_f_result1 = vec_xl (zero, data_f);
+  vec_f_expected1 = (vector float){100000.0, 100001.0, 100002.0, 100003.0};
+
+  vec_f_result2 = vec_xl (disp, data_f);
+  vec_f_expected2 = (vector float){100001.0, 100002.0, 100003.0, 100004.0};
+
+  for (i = 0; i < 4; i++)
+    {
+      if (vec_f_result1[i] != vec_f_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_f_result1[%d] = %f; vec_f_expected1[%d] = %f\n",
+	       i,  vec_f_result1[i], i, vec_f_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_f_result2[i] != vec_f_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_f_result2[%d] = %f; vec_f_expected2[%d] = %f\n",
+	       i,  vec_f_result2[i], i, vec_f_expected2[i]);
+#else
+	abort ();
+#endif
+    }
+
+  disp = 8;
+  vec_d_result1 = vec_xl (zero, data_d);
+  vec_d_expected1 = (vector double){1000000.0, 1000001.0};
+
+  vec_d_result2 = vec_xl (disp, data_d);
+  vec_d_expected2 = (vector double){1000001.0, 1000002.0};
+
+  for (i = 0; i < 2; i++)
+    {
+      if (vec_d_result1[i] != vec_d_expected1[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_d_result1[%d] = %f; vec_f_expected1[%d] = %f\n",
+	       i,  vec_d_result1[i], i, vec_d_expected1[i]);
+#else
+	abort ();
+#endif
+
+      if (vec_d_result2[i] != vec_d_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_d_result2[%d] = %f; vec_f_expected2[%d] = %f\n",
+	       i,  vec_d_result2[i], i, vec_d_expected2[i]);
+#else
+	abort ();
+#endif
+    }
+
+  vec_128_expected1 = (vector __int128_t){12800000};
+  vec_128_result1 = vec_xl (zero, data_128);
+
+  if (vec_128_expected1[0] != vec_128_result1[0])
+    {
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_128_result1[0] = %lld %llu; ",
+	       vec_128_result1[0] >> 64,
+	       vec_128_result1[0] & (__int128_t)0xFFFFFFFFFFFFFFFF);
+	printf("vec_128_expected1[0] = %lld %llu\n",
+	       vec_128_expected1[0] >> 64,
+	       vec_128_expected1[0] & (__int128_t)0xFFFFFFFFFFFFFFFF);
+#else
+	abort ();
+#endif
+    }
+
+  vec_u128_result1 = vec_xl (zero, data_u128);
+  vec_u128_expected1 = (vector __uint128_t){12800001};
+  if (vec_u128_expected1[0] != vec_u128_result1[0])
+    {
+#ifdef DEBUG
+	printf("Error: vec_xl(), vec_u128_result1[0] = %lld; ",
+	       vec_u128_result1[0] >> 64,
+	       vec_u128_result1[0] & (__int128_t)0xFFFFFFFFFFFFFFFF);
+	printf("vec_u128_expected1[0] = %lld\n",
+	       vec_u128_expected1[0] >> 64,
+	       vec_u128_expected1[0] & (__int128_t)0xFFFFFFFFFFFFFFFF);
+#else
+	abort ();
+#endif
+    }
+
+  // vec_xl_be() tests
+  disp = 1;
 #ifdef __BIG_ENDIAN__
-  printf("BIG ENDIAN\n");
   vec_c_expected1 = (vector signed char){0, 1, 2, 3, 4, 5, 6, 7,
 					 8, 9, 10, 11, 12, 13, 14, 15};
 #else
-  printf("LITTLE ENDIAN\n");
   vec_c_expected1 = (vector signed char){15, 14, 13, 12, 11, 10, 9, 8,
 					 7, 6, 5, 4, 3, 2, 1, 0};
 #endif
   vec_c_result1 = vec_xl_be (0, data_c);
 
-  disp = 1;
+
 
 #ifdef __BIG_ENDIAN__
   vec_c_expected2 = (vector signed char){1, 2, 3, 4, 5, 6, 7, 8,
@@ -108,16 +398,36 @@  int main() {
   for (i = 0; i < 16; i++)
     {
       if (vec_c_result1[i] != vec_c_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_c_result1[%d] = %d; vec_c_expected1[%d] = %d\n",
+	       i,  vec_c_result1[i], i, vec_c_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_c_result2[i] != vec_c_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_c_result2[%d] = %d; vec_c_expected2[%d] = %d\n",
+	       i,  vec_c_result2[i], i, vec_c_expected2[i]);
+#else
+	abort ();
+#endif
 
       if (vec_uc_result1[i] != vec_uc_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_uc_result1[%d] = %d; vec_uc_expected1[%d] = %d\n",
+	       i,  vec_uc_result1[i], i, vec_uc_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_uc_result2[i] != vec_uc_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_uc_result2[%d] = %d; vec_uc_expected2[%d] = %d\n",
+	       i,  vec_uc_result1[i], i, vec_uc_expected1[i]);
+#else
+	abort ();
+#endif
     }
 
   vec_ssi_result1 = vec_xl_be (zero, data_ssi);
@@ -144,7 +454,7 @@  int main() {
 #else
   vec_usi_expected1 = (vector unsigned short){18, 17, 16, 15, 14, 13, 12, 11};
 #endif
-   
+
   disp = 2;
   vec_usi_result2 = vec_xl_be (disp, data_usi);
 
@@ -157,16 +467,36 @@  int main() {
   for (i = 0; i < 8; i++)
     {
       if (vec_ssi_result1[i] != vec_ssi_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_ssi_result1[%d] = %d; vec_ssi_expected1[%d] = %d\n",
+	       i,  vec_ssi_result1[i], i, vec_ssi_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_ssi_result2[i] != vec_ssi_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_ssi_result2[%d] = %d; vec_ssi_expected2[%d] = %d\n",
+	       i,  vec_ssi_result2[i], i, vec_ssi_expected2[i]);
+#else
+	abort ();
+#endif
 
       if (vec_usi_result1[i] != vec_usi_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_usi_result1[%d] = %d; vec_usi_expected1[%d] = %d\n",
+	       i,  vec_usi_result1[i], i, vec_usi_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_usi_result2[i] != vec_usi_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_usi_result2[%d] = %d; vec_usi_expected2[%d] = %d\n",
+	       i,  vec_usi_result2[i], i, vec_usi_expected2[i]);
+#else
+	abort ();
+#endif
     }
 
   vec_si_result1 = vec_xl_be (zero, data_si);
@@ -207,16 +537,36 @@  int main() {
   for (i = 0; i < 4; i++)
     {
       if (vec_si_result1[i] != vec_si_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_si_result1[%d] = %d; vec_si_expected1[%d] = %d\n",
+	       i,  vec_si_result1[i], i, vec_si_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_si_result2[i] != vec_si_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_si_result2[%d] = %d; vec_si_expected2[%d] = %d\n",
+	       i,  vec_si_result2[i], i, vec_si_expected2[i]);
+#else
+	abort ();
+#endif
 
       if (vec_ui_result1[i] != vec_ui_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_ui_result1[%d] = %d; vec_ui_expected1[%d] = %d\n",
+	       i,  vec_ui_result1[i], i, vec_ui_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_ui_result2[i] != vec_ui_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_ui_result2[%d] = %d; vec_ui_expected2[%d] = %d\n",
+	       i,  vec_ui_result2[i], i, vec_ui_expected2[i]);
+#else
+	abort ();
+#endif
     }
 
   vec_sll_result1 = vec_xl_be (zero, data_sll);
@@ -257,16 +607,36 @@  int main() {
   for (i = 0; i < 2; i++)
     {
       if (vec_sll_result1[i] != vec_sll_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_sll_result1[%d] = %lld; vec_sll_expected1[%d] = %d\n",
+	       i,  vec_sll_result1[i], i, vec_sll_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_sll_result2[i] != vec_sll_expected2[i])
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_sll_result2[%d] = %lld; vec_sll_expected2[%d] = %d\n",
+	       i,  vec_sll_result2[i], i, vec_sll_expected2[i]);
+#else
 	abort ();
+#endif
 
       if (vec_ull_result1[i] != vec_ull_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_ull_result1[%d] = %lld; vec_ull_expected1[%d] = %d\n",
+	       i,  vec_ull_result1[i], i, vec_ull_expected1[i]);
+#else
+	abort ();
+#endif
 
       if (vec_ull_result2[i] != vec_ull_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_ull_result2[%d] = %lld; vec_ull_expected2[%d] = %d\n",
+	       i,  vec_ull_result2[i], i, vec_sll_expected2[i]);
+#else
+	abort ();
+#endif
     }
 
   vec_f_result1 = vec_xl_be (zero, data_f);
@@ -289,9 +659,20 @@  int main() {
   for (i = 0; i < 4; i++)
     {
       if (vec_f_result1[i] != vec_f_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_f_result1[%d] = %f; vec_f_expected1[%d] = %f\n",
+	       i,  vec_f_result1[i], i, vec_f_expected1[i]);
+#else
+	abort ();
+#endif
+
       if (vec_f_result2[i] != vec_f_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_f_result2[%d] = %f; vec_f_expected2[%d] = %f\n",
+	       i,  vec_f_result2[i], i, vec_f_expected2[i]);
+#else
+	abort ();
+#endif
     }
 
   vec_d_result1 = vec_xl_be (zero, data_d);
@@ -314,8 +695,63 @@  int main() {
   for (i = 0; i < 2; i++)
     {
       if (vec_d_result1[i] != vec_d_expected1[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_d_result2[%d] = %f; vec_d_expected2[%d] = %f\n",
+	       i,  vec_d_result2[i], i, vec_d_expected2[i]);
+#else
+	abort ();
+#endif
+
       if (vec_d_result2[i] != vec_d_expected2[i])
-        abort ();
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_d_result2[%d] = %f; vec_d_expected2[%d] = %f\n",
+	       i,  vec_d_result2[i], i, vec_d_expected2[i]);
+#else
+	abort ();
+#endif
+    }
+
+  disp = 0;
+  vec_128_result1 = vec_xl_be (zero, data_128);
+#ifdef __BIG_ENDIAN__
+  vec_128_expected1 = (vector __int128_t){ (__int128_t)12800000 };
+#else
+  vec_128_expected1 = (vector __int128_t){ (__int128_t)12800000 };
+#endif
+
+  if (vec_128_expected1[0] != vec_128_result1[0])
+    {
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_128_result1[0] = %llu %llu;",
+	       vec_128_result1[0] >> 64,
+	       vec_128_result1[0] & 0xFFFFFFFFFFFFFFFF);
+	printf(" vec_128_expected1[0] = %llu %llu\n",
+	       vec_128_expected1[0] >> 64,
+	       vec_128_expected1[0] & 0xFFFFFFFFFFFFFFFF);
+#else
+      abort ();
+#endif
+    }
+
+#ifdef __BIG_ENDIAN__
+  vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)12800001 };
+#else
+  vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)12800001 };
+#endif
+
+  vec_u128_result1 = vec_xl_be (zero, data_u128);
+
+  if (vec_u128_expected1[0] != vec_u128_result1[0])
+    {
+#ifdef DEBUG
+	printf("Error: vec_xl_be(), vec_u128_result1[0] = %llu %llu;",
+	       vec_u128_result1[0] >> 64,
+	       vec_u128_result1[0] & 0xFFFFFFFFFFFFFFFF);
+	printf(" vec_u128_expected1[0] = %llu %llu\n",
+	       vec_u128_expected1[0] >> 64,
+	       vec_u128_expected1[0] & 0xFFFFFFFFFFFFFFFF);
+#else
+      abort ();
+#endif
     }
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-6-runnable.c b/gcc/testsuite/gcc.target/powerpc/builtins-6-runnable.c
new file mode 100644
index 000000000..5d313124b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-6-runnable.c
@@ -0,0 +1,1001 @@ 
+/* { dg-do run { target { powerpc*-*-* && { lp64 && p8vector_hw } } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O3" } */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <inttypes.h>
+#include <altivec.h>
+
+#define TRUE 1
+#define FALSE 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+void abort (void);
+
+int result_wrong_sc (vector signed char vec_expected,
+		     vector signed char vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 16; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_sc (vector signed char vec_expected,
+	       vector signed char vec_actual)
+{
+  int i;
+
+  printf("expected signed char data\n");
+  for (i = 0; i < 16; i++)
+    printf(" %d,", vec_expected[i]);
+
+  printf("\nactual signed char data\n");
+  for (i = 0; i < 16; i++)
+    printf(" %d,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_uc (vector unsigned char vec_expected,
+		     vector unsigned char vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 16; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_uc (vector unsigned char vec_expected,
+	       vector unsigned char vec_actual)
+{
+  int i;
+
+  printf("expected signed char data\n");
+  for (i = 0; i < 16; i++)
+    printf(" %d,", vec_expected[i]);
+
+  printf("\nactual signed char data\n");
+  for (i = 0; i < 16; i++)
+    printf(" %d,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_us (vector unsigned short vec_expected,
+		     vector unsigned short vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 8; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_us (vector unsigned short vec_expected,
+	       vector unsigned short vec_actual)
+{
+  int i;
+
+  printf("expected unsigned short data\n");
+  for (i = 0; i < 8; i++)
+    printf(" %d,", vec_expected[i]);
+
+  printf("\nactual unsigned short data\n");
+  for (i = 0; i < 8; i++)
+    printf(" %d,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_ss (vector signed short vec_expected,
+		     vector signed short vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 8; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_ss (vector signed short vec_expected,
+	       vector signed short vec_actual)
+{
+  int i;
+
+  printf("expected signed short data\n");
+  for (i = 0; i < 8; i++)
+    printf(" %d,", vec_expected[i]);
+
+  printf("\nactual signed short data\n");
+  for (i = 0; i < 8; i++)
+    printf(" %d,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_ui (vector unsigned int vec_expected,
+		     vector unsigned int vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 4; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_ui (vector unsigned int vec_expected,
+	       vector unsigned int vec_actual)
+{
+  int i;
+
+  printf("expected unsigned int data\n");
+  for (i = 0; i < 4; i++)
+    printf(" %d,", vec_expected[i]);
+
+  printf("\nactual unsigned int data\n");
+  for (i = 0; i < 4; i++)
+    printf(" %d,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_si (vector signed int vec_expected,
+		     vector signed int vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 4; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_si (vector signed int vec_expected,
+	       vector signed int vec_actual)
+{
+  int i;
+
+  printf("expected signed int data\n");
+  for (i = 0; i < 4; i++)
+    printf(" %d,", vec_expected[i]);
+
+  printf("\nactual signed int data\n");
+  for (i = 0; i < 4; i++)
+    printf(" %d,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_ull (vector unsigned long long vec_expected,
+		      vector unsigned long long vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_ull (vector unsigned long long vec_expected,
+		vector unsigned long long vec_actual)
+{
+  int i;
+
+  printf("expected unsigned long long data\n");
+  for (i = 0; i < 2; i++)
+	  //    printf(" %llu,", vec_expected[i]);
+    printf(" 0x%llx,", vec_expected[i]);
+
+  printf("\nactual unsigned long long data\n");
+  for (i = 0; i < 2; i++)
+	  //    printf(" %llu,", vec_actual[i]);
+    printf("0x %llx,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_sll (vector signed long long vec_expected,
+		      vector signed long long vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_sll (vector signed long long vec_expected,
+		vector signed long long vec_actual)
+{
+  int i;
+
+  printf("expected signed long long data\n");
+  for (i = 0; i < 2; i++)
+    printf(" %lld,", vec_expected[i]);
+
+  printf("\nactual signed long long data\n");
+  for (i = 0; i < 2; i++)
+    printf(" %lld,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_u128 (vector __uint128_t vec_expected,
+		       vector __uint128_t vec_actual)
+{
+  int i;
+
+    if (vec_expected[0] != vec_actual[0])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_u128 (vector __uint128_t vec_expected,
+		 vector __uint128_t vec_actual)
+{
+  printf("expected uint128 data\n");
+  printf(" %llu%llu\n", (unsigned long long)(vec_expected[0] >> 64),
+	 (unsigned long long)(vec_expected[0] & 0xFFFFFFFFFFFFFFFF));
+
+  printf("\nactual uint128 data\n");
+  printf(" %llu%llu\n", (unsigned long long)(vec_actual[0] >> 64),
+	 (unsigned long long)(vec_actual[0] & 0xFFFFFFFFFFFFFFFF));
+}
+
+
+int result_wrong_s128 (vector __int128_t vec_expected,
+		       vector __int128_t vec_actual)
+{
+  int i;
+
+    if (vec_expected[0] != vec_actual[0])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_s128 (vector __int128 vec_expected,
+		 vector __int128 vec_actual)
+{
+  printf("expected int128 data\n");
+  printf(" %lld%llu\n", (signed long long)(vec_expected[0] >> 64),
+	 (unsigned long long)(vec_expected[0] & 0xFFFFFFFFFFFFFFFF));
+
+  printf("\nactual int128 data\n");
+  printf(" %lld%llu\n", (signed long long)(vec_actual[0] >> 64),
+	 (unsigned long long)(vec_actual[0] & 0xFFFFFFFFFFFFFFFF));
+}
+
+int result_wrong_d (vector double vec_expected,
+		    vector double vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 2; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_d (vector double vec_expected,
+	      vector double vec_actual)
+{
+  int i;
+
+  printf("expected double data\n");
+  for (i = 0; i < 2; i++)
+    printf(" %f,", vec_expected[i]);
+
+  printf("\nactual double data\n");
+  for (i = 0; i < 2; i++)
+    printf(" %f,", vec_actual[i]);
+  printf("\n");
+}
+
+int result_wrong_f (vector float vec_expected,
+		    vector float vec_actual)
+{
+  int i;
+
+  for (i = 0; i < 4; i++)
+    if (vec_expected[i] != vec_actual[i])
+      return TRUE;
+
+  return FALSE;
+}
+
+void print_f (vector float vec_expected,
+	      vector float vec_actual)
+{
+  int i;
+
+  printf("expected float data\n");
+  for (i = 0; i < 4; i++)
+    printf(" %f,", vec_expected[i]);
+
+  printf("\nactual float data\n");
+  for (i = 0; i < 4; i++)
+    printf(" %f,", vec_actual[i]);
+  printf("\n");
+}
+
+int main() {
+   int i, j;
+   size_t len;
+   vector signed char store_data_sc;
+   vector unsigned char store_data_uc;
+   vector signed int store_data_si;
+   vector unsigned int store_data_ui;
+   vector __int128_t store_data_s128;
+   vector __uint128_t store_data_u128;
+   vector signed long long int store_data_sll;
+   vector unsigned long long int store_data_ull;
+   vector signed short store_data_ss;
+   vector unsigned short store_data_us;
+   vector double store_data_d;
+   vector float store_data_f;
+
+   signed char *address_sc;
+   unsigned char *address_uc;
+   signed int *address_si;
+   unsigned int *address_ui;
+   __int128_t *address_s128;
+   __uint128_t *address_u128;
+   signed long long int *address_sll;
+   unsigned long long int *address_ull;
+   signed short int *address_ss;
+   unsigned short int *address_us;
+   double *address_d;
+   float *address_f;
+
+   vector unsigned char *datap;
+
+   vector unsigned char vec_uc_expected1, vec_uc_result1;
+   vector signed char vec_sc_expected1, vec_sc_result1;
+   vector signed int vec_si_expected1, vec_si_result1;
+   vector unsigned int vec_ui_expected1, vec_ui_result1;
+   vector __int128_t vec_s128_expected1, vec_s128_result1;
+   vector __uint128_t vec_u128_expected1, vec_u128_result1;
+   vector signed long long int vec_sll_expected1, vec_sll_result1;
+   vector unsigned long long int vec_ull_expected1, vec_ull_result1;
+   vector signed short int vec_ss_expected1, vec_ss_result1;
+   vector unsigned short int vec_us_expected1, vec_us_result1;
+   vector double vec_d_expected1, vec_d_result1;
+   vector float vec_f_expected1, vec_f_result1;
+
+   signed long long disp;
+
+   /* VEC_XST */
+   disp = 0;
+   vec_sc_expected1 = (vector signed char){ -7, -6, -5, -4, -3, -2, -1, 0,
+					    1, 2, 3, 4, 5, 6, 7, 8 };
+   store_data_sc = (vector signed char){  -7, -6, -5, -4, -3, -2, -1, 0,
+					  1, 2, 3, 4, 5, 6, 7, 8 };
+
+   for (i=0; i<16; i++)
+     vec_sc_result1[i] = 0;
+
+   address_sc = &vec_sc_result1[0];
+
+   vec_xst (store_data_sc, disp, address_sc);
+
+   if (result_wrong_sc (vec_sc_expected1, vec_sc_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, sc disp = 0, result does not match expected result\n");
+       print_sc (vec_sc_expected1, vec_sc_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 2;
+   vec_sc_expected1 = (vector signed char){  0, 0, -7, -6, -5, -4, -3, -2,
+					     -1, 0, 1, 2, 3, 4, 5, 6 };
+   store_data_sc = (vector signed char){ -7, -6, -5, -4, -3, -2, -1, 0,
+					 1, 2, 3, 4, 5, 6, 7, 8 };
+
+   for (i=0; i<16; i++)
+     vec_sc_result1[i] = 0;
+
+   address_sc = &vec_sc_result1[0];
+
+   vec_xst (store_data_sc, disp, address_sc);
+
+   if (result_wrong_sc (vec_sc_expected1, vec_sc_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, sc disp = 2, result does not match expected result\n");
+       print_sc (vec_sc_expected1, vec_sc_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_uc_expected1 = (vector unsigned char){ 0, 1, 2, 3, 4, 5, 6, 7,
+					      8, 9, 10, 11, 12, 13, 14, 15 };
+   store_data_uc = (vector unsigned char){ 0, 1, 2, 3, 4, 5, 6, 7,
+					   8, 9, 10, 11, 12, 13, 14, 15 };
+
+   for (i=0; i<16; i++)
+     vec_uc_result1[i] = 0;
+
+   address_uc = &vec_uc_result1[0];
+
+   vec_xst (store_data_uc, disp, address_uc);
+
+   if (result_wrong_uc (vec_uc_expected1, vec_uc_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, uc disp = 0, result does not match expected result\n");
+       print_uc (vec_uc_expected1, vec_uc_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_ss_expected1 = (vector signed short int){ -4, -3, -2, -1, 0, 1, 2, 3 };
+   store_data_ss = (vector signed short int){ -4, -3, -2, -1, 0, 1, 2, 3 };
+
+   for (i=0; i<8; i++)
+     vec_ss_result1[i] = 0;
+
+   address_ss = &vec_ss_result1[0];
+
+   vec_xst (store_data_ss, disp, address_ss);
+
+   if (result_wrong_ss (vec_ss_expected1, vec_ss_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, ss disp = 0, result does not match expected result\n");
+       print_ss (vec_ss_expected1, vec_ss_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_us_expected1 = (vector unsigned short int){ 0, 1, 2, 3, 4, 5, 6, 7 };
+   store_data_us = (vector unsigned short int){ 0, 1, 2, 3, 4, 5, 6, 7 };
+
+   for (i=0; i<8; i++)
+     vec_us_result1[i] = 0;
+
+   address_us = &vec_us_result1[0];
+
+   vec_xst (store_data_us, disp, address_us);
+
+   if (result_wrong_us (vec_us_expected1, vec_us_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, us disp = 0, result does not match expected result\n");
+       print_us (vec_us_expected1, vec_us_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_si_expected1 = (vector signed int){ -2, -1, 0, 1 };
+   store_data_si = (vector signed int){ -2, -1, 0, 1 };
+
+   for (i=0; i<4; i++)
+     vec_si_result1[i] = 0;
+
+   address_si = &vec_si_result1[0];
+
+   vec_xst (store_data_si, disp, address_si);
+
+   if (result_wrong_si (vec_si_expected1, vec_si_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, si disp = 0, result does not match expected result\n");
+       print_si (vec_si_expected1, vec_si_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_ui_expected1 = (vector unsigned int){ -2, -1, 0, 1 };
+   store_data_ui = (vector unsigned int){ -2, -1, 0, 1 };
+
+   for (i=0; i<4; i++)
+     vec_ui_result1[i] = 0;
+
+   address_ui = &vec_ui_result1[0];
+
+   vec_xst (store_data_ui, disp, address_ui);
+
+   if (result_wrong_ui (vec_ui_expected1, vec_ui_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, ui disp = 0, result does not match expected result\n");
+       print_ui (vec_ui_expected1, vec_ui_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_sll_expected1 = (vector signed long long){ -1, 0 };
+   store_data_sll = (vector signed long long ){ -1, 0 };
+
+   for (i=0; i<2; i++)
+     vec_sll_result1[i] = 0;
+
+   address_sll = (signed long long *)(&vec_sll_result1[0]);
+
+   vec_xst (store_data_sll, disp, address_sll);
+
+   if (result_wrong_sll (vec_sll_expected1, vec_sll_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, sll disp = 0, result does not match expected result\n");
+       print_sll (vec_sll_expected1, vec_sll_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_ull_expected1 = (vector unsigned long long){ 0, 1 };
+   store_data_ull = (vector unsigned long long){  0, 1 };
+
+   for (i=0; i<2; i++)
+     vec_ull_result1[i] = 0;
+
+   address_ull = (unsigned long long int *)(&vec_ull_result1[0]);
+
+   vec_xst (store_data_ull, disp, address_ull);
+
+   if (result_wrong_ull (vec_ull_expected1, vec_ull_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, ull disp = 0, result does not match expected result\n");
+       print_ull (vec_ull_expected1, vec_ull_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_s128_expected1 = (vector __int128_t){ 12345 };
+   store_data_s128 = (vector __int128_t){  12345 };
+
+   vec_s128_result1[0] = 0;
+
+   address_s128 = (__int128_t *)(&vec_s128_result1[0]);
+
+   vec_xst (store_data_s128, disp, address_s128);
+
+   if (result_wrong_s128 (vec_s128_expected1, vec_s128_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, s128 disp = 0, result does not match expected result\n");
+       print_s128 (vec_s128_expected1, vec_s128_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_u128_expected1 = (vector __uint128_t){ 12345 };
+   store_data_u128 = (vector __uint128_t){  12345 };
+
+   vec_u128_result1[0] = 0;
+
+   address_u128 = (__int128_t *)(&vec_u128_result1[0]);
+
+   vec_xst (store_data_u128, disp, address_u128);
+
+   if (result_wrong_u128 (vec_u128_expected1, vec_u128_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, u128 disp = 0, result does not match expected result\n");
+       print_u128 (vec_u128_expected1, vec_u128_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_d_expected1 = (vector double){ 0, 1 };
+   store_data_d = (vector double){  0, 1 };
+
+   for (i=0; i<2; i++)
+     vec_d_result1[i] = 0;
+
+   address_d = (double *)(&vec_d_result1[0]);
+
+   vec_xst (store_data_d, disp, address_d);
+
+   if (result_wrong_d (vec_d_expected1, vec_d_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, double disp = 0, result does not match expected result\n");
+       print_d (vec_d_expected1, vec_d_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+   vec_f_expected1 = (vector float){ 0, 1 };
+   store_data_f = (vector float){  0, 1 };
+
+   for (i=0; i<4; i++)
+     vec_f_result1[i] = 0;
+
+   address_f = (float *)(&vec_f_result1[0]);
+
+   vec_xst (store_data_f, disp, address_f);
+
+   if (result_wrong_f (vec_f_expected1, vec_f_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst, float disp = 0, result does not match expected result\n");
+       print_f (vec_f_expected1, vec_f_result1);
+#else
+       abort();
+#endif
+     }
+
+   /* VEC_XST_BE, these always load in BE order regardless of
+      machine endianess.  */
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_sc_expected1 = (vector signed char){ -7, -6, -5, -4, -3, -2, -1, 0,
+					    1, 2, 3, 4, 5, 6, 7, 8 };
+#else
+   vec_sc_expected1 = (vector signed char){ 8, 7, 6, 5, 4, 3, 2, 1,
+					    0, -1, -2, -3, -4, -5, -6, -7 };
+#endif
+   store_data_sc = (vector signed char){  -7, -6, -5, -4, -3, -2, -1, 0,
+					  1, 2, 3, 4, 5, 6, 7, 8 };
+
+   for (i=0; i<16; i++)
+     vec_sc_result1[i] = 0;
+
+   address_sc = &vec_sc_result1[0];
+
+   vec_xst_be (store_data_sc, disp, address_sc);
+
+   if (result_wrong_sc (vec_sc_expected1, vec_sc_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, sc disp = 0, result does not match expected result\n");
+       print_sc (vec_sc_expected1, vec_sc_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 2;
+#ifdef __BIG_ENDIAN__
+   vec_sc_expected1 = (vector signed char){  0, 0, -7, -6, -5, -4, -3, -2,
+					     -1, 0, 1, 2, 3, 4, 5, 6 };
+#else
+   vec_sc_expected1 = (vector signed char){  0, 0, 8, 7, 6, 5, 4, 3,
+					     2, 1, 0, -1, -2, -3, -4, -5 };
+#endif
+   store_data_sc = (vector signed char){ -7, -6, -5, -4, -3, -2, -1, 0,
+					 1, 2, 3, 4, 5, 6, 7, 8 };
+
+   for (i=0; i<16; i++)
+     vec_sc_result1[i] = 0;
+
+   address_sc = &vec_sc_result1[0];
+
+   vec_xst_be (store_data_sc, disp, address_sc);
+
+   if (result_wrong_sc (vec_sc_expected1, vec_sc_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, sc disp = 2, result does not match expected result\n");
+       print_sc (vec_sc_expected1, vec_sc_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_uc_expected1 = (vector unsigned char){ 0, 1, 2, 3, 4, 5, 6, 7,
+					      8, 9, 10, 11, 12, 13, 14, 15 };
+#else
+   vec_uc_expected1 = (vector unsigned char){ 15, 14, 13, 12, 11, 10, 9, 8,
+					      7, 6, 5, 4, 3, 2, 1 };
+#endif
+   store_data_uc = (vector unsigned char){ 0, 1, 2, 3, 4, 5, 6, 7,
+					   8, 9, 10, 11, 12, 13, 14, 15 };
+
+   for (i=0; i<16; i++)
+     vec_uc_result1[i] = 0;
+
+   address_uc = &vec_uc_result1[0];
+
+   vec_xst_be (store_data_uc, disp, address_uc);
+
+   if (result_wrong_uc (vec_uc_expected1, vec_uc_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, uc disp = 0, result does not match expected result\n");
+       print_uc (vec_uc_expected1, vec_uc_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_ss_expected1 = (vector signed short int){ -4, -3, -2, -1, 0, 1, 2, 3 };
+#else
+   vec_ss_expected1 = (vector signed short int){ 3, 2, 1, 0, -1, -2, -3, -4 };
+#endif
+   store_data_ss = (vector signed short int){ -4, -3, -2, -1, 0, 1, 2, 3 };
+
+   for (i=0; i<8; i++)
+     vec_ss_result1[i] = 0;
+
+   address_ss = &vec_ss_result1[0];
+
+   vec_xst_be (store_data_ss, disp, address_ss);
+
+   if (result_wrong_ss (vec_ss_expected1, vec_ss_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, ss disp = 0, result does not match expected result\n");
+       print_ss (vec_ss_expected1, vec_ss_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_us_expected1 = (vector unsigned short int){ 0, 1, 2, 3, 4, 5, 6, 7 };
+#else
+   vec_us_expected1 = (vector unsigned short int){ 7, 6, 5, 4, 3, 2, 1, 0 };
+#endif
+   store_data_us = (vector unsigned short int){ 0, 1, 2, 3, 4, 5, 6, 7 };
+
+   for (i=0; i<8; i++)
+     vec_us_result1[i] = 0;
+
+   address_us = &vec_us_result1[0];
+
+   vec_xst_be (store_data_us, disp, address_us);
+
+   if (result_wrong_us (vec_us_expected1, vec_us_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, us disp = 0, result does not match expected result\n");
+       print_us (vec_us_expected1, vec_us_result1);
+#else
+       abort();
+#endif
+     }
+
+#if 0
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_si_expected1 = (vector signed int){ -2, -1, 0, 1 };
+#else
+   vec_si_expected1 = (vector signed int){ 1, 0, -1, -2 };
+#endif
+   store_data_si = (vector signed int){ -2, -1, 0, 1 };
+
+   for (i=0; i<4; i++)
+     vec_si_result1[i] = 0;
+
+   address_si = &vec_si_result1[0];
+
+   vec_xst_be (store_data_si, disp, address_si);
+   if (result_wrong_si (vec_si_expected1, vec_si_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, si disp = 0, result does not match expected result\n");
+       print_si (vec_si_expected1, vec_si_result1);
+#else
+       abort();
+#endif
+     }
+#endif
+
+#if 0
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_ui_expected1 = (vector unsigned int){ -2, -1, 0, 1 };
+#else
+   vec_ui_expected1 = (vector unsigned int){ 1, 0, -1, -2 };
+#endif
+   store_data_ui = (vector unsigned int){ -2, -1, 0, 1 };
+
+   for (i=0; i<4; i++)
+     vec_ui_result1[i] = 0;
+
+   address_ui = &vec_ui_result1[0];
+
+   vec_xst_be (store_data_ui, disp, address_ui);
+
+   if (result_wrong_ui (vec_ui_expected1, vec_ui_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, ui disp = 0, result does not match expected result\n");
+       print_ui (vec_ui_expected1, vec_ui_result1);
+#else
+       abort();
+#endif
+     }
+#endif
+   
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_sll_expected1 = (vector signed long long){ -1, 0 };
+#else
+   vec_sll_expected1 = (vector signed long long){ 0, -1 };
+#endif
+   store_data_sll = (vector signed long long ){ -1, 0 };
+
+   for (i=0; i<2; i++)
+     vec_sll_result1[i] = 0;
+
+   address_sll = (signed long long *)(&vec_sll_result1[0]);
+
+   vec_xst_be (store_data_sll, disp, address_sll);
+
+   if (result_wrong_sll (vec_sll_expected1, vec_sll_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, sll disp = 0, result does not match expected result\n");
+       print_sll (vec_sll_expected1, vec_sll_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_ull_expected1 = (vector unsigned long long){ 0, 1234567890123456 };
+#else
+   vec_ull_expected1 = (vector unsigned long long){1234567890123456, 0 };
+#endif   
+   store_data_ull = (vector unsigned long long){  0, 1234567890123456 };
+
+   for (i=0; i<2; i++)
+     vec_ull_result1[i] = 0;
+
+   address_ull = (unsigned long long int *)(&vec_ull_result1[0]);
+
+   vec_xst_be (store_data_ull, disp, address_ull);
+
+   if (result_wrong_ull (vec_ull_expected1, vec_ull_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, ull disp = 0, result does not match expected result\n");
+       print_ull (vec_ull_expected1, vec_ull_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+
+#ifdef __BIG_ENDIAN__
+   vec_s128_expected1 = (vector __int128_t){ (__uint128_t)12345678911121314 };
+#else
+   vec_s128_expected1 = (vector __int128_t){ (__uint128_t)12345678911121314 };
+#endif
+   store_data_s128 = (vector __int128_t)(__uint128_t){  12345678911121314 };
+
+   vec_s128_result1[0] = 0;
+
+   address_s128 = (__int128_t *)(&vec_s128_result1[0]);
+
+   vec_xst_be (store_data_s128, disp, address_s128);
+
+   if (result_wrong_s128 (vec_s128_expected1, vec_s128_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, s128 disp = 0, result does not match expected result\n");
+       print_s128 (vec_s128_expected1, vec_s128_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)1234567891112131415 };
+#else
+   vec_u128_expected1 = (vector __uint128_t){ (__uint128_t)1234567891112131415 };
+#endif
+   store_data_u128 = (vector __uint128_t){ (__uint128_t)1234567891112131415 };
+
+   vec_u128_result1[0] = 0;
+
+   address_u128 = (__int128_t *)(&vec_u128_result1[0]);
+
+   vec_xst_be (store_data_u128, disp, address_u128);
+
+   if (result_wrong_u128 (vec_u128_expected1, vec_u128_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, u128 disp = 0, result does not match expected result\n");
+       print_u128 (vec_u128_expected1, vec_u128_result1);
+#else
+       abort();
+#endif
+     }
+
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_d_expected1 = (vector double){ 0.0, 1.1 };
+#else
+   vec_d_expected1 = (vector double){ 1.1, 0.0 };
+#endif
+   store_data_d = (vector double){  0.0, 1.1 };
+
+   for (i=0; i<2; i++)
+     vec_d_result1[i] = 0;
+
+   address_d = (double *)(&vec_d_result1[0]);
+
+   vec_xst_be (store_data_d, disp, address_d);
+
+   if (result_wrong_d (vec_d_expected1, vec_d_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, double disp = 0, result does not match expected result\n");
+       print_d (vec_d_expected1, vec_d_result1);
+#else
+       abort();
+#endif
+     }
+
+#if 0
+   disp = 0;
+#ifdef __BIG_ENDIAN__
+   vec_f_expected1 = (vector float){ 0.0, 1.2, 2.3, 3.4 };
+#else
+   vec_f_expected1 = (vector float){ 3.4, 2.3, 1.2, 0.0 };
+#endif
+   store_data_f = (vector float){ 0.0, 1.2, 2.3, 3.4 };
+
+   for (i=0; i<4; i++)
+     vec_f_result1[i] = 0;
+
+   address_f = (float *)(&vec_f_result1[0]);
+
+   vec_xst_be (store_data_f, disp, address_f);
+
+   if (result_wrong_f (vec_f_expected1, vec_f_result1))
+     {
+#ifdef DEBUG
+       printf("Error: vec_xst_be, float disp = 0, result does not match expected result\n");
+       print_f (vec_f_expected1, vec_f_result1);
+#else
+       abort();
+#endif
+     }
+#endif
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/powerpc.exp b/gcc/testsuite/gcc.target/powerpc/powerpc.exp
index 93b3239b3..17b741a58 100644
--- a/gcc/testsuite/gcc.target/powerpc/powerpc.exp
+++ b/gcc/testsuite/gcc.target/powerpc/powerpc.exp
@@ -49,4 +49,16 @@  gcc-dg-runtest [list $srcdir/$subdir/savres.c] "" $alti
 
 # All done.
 torture-finish
+
+torture-init 
+# Test load/store builtins at all optimizations
+set-torture-options [list -O0 -O1 -O2]
+gcc-dg-runtest [list $srcdir/$subdir/builtins-4-runnable.c \
+		     $srcdir/$subdir/builtins-6-runnable.c \
+		     $srcdir/$subdir/builtins-5-p9-runnable.c \
+	       	     $srcdir/$subdir/builtins-6-p9-runnable.c] "" $DEFAULT_CFLAGS
+
+# All done.
+torture-finish
+
 dg-finish