[5/6] rs6000, Add vector splat builtin support

Message ID 01260c60c5ab690757a79423ebc78ea3c6ab0d97.camel@us.ibm.com
State Superseded
Headers show
Series
  • Permute Class Operations
Related show

Commit Message

Kewen.Lin via Gcc-patches June 1, 2020, 4:15 p.m.
GCC maintainers:

The following patch adds support for the vec_splati, vec_splatid and
vec_splati_ins builtins.

Note, this patch adds support for instructions that take a 32-bit
immediate
value that represents a floating point value.  This support adds new
predicates and a support function to properly handle the immediate
value.

The patch has been compiled and tested on

  powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regression errors.

The test case was compiled on a Power 9 system and then tested on
Mambo.

Please let me know if this patch is acceptable for the mainline
branch.  Thanks.

                         Carl Love
--------------------------------------------------------
gcc/ChangeLog

2020-05-30  Carl Love  <cel@us.ibm.com>

        * config/rs6000/altivec.h: Add define for vec_splati,
vec_splatid
        and vec_splati_ins.
        * config/rs6000/vsx.md: Add UNSPEC_XXSPLTIW, UNSPEC_XXSPLTID
        and UNSPEC_XXSPLTI32DX.
        (define_insn): Add vxxspltiw_v4si, vxxspltiw_v4sf_inst,
        vxxspltidp_v2df_inst, vxxsplti32dx_v4si_inst, and
        vxxsplti32dx_v4sf_inst.
        (define_expand): vxxspltiw_v4sf, vxxspltidp_v2df,
vxxsplti32dx_v4si,
        vxxsplti32dx_v4sf.
        * config/rs6000/predicates: Add predicates u1bit_cint_operand,
        s32bit_cint_operand, c32bit_cint_operand, and
f32bit_const_operand.
        * config/rs6000/rs6000-builtin.def (BU_FUTURE_V_1): Add
definitions
        for VXXSPLTIW_V4SI, VXXSPLTIW_V4SF and VXXSPLTID.
        (BU_FUTURE_V_3): Add definitions for VXXSPLTI32DX_V4SI and
        VXXSPLTI32DX_V4SF.
        (BU_FUTURE_OVERLOAD_1): Add definitions XXSPLTIW and XXSPLTID.
        (BU_FUTURE_OVERLOAD_3): Add definition XXSPLTI32DX.
        * config/rs6000/rs6000-call.c: Add overloaded definitions for
        FUTURE_BUILTIN_VEC_XXSPLTIW, FUTURE_BUILTIN_VEC_XXSPLTID and
        FUTURE_BUILTIN_VEC_XXSPLTI32DX.
        * config/rs6000/rs6000-protos.h: Add prototype definition for
        rs6000_constF32toI32.
        (builtin_function_type): Add cases for
FUTURE_BUILTIN_VXXSPLTI32DX_V4SI
        and FUTURE_BUILTIN_VXXSPLTI32DX_V4SF.
        * config/rs6000/rs6000.c: Add function rs6000_constF32toI32.
        * config/doc/extend.texi: Add documentation for vec_splati,
	vec_splatid, and vec_splati_ins.
        * testsuite/gcc.target/powerpc/vec-splati-runnable: New test.
---
 gcc/config/rs6000/altivec.h                   |   3 +
 gcc/config/rs6000/altivec.md                  | 109 +++++++++++++
 gcc/config/rs6000/predicates.md               |  20 +++
 gcc/config/rs6000/rs6000-builtin.def          |  13 ++
 gcc/config/rs6000/rs6000-call.c               |  19 +++
 gcc/config/rs6000/rs6000-protos.h             |   1 +
 gcc/config/rs6000/rs6000.c                    |  16 ++
 gcc/doc/extend.texi                           |  35 +++++
 .../gcc.target/powerpc/vec-splati-runnable.c  | 145 ++++++++++++++++++
 9 files changed, 361 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splati-
runnable.c

+
-- 
2.17.1

Comments

Segher Boessenkool June 5, 2020, 9:28 p.m. | #1
Hi!

On Mon, Jun 01, 2020 at 09:15:00AM -0700, Carl Love wrote:
> The following patch adds support for the vec_splati, vec_splatid and

> vec_splati_ins builtins.


>         * config/rs6000/altivec.h: Add define for vec_splati,

> vec_splatid

>         and vec_splati_ins.


	* config/rs6000/altivec.h (vec_splati, vec_splatid, vec_splati_ins):
	New defines.

Etc.


> +(define_insn "vxxspltiw_v4si"

> +  [(set (match_operand:V4SI 0 "register_operand" "=wa")

> +	(unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]

> +		     UNSPEC_XXSPLTIW))]

> + "TARGET_FUTURE"

> + "xxspltiw %x0,%1"

> + [(set_attr "type" "vecsimple")])


I think we can do a nicer name than "vxx"?  The mode is part of the name
already, so that says it is vector?  And the exact insn used is not
usually something you want in the name.

Maybe just "splat_imm_v4si" or similar?  What do we do for the existing
immediate splats, hrm.  ...  Yeah, similar to what you do, so let's
just go with that for now...  But "xx", not "vxx"?

> +  /* Instruction uses destination as a source.  Do not overwrite

> source.  */


(Your patches line-wrapped in the mail, btw.)

> +   emit_move_insn (operands[0], operands[1]);

> +   emit_insn (gen_vxxsplti32dx_v4sf_inst (operands[0], GEN_INT( index

> ),


No spaces around "index" please.  But there should be a space before the
opening parenthesis.

> +;; Return 1 if op is a 32-bit constant signed integer

> +(define_predicate "s32bit_cint_operand"

> +  (and (match_code "const_int")

> +       (match_test "INTVAL (op) >= -2147483648

> +          && INTVAL (op) <= 2147483647")))


There probably is a nicer way to write this than with big decimal
numbers.  (I'll not suggest one here because I'll just make a fool of
myself with overflow or signed/unsigned etc. :-) )

> +;; Return 1 if op is a constant 32-bit signed or unsigned integer

> +(define_predicate "c32bit_cint_operand"

> +  (and (match_code "const_int")

> +       (match_test "((INTVAL (op) >> 32) == 0)")))


This does not work for negative 32-bit numbers?  In GCC the LHS
expression is -1 for those...  Not sure what it is for the C++11 we now
require, but in C11 it is implementation-defined, so not good either.

> +;; Return 1 if op is a constant 32-bit floating point value

> +(define_predicate "f32bit_const_operand"

> +  (match_code "const_double"))


Either the predicate name is misleading (if you do allow all
const_double values), or there should be some test for the alloed values
here.

> +extern long long rs6000_constF32toI32 (rtx operand);


Please use rs6000_const_f32_to_i32 or similar, or a more meaningful
name (neither "f32" nor "i32" means anything in GCC).

const_float_as_integer?  Something like that?

> +long long

> +rs6000_constF32toI32 (rtx operand)

> +{

> +  long long value;

> +  const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (operand);

> +

> +  if (GET_MODE (operand) != SFmode)

> +    {

> +      printf("ERROR, rs6000_constF32toI32 mode not equal to

> SFmode.\n");


Let's not have the printf :-)

> +      gcc_unreachable ();


Is the gcc_unreachable still useful?  If so, write it as

  gcc_assert (GET_MODE (operand) == SFmode);

?  And if not, just drop it :-)

> +@smallexample

> +@exdent vector double vec_splatid (const float);

> +@end smallexample

> +

> +Convert a floating-point value to double-precision and splat the

> result to a

> +vector of double-precision floats.


You probably should say the floating-point value you start with is
single precision.


Segher
Kewen.Lin via Gcc-patches June 10, 2020, 12:01 a.m. | #2
Segher:

So I have been looking at the predicate definitions that I had created.

On Fri, 2020-06-05 at 16:28 -0500, Segher Boessenkool wrote:
> > +;; Return 1 if op is a 32-bit constant signed integer

> > +(define_predicate "s32bit_cint_operand"

> > +  (and (match_code "const_int")

> > +       (match_test "INTVAL (op) >= -2147483648

> > +          && INTVAL (op) <= 2147483647")))

> 

> There probably is a nicer way to write this than with big decimal

> numbers.  (I'll not suggest one here because I'll just make a fool of

> myself with overflow or signed/unsigned etc. :-) )

> 

> > +;; Return 1 if op is a constant 32-bit signed or unsigned integer

> > +(define_predicate "c32bit_cint_operand"

> > +  (and (match_code "const_int")

> > +       (match_test "((INTVAL (op) >> 32) == 0)")))


The more I look at the above two they really are the same.  Basically,
it boils down to ... can the value signed or unsigned fit in 32-bits or
not?  It seems like both of the above just need to test if the INTVAL
(op) has any bits above bits 0:31 set.  So seems like (INTVAL (op) >>
32) == 0) should be sufficient for both predicates, i.e. replace the
two with a single generic predicate "cint_32bit_operand".  

For starters, I tried changing the definition for s32bit_cint_operand
to:

; Return 1 if op is a 32-bit constant signed integer                           
(define_predicate "s32bit_cint_operand"                                         
  (and (match_code "const_int")                                                 
       (match_test "((INTVAL (op) >> 32) == 0)")))

Unfortunately it doesn't seem to work for 

(define_insn "xxspltiw_v4si"                                                    
  [(set (match_operand:V4SI 0 "register_operand" "=wa")                         
        (unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]           
                     UNSPEC_XXSPLTIW))]                                         
 "TARGET_FUTURE"                                                                
 "xxspltiw %x0,%1"                                                              
 [(set_attr "type" "vecsimple")])  

I get unrecongized insn.  It seems like (INTVAL (op) >> 32) == 0)
should be true for any 32-bit integer signed or unsigned???

Any thoughts as to why this doesn't work?

> 

> This does not work for negative 32-bit numbers?  In GCC the LHS

> expression is -1 for those...  Not sure what it is for the C++11 we

> now

> require, but in C11 it is implementation-defined, so not good either.

> 

> > +;; Return 1 if op is a constant 32-bit floating point value

> > +(define_predicate "f32bit_const_operand"

> > +  (match_code "const_double"))

> 

> Either the predicate name is misleading (if you do allow all

> const_double values), or there should be some test for the alloed

> values

> here.


The predicate is used to check for a 32-bit float constant.  Looking
thru the code not sure if const_double is a 64-bit float? I don't think
that is what I want.  I want a 32-bit floating point value,
const_float, const_real?  Don't see a a const_float or anything that
looks like that?  Not having much luck to see where const_double gets
defined to see what other definitions there are.

I am assuming at this point in the compilation process, the constant
that is passed has type info (signed int, unsigned int, float)
associated with it from the front end parsing of the source code.

         Carl
Kewen.Lin via Gcc-patches June 10, 2020, 3:46 p.m. | #3
On Tue, 2020-06-09 at 17:01 -0700, Carl Love wrote:
> Segher:

> 

> So I have been looking at the predicate definitions that I had

> created.

> 

> On Fri, 2020-06-05 at 16:28 -0500, Segher Boessenkool wrote:

> > > +;; Return 1 if op is a 32-bit constant signed integer

> > > +(define_predicate "s32bit_cint_operand"

> > > +  (and (match_code "const_int")

> > > +       (match_test "INTVAL (op) >= -2147483648

> > > +          && INTVAL (op) <= 2147483647")))

> > 

> > There probably is a nicer way to write this than with big decimal

> > numbers.  (I'll not suggest one here because I'll just make a fool

> > of

> > myself with overflow or signed/unsigned etc. :-) )

> > 

> > > +;; Return 1 if op is a constant 32-bit signed or unsigned

> > > integer

> > > +(define_predicate "c32bit_cint_operand"

> > > +  (and (match_code "const_int")

> > > +       (match_test "((INTVAL (op) >> 32) == 0)")))

> 

> The more I look at the above two they really are the

> same.  Basically,

> it boils down to ... can the value signed or unsigned fit in 32-bits

> or

> not?  It seems like both of the above just need to test if the INTVAL

> (op) has any bits above bits 0:31 set.  So seems like (INTVAL (op) >>

> 32) == 0) should be sufficient for both predicates, i.e. replace the

> two with a single generic predicate "cint_32bit_operand".  

> 

> For starters, I tried changing the definition for s32bit_cint_operand

> to:

> 

> ; Return 1 if op is a 32-bit constant signed integer                           

> (define_predicate "s32bit_cint_operand"                                         

>   (and (match_code "const_int")                                                 

>        (match_test "((INTVAL (op) >> 32) == 0)")))



Compare that to the other predicates (config/rs6000/predicates.md)

Those have explicit checks against both ends of the valid range of
values.   i.e.

;; Return 1 if op is a signed 5-bit constant integer.
(define_predicate "s5bit_cint_operand"
  (and (match_code "const_int")
       (match_test "INTVAL (op) >= -16 && INTVAL (op) <= 15")))



> 

> Unfortunately it doesn't seem to work for 

> 

> (define_insn

> "xxspltiw_v4si"                                                    

>   [(set (match_operand:V4SI 0 "register_operand"

> "=wa")                         

>         (unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand"

> "n")]           

>                      UNSPEC_XXSPLTIW))]                              

>            

>  "TARGET_FUTURE"                                                     

>            

>  "xxspltiw

> %x0,%1"                                                              

>  [(set_attr "type" "vecsimple")])  

> 

> I get unrecongized insn.  It seems like (INTVAL (op) >> 32) == 0)

> should be true for any 32-bit integer signed or unsigned???

> 

> Any thoughts as to why this doesn't work?

> 

> > 

> > This does not work for negative 32-bit numbers?  In GCC the LHS

> > expression is -1 for those...  Not sure what it is for the C++11 we

> > now

> > require, but in C11 it is implementation-defined, so not good

> > either.

> > 

> > > +;; Return 1 if op is a constant 32-bit floating point value

> > > +(define_predicate "f32bit_const_operand"

> > > +  (match_code "const_double"))

> > 

> > Either the predicate name is misleading (if you do allow all

> > const_double values), or there should be some test for the alloed

> > values

> > here.

> 

> The predicate is used to check for a 32-bit float constant.  Looking

> thru the code not sure if const_double is a 64-bit float? I don't

> think

> that is what I want.  I want a 32-bit floating point value,

> const_float, const_real?  Don't see a a const_float or anything that

> looks like that?  Not having much luck to see where const_double gets

> defined to see what other definitions there are.

> 

> I am assuming at this point in the compilation process, the constant

> that is passed has type info (signed int, unsigned int, float)

> associated with it from the front end parsing of the source code.

> 

>          Carl 

>
Kewen.Lin via Gcc-patches June 10, 2020, 4:14 p.m. | #4
On Wed, 2020-06-10 at 10:46 -0500, will schmidt wrote:

<snip>
> > On Fri, 2020-06-05 at 16:28 -0500, Segher Boessenkool wrote:

> > > > +;; Return 1 if op is a 32-bit constant signed integer

> > > > +(define_predicate "s32bit_cint_operand"

> > > > +  (and (match_code "const_int")

> > > > +       (match_test "INTVAL (op) >= -2147483648

> > > > +          && INTVAL (op) <= 2147483647")))

> > > 

> > > There probably is a nicer way to write this than with big decimal

> > > numbers.  (I'll not suggest one here because I'll just make a

> > > fool

> > > of

> > > myself with overflow or signed/unsigned etc. :-) )

> > > 

> > > > +;; Return 1 if op is a constant 32-bit signed or unsigned

> > > > integer

> > > > +(define_predicate "c32bit_cint_operand"

> > > > +  (and (match_code "const_int")

> > > > +       (match_test "((INTVAL (op) >> 32) == 0)")))

> > 

> > The more I look at the above two they really are the

> > same.  Basically,

> > it boils down to ... can the value signed or unsigned fit in 32-

> > bits

> > or

> > not?  It seems like both of the above just need to test if the

> > INTVAL

> > (op) has any bits above bits 0:31 set.  So seems like (INTVAL (op)

> > >>

> > 32) == 0) should be sufficient for both predicates, i.e. replace

> > the

> > two with a single generic predicate "cint_32bit_operand".  

> > 

> > For starters, I tried changing the definition for

> > s32bit_cint_operand

> > to:

> > 

> > ; Return 1 if op is a 32-bit constant signed

> > integer                           

> > (define_predicate

> > "s32bit_cint_operand"                                         

> >   (and (match_code

> > "const_int")                                                 

> >        (match_test "((INTVAL (op) >> 32) == 0)")))

> 

> 

> Compare that to the other predicates (config/rs6000/predicates.md)

> 

> Those have explicit checks against both ends of the valid range of

> values.   i.e.

> 

> ;; Return 1 if op is a signed 5-bit constant integer.

> (define_predicate "s5bit_cint_operand"

>   (and (match_code "const_int")

>        (match_test "INTVAL (op) >= -16 && INTVAL (op) <= 15")))


Well, that is what I did originally.  But if you see your comment
above, "There probably is a nicer way to write this than with big
decimal numbers." so I was trying to figure out how to do it without
using big numbers.  I seemed like shifting the value right 32 bits and
checking if the result was zero would tell us that op fits in 32-bits
but it doesn't seem to work.  So, now I have conflicting feedback.  :-)

                      Carl
Segher Boessenkool June 10, 2020, 6:47 p.m. | #5
Hi!

On Tue, Jun 09, 2020 at 05:01:45PM -0700, Carl Love wrote:
> On Fri, 2020-06-05 at 16:28 -0500, Segher Boessenkool wrote:

> > > +;; Return 1 if op is a 32-bit constant signed integer

> > > +(define_predicate "s32bit_cint_operand"

> > > +  (and (match_code "const_int")

> > > +       (match_test "INTVAL (op) >= -2147483648

> > > +          && INTVAL (op) <= 2147483647")))

> > 

> > There probably is a nicer way to write this than with big decimal

> > numbers.  (I'll not suggest one here because I'll just make a fool of

> > myself with overflow or signed/unsigned etc. :-) )

> > 

> > > +;; Return 1 if op is a constant 32-bit signed or unsigned integer

> > > +(define_predicate "c32bit_cint_operand"

> > > +  (and (match_code "const_int")

> > > +       (match_test "((INTVAL (op) >> 32) == 0)")))

> 

> The more I look at the above two they really are the same.  Basically,

> it boils down to ... can the value signed or unsigned fit in 32-bits or

> not?


s32bit_cint_operand is testing if it fits in signed 32 bit.
c32bit_cint_operand is testing if it fits in unsigned 32 bit (but that
is not what its description says).

> It seems like both of the above just need to test if the INTVAL

> (op) has any bits above bits 0:31 set.  So seems like (INTVAL (op) >>

> 32) == 0) should be sufficient for both predicates, i.e. replace the

> two with a single generic predicate "cint_32bit_operand".  


That isn't the range for signed 32 bit though.  As a 64-bit number, the
top half is the sign extension of the bottom half, so the top half is
-1 (i.e. all bits set) for negative numbers.  A 64-bit number with 0 as
the high half but the top bit of the low half set is out of range for a
signed 32-bit number.

> > > +;; Return 1 if op is a constant 32-bit floating point value

> > > +(define_predicate "f32bit_const_operand"

> > > +  (match_code "const_double"))

> > 

> > Either the predicate name is misleading (if you do allow all

> > const_double values), or there should be some test for the alloed

> > values

> > here.

> 

> The predicate is used to check for a 32-bit float constant.  Looking

> thru the code not sure if const_double is a 64-bit float?


It is.

> I don't think

> that is what I want.  I want a 32-bit floating point value,

> const_float, const_real?


As a constant number that will still be done as a const_double.  You
should test its actual value to see if it is a valid single-presicision
floating point number.  There probably already is a helper for that?

> Don't see a a const_float or anything that

> looks like that?  Not having much luck to see where const_double gets

> defined to see what other definitions there are.


It is defined in rtl.def:
/* numeric floating point or integer constant.  If the mode is
   VOIDmode it is an int otherwise it has a floating point mode and a
   floating point value.  Operands hold the value.  They are all 'w'
   and there may be from 2 to 6; see real.h.  */
DEF_RTL_EXPR(CONST_DOUBLE, "const_double", CONST_DOUBLE_FORMAT, RTX_CONST_OBJ)

(this is probably more confusing than helpful, but you asked ;-) )

> I am assuming at this point in the compilation process, the constant

> that is passed has type info (signed int, unsigned int, float)

> associated with it from the front end parsing of the source code.


RTL uses modes, not types.  What is the mode of the const_double here?
If it is SFmode you are fine (but do test for that mode then).  If it is
DFmode, you need to check its actual value.  We want to support *both*,
and checking the value works in all cases.

(What you need to check is if that floating point value can be exactly
expressed as a 32-bit IEEE float, no rounding or truncation etc.)


Segher
Segher Boessenkool June 10, 2020, 7:10 p.m. | #6
Hi!

On Wed, Jun 10, 2020 at 09:14:07AM -0700, Carl Love wrote:
> On Wed, 2020-06-10 at 10:46 -0500, will schmidt wrote:

> > Compare that to the other predicates (config/rs6000/predicates.md)

> > 

> > Those have explicit checks against both ends of the valid range of

> > values.   i.e.

> > 

> > ;; Return 1 if op is a signed 5-bit constant integer.

> > (define_predicate "s5bit_cint_operand"

> >   (and (match_code "const_int")

> >        (match_test "INTVAL (op) >= -16 && INTVAL (op) <= 15")))

> 

> Well, that is what I did originally.  But if you see your comment

> above, "There probably is a nicer way to write this than with big

> decimal numbers." so I was trying to figure out how to do it without

> using big numbers.


Big *decimal* numbers aren't great.  But you could use hex :-)
2147483647 looks like it could be 0x7fffffff, but so does 2147438647,
and that one is 0x7fff5037.

> I seemed like shifting the value right 32 bits and

> checking if the result was zero would tell us that op fits in 32-bits

> but it doesn't seem to work.  So, now I have conflicting feedback.  :-)


For signed you could do

((0x80000000 + UINTVAL (op)) >> 32) == 0

(or INTVAL even, makes no difference here), but that is much less
readable :-)

Maybe for signed it is neatest if you use trunc_int_for_mode for it?

INTVAL (op) == INTVAL (trunc_int_for_mode (op, SImode))

(which neatly uses the _target_ SImode).

trunc_int_for_mode always sign-extends; we don't have one that zero-
extends afaik, but that one is much easier to write anyway:

IN_RANGE (UINTVAL (op), 0, 0xffffffff)


Segher

Patch

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 0be68892aad..9ed41b1cbf1 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -705,6 +705,9 @@  __altivec_scalar_pred(vec_any_nle,
 #define vec_replace_unaligned(a, b, c) __builtin_vec_replace_un (a, b,
c)
 #define vec_sldb(a, b, c)      __builtin_vec_sldb (a, b, c)
 #define vec_srdb(a, b, c)      __builtin_vec_srdb (a, b, c)
+#define vec_splati(a)  __builtin_vec_xxspltiw (a)
+#define vec_splatid(a) __builtin_vec_xxspltid (a)
+#define vec_splati_ins(a, b, c)        __builtin_vec_xxsplti32dx (a,
b, c)
 
 #define vec_gnb(a, b)	__builtin_vec_gnb (a, b)
 #define vec_clrl(a, b)	__builtin_vec_clrl (a, b)
diff --git a/gcc/config/rs6000/altivec.md
b/gcc/config/rs6000/altivec.md
index de79ae22fd4..47e8148029b 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -173,6 +173,9 @@ 
    UNSPEC_VSTRIL
    UNSPEC_SLDB
    UNSPEC_SRDB
+   UNSPEC_XXSPLTIW
+   UNSPEC_XXSPLTID
+   UNSPEC_XXSPLTI32DX
 ])
 
 (define_c_enum "unspecv"
@@ -799,6 +802,112 @@ 
   "vs<SLDB_LR>dbi %0,%1,%2,%3"
   [(set_attr "type" "vecsimple")])
 
+(define_insn "vxxspltiw_v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=wa")
+	(unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIW))]
+ "TARGET_FUTURE"
+ "xxspltiw %x0,%1"
+ [(set_attr "type" "vecsimple")])
+
+(define_expand "vxxspltiw_v4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=wa")
+	(unspec:V4SF [(match_operand:SF 1 "f32bit_const_operand" "n")]
+		     UNSPEC_XXSPLTIW))]
+ "TARGET_FUTURE"
+{
+  long long value = rs6000_constF32toI32 (operands[1]);
+  emit_insn (gen_vxxspltiw_v4sf_inst (operands[0], GEN_INT (value)));
+  DONE;
+})
+
+(define_insn "vxxspltiw_v4sf_inst"
+  [(set (match_operand:V4SF 0 "register_operand" "=wa")
+	(unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIW))]
+ "TARGET_FUTURE"
+ "xxspltiw %x0,%c1"
+ [(set_attr "type" "vecsimple")])
+
+(define_expand "vxxspltidp_v2df"
+  [(set (match_operand:V2DF 0 "register_operand" )
+	(unspec:V2DF [(match_operand:SF 1 "f32bit_const_operand")]
+		     UNSPEC_XXSPLTID))]
+ "TARGET_FUTURE"
+{
+  long value = rs6000_constF32toI32 (operands[1]);
+  emit_insn (gen_vxxspltidp_v2df_inst (operands[0], GEN_INT (value)));
+  DONE;
+})
+
+(define_insn "vxxspltidp_v2df_inst"
+  [(set (match_operand:V2DF 0 "register_operand" "=wa")
+	(unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTID))]
+  "TARGET_FUTURE"
+  "xxspltidp %x0,%c1"
+  [(set_attr "type" "vecsimple")])
+
+(define_expand "vxxsplti32dx_v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=wa")
+	(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "wa")
+		      (match_operand:QI 2 "u1bit_cint_operand" "n")
+		      (match_operand:SI 3 "s32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTI32DX))]
+ "TARGET_FUTURE"
+{
+  int index = INTVAL (operands[2]);
+
+  if (!BYTES_BIG_ENDIAN)
+    index = 1 - index;
+
+   /* Instruction uses destination as a source.  Do not overwrite
source.  */
+   emit_move_insn (operands[0], operands[1]);
+
+   emit_insn (gen_vxxsplti32dx_v4si_inst (operands[0], GEN_INT
(index),
+					  operands[3]));
+   DONE;
+}
+ [(set_attr "type" "vecsimple")])
+
+(define_insn "vxxsplti32dx_v4si_inst"
+  [(set (match_operand:V4SI 0 "register_operand" "+wa")
+	(unspec:V4SI [(match_operand:QI 1 "u1bit_cint_operand" "n")
+		      (match_operand:SI 2 "s32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTI32DX))]
+  "TARGET_FUTURE"
+  "xxsplti32dx %x0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_expand "vxxsplti32dx_v4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=wa")
+	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "wa")
+		      (match_operand:QI 2 "u1bit_cint_operand" "n")
+		      (match_operand:SF 3 "f32bit_const_operand" "n")]
+		     UNSPEC_XXSPLTI32DX))]
+  "TARGET_FUTURE"
+{
+  int index = INTVAL (operands[2]);
+  long value = rs6000_constF32toI32 (operands[3]);
+  if (!BYTES_BIG_ENDIAN)
+    index = 1 - index;
+
+  /* Instruction uses destination as a source.  Do not overwrite
source.  */
+   emit_move_insn (operands[0], operands[1]);
+   emit_insn (gen_vxxsplti32dx_v4sf_inst (operands[0], GEN_INT( index
),
+					  GEN_INT (value)));
+   DONE;
+})
+
+(define_insn "vxxsplti32dx_v4sf_inst"
+  [(set (match_operand:V4SF 0 "register_operand" "+wa")
+	(unspec:V4SF [(match_operand:QI 1 "u1bit_cint_operand" "n")
+		      (match_operand:SI 2 "s32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTI32DX))]
+  "TARGET_FUTURE"
+  "xxsplti32dx %x0,%1,%2"
+   [(set_attr "type" "vecsimple")])
+
 (define_expand "vstrir_<mode>"
   [(set (match_operand:VIshort 0 "altivec_register_operand")
 	(unspec:VIshort [(match_operand:VIshort 1
"altivec_register_operand")]
diff --git a/gcc/config/rs6000/predicates.md
b/gcc/config/rs6000/predicates.md
index c3f460face2..ebd6f45e0b7 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -214,6 +214,11 @@ 
   (and (match_code "const_int")
        (match_test "INTVAL (op) >= -16 && INTVAL (op) <= 15")))
 
+;; Return 1 if op is a unsigned 1-bit constant integer.
+(define_predicate "u1bit_cint_operand"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 1")))
+
 ;; Return 1 if op is a unsigned 3-bit constant integer.
 (define_predicate "u3bit_cint_operand"
   (and (match_code "const_int")
@@ -272,6 +277,21 @@ 
        (match_test "(unsigned HOST_WIDE_INT)
 		    (INTVAL (op) + 0x8000) >= 0x10000")))
 
+;; Return 1 if op is a 32-bit constant signed integer
+(define_predicate "s32bit_cint_operand"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) >= -2147483648
+          && INTVAL (op) <= 2147483647")))
+
+;; Return 1 if op is a constant 32-bit signed or unsigned integer
+(define_predicate "c32bit_cint_operand"
+  (and (match_code "const_int")
+       (match_test "((INTVAL (op) >> 32) == 0)")))
+
+;; Return 1 if op is a constant 32-bit floating point value
+(define_predicate "f32bit_const_operand"
+  (match_code "const_double"))
+
 ;; Return 1 if op is a positive constant integer that is an exact
power of 2.
 (define_predicate "exact_log2_cint_operand"
   (and (match_code "const_int")
diff --git a/gcc/config/rs6000/rs6000-builtin.def
b/gcc/config/rs6000/rs6000-builtin.def
index 2b198177ef0..7afd4c5e1d5 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2666,6 +2666,15 @@  BU_FUTURE_V_3 (VSRDB_V16QI, "vsrdb_v16qi",
CONST, vsrdb_v16qi)
 BU_FUTURE_V_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, vsrdb_v8hi)
 BU_FUTURE_V_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
 BU_FUTURE_V_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
+
+BU_FUTURE_V_1 (VXXSPLTIW_V4SI, "vxxspltiw_v4si", CONST,
vxxspltiw_v4si)
+BU_FUTURE_V_1 (VXXSPLTIW_V4SF, "vxxspltiw_v4sf", CONST,
vxxspltiw_v4sf)
+
+BU_FUTURE_V_1 (VXXSPLTID, "vxxspltidp", CONST, vxxspltidp_v2df)
+
+BU_FUTURE_V_3 (VXXSPLTI32DX_V4SI, "vxxsplti32dx_v4si", CONST,
vxxsplti32dx_v4si)
+BU_FUTURE_V_3 (VXXSPLTI32DX_V4SF, "vxxsplti32dx_v4sf", CONST,
vxxsplti32dx_v4sf)
+
 BU_FUTURE_V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi)
 BU_FUTURE_V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi)
 BU_FUTURE_V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi)
@@ -2697,6 +2706,10 @@  BU_FUTURE_OVERLOAD_1 (VSTRIL, "stril")
 
 BU_FUTURE_OVERLOAD_1 (VSTRIR_P, "strir_p")
 BU_FUTURE_OVERLOAD_1 (VSTRIL_P, "stril_p")
+
+BU_FUTURE_OVERLOAD_1 (XXSPLTIW, "xxspltiw")
+BU_FUTURE_OVERLOAD_1 (XXSPLTID, "xxspltid")
+BU_FUTURE_OVERLOAD_3 (XXSPLTI32DX, "xxsplti32dx")
 
 /* 1 argument crypto functions.  */
 BU_CRYPTO_1 (VSBOX,		"vsbox",	  CONST,
crypto_vsbox_v2di)
diff --git a/gcc/config/rs6000/rs6000-call.c
b/gcc/config/rs6000/rs6000-call.c
index dd9a4d28d7e..07c437b39ce 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5677,6 +5677,22 @@  const struct altivec_builtin_types
altivec_overloaded_builtins[] = {
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
     RS6000_BTI_unsigned_V2DI, RS6000_BTI_UINTQI },
 
+  { FUTURE_BUILTIN_VEC_XXSPLTIW, FUTURE_BUILTIN_VXXSPLTIW_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_INTSI, 0, 0 },
+  { FUTURE_BUILTIN_VEC_XXSPLTIW, FUTURE_BUILTIN_VXXSPLTIW_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_float, 0, 0 },
+
+  { FUTURE_BUILTIN_VEC_XXSPLTID, FUTURE_BUILTIN_VXXSPLTID,
+    RS6000_BTI_V2DF, RS6000_BTI_float, 0, 0 },
+
+  { FUTURE_BUILTIN_VEC_XXSPLTI32DX, FUTURE_BUILTIN_VXXSPLTI32DX_V4SI,
+    RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_UINTQI,
RS6000_BTI_INTSI },
+  { FUTURE_BUILTIN_VEC_XXSPLTI32DX, FUTURE_BUILTIN_VXXSPLTI32DX_V4SI,
+    RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI,
RS6000_BTI_UINTQI,
+    RS6000_BTI_UINTSI },
+  { FUTURE_BUILTIN_VEC_XXSPLTI32DX, FUTURE_BUILTIN_VXXSPLTI32DX_V4SF,
+    RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_UINTQI,
RS6000_BTI_float },
+
   { FUTURE_BUILTIN_VEC_SRDB, FUTURE_BUILTIN_VSRDB_V16QI,
     RS6000_BTI_V16QI, RS6000_BTI_V16QI,
     RS6000_BTI_V16QI, RS6000_BTI_UINTQI },
@@ -13535,6 +13551,9 @@  builtin_function_type (machine_mode mode_ret,
machine_mode mode_arg0,
     case ALTIVEC_BUILTIN_VSRH:
     case ALTIVEC_BUILTIN_VSRW:
     case P8V_BUILTIN_VSRD:
+    /* Vector splat immediate insert */
+    case FUTURE_BUILTIN_VXXSPLTI32DX_V4SI:
+    case FUTURE_BUILTIN_VXXSPLTI32DX_V4SF:
       h.uns_p[2] = 1;
       break;
 
diff --git a/gcc/config/rs6000/rs6000-protos.h
b/gcc/config/rs6000/rs6000-protos.h
index 5508484ba19..3d165373750 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -274,6 +274,7 @@  extern void rs6000_asm_output_dwarf_pcrel (FILE
*file, int size,
 					   const char *label);
 extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
 					     const char *label);
+extern long long rs6000_constF32toI32 (rtx operand);
 
 /* Declare functions in rs6000-c.c */
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8435bc15d72..d804c023946 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -26418,6 +26418,22 @@  rs6000_cannot_substitute_mem_equiv_p (rtx mem)
   return false;
 }
 
+long long
+rs6000_constF32toI32 (rtx operand)
+{
+  long long value;
+  const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (operand);
+
+  if (GET_MODE (operand) != SFmode)
+    {
+      printf("ERROR, rs6000_constF32toI32 mode not equal to
SFmode.\n");
+      gcc_unreachable ();
+    }
+
+  REAL_VALUE_TO_TARGET_SINGLE (*rv, value);
+  return value;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 711fe751bed..91ca453b8e6 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21129,6 +21129,41 @@  this instruction must be endian-aware.
 
 @findex vec_srdb
 
+Vector Splat
+
+@smallexample
+@exdent vector signed int vec_splati (const signed int);
+@exdent vector float vec_splati (const float);
+@end smallexample
+
+Splat a 32-bit immediate into a vector of words.
+
+@findex vec_splati
+
+@smallexample
+@exdent vector double vec_splatid (const float);
+@end smallexample
+
+Convert a floating-point value to double-precision and splat the
result to a
+vector of double-precision floats.
+
+@findex vec_splatid
+
+@smallexample
+@exdent vector signed int vec_splati_ins (vector signed int,
+const unsigned int, const signed int);
+@exdent vector unsigned int vec_splati_ins (vector unsigned int,
+const unsigned int, const unsigned int);
+@exdent vector float vec_splati_ins (vector float, const unsigned int,
+const float);
+@end smallexample
+
+Argument 2 must be either 0 or 1.  Splat the value of argument 3 into
the word
+identified by argument 2 of each doubleword of argument 1 and return
the
+result.  The other words of argument 1 are unchanged.
+
+@findex vec_splati_ins
+
 @smallexample
 @exdent vector unsigned long long int
 @exdent vec_pdep (vector unsigned long long int, vector unsigned long
long int)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
new file mode 100644
index 00000000000..f9fa55ae0d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -0,0 +1,145 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_future_hw } */
+/* { dg-options "-mdejagnu-cpu=future" } */
+#include <altivec.h>
+
+#define DEBUG 0
+
+#ifdef DEBUG
+#include <stdio.h>
+#endif
+
+extern void abort (void);
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+  vector int vsrc_a_int;
+  vector int vresult_int;
+  vector int expected_vresult_int;
+  int src_a_int = 13;
+
+  vector unsigned int vsrc_a_uint;
+  vector unsigned int vresult_uint;
+  vector unsigned int expected_vresult_uint;
+  unsigned int src_a_uint = 7;
+
+  vector float vresult_f;
+  vector float expected_vresult_f;
+  vector float vsrc_a_f;
+  float src_a_f = 23.0;
+
+  vector double vsrc_a_d;
+  vector double vresult_d;
+  vector double expected_vresult_d;
+ 
+  /* Vector splati word */
+  vresult_int = (vector signed int) { 1, 2, 3, 4 };
+  expected_vresult_int = (vector signed int) { -13, -13, -13, -13 }; 
+						 
+  vresult_int = vec_splati ( -13 );
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_splati (src_a_int)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%d] = %d, expected_vresult_int[%d] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+
+  vresult_f = (vector float) { 1.0, 2.0, 3.0, 4.0 };
+  expected_vresult_f = (vector float) { 23.0, 23.0, 23.0, 23.0 };
+						 
+  vresult_f = vec_splati (23.0f);
+
+  if (!vec_all_eq (vresult_f,  expected_vresult_f)) {
+#if DEBUG
+    printf("ERROR, vec_splati (src_a_f)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_f[%d] = %f, expected_vresult_f[%d] = %f\n",
+	     i, vresult_f[i], i, expected_vresult_f[i]);
+#else
+    abort();
+#endif
+  }
+
+  /* Vector splati double */
+  vresult_d = (vector double) { 2.0, 3.0 };
+  expected_vresult_d = (vector double) { -31.0, -31.0 };
+						 
+  vresult_d = vec_splatid (-31.0f);
+
+  if (!vec_all_eq (vresult_d,  expected_vresult_d)) {
+#if DEBUG
+    printf("ERROR, vec_splati (-31.0f)\n");
+    for(i = 0; i < 2; i++)
+      printf(" vresult_d[%i] = %f, expected_vresult_d[%i] = %f\n",
+	     i, vresult_d[i], i, expected_vresult_d[i]);
+#else
+    abort();
+#endif
+  }
+
+  /* Vector splat immediate */
+  vsrc_a_int = (vector int) { 2, 3, 4, 5 };
+  vresult_int = (vector int) { 1, 1, 1, 1 };
+  expected_vresult_int = (vector int) { 2, 20, 4, 20 };
+						 
+  vresult_int = vec_splati_ins (vsrc_a_int, 1, 20);
+
+  if (!vec_all_eq (vresult_int,  expected_vresult_int)) {
+#if DEBUG
+    printf("ERROR, vec_splati_ins (vsrc_a_int, 1, 20)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_int[%i] = %d, expected_vresult_int[%i] = %d\n",
+	     i, vresult_int[i], i, expected_vresult_int[i]);
+#else
+    abort();
+#endif
+  }
+  
+  vsrc_a_uint = (vector unsigned int) { 4, 5, 6, 7 };
+  vresult_uint = (vector unsigned int) { 1, 1, 1, 1 };
+  expected_vresult_uint = (vector unsigned int) { 4, 40, 6, 40 };
+						 
+  vresult_uint = vec_splati_ins (vsrc_a_uint, 1, 40);
+
+  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+#if DEBUG
+    printf("ERROR, vec_splati_ins (vsrc_a_uint, 1, 40)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_uint[%i] = %d, expected_vresult_uint[%i] =
%d\n",
+	     i, vresult_uint[i], i, expected_vresult_uint[i]);
+#else
+    abort();
+#endif
+  }
+  
+  vsrc_a_f = (vector float) { 2.0, 3.0, 4.0, 5.0 };
+  vresult_f = (vector float) { 1.0, 1.0, 1.0, 1.0 };
+  expected_vresult_f = (vector float) { 2.0, 20.1, 4.0, 20.1 };
+						 
+  vresult_f = vec_splati_ins (vsrc_a_f, 1, 20.1f);
+
+  if (!vec_all_eq (vresult_f,  expected_vresult_f)) {
+#if DEBUG
+    printf("ERROR, vec_splati_ins (vsrc_a_f, 1, 20.1)\n");
+    for(i = 0; i < 4; i++)
+      printf(" vresult_f[%i] = %f, expected_vresult_f[%i] = %f\n",
+	     i, vresult_f[i], i, expected_vresult_f[i]);
+#else
+    abort();
+#endif
+  }
+  
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {\msplati\M} 6 } } */
+/* { dg-final { scan-assembler-times {\msrdbi\M} 6 } } */
+