[rs6000] Support vrotr<mode>3 for int vector types

Message ID 27be90e6-4beb-5c4c-a163-9b136490d783@linux.ibm.com
State New
Headers show
Series
  • [rs6000] Support vrotr<mode>3 for int vector types
Related show

Commit Message

Kewen.Lin July 17, 2019, 8:32 a.m.
Hi all,

This patch follows the idea to improve rs6000 backend instead of
generic expander.  I think this is a better solution?  I was thinking
generic expander change may benefit other targets suffering similar
issues but the previous RFC seems too restricted on const rotation 
count, although it's possible to extend.  Any comments on their pros/
cons are really helpful to me (a noob).

Regression testing just launched, is it OK for trunk if it's bootstrapped
and regresstested on powerpc64le-unknown-linux-gnu?


Thanks,
Kewen

---- 

gcc/ChangeLog

2019-07-17  Kewen Lin  <linkw@gcc.gnu.org>

	* config/rs6000/predicates.md (vint_reg_or_const_vector): New predicate.
	* config/rs6000/vector.md (vrotr<mode>3): New define_expand.

gcc/testsuite/ChangeLog

2019-07-17  Kewen Lin  <linkw@gcc.gnu.org>

	* gcc.target/powerpc/vec_rotate-1.c: New test.
	* gcc.target/powerpc/vec_rotate-2.c: New test.

on 2019/7/16 下午4:45, Kewen.Lin wrote:
> Hi all,

> 

> Based on the previous comments (thank you!), I tried to update the 

> handling in expander and vectorizer.  Middle-end optimizes lrotate

> with const rotation count to rrotate all the time, it makes vectorizer

> fail to vectorize if rrotate isn't supported on the target.  We can at

> least teach it on const rotation count, the cost should be the same? 

> At the same time, the expander already tries to use the opposite 

> rotation optable for scalar, we can teach it to deal with vector as well.

> 

> Is it on the right track and reasonable?

> 

> 

> Thanks,

> Kewen

>

Comments

Jakub Jelinek July 17, 2019, 8:42 a.m. | #1
On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:
> --- a/gcc/config/rs6000/vector.md

> +++ b/gcc/config/rs6000/vector.md

> @@ -1260,6 +1260,32 @@

>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"

>    "")

>  

> +;; Expanders for rotatert to make use of vrotl

> +(define_expand "vrotr<mode>3"

> +  [(set (match_operand:VEC_I 0 "vint_operand")

> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")

> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]

> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"

> +{

> +  machine_mode inner_mode = GET_MODE_INNER (<MODE>mode);

> +  unsigned int bits = GET_MODE_PRECISION (inner_mode);

> +  rtx imm_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits));

> +  rtx rot_count = gen_reg_rtx (<MODE>mode);

> +  if (GET_CODE (operands[2]) == CONST_VECTOR)

> +    {

> +      imm_vec = simplify_const_binary_operation (MINUS, <MODE>mode, imm_vec,

> +						 operands[2]);

> +      rot_count = force_reg (<MODE>mode, imm_vec);

> +    }

> +  else

> +    {

> +      rtx imm_reg = force_reg (<MODE>mode, imm_vec);

> +      emit_insn (gen_sub<mode>3 (rot_count, imm_reg, operands[2]));

> +    }


Is this actually correct if one or more elements in operands[2] are 0?
If vrotl<mode>3 acts with truncated shift count, that is not an issue
(but then perhaps you wouldn't have to compute imm_reg - operands[2] but
just - operands[2]), but if it does something else, then prec - 0 will be
prec and thus outside of the allowed rotate count.  Or does rs6000 allow
rotate counts to be 0 to prec inclusive?

	Jakub
Kewen.Lin July 17, 2019, 9:22 a.m. | #2
on 2019/7/17 下午4:42, Jakub Jelinek wrote:
> On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:

>> --- a/gcc/config/rs6000/vector.md

>> +++ b/gcc/config/rs6000/vector.md

>> @@ -1260,6 +1260,32 @@

>>    "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"

>>    "")

>>  

>> +;; Expanders for rotatert to make use of vrotl

>> +(define_expand "vrotr<mode>3"

>> +  [(set (match_operand:VEC_I 0 "vint_operand")

>> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")

>> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]

>> +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"

>> +{

>> +  machine_mode inner_mode = GET_MODE_INNER (<MODE>mode);

>> +  unsigned int bits = GET_MODE_PRECISION (inner_mode);

>> +  rtx imm_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits));

>> +  rtx rot_count = gen_reg_rtx (<MODE>mode);

>> +  if (GET_CODE (operands[2]) == CONST_VECTOR)

>> +    {

>> +      imm_vec = simplify_const_binary_operation (MINUS, <MODE>mode, imm_vec,

>> +						 operands[2]);

>> +      rot_count = force_reg (<MODE>mode, imm_vec);

>> +    }

>> +  else

>> +    {

>> +      rtx imm_reg = force_reg (<MODE>mode, imm_vec);

>> +      emit_insn (gen_sub<mode>3 (rot_count, imm_reg, operands[2]));

>> +    }

> 

> Is this actually correct if one or more elements in operands[2] are 0?

> If vrotl<mode>3 acts with truncated shift count, that is not an issue

> (but then perhaps you wouldn't have to compute imm_reg - operands[2] but

> just - operands[2]), but if it does something else, then prec - 0 will be

> prec and thus outside of the allowed rotate count.  Or does rs6000 allow

> rotate counts to be 0 to prec inclusive?

> 

> 	Jakub

> 


Hi Jakub,

Good question, the vector rotation for byte looks like (others are similar):

vrlb VRT,VRA,VRB
  do i=0 to 127 by 8
   sh = (VRB)[i+5:i+7]
   VRT[i:i+7] = (VRA)[i:i+7] <<< sh
  end

It only takes care of the counts from 0 to prec-1 (inclusive) [log2(prec) bits]
So it's fine even operands[2] are zero or negative.

Take byte as example, prec is 8.
  - rot count is 0, then minus res gets 8. (out of 3 bits range), same as 0.
  - rot count is 9, then minus res gets -1. (3 bits parsed as 7), the original 
    rot count 9 was parsed as 1 (in 3 bits range).
  - rot count is -1, then minus res gets 9, (3 bits parsed as 1), the original
    rot count was parsed as 7 (in 3 bits range).

It's a good idea to just use negate!  Thanks!!


Kewen
Jakub Jelinek July 17, 2019, 9:38 a.m. | #3
On Wed, Jul 17, 2019 at 05:22:38PM +0800, Kewen.Lin wrote:
> Good question, the vector rotation for byte looks like (others are similar):

> 

> vrlb VRT,VRA,VRB

>   do i=0 to 127 by 8

>    sh = (VRB)[i+5:i+7]

>    VRT[i:i+7] = (VRA)[i:i+7] <<< sh

>   end

> 

> It only takes care of the counts from 0 to prec-1 (inclusive) [log2(prec) bits]

> So it's fine even operands[2] are zero or negative.

> 

> Take byte as example, prec is 8.

>   - rot count is 0, then minus res gets 8. (out of 3 bits range), same as 0.

>   - rot count is 9, then minus res gets -1. (3 bits parsed as 7), the original 

>     rot count 9 was parsed as 1 (in 3 bits range).

>   - rot count is -1, then minus res gets 9, (3 bits parsed as 1), the original

>     rot count was parsed as 7 (in 3 bits range).

> 

> It's a good idea to just use negate!  Thanks!!


Ok, so the hw for the vectors truncates, the question is how happy will the
RTL generic code with that.  rs6000 defines SHIFT_COUNT_TRUNCATED to 0,
so the generic code can't assume there is a truncation going on.  Either it
will punt some optimizations when it sees say negative or too large
shift/rotate count (that is the better case), or it might just assume there
is UB.
As the documentation says, for zero SHIFT_COUNT_TRUNCATED there is an option
of having a pattern with the truncation being explicit, so in your case
*vrotl<mode>3_and or similar that would have an explicit AND on the shift
operand with say {7, 7...} vector for the byte shifts etc. but emit in the
end identical instruction to vrotl<mode>3 and use the MINUS + that pattern
for vrotr<mode>3.  If the rotate argument is CONST_VECTOR, you can of course
just canonicalize, i.e. perform -operands[2] & mask, fold that into constant
and keep using vrotl<mode>3 in that case.

	Jakub
Segher Boessenkool July 17, 2019, 1:40 p.m. | #4
Hi Kewen,

On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:
> Regression testing just launched, is it OK for trunk if it's bootstrapped

> and regresstested on powerpc64le-unknown-linux-gnu?


> +;; Expanders for rotatert to make use of vrotl

> +(define_expand "vrotr<mode>3"

> +  [(set (match_operand:VEC_I 0 "vint_operand")

> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")

> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]


Having any rotatert in a define_expand or define_insn will regress.

So, nope, sorry.


Segher
Kewen.Lin July 18, 2019, 5:44 a.m. | #5
Hi Segher,

on 2019/7/17 下午9:40, Segher Boessenkool wrote:
> Hi Kewen,

> 

> On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:

>> Regression testing just launched, is it OK for trunk if it's bootstrapped

>> and regresstested on powerpc64le-unknown-linux-gnu?

> 

>> +;; Expanders for rotatert to make use of vrotl

>> +(define_expand "vrotr<mode>3"

>> +  [(set (match_operand:VEC_I 0 "vint_operand")

>> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")

>> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]

> 

> Having any rotatert in a define_expand or define_insn will regress.

> 

> So, nope, sorry.

> 


Thanks for clarifying!  Since regression testing passed on powerpc64le,I'd like to double confirm the meaning of "regress", does it mean it's 
a regression from design view?  Is it specific to rotatert and its 
related one like vrotr? 

If yes, it sounds we can't go with vrotr way. :(


Thanks,
Kewen
Segher Boessenkool July 18, 2019, 7:48 p.m. | #6
On Thu, Jul 18, 2019 at 01:44:36PM +0800, Kewen.Lin wrote:
> Hi Segher,

> 

> on 2019/7/17 下午9:40, Segher Boessenkool wrote:

> > Hi Kewen,

> > 

> > On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:

> >> Regression testing just launched, is it OK for trunk if it's bootstrapped

> >> and regresstested on powerpc64le-unknown-linux-gnu?

> > 

> >> +;; Expanders for rotatert to make use of vrotl

> >> +(define_expand "vrotr<mode>3"

> >> +  [(set (match_operand:VEC_I 0 "vint_operand")

> >> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")

> >> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]

> > 

> > Having any rotatert in a define_expand or define_insn will regress.

> > 

> > So, nope, sorry.

> 

> Thanks for clarifying!  Since regression testing passed on powerpc64le,I'd like to double confirm the meaning of "regress", does it mean it's 

> a regression from design view?  Is it specific to rotatert and its 

> related one like vrotr? 


You will get HAVE_rotatert defined in insn-config.h if you do this patch,
and then simplify-rtx.c will not work correctly, generating rotatert by
an immediate, which we have no instructions for.

This might be masked because many of our rl*.c tests already fail because
of other changes, I should fix that :-/


Segher
Kewen.Lin July 19, 2019, 2:21 a.m. | #7
on 2019/7/19 上午3:48, Segher Boessenkool wrote:
> On Thu, Jul 18, 2019 at 01:44:36PM +0800, Kewen.Lin wrote:

>> Hi Segher,

>>

>> on 2019/7/17 下午9:40, Segher Boessenkool wrote:

>>> Hi Kewen,

>>>

>>> On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:

>>>> Regression testing just launched, is it OK for trunk if it's bootstrapped

>>>> and regresstested on powerpc64le-unknown-linux-gnu?

>>>

>>>> +;; Expanders for rotatert to make use of vrotl

>>>> +(define_expand "vrotr<mode>3"

>>>> +  [(set (match_operand:VEC_I 0 "vint_operand")

>>>> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")

>>>> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]

>>>

>>> Having any rotatert in a define_expand or define_insn will regress.

>>>

>>> So, nope, sorry.

>>

>> Thanks for clarifying!  Since regression testing passed on powerpc64le,I'd like to double confirm the meaning of "regress", does it mean it's 

>> a regression from design view?  Is it specific to rotatert and its 

>> related one like vrotr? 

> 

> You will get HAVE_rotatert defined in insn-config.h if you do this patch,

> and then simplify-rtx.c will not work correctly, generating rotatert by

> an immediate, which we have no instructions for.

> 

> This might be masked because many of our rl*.c tests already fail because

> of other changes, I should fix that :-/

> 


Hi Segher,

Thanks for further explanation!  Sorry that, but I didn't find this 
HAVE_rotatert definition.  I guess it's due to the preparation is always 
"DONE"?  Then it doesn't really generate rotatert. 
although I can see rotatert in insn like below, it seems fine in note?

(insn 10 9 11 4 (set (reg:V4SI 122 [ vect__2.7 ])
        (rotate:V4SI (reg:V4SI 121 [ vect__1.6 ])
            (reg:V4SI 124))) "t.c":17:28 1596 {*altivec_vrlw}
     (expr_list:REG_EQUAL (rotatert:V4SI (reg:V4SI 121 [ vect__1.6 ])
            (const_vector:V4SI [
                    (const_int 8 [0x8]) repeated x4
                ]))
        (nil)))


Thanks,
Kewen
Segher Boessenkool July 19, 2019, 3:06 p.m. | #8
Hi!

On Fri, Jul 19, 2019 at 10:21:06AM +0800, Kewen.Lin wrote:
> on 2019/7/19 上午3:48, Segher Boessenkool wrote:

> > On Thu, Jul 18, 2019 at 01:44:36PM +0800, Kewen.Lin wrote:

> >> on 2019/7/17 下午9:40, Segher Boessenkool wrote:

> >>> On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:

> >>>> Regression testing just launched, is it OK for trunk if it's bootstrapped

> >>>> and regresstested on powerpc64le-unknown-linux-gnu?

> >>>

> >>>> +;; Expanders for rotatert to make use of vrotl

> >>>> +(define_expand "vrotr<mode>3"

> >>>> +  [(set (match_operand:VEC_I 0 "vint_operand")

> >>>> +	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")

> >>>> +		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]

> >>>

> >>> Having any rotatert in a define_expand or define_insn will regress.


This is wrong.  I don't know why I thought this for a while.

There shouldn't be any rotatert in anything that goes through recog, but
that is everything *except* define_expand.  So define_insn, define_split,
define_peephole, define_peephole2 (and define_insn_and_split, which is
just syntactic sugar).

> Thanks for further explanation!  Sorry that, but I didn't find this 

> HAVE_rotatert definition.  I guess it's due to the preparation is always 

> "DONE"?  Then it doesn't really generate rotatert. 


You only had one in a define_expand.  That is fine, that pattern is never
recognised against.  HAVE_rotatert means that something somewhere will
recognise rotatert RTL insns; if it isn't set, it doesn't make sense to
ever create them, because they will never match.

> although I can see rotatert in insn like below, it seems fine in note?


Sure, many things are allowed in notes that can never show up in RTL
proper.

So, this approach will work fine, and not be too bad.  Could you do a
new patch with it?  It's simple to do, and even if the generic thing
will happen eventually, this is a nice stepping stone for that.

Thanks, and sorry for the confusion,


Segher

Patch

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 8ca98299950..c4c74630d26 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -163,6 +163,17 @@ 
   return VINT_REGNO_P (REGNO (op));
 })
 
+;; Return 1 if op is a vector register that operates on integer vectors
+;; or if op is a const vector with integer vector modes.
+(define_predicate "vint_reg_or_const_vector"
+  (match_code "reg,subreg,const_vector")
+{
+  if (GET_CODE (op) == CONST_VECTOR && GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    return 1;
+
+  return vint_operand (op, mode);
+})
+
 ;; Return 1 if op is a vector register to do logical operations on (and, or,
 ;; xor, etc.)
 (define_predicate "vlogical_operand"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 70bcfe02e22..5c6a344e452 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -1260,6 +1260,32 @@ 
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
   "")
 
+;; Expanders for rotatert to make use of vrotl
+(define_expand "vrotr<mode>3"
+  [(set (match_operand:VEC_I 0 "vint_operand")
+	(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
+		      (match_operand:VEC_I 2 "vint_reg_or_const_vector")))]
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (<MODE>mode)"
+{
+  machine_mode inner_mode = GET_MODE_INNER (<MODE>mode);
+  unsigned int bits = GET_MODE_PRECISION (inner_mode);
+  rtx imm_vec = gen_const_vec_duplicate (<MODE>mode, GEN_INT (bits));
+  rtx rot_count = gen_reg_rtx (<MODE>mode);
+  if (GET_CODE (operands[2]) == CONST_VECTOR)
+    {
+      imm_vec = simplify_const_binary_operation (MINUS, <MODE>mode, imm_vec,
+						 operands[2]);
+      rot_count = force_reg (<MODE>mode, imm_vec);
+    }
+  else
+    {
+      rtx imm_reg = force_reg (<MODE>mode, imm_vec);
+      emit_insn (gen_sub<mode>3 (rot_count, imm_reg, operands[2]));
+    }
+  emit_insn (gen_vrotl<mode>3 (operands[0], operands[1], rot_count));
+  DONE;
+})
+
 ;; Expanders for arithmetic shift left on each vector element
 (define_expand "vashl<mode>3"
   [(set (match_operand:VEC_I 0 "vint_operand")
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
new file mode 100644
index 00000000000..80aca1a94a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-1.c
@@ -0,0 +1,46 @@ 
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count is const number.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> 8) | (sud[i] << (sizeof (sud[0]) * 8 - 8));
+}
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> 8) | (suw[i] << (sizeof (suw[0]) * 8 - 8));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> 9)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - 9));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> 5)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - 5));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} } } */
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
new file mode 100644
index 00000000000..affda6c023b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_rotate-2.c
@@ -0,0 +1,47 @@ 
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+/* Check vectorizer can exploit vector rotation instructions on Power, mainly
+   for the case rotation count isn't const number.  */
+
+#define N 256
+unsigned long long sud[N], rud[N];
+unsigned int suw[N], ruw[N];
+unsigned short suh[N], ruh[N];
+unsigned char sub[N], rub[N];
+extern unsigned char rot_cnt;
+
+void
+testULL ()
+{
+  for (int i = 0; i < 256; ++i)
+    rud[i] = (sud[i] >> rot_cnt) | (sud[i] << (sizeof (sud[0]) * 8 - rot_cnt));
+}
+
+void
+testUW ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruw[i] = (suw[i] >> rot_cnt) | (suw[i] << (sizeof (suw[0]) * 8 - rot_cnt));
+}
+
+void
+testUH ()
+{
+  for (int i = 0; i < 256; ++i)
+    ruh[i] = (unsigned short) (suh[i] >> rot_cnt)
+	     | (unsigned short) (suh[i] << (sizeof (suh[0]) * 8 - rot_cnt));
+}
+
+void
+testUB ()
+{
+  for (int i = 0; i < 256; ++i)
+    rub[i] = (unsigned char) (sub[i] >> rot_cnt)
+	     | (unsigned char) (sub[i] << (sizeof (sub[0]) * 8 - rot_cnt));
+}
+
+/* { dg-final { scan-assembler {\mvrld\M} } } */
+/* { dg-final { scan-assembler {\mvrlw\M} } } */
+/* { dg-final { scan-assembler {\mvrlh\M} } } */
+/* { dg-final { scan-assembler {\mvrlb\M} } } */