[10/25] Convert BImode vectors.

Message ID 98eaa10bbdf6dd8d9142362184e23bc90c5d612f.1536144068.git.ams@codesourcery.com
State New
Headers show
Series
  • AMD GCN Port
Related show

Commit Message

Andrew Stubbs Sept. 5, 2018, 11:50 a.m.
GCN uses V64BImode to represent vector masks in the middle-end, and DImode
bit-masks to represent them in the back-end.  These must be converted at expand
time and the most convenient way is to simply use a SUBREG.

This works fine except that simplify_subreg needs to be able to convert
immediates, mostly for REG_EQUAL and REG_EQUIV, and currently does not know how
to convert vectors to integers where there is more than one element per byte.

This patch implements such conversions for the cases that we need.

I don't know why this is not a problem for other targets that use BImode
vectors, such as ARM SVE, so it's possible I missed some magic somewhere?

2018-09-05  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* simplify-rtx.c (convert_packed_vector): New function.
	(simplify_immed_subreg): Recognised Boolean vectors and call
	convert_packed_vector.
---
 gcc/simplify-rtx.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

Comments

Jakub Jelinek Sept. 5, 2018, 11:56 a.m. | #1
On Wed, Sep 05, 2018 at 12:50:25PM +0100, ams@codesourcery.com wrote:
> 2018-09-05  Andrew Stubbs  <ams@codesourcery.com>

> 

> 	gcc/

> 	* simplify-rtx.c (convert_packed_vector): New function.

> 	(simplify_immed_subreg): Recognised Boolean vectors and call

> 	convert_packed_vector.

> ---


> +      int elem_bitsize = (GET_MODE_SIZE (from_mode).to_constant()


Further formatting nits, no space before (.

> +			  * BITS_PER_UNIT) / num_elem;

> +      int elem_mask = (1 << elem_bitsize) - 1;

> +      HOST_WIDE_INT subreg_mask =


= at the end of line.

> +	(sizeof (HOST_WIDE_INT) == GET_MODE_SIZE (to_mode)

> +	 ? -1

> +	 : (((HOST_WIDE_INT)1 << (GET_MODE_SIZE (to_mode) * BITS_PER_UNIT))

> +	    - 1));

> +  /* Vectors with multiple elements per byte are a special case.  */


> +  if ((VECTOR_MODE_P (innermode)

> +       && ((GET_MODE_NUNITS (innermode).to_constant()

> +	    / GET_MODE_SIZE(innermode).to_constant()) > 1))


Missing spaces before ( several times.

	Jakub
Richard Biener Sept. 5, 2018, 12:05 p.m. | #2
On Wed, Sep 5, 2018 at 1:51 PM <ams@codesourcery.com> wrote:
>

>

> GCN uses V64BImode to represent vector masks in the middle-end, and DImode

> bit-masks to represent them in the back-end.  These must be converted at expand

> time and the most convenient way is to simply use a SUBREG.


x86 with AVX512 uses SImode in the middle-end as well via the get_mask_mode
vectorization target hook.  Maybe you can avoid another special-case
by piggy-backing on
that?

> This works fine except that simplify_subreg needs to be able to convert

> immediates, mostly for REG_EQUAL and REG_EQUIV, and currently does not know how

> to convert vectors to integers where there is more than one element per byte.

>

> This patch implements such conversions for the cases that we need.

>

> I don't know why this is not a problem for other targets that use BImode

> vectors, such as ARM SVE, so it's possible I missed some magic somewhere?

>

> 2018-09-05  Andrew Stubbs  <ams@codesourcery.com>

>

>         gcc/

>         * simplify-rtx.c (convert_packed_vector): New function.

>         (simplify_immed_subreg): Recognised Boolean vectors and call

>         convert_packed_vector.

> ---

>  gcc/simplify-rtx.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++

>  1 file changed, 76 insertions(+)

>
Andrew Stubbs Sept. 5, 2018, 12:40 p.m. | #3
On 05/09/18 13:05, Richard Biener wrote:
> On Wed, Sep 5, 2018 at 1:51 PM <ams@codesourcery.com> wrote:

>>

>>

>> GCN uses V64BImode to represent vector masks in the middle-end, and DImode

>> bit-masks to represent them in the back-end.  These must be converted at expand

>> time and the most convenient way is to simply use a SUBREG.

> 

> x86 with AVX512 uses SImode in the middle-end as well via the get_mask_mode

> vectorization target hook.  Maybe you can avoid another special-case

> by piggy-backing on

> that?


That's exactly what I wanted to do, but I found that returning 
non-vector modes ran into trouble further down the road.  I don't recall 
the exact details now, but there were assertion failures and failures to 
vectorize.

That was in a GCC 8 codebase though, so is the AVX thing a recent change?

Andrew
Richard Biener Sept. 5, 2018, 12:43 p.m. | #4
On Wed, Sep 5, 2018 at 2:40 PM Andrew Stubbs <andrew_stubbs@mentor.com> wrote:
>

> On 05/09/18 13:05, Richard Biener wrote:

> > On Wed, Sep 5, 2018 at 1:51 PM <ams@codesourcery.com> wrote:

> >>

> >>

> >> GCN uses V64BImode to represent vector masks in the middle-end, and DImode

> >> bit-masks to represent them in the back-end.  These must be converted at expand

> >> time and the most convenient way is to simply use a SUBREG.

> >

> > x86 with AVX512 uses SImode in the middle-end as well via the get_mask_mode

> > vectorization target hook.  Maybe you can avoid another special-case

> > by piggy-backing on

> > that?

>

> That's exactly what I wanted to do, but I found that returning

> non-vector modes ran into trouble further down the road.  I don't recall

> the exact details now, but there were assertion failures and failures to

> vectorize.

>

> That was in a GCC 8 codebase though, so is the AVX thing a recent change?


No.  You might want to look into the x86 backend if there's maybe more tweaks
needed when using non-vector mask modes.

Richard.

> Andrew
Andrew Stubbs Sept. 11, 2018, 2:36 p.m. | #5
On 05/09/18 13:43, Richard Biener wrote:
> No.  You might want to look into the x86 backend if there's maybe more tweaks

> needed when using non-vector mask modes.


I tracked it down to the vector alignment configuration.

Apparently the vectorizer likes to build a "truth" vector, but is 
perfectly happy to put it in a non-vector mode. Unfortunately that 
causes TARGET_VECTOR_ALIGNMENT to be called with the non-vector mode, 
which wasn't handled correctly.

I'm testing to see what happens with the reg_equal and reg_equiv 
conversions, but we might be able to drop this patch.

Andrew
Richard Biener Sept. 12, 2018, 2:36 p.m. | #6
On Tue, Sep 11, 2018 at 4:36 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>

> On 05/09/18 13:43, Richard Biener wrote:

> > No.  You might want to look into the x86 backend if there's maybe more tweaks

> > needed when using non-vector mask modes.

>

> I tracked it down to the vector alignment configuration.

>

> Apparently the vectorizer likes to build a "truth" vector, but is

> perfectly happy to put it in a non-vector mode. Unfortunately that

> causes TARGET_VECTOR_ALIGNMENT to be called with the non-vector mode,

> which wasn't handled correctly.

>

> I'm testing to see what happens with the reg_equal and reg_equiv

> conversions, but we might be able to drop this patch.


That's good news!

> Andrew
Richard Sandiford Sept. 17, 2018, 8:46 a.m. | #7
<ams@codesourcery.com> writes:
> GCN uses V64BImode to represent vector masks in the middle-end, and DImode

> bit-masks to represent them in the back-end.  These must be converted at expand

> time and the most convenient way is to simply use a SUBREG.

>

> This works fine except that simplify_subreg needs to be able to convert

> immediates, mostly for REG_EQUAL and REG_EQUIV, and currently does not know how

> to convert vectors to integers where there is more than one element per byte.

>

> This patch implements such conversions for the cases that we need.

>

> I don't know why this is not a problem for other targets that use BImode

> vectors, such as ARM SVE, so it's possible I missed some magic somewhere?


FWIW, SVE never converts predicates to integers: they stay as V..BImode.

Richard

Patch

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index b4c6883..89487f2 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5976,6 +5976,73 @@  simplify_ternary_operation (enum rtx_code code, machine_mode mode,
   return 0;
 }
 
+/* Convert a CONST_INT to a CONST_VECTOR, or vice versa.
+
+   This should only occur for VECTOR_BOOL_MODE types, so the semantics
+   specified by that are assumed.  In particular, the lowest value is
+   in the first byte.  */
+
+static rtx
+convert_packed_vector (fixed_size_mode to_mode, rtx op,
+		       machine_mode from_mode, unsigned int byte,
+		       unsigned int first_elem, unsigned int inner_bytes)
+{
+  /* Sizes greater than HOST_WIDE_INT would need a better implementation.  */
+  gcc_assert (GET_MODE_SIZE (to_mode) <= sizeof (HOST_WIDE_INT));
+
+  if (GET_CODE (op) == CONST_VECTOR)
+    {
+      gcc_assert (!VECTOR_MODE_P (to_mode));
+
+      int num_elem = GET_MODE_NUNITS (from_mode).to_constant();
+      int elem_bitsize = (GET_MODE_SIZE (from_mode).to_constant()
+			  * BITS_PER_UNIT) / num_elem;
+      int elem_mask = (1 << elem_bitsize) - 1;
+      HOST_WIDE_INT subreg_mask =
+	(sizeof (HOST_WIDE_INT) == GET_MODE_SIZE (to_mode)
+	 ? -1
+	 : (((HOST_WIDE_INT)1 << (GET_MODE_SIZE (to_mode) * BITS_PER_UNIT))
+	    - 1));
+
+      HOST_WIDE_INT val = 0;
+      for (int i = 0; i < num_elem; i++)
+	val |= ((INTVAL (CONST_VECTOR_ELT (op, i)) & elem_mask)
+		<< (i * elem_bitsize));
+
+      val >>= byte * BITS_PER_UNIT;
+      val &= subreg_mask;
+
+      return gen_rtx_CONST_INT (VOIDmode, val);
+    }
+  else if (GET_CODE (op) == CONST_INT)
+    {
+      /* Subregs of a vector not implemented yet.  */
+      gcc_assert (maybe_eq (GET_MODE_SIZE (to_mode),
+			    GET_MODE_SIZE (from_mode)));
+
+      gcc_assert (VECTOR_MODE_P (to_mode));
+
+      int num_elem = GET_MODE_NUNITS (to_mode);
+      int elem_bitsize = (GET_MODE_SIZE (to_mode) * BITS_PER_UNIT) / num_elem;
+      int elem_mask = (1 << elem_bitsize) - 1;
+
+      rtvec val = rtvec_alloc (num_elem);
+      rtx *elem = &RTVEC_ELT (val, 0);
+
+      for (int i = 0; i < num_elem; i++)
+	elem[i] = gen_rtx_CONST_INT (VOIDmode,
+				     (INTVAL (op) >> (i * elem_bitsize))
+				     & elem_mask);
+
+      return gen_rtx_CONST_VECTOR (to_mode, val);
+    }
+  else
+    {
+      gcc_unreachable ();
+      return op;
+    }
+}
+
 /* Evaluate a SUBREG of a CONST_INT or CONST_WIDE_INT or CONST_DOUBLE
    or CONST_FIXED or CONST_VECTOR, returning another CONST_INT or
    CONST_WIDE_INT or CONST_DOUBLE or CONST_FIXED or CONST_VECTOR.
@@ -6017,6 +6084,15 @@  simplify_immed_subreg (fixed_size_mode outermode, rtx op,
   if (COMPLEX_MODE_P (outermode))
     return NULL_RTX;
 
+  /* Vectors with multiple elements per byte are a special case.  */
+  if ((VECTOR_MODE_P (innermode)
+       && ((GET_MODE_NUNITS (innermode).to_constant()
+	    / GET_MODE_SIZE(innermode).to_constant()) > 1))
+      || (VECTOR_MODE_P (outermode)
+	  && (GET_MODE_NUNITS (outermode) / GET_MODE_SIZE(outermode) > 1)))
+    return convert_packed_vector (outermode, op, innermode, byte, first_elem,
+				  inner_bytes);
+
   /* We support any size mode.  */
   max_bitsize = MAX (GET_MODE_BITSIZE (outermode),
 		     inner_bytes * BITS_PER_UNIT);