Make the vectoriser drop to strided accesses for stores with gaps

Message ID 87effy2n0g.fsf@arm.com
State New
Headers show
Series
  • Make the vectoriser drop to strided accesses for stores with gaps
Related show

Commit Message

Richard Sandiford July 20, 2018, 10:57 a.m.
We could vectorise:

     for (...)
       {
         a[0] = ...;
         a[1] = ...;
         a[2] = ...;
         a[3] = ...;
         a += stride;
       }

(including the case when stride == 8) but not:

     for (...)
       {
         a[0] = ...;
         a[1] = ...;
         a[2] = ...;
         a[3] = ...;
         a += 8;
       }

(where the stride is always 8).  The former was treated as a "grouped
and strided" store, while the latter was treated as grouped store with
gaps, which we don't support.

This patch makes us treat groups of stores with gaps at the end as
strided groups too.  I tried to go through all uses of STMT_VINFO_STRIDED_P
and all vector uses of DR_STEP to see whether there were any hard-baked
assumptions, but couldn't see any.  I wondered whether we should relax:

  /* We do not have to consider dependences between accesses that belong
     to the same group, unless the stride could be smaller than the
     group size.  */
  if (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)
      && (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)
          == DR_GROUP_FIRST_ELEMENT (stmtinfo_b))
      && !STMT_VINFO_STRIDED_P (stmtinfo_a))
    return false;

for cases in which the step is constant and the absolute step is known
to be greater than the group size, but data dependence analysis should
already return chrec_known for those cases.

The new test is a version of vect-avg-15.c with the variable step
replaced by a constant one.

A natural follow-on would be to do the same for groups with gaps in
the middle:

          /* Check that the distance between two accesses is equal to the type
             size. Otherwise, we have gaps.  */
          diff = (TREE_INT_CST_LOW (DR_INIT (data_ref))
                  - TREE_INT_CST_LOW (prev_init)) / type_size;
          if (diff != 1)
            {
              [...]
              if (DR_IS_WRITE (data_ref))
                {
                  if (dump_enabled_p ())
                    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                                     "interleaved store with gaps\n");
                  return false;
                }

But I think we should do that separately and see what the fallout
from this change is first.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-07-20  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vect-data-refs.c (vect_analyze_group_access_1): Convert
	grouped stores with gaps to a strided group.

gcc/testsuite/
	* gcc.dg/vect/vect-avg-16.c: New test.
	* gcc.dg/vect/slp-37.c: Expect the loop to be vectorized.
	* gcc.dg/vect/vect-strided-u8-i8-gap4.c,
	* gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c: Likewise for
	the second loop in main1.

Comments

Richard Biener July 20, 2018, 11:08 a.m. | #1
On Fri, Jul 20, 2018 at 12:57 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>

> We could vectorise:

>

>      for (...)

>        {

>          a[0] = ...;

>          a[1] = ...;

>          a[2] = ...;

>          a[3] = ...;

>          a += stride;

>        }

>

> (including the case when stride == 8) but not:

>

>      for (...)

>        {

>          a[0] = ...;

>          a[1] = ...;

>          a[2] = ...;

>          a[3] = ...;

>          a += 8;

>        }

>

> (where the stride is always 8).  The former was treated as a "grouped

> and strided" store, while the latter was treated as grouped store with

> gaps, which we don't support.

>

> This patch makes us treat groups of stores with gaps at the end as

> strided groups too.  I tried to go through all uses of STMT_VINFO_STRIDED_P

> and all vector uses of DR_STEP to see whether there were any hard-baked

> assumptions, but couldn't see any.  I wondered whether we should relax:

>

>   /* We do not have to consider dependences between accesses that belong

>      to the same group, unless the stride could be smaller than the

>      group size.  */

>   if (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)

>       && (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)

>           == DR_GROUP_FIRST_ELEMENT (stmtinfo_b))

>       && !STMT_VINFO_STRIDED_P (stmtinfo_a))

>     return false;

>

> for cases in which the step is constant and the absolute step is known

> to be greater than the group size, but data dependence analysis should

> already return chrec_known for those cases.

>

> The new test is a version of vect-avg-15.c with the variable step

> replaced by a constant one.

>

> A natural follow-on would be to do the same for groups with gaps in

> the middle:

>

>           /* Check that the distance between two accesses is equal to the type

>              size. Otherwise, we have gaps.  */

>           diff = (TREE_INT_CST_LOW (DR_INIT (data_ref))

>                   - TREE_INT_CST_LOW (prev_init)) / type_size;

>           if (diff != 1)

>             {

>               [...]

>               if (DR_IS_WRITE (data_ref))

>                 {

>                   if (dump_enabled_p ())

>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>                                      "interleaved store with gaps\n");

>                   return false;

>                 }

>

> But I think we should do that separately and see what the fallout

> from this change is first.


Agreed.

> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf

> and x86_64-linux-gnu.  OK to install?


Do you need to set STMT_VINFO_STRIDED_P on all stmts?  I think
it is enough to set it on the first group element.

OK otherwise.
Thanks,
Richard.

> Richard

>

>

> 2018-07-20  Richard Sandiford  <richard.sandiford@arm.com>

>

> gcc/

>         * tree-vect-data-refs.c (vect_analyze_group_access_1): Convert

>         grouped stores with gaps to a strided group.

>

> gcc/testsuite/

>         * gcc.dg/vect/vect-avg-16.c: New test.

>         * gcc.dg/vect/slp-37.c: Expect the loop to be vectorized.

>         * gcc.dg/vect/vect-strided-u8-i8-gap4.c,

>         * gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c: Likewise for

>         the second loop in main1.

>

> Index: gcc/tree-vect-data-refs.c

> ===================================================================

> --- gcc/tree-vect-data-refs.c   2018-06-30 13:44:38.567611988 +0100

> +++ gcc/tree-vect-data-refs.c   2018-07-20 11:55:22.570911497 +0100

> @@ -2632,10 +2632,14 @@ vect_analyze_group_access_1 (struct data

>        if (groupsize != count

>           && !DR_IS_READ (dr))

>          {

> -         if (dump_enabled_p ())

> -           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> -                            "interleaved store with gaps\n");

> -         return false;

> +         groupsize = count;

> +         STMT_VINFO_STRIDED_P (stmt_info) = true;

> +         stmt_vec_info next_info = stmt_info;

> +         while (DR_GROUP_NEXT_ELEMENT (next_info))

> +           {

> +             next_info = vinfo_for_stmt (DR_GROUP_NEXT_ELEMENT (next_info));

> +             STMT_VINFO_STRIDED_P (next_info) = 1;

> +           }

>         }

>

>        /* If there is a gap after the last load in the group it is the

> @@ -2651,6 +2655,8 @@ vect_analyze_group_access_1 (struct data

>                            "Detected interleaving ");

>           if (DR_IS_READ (dr))

>             dump_printf (MSG_NOTE, "load ");

> +         else if (STMT_VINFO_STRIDED_P (stmt_info))

> +           dump_printf (MSG_NOTE, "strided store ");

>           else

>             dump_printf (MSG_NOTE, "store ");

>           dump_printf (MSG_NOTE, "of size %u starting with ",

> Index: gcc/testsuite/gcc.dg/vect/vect-avg-16.c

> ===================================================================

> --- /dev/null   2018-07-09 14:52:09.234750850 +0100

> +++ gcc/testsuite/gcc.dg/vect/vect-avg-16.c     2018-07-20 11:55:22.570911497 +0100

> @@ -0,0 +1,52 @@

> +/* { dg-additional-options "-O3" } */

> +/* { dg-require-effective-target vect_int } */

> +

> +#include "tree-vect.h"

> +

> +#define N 80

> +

> +void __attribute__ ((noipa))

> +f (signed char *restrict a, signed char *restrict b,

> +   signed char *restrict c, int n)

> +{

> +  for (int j = 0; j < n; ++j)

> +    {

> +      for (int i = 0; i < 16; ++i)

> +       a[i] = (b[i] + c[i]) >> 1;

> +      a += 20;

> +      b += 20;

> +      c += 20;

> +    }

> +}

> +

> +#define BASE1 -126

> +#define BASE2 -42

> +

> +signed char a[N], b[N], c[N];

> +

> +int

> +main (void)

> +{

> +  check_vect ();

> +

> +  for (int i = 0; i < N; ++i)

> +    {

> +      a[i] = i;

> +      b[i] = BASE1 + i * 3;

> +      c[i] = BASE2 + i * 2;

> +      asm volatile ("" ::: "memory");

> +    }

> +  f (a, b, c, N / 20);

> +  for (int i = 0; i < N; ++i)

> +    {

> +      int d = (BASE1 + BASE2 + i * 5) >> 1;

> +      if (a[i] != (i % 20 < 16 ? d : i))

> +       __builtin_abort ();

> +    }

> +  return 0;

> +}

> +

> +/* { dg-final { scan-tree-dump "vect_recog_average_pattern: detected" "vect" } } */

> +/* { dg-final { scan-tree-dump {\.AVG_FLOOR} "vect" { target vect_avg_qi } } } */

> +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target vect_avg_qi } } } */

> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_avg_qi } } } */

> Index: gcc/testsuite/gcc.dg/vect/slp-37.c

> ===================================================================

> --- gcc/testsuite/gcc.dg/vect/slp-37.c  2018-05-02 08:37:49.033604262 +0100

> +++ gcc/testsuite/gcc.dg/vect/slp-37.c  2018-07-20 11:55:22.570911497 +0100

> @@ -17,8 +17,8 @@ foo1 (s1 *arr)

>    int i;

>    s1 *ptr = arr;

>

> -  /* Different constant types - not SLPable.  The group size is not power of 2,

> -     interleaving is not supported either.  */

> +  /* Vectorized as a strided SLP pair of accesses to <a, b> and a single

> +     strided access to c.  */

>    for (i = 0; i < N; i++)

>      {

>        ptr->a = 6;

> @@ -58,6 +58,5 @@ int main (void)

>    return 0;

>  }

>

> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect"  } } */

> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  } } */

> -

> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */

> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  } } */

> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c

> ===================================================================

> --- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c 2018-05-02 08:37:49.021604375 +0100

> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c 2018-07-20 11:55:22.570911497 +0100

> @@ -53,7 +53,7 @@ main1 (s *arr)

>     }

>

>    ptr = arr;

> -  /* Not vectorizable: gap in store. */

> +  /* Vectorized as a strided SLP pair.  */

>    for (i = 0; i < N; i++)

>      {

>        res[i].a = ptr->b;

> @@ -97,5 +97,4 @@ int main (void)

>    return 0;

>  }

>

> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */

> -

> +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_strided8 } } } */

> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c

> ===================================================================

> --- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c       2018-05-02 08:37:49.013604450 +0100

> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c       2018-07-20 11:55:22.570911497 +0100

> @@ -55,7 +55,7 @@ main1 (s *arr)

>     }

>

>    ptr = arr;

> -  /* Not vectorizable: gap in store.  */

> +  /* Vectorized as a strided SLP pair.  */

>    for (i = 0; i < N; i++)

>      {

>        res[i].a = ptr->b;

> @@ -110,5 +110,4 @@ int main (void)

>    return 0;

>  }

>

> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */

> -

> +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_strided8 } } } */
Richard Sandiford July 20, 2018, 11:15 a.m. | #2
Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Jul 20, 2018 at 12:57 PM Richard Sandiford

> <richard.sandiford@arm.com> wrote:

>>

>> We could vectorise:

>>

>>      for (...)

>>        {

>>          a[0] = ...;

>>          a[1] = ...;

>>          a[2] = ...;

>>          a[3] = ...;

>>          a += stride;

>>        }

>>

>> (including the case when stride == 8) but not:

>>

>>      for (...)

>>        {

>>          a[0] = ...;

>>          a[1] = ...;

>>          a[2] = ...;

>>          a[3] = ...;

>>          a += 8;

>>        }

>>

>> (where the stride is always 8).  The former was treated as a "grouped

>> and strided" store, while the latter was treated as grouped store with

>> gaps, which we don't support.

>>

>> This patch makes us treat groups of stores with gaps at the end as

>> strided groups too.  I tried to go through all uses of STMT_VINFO_STRIDED_P

>> and all vector uses of DR_STEP to see whether there were any hard-baked

>> assumptions, but couldn't see any.  I wondered whether we should relax:

>>

>>   /* We do not have to consider dependences between accesses that belong

>>      to the same group, unless the stride could be smaller than the

>>      group size.  */

>>   if (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)

>>       && (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)

>>           == DR_GROUP_FIRST_ELEMENT (stmtinfo_b))

>>       && !STMT_VINFO_STRIDED_P (stmtinfo_a))

>>     return false;

>>

>> for cases in which the step is constant and the absolute step is known

>> to be greater than the group size, but data dependence analysis should

>> already return chrec_known for those cases.

>>

>> The new test is a version of vect-avg-15.c with the variable step

>> replaced by a constant one.

>>

>> A natural follow-on would be to do the same for groups with gaps in

>> the middle:

>>

>>           /* Check that the distance between two accesses is equal to the type

>>              size. Otherwise, we have gaps.  */

>>           diff = (TREE_INT_CST_LOW (DR_INIT (data_ref))

>>                   - TREE_INT_CST_LOW (prev_init)) / type_size;

>>           if (diff != 1)

>>             {

>>               [...]

>>               if (DR_IS_WRITE (data_ref))

>>                 {

>>                   if (dump_enabled_p ())

>>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>>                                      "interleaved store with gaps\n");

>>                   return false;

>>                 }

>>

>> But I think we should do that separately and see what the fallout

>> from this change is first.

>

> Agreed.

>

>> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf

>> and x86_64-linux-gnu.  OK to install?

>

> Do you need to set STMT_VINFO_STRIDED_P on all stmts?  I think

> it is enough to set it on the first group element.


get_load_store_type tests STMT_VINFO_STRIDED_P on the stmt it's
given rather than the first element, but I guess that's my bug...

I'm hoping this weekend to finally finish off the series to get
rid of vinfo_for_stmt, which will make it slightly easier to do this.

Thanks,
Richard

> OK otherwise.

> Thanks,

> Richard.

>

>> Richard

>>

>>

>> 2018-07-20  Richard Sandiford  <richard.sandiford@arm.com>

>>

>> gcc/

>>         * tree-vect-data-refs.c (vect_analyze_group_access_1): Convert

>>         grouped stores with gaps to a strided group.

>>

>> gcc/testsuite/

>>         * gcc.dg/vect/vect-avg-16.c: New test.

>>         * gcc.dg/vect/slp-37.c: Expect the loop to be vectorized.

>>         * gcc.dg/vect/vect-strided-u8-i8-gap4.c,

>>         * gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c: Likewise for

>>         the second loop in main1.

>>

>> Index: gcc/tree-vect-data-refs.c

>> ===================================================================

>> --- gcc/tree-vect-data-refs.c   2018-06-30 13:44:38.567611988 +0100

>> +++ gcc/tree-vect-data-refs.c   2018-07-20 11:55:22.570911497 +0100

>> @@ -2632,10 +2632,14 @@ vect_analyze_group_access_1 (struct data

>>        if (groupsize != count

>>           && !DR_IS_READ (dr))

>>          {

>> -         if (dump_enabled_p ())

>> -           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>> -                            "interleaved store with gaps\n");

>> -         return false;

>> +         groupsize = count;

>> +         STMT_VINFO_STRIDED_P (stmt_info) = true;

>> +         stmt_vec_info next_info = stmt_info;

>> +         while (DR_GROUP_NEXT_ELEMENT (next_info))

>> +           {

>> +             next_info = vinfo_for_stmt (DR_GROUP_NEXT_ELEMENT (next_info));

>> +             STMT_VINFO_STRIDED_P (next_info) = 1;

>> +           }

>>         }

>>

>>        /* If there is a gap after the last load in the group it is the

>> @@ -2651,6 +2655,8 @@ vect_analyze_group_access_1 (struct data

>>                            "Detected interleaving ");

>>           if (DR_IS_READ (dr))

>>             dump_printf (MSG_NOTE, "load ");

>> +         else if (STMT_VINFO_STRIDED_P (stmt_info))

>> +           dump_printf (MSG_NOTE, "strided store ");

>>           else

>>             dump_printf (MSG_NOTE, "store ");

>>           dump_printf (MSG_NOTE, "of size %u starting with ",

>> Index: gcc/testsuite/gcc.dg/vect/vect-avg-16.c

>> ===================================================================

>> --- /dev/null   2018-07-09 14:52:09.234750850 +0100

>> +++ gcc/testsuite/gcc.dg/vect/vect-avg-16.c     2018-07-20 11:55:22.570911497 +0100

>> @@ -0,0 +1,52 @@

>> +/* { dg-additional-options "-O3" } */

>> +/* { dg-require-effective-target vect_int } */

>> +

>> +#include "tree-vect.h"

>> +

>> +#define N 80

>> +

>> +void __attribute__ ((noipa))

>> +f (signed char *restrict a, signed char *restrict b,

>> +   signed char *restrict c, int n)

>> +{

>> +  for (int j = 0; j < n; ++j)

>> +    {

>> +      for (int i = 0; i < 16; ++i)

>> +       a[i] = (b[i] + c[i]) >> 1;

>> +      a += 20;

>> +      b += 20;

>> +      c += 20;

>> +    }

>> +}

>> +

>> +#define BASE1 -126

>> +#define BASE2 -42

>> +

>> +signed char a[N], b[N], c[N];

>> +

>> +int

>> +main (void)

>> +{

>> +  check_vect ();

>> +

>> +  for (int i = 0; i < N; ++i)

>> +    {

>> +      a[i] = i;

>> +      b[i] = BASE1 + i * 3;

>> +      c[i] = BASE2 + i * 2;

>> +      asm volatile ("" ::: "memory");

>> +    }

>> +  f (a, b, c, N / 20);

>> +  for (int i = 0; i < N; ++i)

>> +    {

>> +      int d = (BASE1 + BASE2 + i * 5) >> 1;

>> +      if (a[i] != (i % 20 < 16 ? d : i))

>> +       __builtin_abort ();

>> +    }

>> +  return 0;

>> +}

>> +

>> +/* { dg-final { scan-tree-dump "vect_recog_average_pattern: detected" "vect" } } */

>> +/* { dg-final { scan-tree-dump {\.AVG_FLOOR} "vect" { target vect_avg_qi } } } */

>> +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target vect_avg_qi } } } */

>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_avg_qi } } } */

>> Index: gcc/testsuite/gcc.dg/vect/slp-37.c

>> ===================================================================

>> --- gcc/testsuite/gcc.dg/vect/slp-37.c  2018-05-02 08:37:49.033604262 +0100

>> +++ gcc/testsuite/gcc.dg/vect/slp-37.c  2018-07-20 11:55:22.570911497 +0100

>> @@ -17,8 +17,8 @@ foo1 (s1 *arr)

>>    int i;

>>    s1 *ptr = arr;

>>

>> -  /* Different constant types - not SLPable.  The group size is not power of 2,

>> -     interleaving is not supported either.  */

>> +  /* Vectorized as a strided SLP pair of accesses to <a, b> and a single

>> +     strided access to c.  */

>>    for (i = 0; i < N; i++)

>>      {

>>        ptr->a = 6;

>> @@ -58,6 +58,5 @@ int main (void)

>>    return 0;

>>  }

>>

>> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect"  } } */

>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  } } */

>> -

>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */

>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  } } */

>> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c

>> ===================================================================

>> --- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c 2018-05-02 08:37:49.021604375 +0100

>> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c 2018-07-20 11:55:22.570911497 +0100

>> @@ -53,7 +53,7 @@ main1 (s *arr)

>>     }

>>

>>    ptr = arr;

>> -  /* Not vectorizable: gap in store. */

>> +  /* Vectorized as a strided SLP pair.  */

>>    for (i = 0; i < N; i++)

>>      {

>>        res[i].a = ptr->b;

>> @@ -97,5 +97,4 @@ int main (void)

>>    return 0;

>>  }

>>

>> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */

>> -

>> +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_strided8 } } } */

>> Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c

>> ===================================================================

>> --- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c       2018-05-02 08:37:49.013604450 +0100

>> +++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c       2018-07-20 11:55:22.570911497 +0100

>> @@ -55,7 +55,7 @@ main1 (s *arr)

>>     }

>>

>>    ptr = arr;

>> -  /* Not vectorizable: gap in store.  */

>> +  /* Vectorized as a strided SLP pair.  */

>>    for (i = 0; i < N; i++)

>>      {

>>        res[i].a = ptr->b;

>> @@ -110,5 +110,4 @@ int main (void)

>>    return 0;

>>  }

>>

>> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */

>> -

>> +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_strided8 } } } */
Richard Sandiford Aug. 22, 2018, 1:01 p.m. | #3
Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Jul 20, 2018 at 12:57 PM Richard Sandiford

> <richard.sandiford@arm.com> wrote:

>>

>> We could vectorise:

>>

>>      for (...)

>>        {

>>          a[0] = ...;

>>          a[1] = ...;

>>          a[2] = ...;

>>          a[3] = ...;

>>          a += stride;

>>        }

>>

>> (including the case when stride == 8) but not:

>>

>>      for (...)

>>        {

>>          a[0] = ...;

>>          a[1] = ...;

>>          a[2] = ...;

>>          a[3] = ...;

>>          a += 8;

>>        }

>>

>> (where the stride is always 8).  The former was treated as a "grouped

>> and strided" store, while the latter was treated as grouped store with

>> gaps, which we don't support.

>>

>> This patch makes us treat groups of stores with gaps at the end as

>> strided groups too.  I tried to go through all uses of STMT_VINFO_STRIDED_P

>> and all vector uses of DR_STEP to see whether there were any hard-baked

>> assumptions, but couldn't see any.  I wondered whether we should relax:

>>

>>   /* We do not have to consider dependences between accesses that belong

>>      to the same group, unless the stride could be smaller than the

>>      group size.  */

>>   if (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)

>>       && (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)

>>           == DR_GROUP_FIRST_ELEMENT (stmtinfo_b))

>>       && !STMT_VINFO_STRIDED_P (stmtinfo_a))

>>     return false;

>>

>> for cases in which the step is constant and the absolute step is known

>> to be greater than the group size, but data dependence analysis should

>> already return chrec_known for those cases.

>>

>> The new test is a version of vect-avg-15.c with the variable step

>> replaced by a constant one.

>>

>> A natural follow-on would be to do the same for groups with gaps in

>> the middle:

>>

>>           /* Check that the distance between two accesses is equal to the type

>>              size. Otherwise, we have gaps.  */

>>           diff = (TREE_INT_CST_LOW (DR_INIT (data_ref))

>>                   - TREE_INT_CST_LOW (prev_init)) / type_size;

>>           if (diff != 1)

>>             {

>>               [...]

>>               if (DR_IS_WRITE (data_ref))

>>                 {

>>                   if (dump_enabled_p ())

>>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>>                                      "interleaved store with gaps\n");

>>                   return false;

>>                 }

>>

>> But I think we should do that separately and see what the fallout

>> from this change is first.

>

> Agreed.

>

>> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf

>> and x86_64-linux-gnu.  OK to install?

>

> Do you need to set STMT_VINFO_STRIDED_P on all stmts?  I think

> it is enough to set it on the first group element.

>

> OK otherwise.


Thanks.  Here's what I installed after fixing the STMT_VINFO_STRIDED_P
uses in get_load_store_type.

Richard


2018-08-22  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-vect-data-refs.c (vect_analyze_group_access_1): Convert
	grouped stores with gaps to a strided group.

gcc/testsuite/
	* gcc.dg/vect/vect-avg-16.c: New test.
	* gcc.dg/vect/slp-37.c: Expect the loop to be vectorized.
	* gcc.dg/vect/vect-strided-u8-i8-gap4.c,
	* gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c: Likewise for
	the second loop in main1.

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2018-08-21 14:47:07.795167843 +0100
+++ gcc/tree-vect-data-refs.c	2018-08-22 10:24:22.070479189 +0100
@@ -2633,10 +2633,8 @@ vect_analyze_group_access_1 (dr_vec_info
       if (groupsize != count
 	  && !DR_IS_READ (dr))
         {
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "interleaved store with gaps\n");
-	  return false;
+	  groupsize = count;
+	  STMT_VINFO_STRIDED_P (stmt_info) = true;
 	}
 
       /* If there is a gap after the last load in the group it is the
@@ -2652,6 +2650,8 @@ vect_analyze_group_access_1 (dr_vec_info
 			   "Detected interleaving ");
 	  if (DR_IS_READ (dr))
 	    dump_printf (MSG_NOTE, "load ");
+	  else if (STMT_VINFO_STRIDED_P (stmt_info))
+	    dump_printf (MSG_NOTE, "strided store ");
 	  else
 	    dump_printf (MSG_NOTE, "store ");
 	  dump_printf (MSG_NOTE, "of size %u starting with ",
Index: gcc/testsuite/gcc.dg/vect/vect-avg-16.c
===================================================================
--- /dev/null	2018-07-26 10:26:13.137955424 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-avg-16.c	2018-08-22 10:24:22.070479189 +0100
@@ -0,0 +1,52 @@
+/* { dg-additional-options "-O3" } */
+/* { dg-require-effective-target vect_int } */
+
+#include "tree-vect.h"
+
+#define N 80
+
+void __attribute__ ((noipa))
+f (signed char *restrict a, signed char *restrict b,
+   signed char *restrict c, int n)
+{
+  for (int j = 0; j < n; ++j)
+    {
+      for (int i = 0; i < 16; ++i)
+	a[i] = (b[i] + c[i]) >> 1;
+      a += 20;
+      b += 20;
+      c += 20;
+    }
+}
+
+#define BASE1 -126
+#define BASE2 -42
+
+signed char a[N], b[N], c[N];
+
+int
+main (void)
+{
+  check_vect ();
+
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = i;
+      b[i] = BASE1 + i * 3;
+      c[i] = BASE2 + i * 2;
+      asm volatile ("" ::: "memory");
+    }
+  f (a, b, c, N / 20);
+  for (int i = 0; i < N; ++i)
+    {
+      int d = (BASE1 + BASE2 + i * 5) >> 1;
+      if (a[i] != (i % 20 < 16 ? d : i))
+	__builtin_abort ();
+    }
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_average_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump {\.AVG_FLOOR} "vect" { target vect_avg_qi } } } */
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target vect_avg_qi } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_avg_qi } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-37.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-37.c	2018-08-01 10:55:10.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-37.c	2018-08-22 10:24:22.070479189 +0100
@@ -17,8 +17,8 @@ foo1 (s1 *arr)
   int i;
   s1 *ptr = arr;
 
-  /* Different constant types - not SLPable.  The group size is not power of 2,
-     interleaving is not supported either.  */
+  /* Vectorized as a strided SLP pair of accesses to <a, b> and a single
+     strided access to c.  */
   for (i = 0; i < N; i++)
     {
       ptr->a = 6;
@@ -58,6 +58,5 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  } } */
-  
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  } } */
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c	2018-08-01 10:55:10.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c	2018-08-22 10:24:22.070479189 +0100
@@ -53,7 +53,7 @@ main1 (s *arr)
    }
 
   ptr = arr;
-  /* Not vectorizable: gap in store. */
+  /* Vectorized as a strided SLP pair.  */
   for (i = 0; i < N; i++)
     { 
       res[i].a = ptr->b;
@@ -97,5 +97,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
-  
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_strided8 } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c	2018-08-01 10:55:10.000000000 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c	2018-08-22 10:24:22.070479189 +0100
@@ -55,7 +55,7 @@ main1 (s *arr)
    }
 
   ptr = arr;
-  /* Not vectorizable: gap in store.  */
+  /* Vectorized as a strided SLP pair.  */
   for (i = 0; i < N; i++)
     {
       res[i].a = ptr->b;
@@ -110,5 +110,4 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
-
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_strided8 } } } */
H.J. Lu Jan. 9, 2019, 3:53 a.m. | #4
On Fri, Jul 20, 2018 at 3:57 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>

> We could vectorise:

>

>      for (...)

>        {

>          a[0] = ...;

>          a[1] = ...;

>          a[2] = ...;

>          a[3] = ...;

>          a += stride;

>        }

>

> (including the case when stride == 8) but not:

>

>      for (...)

>        {

>          a[0] = ...;

>          a[1] = ...;

>          a[2] = ...;

>          a[3] = ...;

>          a += 8;

>        }

>

> (where the stride is always 8).  The former was treated as a "grouped

> and strided" store, while the latter was treated as grouped store with

> gaps, which we don't support.

>

> This patch makes us treat groups of stores with gaps at the end as

> strided groups too.  I tried to go through all uses of STMT_VINFO_STRIDED_P

> and all vector uses of DR_STEP to see whether there were any hard-baked

> assumptions, but couldn't see any.  I wondered whether we should relax:

>

>   /* We do not have to consider dependences between accesses that belong

>      to the same group, unless the stride could be smaller than the

>      group size.  */

>   if (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)

>       && (DR_GROUP_FIRST_ELEMENT (stmtinfo_a)

>           == DR_GROUP_FIRST_ELEMENT (stmtinfo_b))

>       && !STMT_VINFO_STRIDED_P (stmtinfo_a))

>     return false;

>

> for cases in which the step is constant and the absolute step is known

> to be greater than the group size, but data dependence analysis should

> already return chrec_known for those cases.

>

> The new test is a version of vect-avg-15.c with the variable step

> replaced by a constant one.

>

> A natural follow-on would be to do the same for groups with gaps in

> the middle:

>

>           /* Check that the distance between two accesses is equal to the type

>              size. Otherwise, we have gaps.  */

>           diff = (TREE_INT_CST_LOW (DR_INIT (data_ref))

>                   - TREE_INT_CST_LOW (prev_init)) / type_size;

>           if (diff != 1)

>             {

>               [...]

>               if (DR_IS_WRITE (data_ref))

>                 {

>                   if (dump_enabled_p ())

>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>                                      "interleaved store with gaps\n");

>                   return false;

>                 }

>

> But I think we should do that separately and see what the fallout

> from this change is first.

>

> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf

> and x86_64-linux-gnu.  OK to install?

>

> Richard

>

>

> 2018-07-20  Richard Sandiford  <richard.sandiford@arm.com>

>

> gcc/

>         * tree-vect-data-refs.c (vect_analyze_group_access_1): Convert

>         grouped stores with gaps to a strided group.

>


This patch caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87214

H.J.

Patch

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2018-06-30 13:44:38.567611988 +0100
+++ gcc/tree-vect-data-refs.c	2018-07-20 11:55:22.570911497 +0100
@@ -2632,10 +2632,14 @@  vect_analyze_group_access_1 (struct data
       if (groupsize != count
 	  && !DR_IS_READ (dr))
         {
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "interleaved store with gaps\n");
-	  return false;
+	  groupsize = count;
+	  STMT_VINFO_STRIDED_P (stmt_info) = true;
+	  stmt_vec_info next_info = stmt_info;
+	  while (DR_GROUP_NEXT_ELEMENT (next_info))
+	    {
+	      next_info = vinfo_for_stmt (DR_GROUP_NEXT_ELEMENT (next_info));
+	      STMT_VINFO_STRIDED_P (next_info) = 1;
+	    }
 	}
 
       /* If there is a gap after the last load in the group it is the
@@ -2651,6 +2655,8 @@  vect_analyze_group_access_1 (struct data
 			   "Detected interleaving ");
 	  if (DR_IS_READ (dr))
 	    dump_printf (MSG_NOTE, "load ");
+	  else if (STMT_VINFO_STRIDED_P (stmt_info))
+	    dump_printf (MSG_NOTE, "strided store ");
 	  else
 	    dump_printf (MSG_NOTE, "store ");
 	  dump_printf (MSG_NOTE, "of size %u starting with ",
Index: gcc/testsuite/gcc.dg/vect/vect-avg-16.c
===================================================================
--- /dev/null	2018-07-09 14:52:09.234750850 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-avg-16.c	2018-07-20 11:55:22.570911497 +0100
@@ -0,0 +1,52 @@ 
+/* { dg-additional-options "-O3" } */
+/* { dg-require-effective-target vect_int } */
+
+#include "tree-vect.h"
+
+#define N 80
+
+void __attribute__ ((noipa))
+f (signed char *restrict a, signed char *restrict b,
+   signed char *restrict c, int n)
+{
+  for (int j = 0; j < n; ++j)
+    {
+      for (int i = 0; i < 16; ++i)
+	a[i] = (b[i] + c[i]) >> 1;
+      a += 20;
+      b += 20;
+      c += 20;
+    }
+}
+
+#define BASE1 -126
+#define BASE2 -42
+
+signed char a[N], b[N], c[N];
+
+int
+main (void)
+{
+  check_vect ();
+
+  for (int i = 0; i < N; ++i)
+    {
+      a[i] = i;
+      b[i] = BASE1 + i * 3;
+      c[i] = BASE2 + i * 2;
+      asm volatile ("" ::: "memory");
+    }
+  f (a, b, c, N / 20);
+  for (int i = 0; i < N; ++i)
+    {
+      int d = (BASE1 + BASE2 + i * 5) >> 1;
+      if (a[i] != (i % 20 < 16 ? d : i))
+	__builtin_abort ();
+    }
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_average_pattern: detected" "vect" } } */
+/* { dg-final { scan-tree-dump {\.AVG_FLOOR} "vect" { target vect_avg_qi } } } */
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" { target vect_avg_qi } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loop" 1 "vect" { target vect_avg_qi } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-37.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-37.c	2018-05-02 08:37:49.033604262 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-37.c	2018-07-20 11:55:22.570911497 +0100
@@ -17,8 +17,8 @@  foo1 (s1 *arr)
   int i;
   s1 *ptr = arr;
 
-  /* Different constant types - not SLPable.  The group size is not power of 2,
-     interleaving is not supported either.  */
+  /* Vectorized as a strided SLP pair of accesses to <a, b> and a single
+     strided access to c.  */
   for (i = 0; i < N; i++)
     {
       ptr->a = 6;
@@ -58,6 +58,5 @@  int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  } } */
-  
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  } } */
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c	2018-05-02 08:37:49.021604375 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4.c	2018-07-20 11:55:22.570911497 +0100
@@ -53,7 +53,7 @@  main1 (s *arr)
    }
 
   ptr = arr;
-  /* Not vectorizable: gap in store. */
+  /* Vectorized as a strided SLP pair.  */
   for (i = 0; i < N; i++)
     { 
       res[i].a = ptr->b;
@@ -97,5 +97,4 @@  int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
-  
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_strided8 } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c	2018-05-02 08:37:49.013604450 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-strided-u8-i8-gap4-big-array.c	2018-07-20 11:55:22.570911497 +0100
@@ -55,7 +55,7 @@  main1 (s *arr)
    }
 
   ptr = arr;
-  /* Not vectorizable: gap in store.  */
+  /* Vectorized as a strided SLP pair.  */
   for (i = 0; i < N; i++)
     {
       res[i].a = ptr->b;
@@ -110,5 +110,4 @@  int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_strided8 } } } */
-
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target vect_strided8 } } } */