[vect] Disable vectorization of epilogues for loops with SIMDUID set

Message ID 63635e04-3faa-9b89-ca42-0a3b190f4106@arm.com
State New
Headers show
Series
  • [vect] Disable vectorization of epilogues for loops with SIMDUID set
Related show

Commit Message

Andre Vieira (lists) Nov. 4, 2019, 5:11 p.m.
Hi,

I was using loop->simdlen to detect whether it was a SIMD loop and I 
don't believe that was correct, as can be witnessed by the mass failures 
in libgomp. My apologies for not running this, didn't think of it!

I found that these were failing because we do not handle vectorization 
of epilogues correctly when SIMDUID is set. For now Jakub and I agreed 
to disable epilogue vectorization for loops where SIMDUID is set until 
we have fixed this. See further comments inline.

I bootstrapped it on aarch64 and x86_64, ran libgomp on both.

This OK for trunk?

Cheers,
Andre

gcc/ChangeLog:
2019-11-04  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * tree-vect-loop.c (vect_analyze_loop): Disable epilogue
         vectorization if loop->simduid is non null.

On 31/10/2019 16:58, Jakub Jelinek wrote:

> FAIL: libgomp.c/../libgomp.c-c++-common/loop-1.c execution test

> FAIL: libgomp.c/examples-4/simd-3.c execution test

> FAIL: libgomp.c/pr58392.c execution test

> FAIL: libgomp.c/scan-13.c execution test

> FAIL: libgomp.c/scan-17.c execution test

> FAIL: libgomp.c/scan-19.c execution test

> FAIL: libgomp.c/scan-20.c execution test

> FAIL: libgomp.c/simd-10.c execution test

> FAIL: libgomp.c/simd-12.c execution test

> FAIL: libgomp.c/simd-13.c execution test

> FAIL: libgomp.c/simd-6.c execution test

> FAIL: libgomp.c++/../libgomp.c-c++-common/loop-1.c execution test

> FAIL: libgomp.c++/simd-8.C execution test

> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O1  execution test

> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O2  execution test

> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -g  execution test

> FAIL: libgomp.fortran/examples-4/simd-3.f90   -Os  execution test

> FAIL: libgomp.fortran/nestedfn5.f90   -O1  execution test

> FAIL: libgomp.fortran/nestedfn5.f90   -O2  execution test

> FAIL: libgomp.fortran/nestedfn5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

> FAIL: libgomp.fortran/nestedfn5.f90   -O3 -g  execution test

> FAIL: libgomp.fortran/nestedfn5.f90   -Os  execution test


These should go away now, but we should revisit SIMDUID and epilogue 
vectorization later.  I tried to look into it, but I am afraid I know 
very little about SIMD loops to figure out how to make this work.
> 

> On i686-linux, I also see newly

> FAIL: gcc.dg/vect/vect-epilogues.c -flto -ffat-lto-objects  scan-tree-dump vect "LOOP EPILOGUE VECTORIZED"

> FAIL: gcc.dg/vect/vect-epilogues.c scan-tree-dump vect "LOOP EPILOGUE VECTORIZED"

> and in libgomp just


These, just like for arm should be skipped for i686, I suspect it 
doesn't know how to vectorize for lower VF's.  Could someone add the 
appropriate skip target triple for i686?

> FAIL: libgomp.c/examples-4/simd-3.c execution test

> FAIL: libgomp.c/scan-13.c execution test

> FAIL: libgomp.c/scan-17.c execution test

> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O1  execution test

> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O2  execution test

> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -g  execution test

> FAIL: libgomp.fortran/examples-4/simd-3.f90   -Os  execution test


Same as the other libgomp tests. Should go away now.

Comments

Richard Biener Nov. 5, 2019, 7:07 a.m. | #1
On Mon, 4 Nov 2019, Andre Vieira (lists) wrote:

> Hi,

> 

> I was using loop->simdlen to detect whether it was a SIMD loop and I don't

> believe that was correct, as can be witnessed by the mass failures in libgomp.

> My apologies for not running this, didn't think of it!

> 

> I found that these were failing because we do not handle vectorization of

> epilogues correctly when SIMDUID is set. For now Jakub and I agreed to disable

> epilogue vectorization for loops where SIMDUID is set until we have fixed

> this. See further comments inline.

> 

> I bootstrapped it on aarch64 and x86_64, ran libgomp on both.

> 

> This OK for trunk?


OK.  Can you remove the simdlen == 0 check as a followup?

Thanks,
Richard.

> Cheers,

> Andre

> 

> gcc/ChangeLog:

> 2019-11-04  Andre Vieira  <andre.simoesdiasvieira@arm.com>

> 

>         * tree-vect-loop.c (vect_analyze_loop): Disable epilogue

>         vectorization if loop->simduid is non null.

> 

> On 31/10/2019 16:58, Jakub Jelinek wrote:

> 

> > FAIL: libgomp.c/../libgomp.c-c++-common/loop-1.c execution test

> > FAIL: libgomp.c/examples-4/simd-3.c execution test

> > FAIL: libgomp.c/pr58392.c execution test

> > FAIL: libgomp.c/scan-13.c execution test

> > FAIL: libgomp.c/scan-17.c execution test

> > FAIL: libgomp.c/scan-19.c execution test

> > FAIL: libgomp.c/scan-20.c execution test

> > FAIL: libgomp.c/simd-10.c execution test

> > FAIL: libgomp.c/simd-12.c execution test

> > FAIL: libgomp.c/simd-13.c execution test

> > FAIL: libgomp.c/simd-6.c execution test

> > FAIL: libgomp.c++/../libgomp.c-c++-common/loop-1.c execution test

> > FAIL: libgomp.c++/simd-8.C execution test

> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O1  execution test

> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O2  execution test

> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -fomit-frame-pointer

> > FAIL: -funroll-loops -fpeel-loops -ftracer -finline-functions  execution

> > FAIL: test

> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -g  execution test

> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -Os  execution test

> > FAIL: libgomp.fortran/nestedfn5.f90   -O1  execution test

> > FAIL: libgomp.fortran/nestedfn5.f90   -O2  execution test

> > FAIL: libgomp.fortran/nestedfn5.f90   -O3 -fomit-frame-pointer

> > FAIL: -funroll-loops -fpeel-loops -ftracer -finline-functions  execution

> > FAIL: test

> > FAIL: libgomp.fortran/nestedfn5.f90   -O3 -g  execution test

> > FAIL: libgomp.fortran/nestedfn5.f90   -Os  execution test

> 

> These should go away now, but we should revisit SIMDUID and epilogue

> vectorization later.  I tried to look into it, but I am afraid I know very

> little about SIMD loops to figure out how to make this work.

> > 

> > On i686-linux, I also see newly

> > FAIL: gcc.dg/vect/vect-epilogues.c -flto -ffat-lto-objects  scan-tree-dump

> > FAIL: vect "LOOP EPILOGUE VECTORIZED"

> > FAIL: gcc.dg/vect/vect-epilogues.c scan-tree-dump vect "LOOP EPILOGUE

> > FAIL: VECTORIZED"

> > and in libgomp just

> 

> These, just like for arm should be skipped for i686, I suspect it doesn't know

> how to vectorize for lower VF's.  Could someone add the appropriate skip

> target triple for i686?

> 

> > FAIL: libgomp.c/examples-4/simd-3.c execution test

> > FAIL: libgomp.c/scan-13.c execution test

> > FAIL: libgomp.c/scan-17.c execution test

> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O1  execution test

> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O2  execution test

> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -fomit-frame-pointer

> > FAIL: -funroll-loops -fpeel-loops -ftracer -finline-functions  execution

> > FAIL: test

> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -g  execution test

> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -Os  execution test

> 

> Same as the other libgomp tests. Should go away now.

> 

> 

> 


-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Jakub Jelinek Nov. 5, 2019, 7:16 a.m. | #2
On Tue, Nov 05, 2019 at 08:07:53AM +0100, Richard Biener wrote:
> > I was using loop->simdlen to detect whether it was a SIMD loop and I don't

> > believe that was correct, as can be witnessed by the mass failures in libgomp.

> > My apologies for not running this, didn't think of it!

> > 

> > I found that these were failing because we do not handle vectorization of

> > epilogues correctly when SIMDUID is set. For now Jakub and I agreed to disable

> > epilogue vectorization for loops where SIMDUID is set until we have fixed

> > this. See further comments inline.

> > 

> > I bootstrapped it on aarch64 and x86_64, ran libgomp on both.

> > 

> > This OK for trunk?

> 

> OK.  Can you remove the simdlen == 0 check as a followup?


Yeah, exactly, I wanted to ask what the point of the simdlen == 0 check is.
All a non-zero simdlen says is a user assertion that certain inter-loop
depencencies don't exist.

	Jakub
Andre Vieira (lists) Nov. 7, 2019, 2:26 p.m. | #3
Hi,

Rebased the patch on top of Richard Sandiford's patches, with his fixes 
I can now allow for vectorization of epilogues after we match simdlen. 
This will however not turn on epilogue vectorization in cases where we 
specify a desired simdlen that is never matched.  This would require 
more work as before simdlen is matched we would need to analyze each 
vector_size after creating a "first_loop_vinfo" twice: once as an 
epilogue (for in the case we never match simdlen) and once as a main 
loop (in case simdlen would match its VF).  Maybe there is a different 
way of doing it but I don't see it right now.

Bootstrapped and regression tested (also ran libgomp tests) for x86_64 
and aarch64. Currently libgomp has 5 failures for aarch64, these are all 
openacc tests. The first one I looked at is due to a reduction seemingly 
performing too many iterations when defining '$acc parallel 
vector_length(vl)' I am looking into it.

Is this OK for trunk?

Cheers,
Andre

gcc/ChangeLog:
2019-11-07  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * tree-vect-loop.c (vect_analyze_loop): Disable epilogue
         vectorization for loops with SIMDUID set.  Enable epilogue
         vectorization for loops with SIMDLEN set after finding a main
         loop with a VF that matches it.

On 05/11/2019 07:16, Jakub Jelinek wrote:
> On Tue, Nov 05, 2019 at 08:07:53AM +0100, Richard Biener wrote:

>>> I was using loop->simdlen to detect whether it was a SIMD loop and I don't

>>> believe that was correct, as can be witnessed by the mass failures in libgomp.

>>> My apologies for not running this, didn't think of it!

>>>

>>> I found that these were failing because we do not handle vectorization of

>>> epilogues correctly when SIMDUID is set. For now Jakub and I agreed to disable

>>> epilogue vectorization for loops where SIMDUID is set until we have fixed

>>> this. See further comments inline.

>>>

>>> I bootstrapped it on aarch64 and x86_64, ran libgomp on both.

>>>

>>> This OK for trunk?

>>

>> OK.  Can you remove the simdlen == 0 check as a followup?

> 

> Yeah, exactly, I wanted to ask what the point of the simdlen == 0 check is.

> All a non-zero simdlen says is a user assertion that certain inter-loop

> depencencies don't exist.

> 

> 	Jakub

>
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index dfa087ebb2cf01a5d21da0a921f8b6fc3d691ce9..22550ca2d6c56cce201ea422bfae5472a0d85f3a 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2455,11 +2455,15 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
 	    delete loop_vinfo;
 
 	  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
-	     enabled, this is not a simd loop and it is the innermost loop.  */
-	  vect_epilogues = (!loop->simdlen
+	     enabled, SIMDUID is not set, it is the innermost loop and we have
+	     either already found the loop's SIMDLEN or there was no SIMDLEN to
+	     begin with.
+	     TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
+	  vect_epilogues = (!simdlen
 			    && loop->inner == NULL
 			    && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK)
 			    && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)
+			    && !loop->simduid
 			    /* For now only allow one epilogue loop.  */
 			    && first_loop_vinfo->epilogue_vinfos.is_empty ());
Jakub Jelinek Nov. 7, 2019, 2:32 p.m. | #4
On Thu, Nov 07, 2019 at 02:26:29PM +0000, Andre Vieira (lists) wrote:
> 2019-11-07  Andre Vieira  <andre.simoesdiasvieira@arm.com>

> 

>         * tree-vect-loop.c (vect_analyze_loop): Disable epilogue

>         vectorization for loops with SIMDUID set.  Enable epilogue

>         vectorization for loops with SIMDLEN set after finding a main

>         loop with a VF that matches it.

> 

> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c

> index dfa087ebb2cf01a5d21da0a921f8b6fc3d691ce9..22550ca2d6c56cce201ea422bfae5472a0d85f3a 100644

> --- a/gcc/tree-vect-loop.c

> +++ b/gcc/tree-vect-loop.c

> @@ -2455,11 +2455,15 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)

>  	    delete loop_vinfo;

>  

>  	  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is

> -	     enabled, this is not a simd loop and it is the innermost loop.  */

> -	  vect_epilogues = (!loop->simdlen

> +	     enabled, SIMDUID is not set, it is the innermost loop and we have

> +	     either already found the loop's SIMDLEN or there was no SIMDLEN to

> +	     begin with.

> +	     TODO: Enable epilogue vectorization for loops with SIMDUID set.  */

> +	  vect_epilogues = (!simdlen

>  			    && loop->inner == NULL

>  			    && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK)

>  			    && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)

> +			    && !loop->simduid

>  			    /* For now only allow one epilogue loop.  */

>  			    && first_loop_vinfo->epilogue_vinfos.is_empty ());


LGTM.

	Jakub

Patch

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index fa873e9b435037e5a81dda6615cab809d2d4de48..d3a0fa015332dbcccf84bc68531dc1e49550cc19 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2383,10 +2383,11 @@  vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo,
   poly_uint64 lowest_th = 0;
   unsigned vectorized_loops = 0;
 
-  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is enabled, this
-     is not a simd loop and it is the most inner loop.  */
+  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is enabled,
+     SIMDLEN is not set, SIMDUID is not set and this is the most inner loop.
+     TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
   bool vect_epilogues
-    = !loop->simdlen && loop->inner == NULL
+    = loop->simdlen == 0 && loop->simduid == NULL_TREE && loop->inner == NULL
       && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK);
   while (1)
     {