[i386] Fix PR83008

Message ID alpine.LSU.2.20.1712081246540.12252@zhemvz.fhfr.qr
State New
Headers show
Series
  • [i386] Fix PR83008
Related show

Commit Message

Richard Biener Dec. 8, 2017, 11:48 a.m.
This restores the vec_construct cost dependence on the vector element
count.  Honza removed this (accidentially?) during the rework.

Bootstrap / regtest running on x86_64-unknown-linux-gnu, ok for trunk?

Thanks,
Richard.

2017-12-08  Richard Biener  <rguenther@suse.de>

	PR target/83008
	* config/i386/i386.c (ix86_builtin_vectorization_cost): Restore
	vec_construct dependence on vector element count.

Comments

Jan Hubicka Dec. 8, 2017, 3:51 p.m. | #1
> 

> This restores the vec_construct cost dependence on the vector element

> count.  Honza removed this (accidentially?) during the rework.

> 

> Bootstrap / regtest running on x86_64-unknown-linux-gnu, ok for trunk?


Hmm, the false parameter to ix86_vec_cost is supposed to do that.  It uses:

if (!parallel)                                                                
    return cost * GET_MODE_NUNITS (mode);                                       

Why it doesn't work?

Honza
> 

> Thanks,

> Richard.

> 

> 2017-12-08  Richard Biener  <rguenther@suse.de>

> 

> 	PR target/83008

> 	* config/i386/i386.c (ix86_builtin_vectorization_cost): Restore

> 	vec_construct dependence on vector element count.

> 

> Index: gcc/config/i386/i386.c

> ===================================================================

> --- gcc/config/i386/i386.c	(revision 255499)

> +++ gcc/config/i386/i386.c	(working copy)

> @@ -44879,7 +44879,8 @@ ix86_builtin_vectorization_cost (enum ve

>  			      ix86_cost->sse_op, true);

>  

>        case vec_construct:

> -	return ix86_vec_cost (mode, ix86_cost->sse_op, false);

> +	return (ix86_vec_cost (mode, ix86_cost->sse_op, false)

> +		* (TYPE_VECTOR_SUBPARTS (vectype) - 1));

>  

>        default:

>          gcc_unreachable ();
Richard Biener Dec. 8, 2017, 4:12 p.m. | #2
On December 8, 2017 4:51:16 PM GMT+01:00, Jan Hubicka <hubicka@ucw.cz> wrote:
>> 

>> This restores the vec_construct cost dependence on the vector element

>> count.  Honza removed this (accidentially?) during the rework.

>> 

>> Bootstrap / regtest running on x86_64-unknown-linux-gnu, ok for

>trunk?

>

>Hmm, the false parameter to ix86_vec_cost is supposed to do that.  It

>uses:

>

>if (!parallel)                                                         

>      

>return cost * GET_MODE_NUNITS (mode);                                  

>    

>

>Why it doesn't work?


I see. Not exactly the same. I guess we need to see why the costs favor a 16 element vector in it plus store over 16 scalar stores then... 

See the PR. 

>

>Honza

>> 

>> Thanks,

>> Richard.

>> 

>> 2017-12-08  Richard Biener  <rguenther@suse.de>

>> 

>> 	PR target/83008

>> 	* config/i386/i386.c (ix86_builtin_vectorization_cost): Restore

>> 	vec_construct dependence on vector element count.

>> 

>> Index: gcc/config/i386/i386.c

>> ===================================================================

>> --- gcc/config/i386/i386.c	(revision 255499)

>> +++ gcc/config/i386/i386.c	(working copy)

>> @@ -44879,7 +44879,8 @@ ix86_builtin_vectorization_cost (enum ve

>>  			      ix86_cost->sse_op, true);

>>  

>>        case vec_construct:

>> -	return ix86_vec_cost (mode, ix86_cost->sse_op, false);

>> +	return (ix86_vec_cost (mode, ix86_cost->sse_op, false)

>> +		* (TYPE_VECTOR_SUBPARTS (vectype) - 1));

>>  

>>        default:

>>          gcc_unreachable ();
Jan Hubicka Dec. 8, 2017, 4:45 p.m. | #3
> On December 8, 2017 4:51:16 PM GMT+01:00, Jan Hubicka <hubicka@ucw.cz> wrote:

> >> 

> >> This restores the vec_construct cost dependence on the vector element

> >> count.  Honza removed this (accidentially?) during the rework.

> >> 

> >> Bootstrap / regtest running on x86_64-unknown-linux-gnu, ok for

> >trunk?

> >

> >Hmm, the false parameter to ix86_vec_cost is supposed to do that.  It

> >uses:

> >

> >if (!parallel)                                                         

> >      

> >return cost * GET_MODE_NUNITS (mode);                                  

> >    

> >

> >Why it doesn't work?

> 

> I see. Not exactly the same. I guess we need to see why the costs favor a 16 element vector in it plus store over 16 scalar stores then... 


Isn't it what you proposed as profitable for Martin Jambor's copy by pieces
issue?  I think the reason is that stores are modeled as having latency (or
cost/2) of 4.  Construction is modelled as simple sse op (latency 1)*number of
parts.

So in this simplified model we cummulate latency of 16 stores as 16*4,
while construction plus one store as 16*1+4.

Honza
> 

> See the PR. 

> 

> >

> >Honza

> >> 

> >> Thanks,

> >> Richard.

> >> 

> >> 2017-12-08  Richard Biener  <rguenther@suse.de>

> >> 

> >> 	PR target/83008

> >> 	* config/i386/i386.c (ix86_builtin_vectorization_cost): Restore

> >> 	vec_construct dependence on vector element count.

> >> 

> >> Index: gcc/config/i386/i386.c

> >> ===================================================================

> >> --- gcc/config/i386/i386.c	(revision 255499)

> >> +++ gcc/config/i386/i386.c	(working copy)

> >> @@ -44879,7 +44879,8 @@ ix86_builtin_vectorization_cost (enum ve

> >>  			      ix86_cost->sse_op, true);

> >>  

> >>        case vec_construct:

> >> -	return ix86_vec_cost (mode, ix86_cost->sse_op, false);

> >> +	return (ix86_vec_cost (mode, ix86_cost->sse_op, false)

> >> +		* (TYPE_VECTOR_SUBPARTS (vectype) - 1));

> >>  

> >>        default:

> >>          gcc_unreachable ();
Richard Biener Dec. 8, 2017, 5 p.m. | #4
On December 8, 2017 5:45:23 PM GMT+01:00, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On December 8, 2017 4:51:16 PM GMT+01:00, Jan Hubicka

><hubicka@ucw.cz> wrote:

>> >> 

>> >> This restores the vec_construct cost dependence on the vector

>element

>> >> count.  Honza removed this (accidentially?) during the rework.

>> >> 

>> >> Bootstrap / regtest running on x86_64-unknown-linux-gnu, ok for

>> >trunk?

>> >

>> >Hmm, the false parameter to ix86_vec_cost is supposed to do that. 

>It

>> >uses:

>> >

>> >if (!parallel)                                                      

>  

>> >      

>> >return cost * GET_MODE_NUNITS (mode);                               

>  

>> >    

>> >

>> >Why it doesn't work?

>> 

>> I see. Not exactly the same. I guess we need to see why the costs

>favor a 16 element vector in it plus store over 16 scalar stores

>then... 

>

>Isn't it what you proposed as profitable for Martin Jambor's copy by

>pieces

>issue?  I think the reason is that stores are modeled as having latency

>(or

>cost/2) of 4.  Construction is modelled as simple sse op (latency

>1)*number of

>parts.


Yes. 

>So in this simplified model we cummulate latency of 16 stores as 16*4,

>while construction plus one store as 16*1+4.


But vector vs. Scalar cost somehow mess up for avx512... 

>Honza

>> 

>> See the PR. 

>> 

>> >

>> >Honza

>> >> 

>> >> Thanks,

>> >> Richard.

>> >> 

>> >> 2017-12-08  Richard Biener  <rguenther@suse.de>

>> >> 

>> >> 	PR target/83008

>> >> 	* config/i386/i386.c (ix86_builtin_vectorization_cost): Restore

>> >> 	vec_construct dependence on vector element count.

>> >> 

>> >> Index: gcc/config/i386/i386.c

>> >>

>===================================================================

>> >> --- gcc/config/i386/i386.c	(revision 255499)

>> >> +++ gcc/config/i386/i386.c	(working copy)

>> >> @@ -44879,7 +44879,8 @@ ix86_builtin_vectorization_cost (enum ve

>> >>  			      ix86_cost->sse_op, true);

>> >>  

>> >>        case vec_construct:

>> >> -	return ix86_vec_cost (mode, ix86_cost->sse_op, false);

>> >> +	return (ix86_vec_cost (mode, ix86_cost->sse_op, false)

>> >> +		* (TYPE_VECTOR_SUBPARTS (vectype) - 1));

>> >>  

>> >>        default:

>> >>          gcc_unreachable ();

Patch

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 255499)
+++ gcc/config/i386/i386.c	(working copy)
@@ -44879,7 +44879,8 @@  ix86_builtin_vectorization_cost (enum ve
 			      ix86_cost->sse_op, true);
 
       case vec_construct:
-	return ix86_vec_cost (mode, ix86_cost->sse_op, false);
+	return (ix86_vec_cost (mode, ix86_cost->sse_op, false)
+		* (TYPE_VECTOR_SUBPARTS (vectype) - 1));
 
       default:
         gcc_unreachable ();