RFA: vectorizer patches 2/2: reduction splitting

Message ID 5BE7E51F.9010907@riscy-ip.com
State New
Headers show
Series
  • RFA: vectorizer patches 2/2: reduction splitting
Related show

Commit Message

Joern Wolfgang Rennecke Nov. 11, 2018, 8:15 a.m.
It's nice to use the processors vector arithmetic to good effect, but 
it's all for naught when
there are too many moves from/to general registers cluttering up the 
loop.  With a
double-vector reduction variable, the standard final reduction code got 
so awkward that
the register allocator decided that the reduction variable must live in 
general purpose
registers, not only after the loop, but across the loop patch.
Splitting the reduction to force the first step to be done as a vector 
operation
seemed the obvious solution. The hook was called, but the vectorizer still
generated the vanilla final reduction code.  It turns out that the 
reduction splitting
was calculated, but the result not used, and the calculation started anew.

The attached patch fixes this.

bootstrapped and regression tested on x86_64-pc-linux-gnu .
2018-10-31  Joern Rennecke  <joern.rennecke@riscy-ip.com>

	* tree-vect-loop.c (vect_create_epilog_for_reduction):
	If we split the reduction, use the result in Case 3 too.

Comments

Richard Biener Nov. 12, 2018, 2:36 p.m. | #1
On Sun, Nov 11, 2018 at 9:16 AM Joern Wolfgang Rennecke
<joern.rennecke@riscy-ip.com> wrote:
>

> It's nice to use the processors vector arithmetic to good effect, but

> it's all for naught when

> there are too many moves from/to general registers cluttering up the

> loop.  With a

> double-vector reduction variable, the standard final reduction code got

> so awkward that

> the register allocator decided that the reduction variable must live in

> general purpose

> registers, not only after the loop, but across the loop patch.

> Splitting the reduction to force the first step to be done as a vector

> operation

> seemed the obvious solution. The hook was called, but the vectorizer still

> generated the vanilla final reduction code.  It turns out that the

> reduction splitting

> was calculated, but the result not used, and the calculation started anew.

>

> The attached patch fixes this.


That looks quite fragile to me or warrants further cleanups.  Can you
push up the new_phis.length assert further and elide the loop over
the PHIs?  It looks like at the very beginning we are reducing the
PHIs to a single PHI and new_phi_result is the one to look at
(and the vector is updated, but given we replace the PHI with an
assign using new_phi_result instead of the vector would be better).

RIchard.

> bootstrapped and regression tested on x86_64-pc-linux-gnu .

Patch

Index: tree-vect-loop.c
===================================================================
--- tree-vect-loop.c	(revision 266008)
+++ tree-vect-loop.c	(working copy)
@@ -5139,6 +5139,7 @@  vect_create_epilog_for_reduction (vec<tree> vect_d
       while (sz > sz1)
 	{
 	  gcc_assert (!slp_reduc);
+	  gcc_assert (new_phis.length () == 1);
 	  sz /= 2;
 	  vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz);
 
@@ -5301,7 +5302,12 @@  vect_create_epilog_for_reduction (vec<tree> vect_d
           FOR_EACH_VEC_ELT (new_phis, i, new_phi)
             {
               int bit_offset;
-              if (gimple_code (new_phi) == GIMPLE_PHI)
+
+	      /* If we did reduction splitting, make sure to use the result.  */
+	      if (!slp_reduc && new_phis.length () == 1
+		  && new_temp != new_phi_result)
+		vec_temp = new_temp;
+	      else if (gimple_code (new_phi) == GIMPLE_PHI)
                 vec_temp = PHI_RESULT (new_phi);
               else
                 vec_temp = gimple_assign_lhs (new_phi);