[0/3,POPCOUNT]

Message ID CAELXzTO1FAVbJff1V-RC=a4p6ngw309xM2x7t6LinazKtGtKRg@mail.gmail.com
Headers show
Series
  • [1/3,POPCOUNT] Handle COND_EXPR in expression_expensive_p
Related show

Message

Kugan Vivekanandarajah June 22, 2018, 9:11 a.m.
When we set niter with maybe_zero, currently final_value_relacement
will not happen due to expression_expensive_p not handling. Patch 1
adds this.

With that we have the following optimized gimple.

  <bb 2> [local count: 118111601]:
  if (b_4(D) != 0)
    goto <bb 3>; [89.00%]
  else
    goto <bb 4>; [11.00%]

  <bb 3> [local count: 105119324]:
  _2 = (unsigned long) b_4(D);
  _9 = __builtin_popcountl (_2);
  c_3 = b_4(D) != 0 ? _9 : 1;

  <bb 4> [local count: 118111601]:
  # c_12 = PHI <c_3(3), 0(2)>

I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the
latch execute zero times for b_4 == 0 means that the body will execute
ones.

The issue here is, since we are checking if (b_4(D) != 0) before
entering the loop means we don't need to set maybe_zero. Patch 2
handles this.

With that we have
  <bb 2> [local count: 118111601]:
  if (b_4(D) != 0)
    goto <bb 3>; [89.00%]
  else
    goto <bb 4>; [11.00%]

  <bb 3> [local count: 105119324]:
  _2 = (unsigned long) b_4(D);
  _9 = __builtin_popcountl (_2);

  <bb 4> [local count: 118111601]:
  # c_12 = PHI <0(2), _9(3)>

As advised earlier, patch 3 adds phiopt support to remove this.

Bootstrap and regression testing are ongoing.

Is this OK for trunk.

Thanks,
Kugan

Comments

Jeff Law June 22, 2018, 4:06 p.m. | #1
On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:
> When we set niter with maybe_zero, currently final_value_relacement

> will not happen due to expression_expensive_p not handling. Patch 1

> adds this.

> 

> With that we have the following optimized gimple.

> 

>   <bb 2> [local count: 118111601]:

>   if (b_4(D) != 0)

>     goto <bb 3>; [89.00%]

>   else

>     goto <bb 4>; [11.00%]

> 

>   <bb 3> [local count: 105119324]:

>   _2 = (unsigned long) b_4(D);

>   _9 = __builtin_popcountl (_2);

>   c_3 = b_4(D) != 0 ? _9 : 1;

> 

>   <bb 4> [local count: 118111601]:

>   # c_12 = PHI <c_3(3), 0(2)>

> 

> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the

> latch execute zero times for b_4 == 0 means that the body will execute

> ones.

ISTM that DOM ought to have simplified the conditional, unless there's
some other way to get to bb3.  We know that b_4 is nonzero and thus c_3
must have the value _9.


> 

> The issue here is, since we are checking if (b_4(D) != 0) before

> entering the loop means we don't need to set maybe_zero. Patch 2

> handles this.

> 

> With that we have

>   <bb 2> [local count: 118111601]:

>   if (b_4(D) != 0)

>     goto <bb 3>; [89.00%]

>   else

>     goto <bb 4>; [11.00%]

> 

>   <bb 3> [local count: 105119324]:

>   _2 = (unsigned long) b_4(D);

>   _9 = __builtin_popcountl (_2);

> 

>   <bb 4> [local count: 118111601]:

>   # c_12 = PHI <0(2), _9(3)>

> 

> As advised earlier, patch 3 adds phiopt support to remove this.

So if DOM or some other pass fixed up the assignment to c_3 I'd hope we
wouldn't set maybe_zero.

So I'd like to start by first determining if we should have already
simplified the COND_EXPR in bb3.  Do you have a testcase for that?


jeff
Bin.Cheng June 25, 2018, 1:15 a.m. | #2
On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> When we set niter with maybe_zero, currently final_value_relacement

> will not happen due to expression_expensive_p not handling. Patch 1

> adds this.

>

> With that we have the following optimized gimple.

>

>   <bb 2> [local count: 118111601]:

>   if (b_4(D) != 0)

>     goto <bb 3>; [89.00%]

>   else

>     goto <bb 4>; [11.00%]

>

>   <bb 3> [local count: 105119324]:

>   _2 = (unsigned long) b_4(D);

>   _9 = __builtin_popcountl (_2);

>   c_3 = b_4(D) != 0 ? _9 : 1;

>

>   <bb 4> [local count: 118111601]:

>   # c_12 = PHI <c_3(3), 0(2)>

>

> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the

No, it doesn't make much sense.  when b_4(D) == 0, the popcount
computed should be 0.  Point is you can never get b_4(D) == 0 with
guard condition in basic block 2.  So the result should simply be:

>   <bb 2> [local count: 118111601]:

>   if (b_4(D) != 0)

>     goto <bb 3>; [89.00%]

>   else

>     goto <bb 4>; [11.00%]

>

>   <bb 3> [local count: 105119324]:

>   _2 = (unsigned long) b_4(D);

>   c_3 = __builtin_popcountl (_2);

>

>   <bb 4> [local count: 118111601]:

>   # c_12 = PHI <c_3(3), 0(2)>


I think this is the code generated if maybe_zero is not set?  which it
should not be set here.
For the same reason, it can be further optimized into:

>   <bb 2> [local count: 118111601]:

>   _2 = (unsigned long) b_4(D);

>   c_12 = __builtin_popcountl (_2);

>


> latch execute zero times for b_4 == 0 means that the body will execute

> ones.

You never get zero times latch here with the aforementioned guard condition.

BTW, I didn't look at following patches which could be wanted optimizations.

Thanks,
bin
>

> The issue here is, since we are checking if (b_4(D) != 0) before

> entering the loop means we don't need to set maybe_zero. Patch 2

> handles this.

>

> With that we have

>   <bb 2> [local count: 118111601]:

>   if (b_4(D) != 0)

>     goto <bb 3>; [89.00%]

>   else

>     goto <bb 4>; [11.00%]

>

>   <bb 3> [local count: 105119324]:

>   _2 = (unsigned long) b_4(D);

>   _9 = __builtin_popcountl (_2);

>

>   <bb 4> [local count: 118111601]:

>   # c_12 = PHI <0(2), _9(3)>

>

> As advised earlier, patch 3 adds phiopt support to remove this.

>

> Bootstrap and regression testing are ongoing.

>

> Is this OK for trunk.

>

> Thanks,

> Kugan
Kugan Vivekanandarajah June 25, 2018, 2:41 a.m. | #3
Hi Jeff,

Thanks for the comments.

On 23 June 2018 at 02:06, Jeff Law <law@redhat.com> wrote:
> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:

>> When we set niter with maybe_zero, currently final_value_relacement

>> will not happen due to expression_expensive_p not handling. Patch 1

>> adds this.

>>

>> With that we have the following optimized gimple.

>>

>>   <bb 2> [local count: 118111601]:

>>   if (b_4(D) != 0)

>>     goto <bb 3>; [89.00%]

>>   else

>>     goto <bb 4>; [11.00%]

>>

>>   <bb 3> [local count: 105119324]:

>>   _2 = (unsigned long) b_4(D);

>>   _9 = __builtin_popcountl (_2);

>>   c_3 = b_4(D) != 0 ? _9 : 1;

>>

>>   <bb 4> [local count: 118111601]:

>>   # c_12 = PHI <c_3(3), 0(2)>

>>

>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the

>> latch execute zero times for b_4 == 0 means that the body will execute

>> ones.

> ISTM that DOM ought to have simplified the conditional, unless there's

> some other way to get to bb3.  We know that b_4 is nonzero and thus c_3

> must have the value _9.

As of now, dom is not optimizing it. With the attached hack, it can be made to.

>

>

>>

>> The issue here is, since we are checking if (b_4(D) != 0) before

>> entering the loop means we don't need to set maybe_zero. Patch 2

>> handles this.

>>

>> With that we have

>>   <bb 2> [local count: 118111601]:

>>   if (b_4(D) != 0)

>>     goto <bb 3>; [89.00%]

>>   else

>>     goto <bb 4>; [11.00%]

>>

>>   <bb 3> [local count: 105119324]:

>>   _2 = (unsigned long) b_4(D);

>>   _9 = __builtin_popcountl (_2);

>>

>>   <bb 4> [local count: 118111601]:

>>   # c_12 = PHI <0(2), _9(3)>

>>

>> As advised earlier, patch 3 adds phiopt support to remove this.

> So if DOM or some other pass fixed up the assignment to c_3 I'd hope we

> wouldn't set maybe_zero.

>

> So I'd like to start by first determining if we should have already

> simplified the COND_EXPR in bb3.  Do you have a testcase for that?

Sorry, It is hidden in patch 3 (attaching now). You will need patch 1 as well.

Thanks,
Kugan

>

>

> jeff
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index a6f176c..77ae7d1b 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1991,6 +1991,25 @@ dom_opt_dom_walker::optimize_stmt (basic_block bb, gimple_stmt_iterator si)
 	    }
 	}
 
+      if (gimple_code (stmt) == GIMPLE_ASSIGN
+	  && gimple_assign_rhs_code (stmt) == COND_EXPR)
+	{
+	  /* If this is a conditional stmt, see if we can optimize the
+	     condition.  */
+	  x_vr_values = evrp_range_analyzer.get_vr_values ();
+	  tree exp = gimple_assign_rhs1 (stmt);
+	  tree lhs = TREE_OPERAND (exp, 0);
+	  tree rhs = TREE_OPERAND (exp, 1);
+	  tree val = x_vr_values->vrp_evaluate_conditional (TREE_CODE (exp),
+							    lhs, rhs, stmt);
+	  if (is_gimple_min_invariant (val))
+	    {
+	      gimple_assign_set_rhs1 (stmt, val);
+	      update_stmt (stmt);
+	    }
+	  x_vr_values = NULL;
+	}
+
       if (gimple_code (stmt) == GIMPLE_COND)
 	{
 	  tree lhs = gimple_cond_lhs (stmt);
int PopCount (long b) {
    int c = 0;

    while (b) {
	b &= b - 1;
	c++;
    }
    return c;
}
Kugan Vivekanandarajah June 25, 2018, 3:37 a.m. | #4
Hi Bin,

Thanks for your comments.

On 25 June 2018 at 11:15, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> When we set niter with maybe_zero, currently final_value_relacement

>> will not happen due to expression_expensive_p not handling. Patch 1

>> adds this.

>>

>> With that we have the following optimized gimple.

>>

>>   <bb 2> [local count: 118111601]:

>>   if (b_4(D) != 0)

>>     goto <bb 3>; [89.00%]

>>   else

>>     goto <bb 4>; [11.00%]

>>

>>   <bb 3> [local count: 105119324]:

>>   _2 = (unsigned long) b_4(D);

>>   _9 = __builtin_popcountl (_2);

>>   c_3 = b_4(D) != 0 ? _9 : 1;

>>

>>   <bb 4> [local count: 118111601]:

>>   # c_12 = PHI <c_3(3), 0(2)>

>>

>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the

> No, it doesn't make much sense.  when b_4(D) == 0, the popcount

> computed should be 0.  Point is you can never get b_4(D) == 0 with

> guard condition in basic block 2.  So the result should simply be:


When we do  calculate niter (for the copy header case), we set
may_be_zero as (which I think is correct)
niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
                  build_zero_cst
                  (TREE_TYPE (src)));

Then in final_value_replacement_loop (struct loop *loop)

for the PHI stmt for which we are going to do the final value replacement,
we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC.

then we do
compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn)

where when we do chrec_apply to the polynomial_chrec with niter from
popcount which also has the may_be_zero, we end up with the 1.
Looking at this, I am not sure if this is wrong. May be I am missing something.

In this testcase, before we enter the loop we have a check for (b_4(D)
> 0). Thus, setting niter->may_be_zero is not strictly necessary but

conservatively correct (?).

Thanks,
Kugan

>

>>   <bb 2> [local count: 118111601]:

>>   if (b_4(D) != 0)

>>     goto <bb 3>; [89.00%]

>>   else

>>     goto <bb 4>; [11.00%]

>>

>>   <bb 3> [local count: 105119324]:

>>   _2 = (unsigned long) b_4(D);

>>   c_3 = __builtin_popcountl (_2);

>>

>>   <bb 4> [local count: 118111601]:

>>   # c_12 = PHI <c_3(3), 0(2)>

>

> I think this is the code generated if maybe_zero is not set?  which it

> should not be set here.

> For the same reason, it can be further optimized into:

>

>>   <bb 2> [local count: 118111601]:

>>   _2 = (unsigned long) b_4(D);

>>   c_12 = __builtin_popcountl (_2);

>>

>

>> latch execute zero times for b_4 == 0 means that the body will execute

>> ones.

> You never get zero times latch here with the aforementioned guard condition.

>

> BTW, I didn't look at following patches which could be wanted optimizations.

>

> Thanks,

> bin

>>

>> The issue here is, since we are checking if (b_4(D) != 0) before

>> entering the loop means we don't need to set maybe_zero. Patch 2

>> handles this.

>>

>> With that we have

>>   <bb 2> [local count: 118111601]:

>>   if (b_4(D) != 0)

>>     goto <bb 3>; [89.00%]

>>   else

>>     goto <bb 4>; [11.00%]

>>

>>   <bb 3> [local count: 105119324]:

>>   _2 = (unsigned long) b_4(D);

>>   _9 = __builtin_popcountl (_2);

>>

>>   <bb 4> [local count: 118111601]:

>>   # c_12 = PHI <0(2), _9(3)>

>>

>> As advised earlier, patch 3 adds phiopt support to remove this.

>>

>> Bootstrap and regression testing are ongoing.

>>

>> Is this OK for trunk.

>>

>> Thanks,

>> Kugan
Bin.Cheng June 25, 2018, 3:56 a.m. | #5
On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Bin,

>

> Thanks for your comments.

>

> On 25 June 2018 at 11:15, Bin.Cheng <amker.cheng@gmail.com> wrote:

>> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> When we set niter with maybe_zero, currently final_value_relacement

>>> will not happen due to expression_expensive_p not handling. Patch 1

>>> adds this.

>>>

>>> With that we have the following optimized gimple.

>>>

>>>   <bb 2> [local count: 118111601]:

>>>   if (b_4(D) != 0)

>>>     goto <bb 3>; [89.00%]

>>>   else

>>>     goto <bb 4>; [11.00%]

>>>

>>>   <bb 3> [local count: 105119324]:

>>>   _2 = (unsigned long) b_4(D);

>>>   _9 = __builtin_popcountl (_2);

>>>   c_3 = b_4(D) != 0 ? _9 : 1;

>>>

>>>   <bb 4> [local count: 118111601]:

>>>   # c_12 = PHI <c_3(3), 0(2)>

>>>

>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the

>> No, it doesn't make much sense.  when b_4(D) == 0, the popcount

>> computed should be 0.  Point is you can never get b_4(D) == 0 with

>> guard condition in basic block 2.  So the result should simply be:

>

> When we do  calculate niter (for the copy header case), we set

> may_be_zero as (which I think is correct)

> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,

>                   build_zero_cst

>                   (TREE_TYPE (src)));

>

> Then in final_value_replacement_loop (struct loop *loop)

>

> for the PHI stmt for which we are going to do the final value replacement,

> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC.

>

> then we do

> compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn)

>

> where when we do chrec_apply to the polynomial_chrec with niter from

> popcount which also has the may_be_zero, we end up with the 1.

> Looking at this, I am not sure if this is wrong. May be I am missing something.

I think it is wrong.  How could you get popcount == 1 when b_4(D) ==
0?  Though it never happens in this case.
>

> In this testcase, before we enter the loop we have a check for (b_4(D)

>> 0). Thus, setting niter->may_be_zero is not strictly necessary but

> conservatively correct (?).

Yes, but not necessarily.  Setting maybe_zero could confuse following
optimizations and we should avoid doing that whenever possible.  If
any pass goes wrong because it's not set conservatively, it is that
pass' responsibility and should be fixed accordingly.  Here IMHO, we
don't need to set it.

Thanks,
bin
>

> Thanks,

> Kugan

>

>>

>>>   <bb 2> [local count: 118111601]:

>>>   if (b_4(D) != 0)

>>>     goto <bb 3>; [89.00%]

>>>   else

>>>     goto <bb 4>; [11.00%]

>>>

>>>   <bb 3> [local count: 105119324]:

>>>   _2 = (unsigned long) b_4(D);

>>>   c_3 = __builtin_popcountl (_2);

>>>

>>>   <bb 4> [local count: 118111601]:

>>>   # c_12 = PHI <c_3(3), 0(2)>

>>

>> I think this is the code generated if maybe_zero is not set?  which it

>> should not be set here.

>> For the same reason, it can be further optimized into:

>>

>>>   <bb 2> [local count: 118111601]:

>>>   _2 = (unsigned long) b_4(D);

>>>   c_12 = __builtin_popcountl (_2);

>>>

>>

>>> latch execute zero times for b_4 == 0 means that the body will execute

>>> ones.

>> You never get zero times latch here with the aforementioned guard condition.

>>

>> BTW, I didn't look at following patches which could be wanted optimizations.

>>

>> Thanks,

>> bin

>>>

>>> The issue here is, since we are checking if (b_4(D) != 0) before

>>> entering the loop means we don't need to set maybe_zero. Patch 2

>>> handles this.

>>>

>>> With that we have

>>>   <bb 2> [local count: 118111601]:

>>>   if (b_4(D) != 0)

>>>     goto <bb 3>; [89.00%]

>>>   else

>>>     goto <bb 4>; [11.00%]

>>>

>>>   <bb 3> [local count: 105119324]:

>>>   _2 = (unsigned long) b_4(D);

>>>   _9 = __builtin_popcountl (_2);

>>>

>>>   <bb 4> [local count: 118111601]:

>>>   # c_12 = PHI <0(2), _9(3)>

>>>

>>> As advised earlier, patch 3 adds phiopt support to remove this.

>>>

>>> Bootstrap and regression testing are ongoing.

>>>

>>> Is this OK for trunk.

>>>

>>> Thanks,

>>> Kugan
Kugan Vivekanandarajah June 25, 2018, 5:50 a.m. | #6
Hi Bin,

On 25 June 2018 at 13:56, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi Bin,

>>

>> Thanks for your comments.

>>

>> On 25 June 2018 at 11:15, Bin.Cheng <amker.cheng@gmail.com> wrote:

>>> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> When we set niter with maybe_zero, currently final_value_relacement

>>>> will not happen due to expression_expensive_p not handling. Patch 1

>>>> adds this.

>>>>

>>>> With that we have the following optimized gimple.

>>>>

>>>>   <bb 2> [local count: 118111601]:

>>>>   if (b_4(D) != 0)

>>>>     goto <bb 3>; [89.00%]

>>>>   else

>>>>     goto <bb 4>; [11.00%]

>>>>

>>>>   <bb 3> [local count: 105119324]:

>>>>   _2 = (unsigned long) b_4(D);

>>>>   _9 = __builtin_popcountl (_2);

>>>>   c_3 = b_4(D) != 0 ? _9 : 1;

>>>>

>>>>   <bb 4> [local count: 118111601]:

>>>>   # c_12 = PHI <c_3(3), 0(2)>

>>>>

>>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the

>>> No, it doesn't make much sense.  when b_4(D) == 0, the popcount

>>> computed should be 0.  Point is you can never get b_4(D) == 0 with

>>> guard condition in basic block 2.  So the result should simply be:

>>

>> When we do  calculate niter (for the copy header case), we set

>> may_be_zero as (which I think is correct)

>> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,

>>                   build_zero_cst

>>                   (TREE_TYPE (src)));

>>

>> Then in final_value_replacement_loop (struct loop *loop)

>>

>> for the PHI stmt for which we are going to do the final value replacement,

>> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC.

>>

>> then we do

>> compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn)

>>

>> where when we do chrec_apply to the polynomial_chrec with niter from

>> popcount which also has the may_be_zero, we end up with the 1.

>> Looking at this, I am not sure if this is wrong. May be I am missing something.

> I think it is wrong.  How could you get popcount == 1 when b_4(D) ==

> 0?  Though it never happens in this case.


We dont set popcount = 1. When we set niter for popcount pattern with
niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,
                  build_zero_cst (TREE_TYPE (src)));

Because of which, we have an niter in the final_value_replacement, we have
(gdb) p debug_tree (niter)
 <cond_expr 0x7ffff6a76a80
    type <integer_type 0x7ffff694d1f8 public unsigned DI
        size <integer_cst 0x7ffff692dcf0 constant 64>
        unit-size <integer_cst 0x7ffff692dd08 constant 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff694d1f8 precision:64 min <integer_cst
0x7ffff694a120 0> max <integer_cst 0x7ffff692e5c0
18446744073709551615>>

    arg:0 <ne_expr 0x7ffff6a80910
        type <boolean_type 0x7ffff6945b28 _Bool public unsigned QI
            size <integer_cst 0x7ffff692dde0 constant 8>
            unit-size <integer_cst 0x7ffff692ddf8 constant 1>
            align:8 warn_if_not_align:0 symtab:0 alias-set -1
canonical-type 0x7ffff6945b28 precision:1 min <integer_cst
0x7ffff694a048 0> max <integer_cst 0x7ffff694a078 1>>

        arg:0 <ssa_name 0x7ffff6937a68 type <integer_type
0x7ffff6945738 long int>
            visited var <parm_decl 0x7ffff6a79000 b>
            def_stmt GIMPLE_NOPvolu
            version:4>
        arg:1 <integer_cst 0x7ffff6a64720 constant 0>>
    arg:1 <plus_expr 0x7ffff6a808c0 type <integer_type 0x7ffff694d1f8>

        arg:0 <nop_expr 0x7ffff6a883c0 type <integer_type 0x7ffff694d1f8>

            arg:0 <call_expr 0x7ffff69396c8 type <integer_type
0x7ffff69455e8 int>

                fn <addr_expr 0x7ffff6a883a0 type <pointer_type 0x7ffff6a55888>
                    readonly constant arg:0 <function_decl
0x7ffff69ff600 __builtin_popcountl>>
                arg:0 <nop_expr 0x7ffff6a88380 type <integer_type
0x7ffff694d1f8>
                    arg:0 <ssa_name 0x7ffff6937a68>>>>
        arg:1 <integer_cst 0x7ffff692e5c0 constant 18446744073709551615>>
    arg:2 <integer_cst 0x7ffff694a120 type <integer_type
0x7ffff694d1f8> constant 0>>

Then from there then we do compute_overall_effect_of_inner_loop for
scalar evolution of PHI with niter we get the 1.

>>

>> In this testcase, before we enter the loop we have a check for (b_4(D)

>>> 0). Thus, setting niter->may_be_zero is not strictly necessary but

>> conservatively correct (?).

> Yes, but not necessarily.  Setting maybe_zero could confuse following

> optimizations and we should avoid doing that whenever possible.  If

> any pass goes wrong because it's not set conservatively, it is that

> pass' responsibility and should be fixed accordingly.  Here IMHO, we

> don't need to set it.


My patch 2 is for not setting this when we know know a_4(D) is not
zero in this path.

Thanks,
Kugan




>

> Thanks,

> bin

>>

>> Thanks,

>> Kugan

>>

>>>

>>>>   <bb 2> [local count: 118111601]:

>>>>   if (b_4(D) != 0)

>>>>     goto <bb 3>; [89.00%]

>>>>   else

>>>>     goto <bb 4>; [11.00%]

>>>>

>>>>   <bb 3> [local count: 105119324]:

>>>>   _2 = (unsigned long) b_4(D);

>>>>   c_3 = __builtin_popcountl (_2);

>>>>

>>>>   <bb 4> [local count: 118111601]:

>>>>   # c_12 = PHI <c_3(3), 0(2)>

>>>

>>> I think this is the code generated if maybe_zero is not set?  which it

>>> should not be set here.

>>> For the same reason, it can be further optimized into:

>>>

>>>>   <bb 2> [local count: 118111601]:

>>>>   _2 = (unsigned long) b_4(D);

>>>>   c_12 = __builtin_popcountl (_2);

>>>>

>>>

>>>> latch execute zero times for b_4 == 0 means that the body will execute

>>>> ones.

>>> You never get zero times latch here with the aforementioned guard condition.

>>>

>>> BTW, I didn't look at following patches which could be wanted optimizations.

>>>

>>> Thanks,

>>> bin

>>>>

>>>> The issue here is, since we are checking if (b_4(D) != 0) before

>>>> entering the loop means we don't need to set maybe_zero. Patch 2

>>>> handles this.

>>>>

>>>> With that we have

>>>>   <bb 2> [local count: 118111601]:

>>>>   if (b_4(D) != 0)

>>>>     goto <bb 3>; [89.00%]

>>>>   else

>>>>     goto <bb 4>; [11.00%]

>>>>

>>>>   <bb 3> [local count: 105119324]:

>>>>   _2 = (unsigned long) b_4(D);

>>>>   _9 = __builtin_popcountl (_2);

>>>>

>>>>   <bb 4> [local count: 118111601]:

>>>>   # c_12 = PHI <0(2), _9(3)>

>>>>

>>>> As advised earlier, patch 3 adds phiopt support to remove this.

>>>>

>>>> Bootstrap and regression testing are ongoing.

>>>>

>>>> Is this OK for trunk.

>>>>

>>>> Thanks,

>>>> Kugan
Bin.Cheng June 25, 2018, 9:29 a.m. | #7
On Mon, Jun 25, 2018 at 1:50 PM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Bin,

>

> On 25 June 2018 at 13:56, Bin.Cheng <amker.cheng@gmail.com> wrote:

>> On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi Bin,

>>>

>>> Thanks for your comments.

>>>

>>> On 25 June 2018 at 11:15, Bin.Cheng <amker.cheng@gmail.com> wrote:

>>>> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> When we set niter with maybe_zero, currently final_value_relacement

>>>>> will not happen due to expression_expensive_p not handling. Patch 1

>>>>> adds this.

>>>>>

>>>>> With that we have the following optimized gimple.

>>>>>

>>>>>   <bb 2> [local count: 118111601]:

>>>>>   if (b_4(D) != 0)

>>>>>     goto <bb 3>; [89.00%]

>>>>>   else

>>>>>     goto <bb 4>; [11.00%]

>>>>>

>>>>>   <bb 3> [local count: 105119324]:

>>>>>   _2 = (unsigned long) b_4(D);

>>>>>   _9 = __builtin_popcountl (_2);

>>>>>   c_3 = b_4(D) != 0 ? _9 : 1;

>>>>>

>>>>>   <bb 4> [local count: 118111601]:

>>>>>   # c_12 = PHI <c_3(3), 0(2)>

>>>>>

>>>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the

>>>> No, it doesn't make much sense.  when b_4(D) == 0, the popcount

>>>> computed should be 0.  Point is you can never get b_4(D) == 0 with

>>>> guard condition in basic block 2.  So the result should simply be:

>>>

>>> When we do  calculate niter (for the copy header case), we set

>>> may_be_zero as (which I think is correct)

>>> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,

>>>                   build_zero_cst

>>>                   (TREE_TYPE (src)));

>>>

>>> Then in final_value_replacement_loop (struct loop *loop)

>>>

>>> for the PHI stmt for which we are going to do the final value replacement,

>>> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC.

>>>

>>> then we do

>>> compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn)

>>>

>>> where when we do chrec_apply to the polynomial_chrec with niter from

>>> popcount which also has the may_be_zero, we end up with the 1.

>>> Looking at this, I am not sure if this is wrong. May be I am missing something.

>> I think it is wrong.  How could you get popcount == 1 when b_4(D) ==

>> 0?  Though it never happens in this case.

>

> We dont set popcount = 1. When we set niter for popcount pattern with

> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,

>                   build_zero_cst (TREE_TYPE (src)));

Hmm, I think this is unnecessary and causing the weird cond_expr in
following optimization.  What happens if you simply set it to false?

Thanks,
bin
>

> Because of which, we have an niter in the final_value_replacement, we have

> (gdb) p debug_tree (niter)

>  <cond_expr 0x7ffff6a76a80

>     type <integer_type 0x7ffff694d1f8 public unsigned DI

>         size <integer_cst 0x7ffff692dcf0 constant 64>

>         unit-size <integer_cst 0x7ffff692dd08 constant 8>

>         align:64 warn_if_not_align:0 symtab:0 alias-set -1

> canonical-type 0x7ffff694d1f8 precision:64 min <integer_cst

> 0x7ffff694a120 0> max <integer_cst 0x7ffff692e5c0

> 18446744073709551615>>

>

>     arg:0 <ne_expr 0x7ffff6a80910

>         type <boolean_type 0x7ffff6945b28 _Bool public unsigned QI

>             size <integer_cst 0x7ffff692dde0 constant 8>

>             unit-size <integer_cst 0x7ffff692ddf8 constant 1>

>             align:8 warn_if_not_align:0 symtab:0 alias-set -1

> canonical-type 0x7ffff6945b28 precision:1 min <integer_cst

> 0x7ffff694a048 0> max <integer_cst 0x7ffff694a078 1>>

>

>         arg:0 <ssa_name 0x7ffff6937a68 type <integer_type

> 0x7ffff6945738 long int>

>             visited var <parm_decl 0x7ffff6a79000 b>

>             def_stmt GIMPLE_NOPvolu

>             version:4>

>         arg:1 <integer_cst 0x7ffff6a64720 constant 0>>

>     arg:1 <plus_expr 0x7ffff6a808c0 type <integer_type 0x7ffff694d1f8>

>

>         arg:0 <nop_expr 0x7ffff6a883c0 type <integer_type 0x7ffff694d1f8>

>

>             arg:0 <call_expr 0x7ffff69396c8 type <integer_type

> 0x7ffff69455e8 int>

>

>                 fn <addr_expr 0x7ffff6a883a0 type <pointer_type 0x7ffff6a55888>

>                     readonly constant arg:0 <function_decl

> 0x7ffff69ff600 __builtin_popcountl>>

>                 arg:0 <nop_expr 0x7ffff6a88380 type <integer_type

> 0x7ffff694d1f8>

>                     arg:0 <ssa_name 0x7ffff6937a68>>>>

>         arg:1 <integer_cst 0x7ffff692e5c0 constant 18446744073709551615>>

>     arg:2 <integer_cst 0x7ffff694a120 type <integer_type

> 0x7ffff694d1f8> constant 0>>

>

> Then from there then we do compute_overall_effect_of_inner_loop for

> scalar evolution of PHI with niter we get the 1.

>

>>>

>>> In this testcase, before we enter the loop we have a check for (b_4(D)

>>>> 0). Thus, setting niter->may_be_zero is not strictly necessary but

>>> conservatively correct (?).

>> Yes, but not necessarily.  Setting maybe_zero could confuse following

>> optimizations and we should avoid doing that whenever possible.  If

>> any pass goes wrong because it's not set conservatively, it is that

>> pass' responsibility and should be fixed accordingly.  Here IMHO, we

>> don't need to set it.

>

> My patch 2 is for not setting this when we know know a_4(D) is not

> zero in this path.

>

> Thanks,

> Kugan

>

>

>

>

>>

>> Thanks,

>> bin

>>>

>>> Thanks,

>>> Kugan

>>>

>>>>

>>>>>   <bb 2> [local count: 118111601]:

>>>>>   if (b_4(D) != 0)

>>>>>     goto <bb 3>; [89.00%]

>>>>>   else

>>>>>     goto <bb 4>; [11.00%]

>>>>>

>>>>>   <bb 3> [local count: 105119324]:

>>>>>   _2 = (unsigned long) b_4(D);

>>>>>   c_3 = __builtin_popcountl (_2);

>>>>>

>>>>>   <bb 4> [local count: 118111601]:

>>>>>   # c_12 = PHI <c_3(3), 0(2)>

>>>>

>>>> I think this is the code generated if maybe_zero is not set?  which it

>>>> should not be set here.

>>>> For the same reason, it can be further optimized into:

>>>>

>>>>>   <bb 2> [local count: 118111601]:

>>>>>   _2 = (unsigned long) b_4(D);

>>>>>   c_12 = __builtin_popcountl (_2);

>>>>>

>>>>

>>>>> latch execute zero times for b_4 == 0 means that the body will execute

>>>>> ones.

>>>> You never get zero times latch here with the aforementioned guard condition.

>>>>

>>>> BTW, I didn't look at following patches which could be wanted optimizations.

>>>>

>>>> Thanks,

>>>> bin

>>>>>

>>>>> The issue here is, since we are checking if (b_4(D) != 0) before

>>>>> entering the loop means we don't need to set maybe_zero. Patch 2

>>>>> handles this.

>>>>>

>>>>> With that we have

>>>>>   <bb 2> [local count: 118111601]:

>>>>>   if (b_4(D) != 0)

>>>>>     goto <bb 3>; [89.00%]

>>>>>   else

>>>>>     goto <bb 4>; [11.00%]

>>>>>

>>>>>   <bb 3> [local count: 105119324]:

>>>>>   _2 = (unsigned long) b_4(D);

>>>>>   _9 = __builtin_popcountl (_2);

>>>>>

>>>>>   <bb 4> [local count: 118111601]:

>>>>>   # c_12 = PHI <0(2), _9(3)>

>>>>>

>>>>> As advised earlier, patch 3 adds phiopt support to remove this.

>>>>>

>>>>> Bootstrap and regression testing are ongoing.

>>>>>

>>>>> Is this OK for trunk.

>>>>>

>>>>> Thanks,

>>>>> Kugan
Richard Biener June 25, 2018, 9:45 a.m. | #8
On Mon, Jun 25, 2018 at 11:30 AM Bin.Cheng <amker.cheng@gmail.com> wrote:
>

> On Mon, Jun 25, 2018 at 1:50 PM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

> > Hi Bin,

> >

> > On 25 June 2018 at 13:56, Bin.Cheng <amker.cheng@gmail.com> wrote:

> >> On Mon, Jun 25, 2018 at 11:37 AM, Kugan Vivekanandarajah

> >> <kugan.vivekanandarajah@linaro.org> wrote:

> >>> Hi Bin,

> >>>

> >>> Thanks for your comments.

> >>>

> >>> On 25 June 2018 at 11:15, Bin.Cheng <amker.cheng@gmail.com> wrote:

> >>>> On Fri, Jun 22, 2018 at 5:11 PM, Kugan Vivekanandarajah

> >>>> <kugan.vivekanandarajah@linaro.org> wrote:

> >>>>> When we set niter with maybe_zero, currently final_value_relacement

> >>>>> will not happen due to expression_expensive_p not handling. Patch 1

> >>>>> adds this.

> >>>>>

> >>>>> With that we have the following optimized gimple.

> >>>>>

> >>>>>   <bb 2> [local count: 118111601]:

> >>>>>   if (b_4(D) != 0)

> >>>>>     goto <bb 3>; [89.00%]

> >>>>>   else

> >>>>>     goto <bb 4>; [11.00%]

> >>>>>

> >>>>>   <bb 3> [local count: 105119324]:

> >>>>>   _2 = (unsigned long) b_4(D);

> >>>>>   _9 = __builtin_popcountl (_2);

> >>>>>   c_3 = b_4(D) != 0 ? _9 : 1;

> >>>>>

> >>>>>   <bb 4> [local count: 118111601]:

> >>>>>   # c_12 = PHI <c_3(3), 0(2)>

> >>>>>

> >>>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the

> >>>> No, it doesn't make much sense.  when b_4(D) == 0, the popcount

> >>>> computed should be 0.  Point is you can never get b_4(D) == 0 with

> >>>> guard condition in basic block 2.  So the result should simply be:

> >>>

> >>> When we do  calculate niter (for the copy header case), we set

> >>> may_be_zero as (which I think is correct)

> >>> niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,

> >>>                   build_zero_cst

> >>>                   (TREE_TYPE (src)));

> >>>

> >>> Then in final_value_replacement_loop (struct loop *loop)

> >>>

> >>> for the PHI stmt for which we are going to do the final value replacement,

> >>> we analyze_scalar_evolution_in_loop which is POLYNOMIAL_CHREC.

> >>>

> >>> then we do

> >>> compute_overall_effect_of_inner_loop (struct loop *loop, tree evolution_fn)

> >>>

> >>> where when we do chrec_apply to the polynomial_chrec with niter from

> >>> popcount which also has the may_be_zero, we end up with the 1.

> >>> Looking at this, I am not sure if this is wrong. May be I am missing something.

> >> I think it is wrong.  How could you get popcount == 1 when b_4(D) ==

> >> 0?  Though it never happens in this case.

> >

> > We dont set popcount = 1. When we set niter for popcount pattern with

> > niter->may_be_zero = fold_build2 (EQ_EXPR, boolean_type_node, src,

> >                   build_zero_cst (TREE_TYPE (src)));

> Hmm, I think this is unnecessary and causing the weird cond_expr in

> following optimization.  What happens if you simply set it to false?


It is surely not unnecessary because we set it to non-NULL only when
the result is _not_ simply popcount() but popcount()-1.

All the above suggests that SCEV does sth wrong.  Dumps show

  (set_nb_iterations_in_loop = b_4(D) != 0 ? (unsigned long)
__builtin_popcountl ((unsigned long) b_4(D)) + 18446744073709551615 :
0))

(chrec_apply
  (varying_loop = 1
)
  (chrec = {1, +, 1}_1)
  (x = b_4(D) != 0 ? (int) ((unsigned int) __builtin_popcountl
((unsigned long) b_4(D)) + 4294967295) : 0)
  (res = b_4(D) != 0 ? __builtin_popcountl ((unsigned long) b_4(D)) : 1))
)

this is because the analysis works on a header-copied loop and thus
the IV for C is {1, +, 1}
so this is all correct and to be simply optimized by following passes
factoring in
the range of b_4(D) which was tested to be _not_zero before.

Richard.

> Thanks,

> bin

> >

> > Because of which, we have an niter in the final_value_replacement, we have

> > (gdb) p debug_tree (niter)

> >  <cond_expr 0x7ffff6a76a80

> >     type <integer_type 0x7ffff694d1f8 public unsigned DI

> >         size <integer_cst 0x7ffff692dcf0 constant 64>

> >         unit-size <integer_cst 0x7ffff692dd08 constant 8>

> >         align:64 warn_if_not_align:0 symtab:0 alias-set -1

> > canonical-type 0x7ffff694d1f8 precision:64 min <integer_cst

> > 0x7ffff694a120 0> max <integer_cst 0x7ffff692e5c0

> > 18446744073709551615>>

> >

> >     arg:0 <ne_expr 0x7ffff6a80910

> >         type <boolean_type 0x7ffff6945b28 _Bool public unsigned QI

> >             size <integer_cst 0x7ffff692dde0 constant 8>

> >             unit-size <integer_cst 0x7ffff692ddf8 constant 1>

> >             align:8 warn_if_not_align:0 symtab:0 alias-set -1

> > canonical-type 0x7ffff6945b28 precision:1 min <integer_cst

> > 0x7ffff694a048 0> max <integer_cst 0x7ffff694a078 1>>

> >

> >         arg:0 <ssa_name 0x7ffff6937a68 type <integer_type

> > 0x7ffff6945738 long int>

> >             visited var <parm_decl 0x7ffff6a79000 b>

> >             def_stmt GIMPLE_NOPvolu

> >             version:4>

> >         arg:1 <integer_cst 0x7ffff6a64720 constant 0>>

> >     arg:1 <plus_expr 0x7ffff6a808c0 type <integer_type 0x7ffff694d1f8>

> >

> >         arg:0 <nop_expr 0x7ffff6a883c0 type <integer_type 0x7ffff694d1f8>

> >

> >             arg:0 <call_expr 0x7ffff69396c8 type <integer_type

> > 0x7ffff69455e8 int>

> >

> >                 fn <addr_expr 0x7ffff6a883a0 type <pointer_type 0x7ffff6a55888>

> >                     readonly constant arg:0 <function_decl

> > 0x7ffff69ff600 __builtin_popcountl>>

> >                 arg:0 <nop_expr 0x7ffff6a88380 type <integer_type

> > 0x7ffff694d1f8>

> >                     arg:0 <ssa_name 0x7ffff6937a68>>>>

> >         arg:1 <integer_cst 0x7ffff692e5c0 constant 18446744073709551615>>

> >     arg:2 <integer_cst 0x7ffff694a120 type <integer_type

> > 0x7ffff694d1f8> constant 0>>

> >

> > Then from there then we do compute_overall_effect_of_inner_loop for

> > scalar evolution of PHI with niter we get the 1.

> >

> >>>

> >>> In this testcase, before we enter the loop we have a check for (b_4(D)

> >>>> 0). Thus, setting niter->may_be_zero is not strictly necessary but

> >>> conservatively correct (?).

> >> Yes, but not necessarily.  Setting maybe_zero could confuse following

> >> optimizations and we should avoid doing that whenever possible.  If

> >> any pass goes wrong because it's not set conservatively, it is that

> >> pass' responsibility and should be fixed accordingly.  Here IMHO, we

> >> don't need to set it.

> >

> > My patch 2 is for not setting this when we know know a_4(D) is not

> > zero in this path.

> >

> > Thanks,

> > Kugan

> >

> >

> >

> >

> >>

> >> Thanks,

> >> bin

> >>>

> >>> Thanks,

> >>> Kugan

> >>>

> >>>>

> >>>>>   <bb 2> [local count: 118111601]:

> >>>>>   if (b_4(D) != 0)

> >>>>>     goto <bb 3>; [89.00%]

> >>>>>   else

> >>>>>     goto <bb 4>; [11.00%]

> >>>>>

> >>>>>   <bb 3> [local count: 105119324]:

> >>>>>   _2 = (unsigned long) b_4(D);

> >>>>>   c_3 = __builtin_popcountl (_2);

> >>>>>

> >>>>>   <bb 4> [local count: 118111601]:

> >>>>>   # c_12 = PHI <c_3(3), 0(2)>

> >>>>

> >>>> I think this is the code generated if maybe_zero is not set?  which it

> >>>> should not be set here.

> >>>> For the same reason, it can be further optimized into:

> >>>>

> >>>>>   <bb 2> [local count: 118111601]:

> >>>>>   _2 = (unsigned long) b_4(D);

> >>>>>   c_12 = __builtin_popcountl (_2);

> >>>>>

> >>>>

> >>>>> latch execute zero times for b_4 == 0 means that the body will execute

> >>>>> ones.

> >>>> You never get zero times latch here with the aforementioned guard condition.

> >>>>

> >>>> BTW, I didn't look at following patches which could be wanted optimizations.

> >>>>

> >>>> Thanks,

> >>>> bin

> >>>>>

> >>>>> The issue here is, since we are checking if (b_4(D) != 0) before

> >>>>> entering the loop means we don't need to set maybe_zero. Patch 2

> >>>>> handles this.

> >>>>>

> >>>>> With that we have

> >>>>>   <bb 2> [local count: 118111601]:

> >>>>>   if (b_4(D) != 0)

> >>>>>     goto <bb 3>; [89.00%]

> >>>>>   else

> >>>>>     goto <bb 4>; [11.00%]

> >>>>>

> >>>>>   <bb 3> [local count: 105119324]:

> >>>>>   _2 = (unsigned long) b_4(D);

> >>>>>   _9 = __builtin_popcountl (_2);

> >>>>>

> >>>>>   <bb 4> [local count: 118111601]:

> >>>>>   # c_12 = PHI <0(2), _9(3)>

> >>>>>

> >>>>> As advised earlier, patch 3 adds phiopt support to remove this.

> >>>>>

> >>>>> Bootstrap and regression testing are ongoing.

> >>>>>

> >>>>> Is this OK for trunk.

> >>>>>

> >>>>> Thanks,

> >>>>> Kugan
Jeff Law June 28, 2018, 2:59 a.m. | #9
On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote:
> Hi Jeff,

> 

> Thanks for the comments.

> 

> On 23 June 2018 at 02:06, Jeff Law <law@redhat.com> wrote:

>> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:

>>> When we set niter with maybe_zero, currently final_value_relacement

>>> will not happen due to expression_expensive_p not handling. Patch 1

>>> adds this.

>>>

>>> With that we have the following optimized gimple.

>>>

>>>   <bb 2> [local count: 118111601]:

>>>   if (b_4(D) != 0)

>>>     goto <bb 3>; [89.00%]

>>>   else

>>>     goto <bb 4>; [11.00%]

>>>

>>>   <bb 3> [local count: 105119324]:

>>>   _2 = (unsigned long) b_4(D);

>>>   _9 = __builtin_popcountl (_2);

>>>   c_3 = b_4(D) != 0 ? _9 : 1;

>>>

>>>   <bb 4> [local count: 118111601]:

>>>   # c_12 = PHI <c_3(3), 0(2)>

>>>

>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the

>>> latch execute zero times for b_4 == 0 means that the body will execute

>>> ones.

>> ISTM that DOM ought to have simplified the conditional, unless there's

>> some other way to get to bb3.  We know that b_4 is nonzero and thus c_3

>> must have the value _9.

> As of now, dom is not optimizing it. With the attached hack, it can be made to.

Thanks.   It's not as much of a hack as you might think.

Right now there's two ways to get at the data we're looking for.  One is
to query the VRP engine like you've done.  The other is to query the
expression table which should have something like 1 = (b4 != 0) in it as
that expression holds on the 2->3 edge which dominates bb3.

One of the things I want to do this summer is integrate the VRP engine
queries better into DOM and stop relying on adding conditionals into the
expression table (which isn't as powerful and doesn't scale well).  So
your patch isn't a terrible hack.  It's just not the well integrated
solution I want to pull together :-)


I think it was just last year when I started wondering if we were
handling COND_EXPRs on the RHS of an assignment correctly, but it got
lost on my stack of TODO items.  It's good to have a testcase which
confirms my suspicion that we're not handling them as well as we should.

I've opened bz86339 to track this issue.

Jeff
Jeff Law July 5, 2018, 10:03 p.m. | #10
On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote:
> Hi Jeff,

> 

> Thanks for the comments.

> 

> On 23 June 2018 at 02:06, Jeff Law <law@redhat.com> wrote:

>> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:

>>> When we set niter with maybe_zero, currently final_value_relacement

>>> will not happen due to expression_expensive_p not handling. Patch 1

>>> adds this.

>>>

>>> With that we have the following optimized gimple.

>>>

>>>   <bb 2> [local count: 118111601]:

>>>   if (b_4(D) != 0)

>>>     goto <bb 3>; [89.00%]

>>>   else

>>>     goto <bb 4>; [11.00%]

>>>

>>>   <bb 3> [local count: 105119324]:

>>>   _2 = (unsigned long) b_4(D);

>>>   _9 = __builtin_popcountl (_2);

>>>   c_3 = b_4(D) != 0 ? _9 : 1;

>>>

>>>   <bb 4> [local count: 118111601]:

>>>   # c_12 = PHI <c_3(3), 0(2)>

>>>

>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the

>>> latch execute zero times for b_4 == 0 means that the body will execute

>>> ones.

>> ISTM that DOM ought to have simplified the conditional, unless there's

>> some other way to get to bb3.  We know that b_4 is nonzero and thus c_3

>> must have the value _9.

> As of now, dom is not optimizing it. With the attached hack, it can be made to.

What's strange is I'm not getting the c_3 = (b_4 != 0) ... in any of the
dumps I'm looking at.  Instead it's c_3 = _9, which is what I would
expect since we know that b_4 != 0


My tests have been on x86_64 and aarch64 linux targets.  I've tried with
patch#1 installed as well as with patch #1 and patch #2 together.

What target, what flags and what patches do I need to see this?

Jeff
Richard Biener July 6, 2018, 5:39 a.m. | #11
On July 6, 2018 12:03:11 AM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote:

>> Hi Jeff,

>> 

>> Thanks for the comments.

>> 

>> On 23 June 2018 at 02:06, Jeff Law <law@redhat.com> wrote:

>>> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:

>>>> When we set niter with maybe_zero, currently final_value_relacement

>>>> will not happen due to expression_expensive_p not handling. Patch 1

>>>> adds this.

>>>>

>>>> With that we have the following optimized gimple.

>>>>

>>>>   <bb 2> [local count: 118111601]:

>>>>   if (b_4(D) != 0)

>>>>     goto <bb 3>; [89.00%]

>>>>   else

>>>>     goto <bb 4>; [11.00%]

>>>>

>>>>   <bb 3> [local count: 105119324]:

>>>>   _2 = (unsigned long) b_4(D);

>>>>   _9 = __builtin_popcountl (_2);

>>>>   c_3 = b_4(D) != 0 ? _9 : 1;

>>>>

>>>>   <bb 4> [local count: 118111601]:

>>>>   # c_12 = PHI <c_3(3), 0(2)>

>>>>

>>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when

>the

>>>> latch execute zero times for b_4 == 0 means that the body will

>execute

>>>> ones.

>>> ISTM that DOM ought to have simplified the conditional, unless

>there's

>>> some other way to get to bb3.  We know that b_4 is nonzero and thus

>c_3

>>> must have the value _9.

>> As of now, dom is not optimizing it. With the attached hack, it can

>be made to.

>What's strange is I'm not getting the c_3 = (b_4 != 0) ... in any of

>the

>dumps I'm looking at.  Instead it's c_3 = _9, which is what I would

>expect since we know that b_4 != 0

>

>

>My tests have been on x86_64 and aarch64 linux targets.  I've tried

>with

>patch#1 installed as well as with patch #1 and patch #2 together.

>

>What target, what flags and what patches do I need to see this?


I believe it has been fixed in niters analysis to avoid the condition if it is known to be true. 

Richard. 

>Jeff
Jeff Law July 6, 2018, 5:51 a.m. | #12
On 07/05/2018 11:39 PM, Richard Biener wrote:
> On July 6, 2018 12:03:11 AM GMT+02:00, Jeff Law <law@redhat.com> wrote:

>> On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote:

>>> Hi Jeff,

>>>

>>> Thanks for the comments.

>>>

>>> On 23 June 2018 at 02:06, Jeff Law <law@redhat.com> wrote:

>>>> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:

>>>>> When we set niter with maybe_zero, currently final_value_relacement

>>>>> will not happen due to expression_expensive_p not handling. Patch 1

>>>>> adds this.

>>>>>

>>>>> With that we have the following optimized gimple.

>>>>>

>>>>>   <bb 2> [local count: 118111601]:

>>>>>   if (b_4(D) != 0)

>>>>>     goto <bb 3>; [89.00%]

>>>>>   else

>>>>>     goto <bb 4>; [11.00%]

>>>>>

>>>>>   <bb 3> [local count: 105119324]:

>>>>>   _2 = (unsigned long) b_4(D);

>>>>>   _9 = __builtin_popcountl (_2);

>>>>>   c_3 = b_4(D) != 0 ? _9 : 1;

>>>>>

>>>>>   <bb 4> [local count: 118111601]:

>>>>>   # c_12 = PHI <c_3(3), 0(2)>

>>>>>

>>>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when

>> the

>>>>> latch execute zero times for b_4 == 0 means that the body will

>> execute

>>>>> ones.

>>>> ISTM that DOM ought to have simplified the conditional, unless

>> there's

>>>> some other way to get to bb3.  We know that b_4 is nonzero and thus

>> c_3

>>>> must have the value _9.

>>> As of now, dom is not optimizing it. With the attached hack, it can

>> be made to.

>> What's strange is I'm not getting the c_3 = (b_4 != 0) ... in any of

>> the

>> dumps I'm looking at.  Instead it's c_3 = _9, which is what I would

>> expect since we know that b_4 != 0

>>

>>

>> My tests have been on x86_64 and aarch64 linux targets.  I've tried

>> with

>> patch#1 installed as well as with patch #1 and patch #2 together.

>>

>> What target, what flags and what patches do I need to see this?

> 

> I believe it has been fixed in niters analysis to avoid the condition if it is known to be true. 

Ah.  Presumably the change from 6-16.  I'll back that out and retry.

jeff
Kugan Vivekanandarajah July 6, 2018, 6:15 a.m. | #13
Hi Jeff,

Thanks for looking into it.

On 6 July 2018 at 08:03, Jeff Law <law@redhat.com> wrote:
> On 06/24/2018 08:41 PM, Kugan Vivekanandarajah wrote:

>> Hi Jeff,

>>

>> Thanks for the comments.

>>

>> On 23 June 2018 at 02:06, Jeff Law <law@redhat.com> wrote:

>>> On 06/22/2018 03:11 AM, Kugan Vivekanandarajah wrote:

>>>> When we set niter with maybe_zero, currently final_value_relacement

>>>> will not happen due to expression_expensive_p not handling. Patch 1

>>>> adds this.

>>>>

>>>> With that we have the following optimized gimple.

>>>>

>>>>   <bb 2> [local count: 118111601]:

>>>>   if (b_4(D) != 0)

>>>>     goto <bb 3>; [89.00%]

>>>>   else

>>>>     goto <bb 4>; [11.00%]

>>>>

>>>>   <bb 3> [local count: 105119324]:

>>>>   _2 = (unsigned long) b_4(D);

>>>>   _9 = __builtin_popcountl (_2);

>>>>   c_3 = b_4(D) != 0 ? _9 : 1;

>>>>

>>>>   <bb 4> [local count: 118111601]:

>>>>   # c_12 = PHI <c_3(3), 0(2)>

>>>>

>>>> I assume that 1 in  b_4(D) != 0 ? _9 : 1; is OK (?) because when the

>>>> latch execute zero times for b_4 == 0 means that the body will execute

>>>> ones.

>>> ISTM that DOM ought to have simplified the conditional, unless there's

>>> some other way to get to bb3.  We know that b_4 is nonzero and thus c_3

>>> must have the value _9.

>> As of now, dom is not optimizing it. With the attached hack, it can be made to.

> What's strange is I'm not getting the c_3 = (b_4 != 0) ... in any of the

> dumps I'm looking at.  Instead it's c_3 = _9, which is what I would

> expect since we know that b_4 != 0

>

>

> My tests have been on x86_64 and aarch64 linux targets.  I've tried with

> patch#1 installed as well as with patch #1 and patch #2 together.

>

> What target, what flags and what patches do I need to see this?

You need the patch 1 (attaching) to get that. With Patch 2 in this
series, it will be optimized.

I haven't committed the patches yet as I am testing all the three
patches. I will commit after testing on current trunk.

Thanks,
Kugan


>

> Jeff
From 12263df77931aa55d205b9db470436848d762684 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Fri, 22 Jun 2018 14:10:26 +1000
Subject: [PATCH 1/3] generate popcount when checked for zero

Change-Id: I951e6d487268b757cbdaa8dcf671ab1377490db6
---
 gcc/gimplify.c                          |  2 +-
 gcc/gimplify.h                          |  1 +
 gcc/testsuite/gcc.dg/tree-ssa/pr64183.c |  2 +-
 gcc/testsuite/gcc.target/i386/pr85073.c |  2 +-
 gcc/tree-scalar-evolution.c             | 12 ++++++++++++
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 48ac92e..c86ad1a 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -3878,7 +3878,7 @@ gimplify_pure_cond_expr (tree *expr_p, gimple_seq *pre_p)
    EXPR is GENERIC, while tree_could_trap_p can be called
    only on GIMPLE.  */
 
-static bool
+bool
 generic_expr_could_trap_p (tree expr)
 {
   unsigned i, n;
diff --git a/gcc/gimplify.h b/gcc/gimplify.h
index dd0e4c0..62ca869 100644
--- a/gcc/gimplify.h
+++ b/gcc/gimplify.h
@@ -83,6 +83,7 @@ extern enum gimplify_status gimplify_arg (tree *, gimple_seq *, location_t,
 extern void gimplify_function_tree (tree);
 extern enum gimplify_status gimplify_va_arg_expr (tree *, gimple_seq *,
 						  gimple_seq *);
+extern bool generic_expr_could_trap_p (tree expr);
 gimple *gimplify_assign (tree, tree, gimple_seq *);
 
 #endif /* GCC_GIMPLIFY_H */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64183.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
index 7a854fc..50d0c5a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fno-tree-vectorize -fdump-tree-cunroll-details" } */
+/* { dg-options "-O3 -fno-tree-vectorize -fdisable-tree-sccp -fdump-tree-cunroll-details" } */
 
 int bits;
 unsigned int size;
diff --git a/gcc/testsuite/gcc.target/i386/pr85073.c b/gcc/testsuite/gcc.target/i386/pr85073.c
index 187102d..71a5d23 100644
--- a/gcc/testsuite/gcc.target/i386/pr85073.c
+++ b/gcc/testsuite/gcc.target/i386/pr85073.c
@@ -1,6 +1,6 @@
 /* PR target/85073 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -mbmi" } */
+/* { dg-options "-O2 -mbmi -fdisable-tree-sccp" } */
 
 int
 foo (unsigned x)
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 4b0ec02..8e29005 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -3508,6 +3508,18 @@ expression_expensive_p (tree expr)
       return false;
     }
 
+  if (code == COND_EXPR)
+    return (expression_expensive_p (TREE_OPERAND (expr, 0))
+	    || (EXPR_P (TREE_OPERAND (expr, 1))
+		&& EXPR_P (TREE_OPERAND (expr, 2)))
+	    /* If either branch has side effects or could trap.  */
+	    || TREE_SIDE_EFFECTS (TREE_OPERAND (expr, 1))
+	    || generic_expr_could_trap_p (TREE_OPERAND (expr, 1))
+	    || TREE_SIDE_EFFECTS (TREE_OPERAND (expr, 0))
+	    || generic_expr_could_trap_p (TREE_OPERAND (expr, 0))
+	    || expression_expensive_p (TREE_OPERAND (expr, 1))
+	    || expression_expensive_p (TREE_OPERAND (expr, 2)));
+
   switch (TREE_CODE_CLASS (code))
     {
     case tcc_binary: