[stage1] ipa-cp: Fix PGO regression caused by r278808

Message ID 20200601014557.10718-1-luoxhu@linux.ibm.com
State New
Headers show
Series
  • [stage1] ipa-cp: Fix PGO regression caused by r278808
Related show

Commit Message

Kees Cook via Gcc-patches June 1, 2020, 1:45 a.m.
resend the patch for stage1:
https://gcc.gnu.org/pipermail/gcc-patches/2020-January/538186.html

The performance of exchange2 built with PGO will decrease ~28% by r278808
due to profile count set incorrectly.  The cloned nodes are updated to a
very small count caused later pass cunroll fail to unroll the recursive
function in exchange2.

digits_2 ->
digits_2.constprop.0, digits_2.constprop.1, etc.

1. Enable proportion orig_sum to the new nodes for self recursive node
   (k means multiple self recursive calls):
   new_sum = (orig_sum + new_sum) * 1 / k \
   * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).
2. Two mutually recursive functions are not supported in self recursive
   clone yet so also not supported in update_profiling_info here.
3. Improve value range for param_ipa_cp_max_recursive_depth to (0, 8).
   If it calls itself two different sets, usually recursive boudary limit
   will stop the specialize first, otherwise it is slow even without
   recursive specialize.

gcc/ChangeLog:

	2020-05-29  Xionghu Luo  <luoxhu@linux.ibm.com>

	* ipa-cp.c (update_profiling_info): Check self_scc node.
	* params.opt (ipa-cp-max-recursive-depth): Set param range.

gcc/testsuite/ChangeLog:

	2020-05-29  Xionghu Luo  <luoxhu@linux.ibm.com>

	* gcc.dg/ipa/ipa-clone-3.c: Update count.
---
 gcc/ipa-cp.c                           | 33 +++++++++++++++++++++++++-
 gcc/params.opt                         |  2 +-
 gcc/testsuite/gcc.dg/ipa/ipa-clone-3.c |  2 +-
 3 files changed, 34 insertions(+), 3 deletions(-)

-- 
2.21.0.777.g83232e3864

Comments

Kees Cook via Gcc-patches June 15, 2020, 7:20 a.m. | #1
Gentle ping...


On 2020/6/1 09:45, Xionghu Luo wrote:
> resend the patch for stage1:

> https://gcc.gnu.org/pipermail/gcc-patches/2020-January/538186.html

> 

> The performance of exchange2 built with PGO will decrease ~28% by r278808

> due to profile count set incorrectly.  The cloned nodes are updated to a

> very small count caused later pass cunroll fail to unroll the recursive

> function in exchange2.

> 

> digits_2 ->

> digits_2.constprop.0, digits_2.constprop.1, etc.

> 

> 1. Enable proportion orig_sum to the new nodes for self recursive node

>     (k means multiple self recursive calls):

>     new_sum = (orig_sum + new_sum) * 1 / k \

>     * self_recursive_probability * (1 / param_ipa_cp_max_recursive_depth).

> 2. Two mutually recursive functions are not supported in self recursive

>     clone yet so also not supported in update_profiling_info here.

> 3. Improve value range for param_ipa_cp_max_recursive_depth to (0, 8).

>     If it calls itself two different sets, usually recursive boudary limit

>     will stop the specialize first, otherwise it is slow even without

>     recursive specialize.

> 

> gcc/ChangeLog:

> 

> 	2020-05-29  Xionghu Luo  <luoxhu@linux.ibm.com>

> 

> 	* ipa-cp.c (update_profiling_info): Check self_scc node.

> 	* params.opt (ipa-cp-max-recursive-depth): Set param range.

> 

> gcc/testsuite/ChangeLog:

> 

> 	2020-05-29  Xionghu Luo  <luoxhu@linux.ibm.com>

> 

> 	* gcc.dg/ipa/ipa-clone-3.c: Update count.

> ---

>   gcc/ipa-cp.c                           | 33 +++++++++++++++++++++++++-

>   gcc/params.opt                         |  2 +-

>   gcc/testsuite/gcc.dg/ipa/ipa-clone-3.c |  2 +-

>   3 files changed, 34 insertions(+), 3 deletions(-)

> 

> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c

> index c64e9104a94..919c741ecfa 100644

> --- a/gcc/ipa-cp.c

> +++ b/gcc/ipa-cp.c

> @@ -2046,7 +2046,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs,

>         /* Recursively generate lattice values with a limited count.  */

>         FOR_EACH_VEC_ELT (val_seeds, i, src_val)

>   	{

> -	  for (int j = 1; j < max_recursive_depth; j++)

> +	  for (int j = 0; j < max_recursive_depth; j++)

>   	    {

>   	      tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2,

>   						     src_val, res_type);

> @@ -4345,6 +4345,37 @@ update_profiling_info (struct cgraph_node *orig_node,

>   					      false);

>     new_sum = stats.count_sum;

>   

> +  class ipa_node_params *info = IPA_NODE_REF (orig_node);

> +  if (info && info->node_is_self_scc)

> +    {

> +      profile_count self_recursive_count = profile_count::zero ();

> +      unsigned k = 0;

> +

> +      /* The self recursive edge is on the orig_node.  */

> +      for (cs = orig_node->callees; cs; cs = cs->next_callee)

> +	if (ipa_edge_within_scc (cs))

> +	  {

> +	    cgraph_node *callee = cs->callee->function_symbol ();

> +	    if (callee && cs->caller == cs->callee)

> +	      {

> +		self_recursive_count += cs->count;

> +		k++;

> +	      }

> +	  }

> +

> +      /* Proportion count for multiple self recursive node.  */

> +      if (k)

> +	self_recursive_count.apply_scale (1, k);

> +

> +      /* Proportion count for self recursive node from all callers.  */

> +      new_sum

> +	= (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);

> +

> +      /* Proportion count for specialized cloned node.  */

> +      if (param_ipa_cp_max_recursive_depth)

> +	new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);

> +    }

> +

>     if (orig_node_count < orig_sum + new_sum)

>       {

>         if (dump_file)

> diff --git a/gcc/params.opt b/gcc/params.opt

> index 4aec480798b..ad000145dfb 100644

> --- a/gcc/params.opt

> +++ b/gcc/params.opt

> @@ -199,7 +199,7 @@ Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param Optimiza

>   Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.

>   

>   -param=ipa-cp-max-recursive-depth=

> -Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param Optimization

> +Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(7) IntegerRange(0, 8) Param Optimization

>   Maximum depth of recursive cloning for self-recursive function.

>   

>   -param=ipa-cp-min-recursive-probability=

> diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-clone-3.c b/gcc/testsuite/gcc.dg/ipa/ipa-clone-3.c

> index a73cb8b63fc..1efa23ae109 100644

> --- a/gcc/testsuite/gcc.dg/ipa/ipa-clone-3.c

> +++ b/gcc/testsuite/gcc.dg/ipa/ipa-clone-3.c

> @@ -41,4 +41,4 @@ int main ()

>     return recur_fn (&v);

>   }

>   

> -/* { dg-final { scan-ipa-dump-times "Creating a specialized node of recur_fn/\[0-9\]*\\." 8 "cp" } } */

> +/* { dg-final { scan-ipa-dump-times "Creating a specialized node of recur_fn/\[0-9\]*\\." 9 "cp" } } */

>

Patch

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index c64e9104a94..919c741ecfa 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -2046,7 +2046,7 @@  propagate_vals_across_arith_jfunc (cgraph_edge *cs,
       /* Recursively generate lattice values with a limited count.  */
       FOR_EACH_VEC_ELT (val_seeds, i, src_val)
 	{
-	  for (int j = 1; j < max_recursive_depth; j++)
+	  for (int j = 0; j < max_recursive_depth; j++)
 	    {
 	      tree cstval = get_val_across_arith_op (opcode, opnd1_type, opnd2,
 						     src_val, res_type);
@@ -4345,6 +4345,37 @@  update_profiling_info (struct cgraph_node *orig_node,
 					      false);
   new_sum = stats.count_sum;
 
+  class ipa_node_params *info = IPA_NODE_REF (orig_node);
+  if (info && info->node_is_self_scc)
+    {
+      profile_count self_recursive_count = profile_count::zero ();
+      unsigned k = 0;
+
+      /* The self recursive edge is on the orig_node.  */
+      for (cs = orig_node->callees; cs; cs = cs->next_callee)
+	if (ipa_edge_within_scc (cs))
+	  {
+	    cgraph_node *callee = cs->callee->function_symbol ();
+	    if (callee && cs->caller == cs->callee)
+	      {
+		self_recursive_count += cs->count;
+		k++;
+	      }
+	  }
+
+      /* Proportion count for multiple self recursive node.  */
+      if (k)
+	self_recursive_count.apply_scale (1, k);
+
+      /* Proportion count for self recursive node from all callers.  */
+      new_sum
+	= (orig_sum + new_sum).apply_scale (self_recursive_count, orig_sum);
+
+      /* Proportion count for specialized cloned node.  */
+      if (param_ipa_cp_max_recursive_depth)
+	new_sum = new_sum.apply_scale (1, param_ipa_cp_max_recursive_depth);
+    }
+
   if (orig_node_count < orig_sum + new_sum)
     {
       if (dump_file)
diff --git a/gcc/params.opt b/gcc/params.opt
index 4aec480798b..ad000145dfb 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -199,7 +199,7 @@  Common Joined UInteger Var(param_ipa_cp_loop_hint_bonus) Init(64) Param Optimiza
 Compile-time bonus IPA-CP assigns to candidates which make loop bounds or strides known.
 
 -param=ipa-cp-max-recursive-depth=
-Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(8) Param Optimization
+Common Joined UInteger Var(param_ipa_cp_max_recursive_depth) Init(7) IntegerRange(0, 8) Param Optimization
 Maximum depth of recursive cloning for self-recursive function.
 
 -param=ipa-cp-min-recursive-probability=
diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-clone-3.c b/gcc/testsuite/gcc.dg/ipa/ipa-clone-3.c
index a73cb8b63fc..1efa23ae109 100644
--- a/gcc/testsuite/gcc.dg/ipa/ipa-clone-3.c
+++ b/gcc/testsuite/gcc.dg/ipa/ipa-clone-3.c
@@ -41,4 +41,4 @@  int main ()
   return recur_fn (&v);
 }
 
-/* { dg-final { scan-ipa-dump-times "Creating a specialized node of recur_fn/\[0-9\]*\\." 8 "cp" } } */
+/* { dg-final { scan-ipa-dump-times "Creating a specialized node of recur_fn/\[0-9\]*\\." 9 "cp" } } */