PR88751: Backport to GCC 8 and 9 branches?

Message ID bec04eb5-e703-279c-09f8-e62eac12fd3e@linux.ibm.com
State New
Headers show
Series
  • PR88751: Backport to GCC 8 and 9 branches?
Related show

Commit Message

Andreas Krebbel Sept. 6, 2019, 8:11 a.m.
Hi,

since this caused a critical performance regression in the OpenJ9 byte code interpreter after
migrating from GCC 4.8 to GCC 7 I would like to backport this patch also to GCC 8 and 9 branch.

Ok - after bootstrap and regression test went fine?

Andreas


commit d3dc20418aad41af83fe45ccba527deb0b334983
Author: krebbel <krebbel@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Jun 6 11:35:04 2019 +0000

    Fix PR88751

    This patch implements a small improvement for the heuristic in lra
    which decides when it has to activate the simpler register allocation
    algorithm.

    gcc/ChangeLog:

    2019-06-06  Andreas Krebbel  <krebbel@linux.ibm.com>

            PR rtl-optimization/88751
            * ira.c (ira): Use the number of the actually referenced registers
            when calculating the threshold.



    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271996 138bc75d-0d04-0410-961f-82ee72b054a4

Comments

Richard Biener Sept. 6, 2019, 10:48 a.m. | #1
On Fri, Sep 6, 2019 at 10:11 AM Andreas Krebbel <krebbel@linux.ibm.com> wrote:
>

> Hi,

>

> since this caused a critical performance regression in the OpenJ9 byte code interpreter after

> migrating from GCC 4.8 to GCC 7 I would like to backport this patch also to GCC 8 and 9 branch.

>

> Ok - after bootstrap and regression test went fine?


Looks reasonable to me.  But what about GCC 7?  I assume you also verified the
actual performance regression is gone.

Richard.

>

> Andreas

>

>

> commit d3dc20418aad41af83fe45ccba527deb0b334983

> Author: krebbel <krebbel@138bc75d-0d04-0410-961f-82ee72b054a4>

> Date:   Thu Jun 6 11:35:04 2019 +0000

>

>     Fix PR88751

>

>     This patch implements a small improvement for the heuristic in lra

>     which decides when it has to activate the simpler register allocation

>     algorithm.

>

>     gcc/ChangeLog:

>

>     2019-06-06  Andreas Krebbel  <krebbel@linux.ibm.com>

>

>             PR rtl-optimization/88751

>             * ira.c (ira): Use the number of the actually referenced registers

>             when calculating the threshold.

>

>

>

>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271996 138bc75d-0d04-0410-961f-82ee72b054a4

>

>

> diff --git a/gcc/ira.c b/gcc/ira.c

> index 4a14fb31583..725636d8dc5 100644

> --- a/gcc/ira.c

> +++ b/gcc/ira.c

> @@ -5198,6 +5198,8 @@ ira (FILE *f)

>    int ira_max_point_before_emit;

>    bool saved_flag_caller_saves = flag_caller_saves;

>    enum ira_region saved_flag_ira_region = flag_ira_region;

> +  unsigned int i;

> +  int num_used_regs = 0;

>

>    clear_bb_flags ();

>

> @@ -5213,12 +5215,17 @@ ira (FILE *f)

>

>    ira_conflicts_p = optimize > 0;

>

> +  /* Determine the number of pseudos actually requiring coloring.  */

> +  for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)

> +    num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));

> +

>    /* If there are too many pseudos and/or basic blocks (e.g. 10K

>       pseudos and 10K blocks or 100K pseudos and 1K blocks), we will

>       use simplified and faster algorithms in LRA.  */

>    lra_simple_p

>      = (ira_use_lra_p

> -       && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun));

> +       && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));

> +

>    if (lra_simple_p)

>      {

>        /* It permits to skip live range splitting in LRA.  */

>
Andreas Krebbel Sept. 20, 2019, 9:27 a.m. | #2
On 06.09.19 12:48, Richard Biener wrote:
> On Fri, Sep 6, 2019 at 10:11 AM Andreas Krebbel <krebbel@linux.ibm.com> wrote:

>>

>> Hi,

>>

>> since this caused a critical performance regression in the OpenJ9 byte code interpreter after

>> migrating from GCC 4.8 to GCC 7 I would like to backport this patch also to GCC 8 and 9 branch.

>>

>> Ok - after bootstrap and regression test went fine?

> 

> Looks reasonable to me.  But what about GCC 7?  I assume you also verified the

> actual performance regression is gone.


I've committed the patch to GCC 7 and 8 branch after verifying that the change has the desired
effect on the source code file from OpenJ9.

GCC 9 branch is currently frozen. Ok, to apply there as well?

Andreas

> 

> Richard.

> 

>>

>> Andreas

>>

>>

>> commit d3dc20418aad41af83fe45ccba527deb0b334983

>> Author: krebbel <krebbel@138bc75d-0d04-0410-961f-82ee72b054a4>

>> Date:   Thu Jun 6 11:35:04 2019 +0000

>>

>>     Fix PR88751

>>

>>     This patch implements a small improvement for the heuristic in lra

>>     which decides when it has to activate the simpler register allocation

>>     algorithm.

>>

>>     gcc/ChangeLog:

>>

>>     2019-06-06  Andreas Krebbel  <krebbel@linux.ibm.com>

>>

>>             PR rtl-optimization/88751

>>             * ira.c (ira): Use the number of the actually referenced registers

>>             when calculating the threshold.

>>

>>

>>

>>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271996 138bc75d-0d04-0410-961f-82ee72b054a4

>>

>>

>> diff --git a/gcc/ira.c b/gcc/ira.c

>> index 4a14fb31583..725636d8dc5 100644

>> --- a/gcc/ira.c

>> +++ b/gcc/ira.c

>> @@ -5198,6 +5198,8 @@ ira (FILE *f)

>>    int ira_max_point_before_emit;

>>    bool saved_flag_caller_saves = flag_caller_saves;

>>    enum ira_region saved_flag_ira_region = flag_ira_region;

>> +  unsigned int i;

>> +  int num_used_regs = 0;

>>

>>    clear_bb_flags ();

>>

>> @@ -5213,12 +5215,17 @@ ira (FILE *f)

>>

>>    ira_conflicts_p = optimize > 0;

>>

>> +  /* Determine the number of pseudos actually requiring coloring.  */

>> +  for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)

>> +    num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));

>> +

>>    /* If there are too many pseudos and/or basic blocks (e.g. 10K

>>       pseudos and 10K blocks or 100K pseudos and 1K blocks), we will

>>       use simplified and faster algorithms in LRA.  */

>>    lra_simple_p

>>      = (ira_use_lra_p

>> -       && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun));

>> +       && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));

>> +

>>    if (lra_simple_p)

>>      {

>>        /* It permits to skip live range splitting in LRA.  */

>>
Richard Biener Sept. 20, 2019, 12:02 p.m. | #3
On Fri, Sep 20, 2019 at 11:28 AM Andreas Krebbel <krebbel@linux.ibm.com> wrote:
>

> On 06.09.19 12:48, Richard Biener wrote:

> > On Fri, Sep 6, 2019 at 10:11 AM Andreas Krebbel <krebbel@linux.ibm.com> wrote:

> >>

> >> Hi,

> >>

> >> since this caused a critical performance regression in the OpenJ9 byte code interpreter after

> >> migrating from GCC 4.8 to GCC 7 I would like to backport this patch also to GCC 8 and 9 branch.

> >>

> >> Ok - after bootstrap and regression test went fine?

> >

> > Looks reasonable to me.  But what about GCC 7?  I assume you also verified the

> > actual performance regression is gone.

>

> I've committed the patch to GCC 7 and 8 branch after verifying that the change has the desired

> effect on the source code file from OpenJ9.

>

> GCC 9 branch is currently frozen. Ok, to apply there as well?


Yes, it shouldn't be frozen anymore...

Richard.

> Andreas

>

> >

> > Richard.

> >

> >>

> >> Andreas

> >>

> >>

> >> commit d3dc20418aad41af83fe45ccba527deb0b334983

> >> Author: krebbel <krebbel@138bc75d-0d04-0410-961f-82ee72b054a4>

> >> Date:   Thu Jun 6 11:35:04 2019 +0000

> >>

> >>     Fix PR88751

> >>

> >>     This patch implements a small improvement for the heuristic in lra

> >>     which decides when it has to activate the simpler register allocation

> >>     algorithm.

> >>

> >>     gcc/ChangeLog:

> >>

> >>     2019-06-06  Andreas Krebbel  <krebbel@linux.ibm.com>

> >>

> >>             PR rtl-optimization/88751

> >>             * ira.c (ira): Use the number of the actually referenced registers

> >>             when calculating the threshold.

> >>

> >>

> >>

> >>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271996 138bc75d-0d04-0410-961f-82ee72b054a4

> >>

> >>

> >> diff --git a/gcc/ira.c b/gcc/ira.c

> >> index 4a14fb31583..725636d8dc5 100644

> >> --- a/gcc/ira.c

> >> +++ b/gcc/ira.c

> >> @@ -5198,6 +5198,8 @@ ira (FILE *f)

> >>    int ira_max_point_before_emit;

> >>    bool saved_flag_caller_saves = flag_caller_saves;

> >>    enum ira_region saved_flag_ira_region = flag_ira_region;

> >> +  unsigned int i;

> >> +  int num_used_regs = 0;

> >>

> >>    clear_bb_flags ();

> >>

> >> @@ -5213,12 +5215,17 @@ ira (FILE *f)

> >>

> >>    ira_conflicts_p = optimize > 0;

> >>

> >> +  /* Determine the number of pseudos actually requiring coloring.  */

> >> +  for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)

> >> +    num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));

> >> +

> >>    /* If there are too many pseudos and/or basic blocks (e.g. 10K

> >>       pseudos and 10K blocks or 100K pseudos and 1K blocks), we will

> >>       use simplified and faster algorithms in LRA.  */

> >>    lra_simple_p

> >>      = (ira_use_lra_p

> >> -       && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun));

> >> +       && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));

> >> +

> >>    if (lra_simple_p)

> >>      {

> >>        /* It permits to skip live range splitting in LRA.  */

> >>

>

Patch

diff --git a/gcc/ira.c b/gcc/ira.c
index 4a14fb31583..725636d8dc5 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -5198,6 +5198,8 @@  ira (FILE *f)
   int ira_max_point_before_emit;
   bool saved_flag_caller_saves = flag_caller_saves;
   enum ira_region saved_flag_ira_region = flag_ira_region;
+  unsigned int i;
+  int num_used_regs = 0;

   clear_bb_flags ();

@@ -5213,12 +5215,17 @@  ira (FILE *f)

   ira_conflicts_p = optimize > 0;

+  /* Determine the number of pseudos actually requiring coloring.  */
+  for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
+    num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));
+
   /* If there are too many pseudos and/or basic blocks (e.g. 10K
      pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
      use simplified and faster algorithms in LRA.  */
   lra_simple_p
     = (ira_use_lra_p
-       && max_reg_num () >= (1 << 26) / last_basic_block_for_fn (cfun));
+       && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));
+
   if (lra_simple_p)
     {
       /* It permits to skip live range splitting in LRA.  */