[v2] combine: perform jump threading at the end

Message ID 20180905120059.32843-1-iii@linux.ibm.com
State New
Headers show
Series
  • [v2] combine: perform jump threading at the end
Related show

Commit Message

Ilya Leoshkevich Sept. 5, 2018, noon
gcc/ChangeLog:

2018-09-05  Ilya Leoshkevich  <iii@linux.ibm.com>

	PR target/80080
	* combine.c (rest_of_handle_combine): Perform jump threading.

gcc/testsuite/ChangeLog:

2018-09-05  Ilya Leoshkevich  <iii@linux.ibm.com>

	PR target/80080
	* gcc.target/s390/pr80080-4.c: New test.
---
 gcc/combine.c                             | 10 ++++++++--
 gcc/testsuite/gcc.target/s390/pr80080-4.c | 16 ++++++++++++++++
 2 files changed, 24 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr80080-4.c

-- 
2.18.0

Comments

Richard Biener Sept. 5, 2018, 12:11 p.m. | #1
On Wed, Sep 5, 2018 at 2:01 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>

> gcc/ChangeLog:

>

> 2018-09-05  Ilya Leoshkevich  <iii@linux.ibm.com>

>

>         PR target/80080

>         * combine.c (rest_of_handle_combine): Perform jump threading.

>

> gcc/testsuite/ChangeLog:

>

> 2018-09-05  Ilya Leoshkevich  <iii@linux.ibm.com>

>

>         PR target/80080

>         * gcc.target/s390/pr80080-4.c: New test.

> ---

>  gcc/combine.c                             | 10 ++++++++--

>  gcc/testsuite/gcc.target/s390/pr80080-4.c | 16 ++++++++++++++++

>  2 files changed, 24 insertions(+), 2 deletions(-)

>  create mode 100644 gcc/testsuite/gcc.target/s390/pr80080-4.c

>

> diff --git a/gcc/combine.c b/gcc/combine.c

> index a2649b6d5a1..818b4c5b77d 100644

> --- a/gcc/combine.c

> +++ b/gcc/combine.c

> @@ -14960,10 +14960,16 @@ rest_of_handle_combine (void)

>         free_dominance_info (CDI_DOMINATORS);

>        timevar_push (TV_JUMP);

>        rebuild_jump_labels (get_insns ());

> -      cleanup_cfg (0);

> -      timevar_pop (TV_JUMP);

>      }

>

> +  /* Combining insns can change basic blocks in a way that they end up

> +     containing a single jump_insn. This creates an opportunity to improve code

> +     with jump threading.  */

> +  cleanup_cfg (CLEANUP_THREADING);

> +

> +  if (rebuild_jump_labels_after_combine)

> +    timevar_pop (TV_JUMP);


cleanup_cfg pushes its own timevar so it doesn't make sense to try covering it
with TV_JUMP.  And rebuild_jump_labels immediately pushes TV_REBUILD_JUMP.

So I suggest to remove the timevar_push/pop of TV_JUMP here.

No comment in general about the change, maybe we can detect transforms that
make jump-threading viable and conditionalize that properly?  Note the only
setter of CLEANUP_THREADING guards it with flag_thread_jumps so maybe better
do it above as well (avoids cost at -O0 for example).

Richard.

> +

>    regstat_free_n_sets_and_refs ();

>    return 0;

>  }

> diff --git a/gcc/testsuite/gcc.target/s390/pr80080-4.c b/gcc/testsuite/gcc.target/s390/pr80080-4.c

> new file mode 100644

> index 00000000000..91d31ec7845

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/s390/pr80080-4.c

> @@ -0,0 +1,16 @@

> +/* { dg-do compile } */

> +/* { dg-options "-march=z196 -O2" } */

> +

> +extern void bar(int *mem);

> +

> +void foo4(int *mem)

> +{

> +  int oldval = 0;

> +  if (!__atomic_compare_exchange_n (mem, (void *) &oldval, 1,

> +                                   1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))

> +    {

> +      bar (mem);

> +    }

> +}

> +

> +/* { dg-final { scan-assembler "\n\tlt\t.*\n\tjne\t(\\.L\\d+)\n(.*\n)*\tcs\t.*\n\tber\t%r14\n\\1:\n\tjg\tbar\n" } } */

> --

> 2.18.0

>
Jeff Law Sept. 6, 2018, 6:11 p.m. | #2
On 09/05/2018 06:11 AM, Richard Biener wrote:
> On Wed, Sep 5, 2018 at 2:01 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:

>>

>> gcc/ChangeLog:

>>

>> 2018-09-05  Ilya Leoshkevich  <iii@linux.ibm.com>

>>

>>         PR target/80080

>>         * combine.c (rest_of_handle_combine): Perform jump threading.

>>

>> gcc/testsuite/ChangeLog:

>>

>> 2018-09-05  Ilya Leoshkevich  <iii@linux.ibm.com>

>>

>>         PR target/80080

>>         * gcc.target/s390/pr80080-4.c: New test.

>> ---

>>  gcc/combine.c                             | 10 ++++++++--

>>  gcc/testsuite/gcc.target/s390/pr80080-4.c | 16 ++++++++++++++++

>>  2 files changed, 24 insertions(+), 2 deletions(-)

>>  create mode 100644 gcc/testsuite/gcc.target/s390/pr80080-4.c

>>

>> diff --git a/gcc/combine.c b/gcc/combine.c

>> index a2649b6d5a1..818b4c5b77d 100644

>> --- a/gcc/combine.c

>> +++ b/gcc/combine.c

>> @@ -14960,10 +14960,16 @@ rest_of_handle_combine (void)

>>         free_dominance_info (CDI_DOMINATORS);

>>        timevar_push (TV_JUMP);

>>        rebuild_jump_labels (get_insns ());

>> -      cleanup_cfg (0);

>> -      timevar_pop (TV_JUMP);

>>      }

>>

>> +  /* Combining insns can change basic blocks in a way that they end up

>> +     containing a single jump_insn. This creates an opportunity to improve code

>> +     with jump threading.  */

>> +  cleanup_cfg (CLEANUP_THREADING);

>> +

>> +  if (rebuild_jump_labels_after_combine)

>> +    timevar_pop (TV_JUMP);

> 

> cleanup_cfg pushes its own timevar so it doesn't make sense to try covering it

> with TV_JUMP.  And rebuild_jump_labels immediately pushes TV_REBUILD_JUMP.

> 

> So I suggest to remove the timevar_push/pop of TV_JUMP here.

> 

> No comment in general about the change, maybe we can detect transforms that

> make jump-threading viable and conditionalize that properly?  Note the only

> setter of CLEANUP_THREADING guards it with flag_thread_jumps so maybe better

> do it above as well (avoids cost at -O0 for example).

The sad thing is I thought we'd killed the RTL jump threading code eons ago.

THe RTL jump threading code tries to prove that the target block has no
side effects and that we can statically determine the true/false
condition for the conditional branch at the end of the block.

This is (of course) much easier to do when the target block has no insns
other than the conditional branch.  So perhaps only do this when the
target block has just the conditional?

Hard to know if that'd work here since RTL wasn't posted.

Jeff
Segher Boessenkool Sept. 6, 2018, 10:39 p.m. | #3
On Thu, Sep 06, 2018 at 12:11:09PM -0600, Jeff Law wrote:
> On 09/05/2018 06:11 AM, Richard Biener wrote:

> > On Wed, Sep 5, 2018 at 2:01 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:

> >> +  /* Combining insns can change basic blocks in a way that they end up

> >> +     containing a single jump_insn. This creates an opportunity to improve code

> >> +     with jump threading.  */

> >> +  cleanup_cfg (CLEANUP_THREADING);


Please show an example of when this happens.  For almost all code it does
not happen, so please don't do it always.

Does it improve code at all?  There is a reason we do not run the expensive
cfg cleanups after every pass: they are expensive.  They are only done in
some strategically chosen places.


Segher
Ilya Leoshkevich Sept. 10, 2018, 10:58 a.m. | #4
> Am 06.09.2018 um 20:11 schrieb Jeff Law <law@redhat.com>:

> 

> On 09/05/2018 06:11 AM, Richard Biener wrote:

>> On Wed, Sep 5, 2018 at 2:01 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:

>>> 

>>> gcc/ChangeLog:

>>> 

>>> 2018-09-05  Ilya Leoshkevich  <iii@linux.ibm.com>

>>> 

>>>        PR target/80080

>>>        * combine.c (rest_of_handle_combine): Perform jump threading.

>>> 

>>> gcc/testsuite/ChangeLog:

>>> 

>>> 2018-09-05  Ilya Leoshkevich  <iii@linux.ibm.com>

>>> 

>>>        PR target/80080

>>>        * gcc.target/s390/pr80080-4.c: New test.

>>> ---

>>> gcc/combine.c                             | 10 ++++++++--

>>> gcc/testsuite/gcc.target/s390/pr80080-4.c | 16 ++++++++++++++++

>>> 2 files changed, 24 insertions(+), 2 deletions(-)

>>> create mode 100644 gcc/testsuite/gcc.target/s390/pr80080-4.c

>>> 

>>> diff --git a/gcc/combine.c b/gcc/combine.c

>>> index a2649b6d5a1..818b4c5b77d 100644

>>> --- a/gcc/combine.c

>>> +++ b/gcc/combine.c

>>> @@ -14960,10 +14960,16 @@ rest_of_handle_combine (void)

>>>        free_dominance_info (CDI_DOMINATORS);

>>>       timevar_push (TV_JUMP);

>>>       rebuild_jump_labels (get_insns ());

>>> -      cleanup_cfg (0);

>>> -      timevar_pop (TV_JUMP);

>>>     }

>>> 

>>> +  /* Combining insns can change basic blocks in a way that they end up

>>> +     containing a single jump_insn. This creates an opportunity to improve code

>>> +     with jump threading.  */

>>> +  cleanup_cfg (CLEANUP_THREADING);

>>> +

>>> +  if (rebuild_jump_labels_after_combine)

>>> +    timevar_pop (TV_JUMP);

>> 

>> cleanup_cfg pushes its own timevar so it doesn't make sense to try covering it

>> with TV_JUMP.  And rebuild_jump_labels immediately pushes TV_REBUILD_JUMP.

>> 

>> So I suggest to remove the timevar_push/pop of TV_JUMP here.

>> 

>> No comment in general about the change, maybe we can detect transforms that

>> make jump-threading viable and conditionalize that properly?  Note the only

>> setter of CLEANUP_THREADING guards it with flag_thread_jumps so maybe better

>> do it above as well (avoids cost at -O0 for example).

> The sad thing is I thought we'd killed the RTL jump threading code eons ago.

Do you mean RTL jump threading is deprecated and/or we better rely on
something else to achieve the same results?
> 

> THe RTL jump threading code tries to prove that the target block has no

> side effects and that we can statically determine the true/false

> condition for the conditional branch at the end of the block.

> 

> This is (of course) much easier to do when the target block has no insns

> other than the conditional branch.  So perhaps only do this when the

> target block has just the conditional?

Sounds reasonable, I implemented this in the new patch.  The only thing
is that combine leaves (note NOTE_INSN_DELETED) around, so I needed to
also account for those.  I used side_effects_p for that.
> 

> Hard to know if that'd work here since RTL wasn't posted.

I will post the RTL with the updated patch shortly.
> 

> Jeff
Ilya Leoshkevich Sept. 10, 2018, 12:25 p.m. | #5
> Am 07.09.2018 um 00:39 schrieb Segher Boessenkool <segher@kernel.crashing.org>:

> 

> On Thu, Sep 06, 2018 at 12:11:09PM -0600, Jeff Law wrote:

>> On 09/05/2018 06:11 AM, Richard Biener wrote:

>>> On Wed, Sep 5, 2018 at 2:01 PM Ilya Leoshkevich <iii@linux.ibm.com> wrote:

>>>> +  /* Combining insns can change basic blocks in a way that they end up

>>>> +     containing a single jump_insn. This creates an opportunity to improve code

>>>> +     with jump threading.  */

>>>> +  cleanup_cfg (CLEANUP_THREADING);

> 

> Please show an example of when this happens.  For almost all code it does

> not happen, so please don't do it always.

This improves the code for the following example from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080:

extern void bar(int *mem);

void foo5(int *mem)
{
  int oldval = 0;
  __atomic_compare_exchange_n (mem, (void *) &oldval, 1,
                               1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
  if (oldval != 0)
    bar (mem);
}

I posted the corresponding RTL here:
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00495.html
> 

> Does it improve code at all?  There is a reason we do not run the expensive

> cfg cleanups after every pass: they are expensive.  They are only done in

> some strategically chosen places.

> 

Performance-wise, the net win is insignificant (checked with SPEC
CPU2006), but nevertheless the generated code contains less redundant
jumps.  I intended this to be a small unintrusive improvement, so it
definitely is not good if it would increase the compile times.  I will
check how this affects build times of gcc master and SPEC CPU2006
benchmarks.
> 

> Segher

>
Segher Boessenkool Sept. 14, 2018, 9:39 p.m. | #6
On Mon, Sep 10, 2018 at 12:58:19PM +0200, Ilya Leoshkevich wrote:
> > Am 06.09.2018 um 20:11 schrieb Jeff Law <law@redhat.com>:

> > This is (of course) much easier to do when the target block has no insns

> > other than the conditional branch.  So perhaps only do this when the

> > target block has just the conditional?

> Sounds reasonable, I implemented this in the new patch.  The only thing

> is that combine leaves (note NOTE_INSN_DELETED) around, so I needed to

> also account for those.  I used side_effects_p for that.


Combine cannot delete insns because various things in combine refer to
insns by uid.  It could of course delete insns after it is done, hrm,
I'll have a look at that.  Either way, you'll also have lots of other
notes still hanging around (esp. with debug info).


Segher
Segher Boessenkool Sept. 14, 2018, 9:56 p.m. | #7
On Wed, Sep 05, 2018 at 02:00:59PM +0200, Ilya Leoshkevich wrote:
> +/* { dg-final { scan-assembler "\n\tlt\t.*\n\tjne\t(\\.L\\d+)\n(.*\n)*\tcs\t.*\n\tber\t%r14\n\\1:\n\tjg\tbar\n" } } */


About this RE.  "." matches anything, also newlines, so (.*\n)* is the
same as just .* (ignoring captures).  But you probably do not want this,
so you should put (?n) in your RE (traditionally, at the start of it).

You also might want to quote the string with {} instead of "".  Read
https://www.tcl.tk/man/tcl8.4/TclCmd/Tcl.htm for why (it is short; when
you have read it, you will know *all* Tcl syntax.  Well maybe read re_syntax
as well).


Segher

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index a2649b6d5a1..818b4c5b77d 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -14960,10 +14960,16 @@  rest_of_handle_combine (void)
 	free_dominance_info (CDI_DOMINATORS);
       timevar_push (TV_JUMP);
       rebuild_jump_labels (get_insns ());
-      cleanup_cfg (0);
-      timevar_pop (TV_JUMP);
     }
 
+  /* Combining insns can change basic blocks in a way that they end up
+     containing a single jump_insn. This creates an opportunity to improve code
+     with jump threading.  */
+  cleanup_cfg (CLEANUP_THREADING);
+
+  if (rebuild_jump_labels_after_combine)
+    timevar_pop (TV_JUMP);
+
   regstat_free_n_sets_and_refs ();
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/s390/pr80080-4.c b/gcc/testsuite/gcc.target/s390/pr80080-4.c
new file mode 100644
index 00000000000..91d31ec7845
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr80080-4.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=z196 -O2" } */
+
+extern void bar(int *mem);
+
+void foo4(int *mem)
+{
+  int oldval = 0;
+  if (!__atomic_compare_exchange_n (mem, (void *) &oldval, 1,
+				    1, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+    {
+      bar (mem);
+    }
+}
+
+/* { dg-final { scan-assembler "\n\tlt\t.*\n\tjne\t(\\.L\\d+)\n(.*\n)*\tcs\t.*\n\tber\t%r14\n\\1:\n\tjg\tbar\n" } } */