[PR81611] improve auto-inc

Message ID ork1w7apnx.fsf@lxoliva.fsfla.org
State New
Headers show
Series
  • [PR81611] improve auto-inc
Related show

Commit Message

Alexandre Oliva Jan. 24, 2018, 3:42 a.m.
These two patches fix PR81611.

The first one improves forwprop so that we avoid adding SSA conflicting
by forwpropping the iv increment, which may cause both the incremented
and the original value to be live, even when the iv is copied between
the PHI node and the increment.  We already handled the case in which
there aren't any such copies.

Alas, this is not enough to address the problem on avr, even though it
fixes it on e.g. ppc.  The reason is that avr rejects var+offset
addresses, and this prevents the memory access in a post-inc code
sequence from being adjusted to an address that auto-inc-dec can
recognize.

The second patch adjusts auto-inc-dec to recognize a code sequence in
which the original, unincremented pseudo is used in an address after
it's incremented into another pseudo, and turn that into a post-inc
address, leaving the copying for subsequent passes to eliminate.

Regstrapped on x86_64-linux-gnu, i686-linux-gnu, ppc64-linux-gnu and
aarch64-linux-gnu.  Ok to install?


I'd appreciate suggestions on how to turn the submitted testcase into a
regression test; I suppose an avr-specific test that requires the
auto-inc transformation is a possibility, but that feels a bit too
limited, doesn't it?  Thoughts?  Thanks in advance,


[PR81611] accept copies in simple_iv_increment_p

If there are copies between the GIMPLE_PHI at the loop body and the
increment that reaches it (presumably through a back edge), still
regard it as a simple_iv_increment, so that we won't consider the
value in the back edge eligible for forwprop.  Doing so would risk
making the phi node and the incremented conflicting value live
within the loop, and the phi node to be preserved for propagated
uses after the loop.

for  gcc/ChangeLog

	PR tree-optimization/81611
	* tree-ssa-dom.c (simple_iv_increment_p): Skip intervening
	copies.
---
 gcc/tree-ssa-dom.c |   21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)



-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Comments

Richard Biener Jan. 24, 2018, 2:19 p.m. | #1
On Wed, Jan 24, 2018 at 4:42 AM, Alexandre Oliva <aoliva@redhat.com> wrote:
> These two patches fix PR81611.

>

> The first one improves forwprop so that we avoid adding SSA conflicting

> by forwpropping the iv increment, which may cause both the incremented

> and the original value to be live, even when the iv is copied between

> the PHI node and the increment.  We already handled the case in which

> there aren't any such copies.

>

> Alas, this is not enough to address the problem on avr, even though it

> fixes it on e.g. ppc.  The reason is that avr rejects var+offset

> addresses, and this prevents the memory access in a post-inc code

> sequence from being adjusted to an address that auto-inc-dec can

> recognize.

>

> The second patch adjusts auto-inc-dec to recognize a code sequence in

> which the original, unincremented pseudo is used in an address after

> it's incremented into another pseudo, and turn that into a post-inc

> address, leaving the copying for subsequent passes to eliminate.

>

> Regstrapped on x86_64-linux-gnu, i686-linux-gnu, ppc64-linux-gnu and

> aarch64-linux-gnu.  Ok to install?

>

>

> I'd appreciate suggestions on how to turn the submitted testcase into a

> regression test; I suppose an avr-specific test that requires the

> auto-inc transformation is a possibility, but that feels a bit too

> limited, doesn't it?  Thoughts?  Thanks in advance,

>

>

> [PR81611] accept copies in simple_iv_increment_p

>

> If there are copies between the GIMPLE_PHI at the loop body and the

> increment that reaches it (presumably through a back edge), still

> regard it as a simple_iv_increment, so that we won't consider the

> value in the back edge eligible for forwprop.  Doing so would risk

> making the phi node and the incremented conflicting value live

> within the loop, and the phi node to be preserved for propagated

> uses after the loop.

>

> for  gcc/ChangeLog

>

>         PR tree-optimization/81611

>         * tree-ssa-dom.c (simple_iv_increment_p): Skip intervening

>         copies.

> ---

>  gcc/tree-ssa-dom.c |   21 +++++++++++++++++----

>  1 file changed, 17 insertions(+), 4 deletions(-)

>

> diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c

> index 2b371667253a..3c0ff9458342 100644

> --- a/gcc/tree-ssa-dom.c

> +++ b/gcc/tree-ssa-dom.c

> @@ -1276,8 +1276,11 @@ record_equality (tree x, tree y, class const_and_copies *const_and_copies)

>  /* Returns true when STMT is a simple iv increment.  It detects the

>     following situation:

>

> -   i_1 = phi (..., i_2)

> -   i_2 = i_1 +/- ...  */

> +   i_1 = phi (..., i_k)

> +   [...]

> +   i_j = i_{j-1}  for each j : 2 <= j <= k-1

> +   [...]

> +   i_k = i_{k-1} +/- ...  */

>

>  bool

>  simple_iv_increment_p (gimple *stmt)

> @@ -1305,8 +1308,18 @@ simple_iv_increment_p (gimple *stmt)

>      return false;

>

>    phi = SSA_NAME_DEF_STMT (preinc);

> -  if (gimple_code (phi) != GIMPLE_PHI)

> -    return false;

> +  while (gimple_code (phi) != GIMPLE_PHI)

> +    {

> +      /* Follow trivial copies, but not the DEF used in a back edge,

> +        so that we don't prevent coalescing.  */

> +      if (gimple_code (phi) != GIMPLE_ASSIGN

> +         || gimple_assign_lhs (phi) != preinc

> +         || !gimple_assign_ssa_name_copy_p (phi))


given gimple_assign_ssa_name_copy checks it is an assign
just do

   if (!gimple_assign_ssa-anme_Copy_p (phi))

the lhs != preinc check is always false given you got to phi via
SSA_NAME_DEF_STMT of preinc.

The simple_iv_increment_p change is ok with that change.  The other
change is RTL which I
defer to somebody else.

Richard.

> +       return false;

> +

> +      preinc = gimple_assign_rhs1 (phi);

> +      phi = SSA_NAME_DEF_STMT (preinc);

> +    }

>

>    for (i = 0; i < gimple_phi_num_args (phi); i++)

>      if (gimple_phi_arg_def (phi, i) == lhs)

>

>

> [PR81611] turn inc-and-use-of-dead-orig into auto-inc

>

> When the addressing modes available on the machine don't allow offsets

> in addresses, odds are that post-increments will be represented in

> trees and RTL as:

>

>   y <= x + 1

>   ... *(x) ...

>   x <= y

>

> so deal with this form so as to create auto-inc addresses that we'd

> otherwise miss.

>

>

> for  gcc/ChangeLog

>

>         PR rtl-optimization/81611

>         * auto-inc-dec.c (attempt_change): Move dead note from

>         mem_insn if it's the next use of regno

>         (find_address): Take address use of reg holding

>         non-incremented value.

>         (merge_in_block): Attempt to use a mem insn that is the next

>         use of the original regno.

> ---

>  gcc/auto-inc-dec.c |   46 ++++++++++++++++++++++++++++++++++++++++++++--

>  1 file changed, 44 insertions(+), 2 deletions(-)

>

> diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c

> index d02fa9d081c7..4ffbcf56a456 100644

> --- a/gcc/auto-inc-dec.c

> +++ b/gcc/auto-inc-dec.c

> @@ -508,7 +508,11 @@ attempt_change (rtx new_addr, rtx inc_reg)

>          before the memory reference.  */

>        gcc_assert (mov_insn);

>        emit_insn_before (mov_insn, inc_insn.insn);

> -      move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0);

> +      regno = REGNO (inc_insn.reg0);

> +      if (reg_next_use[regno] == mem_insn.insn)

> +       move_dead_notes (mov_insn, mem_insn.insn, inc_insn.reg0);

> +      else

> +       move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0);

>

>        regno = REGNO (inc_insn.reg_res);

>        reg_next_def[regno] = mov_insn;

> @@ -842,7 +846,7 @@ find_address (rtx *address_of_x)

>

>    if (code == MEM && rtx_equal_p (XEXP (x, 0), inc_insn.reg_res))

>      {

> -      /* Match with *reg0.  */

> +      /* Match with *reg_res.  */

>        mem_insn.mem_loc = address_of_x;

>        mem_insn.reg0 = inc_insn.reg_res;

>        mem_insn.reg1_is_const = true;

> @@ -850,6 +854,17 @@ find_address (rtx *address_of_x)

>        mem_insn.reg1 = GEN_INT (0);

>        return -1;

>      }

> +  if (code == MEM && inc_insn.reg1_is_const && inc_insn.reg0

> +      && rtx_equal_p (XEXP (x, 0), inc_insn.reg0))

> +    {

> +      /* Match with *reg0 AKA *(reg_res - reg1_val).  */

> +      mem_insn.mem_loc = address_of_x;

> +      mem_insn.reg0 = inc_insn.reg_res;

> +      mem_insn.reg1_is_const = true;

> +      mem_insn.reg1_val = -inc_insn.reg1_val;

> +      mem_insn.reg1 = GEN_INT (mem_insn.reg1_val);

> +      return -1;

> +    }

>    if (code == MEM && GET_CODE (XEXP (x, 0)) == PLUS

>        && rtx_equal_p (XEXP (XEXP (x, 0), 0), inc_insn.reg_res))

>      {

> @@ -1370,6 +1385,33 @@ merge_in_block (int max_reg, basic_block bb)

>                           insn_is_add_or_inc = false;

>                         }

>                     }

> +

> +                 if (ok && insn_is_add_or_inc

> +                     && inc_insn.reg0 != inc_insn.reg_res)

> +                   {

> +                     regno = REGNO (inc_insn.reg0);

> +                     int luid = DF_INSN_LUID (mem_insn.insn);

> +                     mem_insn.insn = get_next_ref (regno, bb, reg_next_use);

> +

> +                     /* Try a mem use of reg0 before that of reg_res

> +                        too.  If there's no further use of reg_res,

> +                        there's no point in trying an auto-inc, and

> +                        if the use of reg0 is after that of reg_res,

> +                        it will be too late for the auto-inc to

> +                        compute reg_res's correct value.  */

> +                     if (mem_insn.insn

> +                         && luid > DF_INSN_LUID (mem_insn.insn)

> +                         && find_address (&PATTERN (mem_insn.insn)) == -1)

> +                       {

> +                         if (dump_file)

> +                           dump_mem_insn (dump_file);

> +                         if (try_merge ())

> +                           {

> +                             success_in_block++;

> +                             insn_is_add_or_inc = false;

> +                           }

> +                       }

> +                   }

>                 }

>             }

>         }

>

>

> --

> Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/

> You must be the change you wish to see in the world. -- Gandhi

> Be Free! -- http://FSFLA.org/   FSF Latin America board member

> Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Alexandre Oliva Jan. 25, 2018, 8:23 p.m. | #2
On Jan 24, 2018, Richard Biener <richard.guenther@gmail.com> wrote:

> given gimple_assign_ssa_name_copy checks it is an assign


*nod*

> the lhs != preinc check is always false given you got to phi via

> SSA_NAME_DEF_STMT of preinc.


*nod*

> The simple_iv_increment_p change is ok with that change.


Thanks, I'll retest with the simplified test (just in case; for I can't
recall why I ended up with all those redundant conditions), repost and
install.

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Alexandre Oliva Jan. 30, 2018, 5:38 p.m. | #3
On Jan 25, 2018, Alexandre Oliva <aoliva@redhat.com> wrote:

> Thanks, I'll retest with the simplified test (just in case; for I can't

> recall why I ended up with all those redundant conditions),


As suspected, removing the redundant tests didn't regress anything.  I
suppose they mattered in some earlier experimental version of the patch
(I vaguely recall having a long sequence of tests within the loop at
some point), and I just failed to further simplify the final form.
Thanks for catching the further simplification opportunities!

FTR, here's the patch I'm installing, while awaiting a review from
someone else on the rtl auto-inc patch (the second and last patch in
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01994.html).


[PR81611] accept copies in simple_iv_increment_p

If there are copies between the GIMPLE_PHI at the loop body and the
increment that reaches it (presumably through a back edge), still
regard it as a simple_iv_increment, so that we won't consider the
value in the back edge eligible for forwprop.  Doing so would risk
making the phi node and the incremented conflicting value live
within the loop, and the phi node to be preserved for propagated
uses after the loop.

for  gcc/ChangeLog

	PR tree-optimization/81611
	* tree-ssa-dom.c (simple_iv_increment_p): Skip intervening
	copies.
---
 gcc/tree-ssa-dom.c |   18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 2b371667253a..a6f176c5def0 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1276,8 +1276,11 @@ record_equality (tree x, tree y, class const_and_copies *const_and_copies)
 /* Returns true when STMT is a simple iv increment.  It detects the
    following situation:
 
-   i_1 = phi (..., i_2)
-   i_2 = i_1 +/- ...  */
+   i_1 = phi (..., i_k)
+   [...]
+   i_j = i_{j-1}  for each j : 2 <= j <= k-1
+   [...]
+   i_k = i_{k-1} +/- ...  */
 
 bool
 simple_iv_increment_p (gimple *stmt)
@@ -1305,8 +1308,15 @@ simple_iv_increment_p (gimple *stmt)
     return false;
 
   phi = SSA_NAME_DEF_STMT (preinc);
-  if (gimple_code (phi) != GIMPLE_PHI)
-    return false;
+  while (gimple_code (phi) != GIMPLE_PHI)
+    {
+      /* Follow trivial copies, but not the DEF used in a back edge,
+	 so that we don't prevent coalescing.  */
+      if (!gimple_assign_ssa_name_copy_p (phi))
+	return false;
+      preinc = gimple_assign_rhs1 (phi);
+      phi = SSA_NAME_DEF_STMT (preinc);
+    }
 
   for (i = 0; i < gimple_phi_num_args (phi); i++)
     if (gimple_phi_arg_def (phi, i) == lhs)


-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Jeff Law Feb. 14, 2018, 5:06 a.m. | #4
On 01/23/2018 08:42 PM, Alexandre Oliva wrote:
> These two patches fix PR81611.

> 

> The first one improves forwprop so that we avoid adding SSA conflicting

> by forwpropping the iv increment, which may cause both the incremented

> and the original value to be live, even when the iv is copied between

> the PHI node and the increment.  We already handled the case in which

> there aren't any such copies.

> 

> Alas, this is not enough to address the problem on avr, even though it

> fixes it on e.g. ppc.  The reason is that avr rejects var+offset

> addresses, and this prevents the memory access in a post-inc code

> sequence from being adjusted to an address that auto-inc-dec can

> recognize.

> 

> The second patch adjusts auto-inc-dec to recognize a code sequence in

> which the original, unincremented pseudo is used in an address after

> it's incremented into another pseudo, and turn that into a post-inc

> address, leaving the copying for subsequent passes to eliminate.

> 

> Regstrapped on x86_64-linux-gnu, i686-linux-gnu, ppc64-linux-gnu and

> aarch64-linux-gnu.  Ok to install?

> 

> 

> I'd appreciate suggestions on how to turn the submitted testcase into a

> regression test; I suppose an avr-specific test that requires the

> auto-inc transformation is a possibility, but that feels a bit too

> limited, doesn't it?  Thoughts?  Thanks in advance,

> 

> 


[ snip the DOM change which was approved and installed ]

> [PR81611] turn inc-and-use-of-dead-orig into auto-inc

> 

> When the addressing modes available on the machine don't allow offsets

> in addresses, odds are that post-increments will be represented in

> trees and RTL as:

> 

>   y <= x + 1

>   ... *(x) ...

>   x <= y

> 

> so deal with this form so as to create auto-inc addresses that we'd

> otherwise miss.

> 

> 

> for  gcc/ChangeLog

> 

> 	PR rtl-optimization/81611

> 	* auto-inc-dec.c (attempt_change): Move dead note from

> 	mem_insn if it's the next use of regno

> 	(find_address): Take address use of reg holding

> 	non-incremented value.

> 	(merge_in_block): Attempt to use a mem insn that is the next

> 	use of the original regno.

> ---

>  gcc/auto-inc-dec.c |   46 ++++++++++++++++++++++++++++++++++++++++++++--

>  1 file changed, 44 insertions(+), 2 deletions(-)

> 

> diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c

> index d02fa9d081c7..4ffbcf56a456 100644

> --- a/gcc/auto-inc-dec.c

> +++ b/gcc/auto-inc-dec.c

> @@ -508,7 +508,11 @@ attempt_change (rtx new_addr, rtx inc_reg)

>  	 before the memory reference.  */

>        gcc_assert (mov_insn);

>        emit_insn_before (mov_insn, inc_insn.insn);

> -      move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0);

> +      regno = REGNO (inc_insn.reg0);

> +      if (reg_next_use[regno] == mem_insn.insn)

> +	move_dead_notes (mov_insn, mem_insn.insn, inc_insn.reg0);

> +      else

> +	move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0);

>  

>        regno = REGNO (inc_insn.reg_res);

>        reg_next_def[regno] = mov_insn;


> @@ -1370,6 +1385,33 @@ merge_in_block (int max_reg, basic_block bb)

>  			  insn_is_add_or_inc = false;

>  			}

>  		    }

> +

> +		  if (ok && insn_is_add_or_inc

> +		      && inc_insn.reg0 != inc_insn.reg_res)

> +		    {

> +		      regno = REGNO (inc_insn.reg0);

> +		      int luid = DF_INSN_LUID (mem_insn.insn);

> +		      mem_insn.insn = get_next_ref (regno, bb, reg_next_use);

So I think a comment is warranted  right as we enter the TRUE arm.

At that point INC_INSN is an inc/dec.  But MEM_INSN is not necessarily a
memory reference.  It could be a memory reference, it could be a copy,
it could be something completely different (it's just the next insn that
references the result of the increment).  In the case we care about we
want it to be a copy of INC_INSN's REG_RES back to REG0.

ISTM that verifying MEM_INSN is a reg->reg copy (reg_res -> reg0) before
we call get_next_ref for reg0 is advisable and probably good from a
compile-time standpoint by avoiding calls into find_address.

After we call get_next_ref to get the next reference of the source of
the increment, then we're hoping to find a memory reference that uses
REG0.  But it's not guaranteed it's a memory reference insn.

I was having an awful time understanding how this code could work from
the comments until I put it under a debugger and got a sense of the
state as we entered that IF block.  Then it was much clearer :-)

I believe Georg had other testcases in subsequent comments in the BZ,
but I don't believe they were flagged as regressions.  So I think once
you check in your patch we note in the BZ that the original testcase
(and thus the regression) is fixed, but that the later tests are not.
We then remove the regression marker.

If Georg (or anyone else) does the analysis to show those later tests
are also regressions, then we'll add the regression marker back.

Seem reasonable?

Jeff
Alexandre Oliva Feb. 27, 2018, 11:18 p.m. | #5
On Feb 14, 2018, Jeff Law <law@redhat.com> wrote:

>> +		      regno = REGNO (inc_insn.reg0);

>> +		      int luid = DF_INSN_LUID (mem_insn.insn);

>> +		      mem_insn.insn = get_next_ref (regno, bb, reg_next_use);

> So I think a comment is warranted  right as we enter the TRUE arm.


> At that point INC_INSN is an inc/dec.  But MEM_INSN is not necessarily a

> memory reference.  It could be a memory reference, it could be a copy,

> it could be something completely different (it's just the next insn that

> references the result of the increment).  In the case we care about we

> want it to be a copy of INC_INSN's REG_RES back to REG0.


> ISTM that verifying MEM_INSN is a reg->reg copy (reg_res -> reg0) before

> we call get_next_ref for reg0 is advisable and probably good from a

> compile-time standpoint by avoiding calls into find_address.


But we don't need it to be a copy.  The transformation is just as
legitimate if the regs go independent ways after that point.  We have
reg_res set to reg0+reg1, and then a use of reg0 in a MEM before any
other use of reg_res.  We turn that into a copy of reg0 to reg_res, and
the MEM addr into a post_add of reg_res with reg1 (possibly a post_inc),
so that the MEM dereferences reg_res while it's still equal to reg0, and
after the MEM, reg_res becomes reg0+reg1, as it should for any
subsequent uses, and reg0 is unmodified.  Whether or not a subsequent
copy from reg_res to reg0 is to be found won't make the transformation
any more or less legitimate.

> After we call get_next_ref to get the next reference of the source of

> the increment, then we're hoping to find a memory reference that uses

> REG0.  But it's not guaranteed it's a memory reference insn.


Yeah, find_address will determine if it contains any of the MEM patterns
we might be interested in, but it could be anything whatsoever.  The MEM
pattern might appear virtually anywhere in the insn.

> I was having an awful time understanding how this code could work from

> the comments until I put it under a debugger and got a sense of the

> state as we entered that IF block.  Then it was much clearer :-)


Sorry, I realize the comments were written based on a lot of context
about the overall behavior of the pass, that I had learned while trying
to figure it out.  At the risk of making it redundant, I've expanded the
comments, and added further tests that won't affect current behavior in
any significant way, but that might speed things up a bit and will save
us trouble should find_address be extended to catch additional patterns.


> I believe Georg had other testcases in subsequent comments in the BZ,

> but I don't believe they were flagged as regressions.


However, with the testcases I realized the incremented register could
still be live, even if we didn't find a subsequent use for it.
Adjusting for that made those testcases use post_inc too.

Here's the improved patch, regstrapped on aarch64-, ppc64-, and
ppc64el-linux-gnu.  Ok to install?


[PR81611] turn inc-and-use-of-dead-orig into auto-inc

When the addressing modes available on the machine don't allow offsets
in addresses, odds are that post-increments will be represented in
trees and RTL as:

  y <= x + 1
  ... *(x) ...
  x <= y

so deal with it by turning such RTL as:

  (set y (plus x n))
  ... (mem x) ...

without intervening uses of y into

  (set y x)
  ... (mem (post_add y n)) ...

so as to create auto-inc addresses that we'd otherwise miss.


for  gcc/ChangeLog

	PR rtl-optimization/81611
	* auto-inc-dec.c (attempt_change): Move dead note from
	mem_insn if it's the next use of regno
	(find_address): Take address use of reg holding
	non-incremented value.  Add parm to limit search to the named
	reg only.
	(merge_in_block): Attempt to use a mem insn that is the next
	use of the original regno.
---
 gcc/auto-inc-dec.c |  140 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 130 insertions(+), 10 deletions(-)

diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c
index d02fa9d081c7..e6dc1c30d716 100644
--- a/gcc/auto-inc-dec.c
+++ b/gcc/auto-inc-dec.c
@@ -508,7 +508,11 @@ attempt_change (rtx new_addr, rtx inc_reg)
 	 before the memory reference.  */
       gcc_assert (mov_insn);
       emit_insn_before (mov_insn, inc_insn.insn);
-      move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0);
+      regno = REGNO (inc_insn.reg0);
+      if (reg_next_use[regno] == mem_insn.insn)
+	move_dead_notes (mov_insn, mem_insn.insn, inc_insn.reg0);
+      else
+	move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0);
 
       regno = REGNO (inc_insn.reg_res);
       reg_next_def[regno] = mov_insn;
@@ -825,13 +829,15 @@ parse_add_or_inc (rtx_insn *insn, bool before_mem)
 
 /* A recursive function that checks all of the mem uses in
    ADDRESS_OF_X to see if any single one of them is compatible with
-   what has been found in inc_insn.
+   what has been found in inc_insn.  To avoid accidental matches, we
+   will only find MEMs with FINDREG, be it inc_insn.reg_res, be it
+   inc_insn.reg0.
 
    -1 is returned for success.  0 is returned if nothing was found and
    1 is returned for failure. */
 
 static int
-find_address (rtx *address_of_x)
+find_address (rtx *address_of_x, rtx findreg)
 {
   rtx x = *address_of_x;
   enum rtx_code code = GET_CODE (x);
@@ -840,9 +846,10 @@ find_address (rtx *address_of_x)
   int value = 0;
   int tem;
 
-  if (code == MEM && rtx_equal_p (XEXP (x, 0), inc_insn.reg_res))
+  if (code == MEM && findreg == inc_insn.reg_res
+      && rtx_equal_p (XEXP (x, 0), inc_insn.reg_res))
     {
-      /* Match with *reg0.  */
+      /* Match with *reg_res.  */
       mem_insn.mem_loc = address_of_x;
       mem_insn.reg0 = inc_insn.reg_res;
       mem_insn.reg1_is_const = true;
@@ -850,7 +857,21 @@ find_address (rtx *address_of_x)
       mem_insn.reg1 = GEN_INT (0);
       return -1;
     }
-  if (code == MEM && GET_CODE (XEXP (x, 0)) == PLUS
+  if (code == MEM && inc_insn.reg1_is_const && inc_insn.reg0
+      && findreg == inc_insn.reg0
+      && rtx_equal_p (XEXP (x, 0), inc_insn.reg0))
+    {
+      /* Match with *reg0, assumed to be equivalent to
+         *(reg_res - reg1_val); callers must check whether this is the case.  */
+      mem_insn.mem_loc = address_of_x;
+      mem_insn.reg0 = inc_insn.reg_res;
+      mem_insn.reg1_is_const = true;
+      mem_insn.reg1_val = -inc_insn.reg1_val;
+      mem_insn.reg1 = GEN_INT (mem_insn.reg1_val);
+      return -1;
+    }
+  if (code == MEM && findreg == inc_insn.reg_res
+      && GET_CODE (XEXP (x, 0)) == PLUS
       && rtx_equal_p (XEXP (XEXP (x, 0), 0), inc_insn.reg_res))
     {
       rtx b = XEXP (XEXP (x, 0), 1);
@@ -879,7 +900,7 @@ find_address (rtx *address_of_x)
     {
       /* If REG occurs inside a MEM used in a bit-field reference,
 	 that is unacceptable.  */
-      if (find_address (&XEXP (x, 0)))
+      if (find_address (&XEXP (x, 0), findreg))
 	return 1;
     }
 
@@ -891,7 +912,7 @@ find_address (rtx *address_of_x)
     {
       if (fmt[i] == 'e')
 	{
-	  tem = find_address (&XEXP (x, i));
+	  tem = find_address (&XEXP (x, i), findreg);
 	  /* If this is the first use, let it go so the rest of the
 	     insn can be checked.  */
 	  if (value == 0)
@@ -905,7 +926,7 @@ find_address (rtx *address_of_x)
 	  int j;
 	  for (j = XVECLEN (x, i) - 1; j >= 0; j--)
 	    {
-	      tem = find_address (&XVECEXP (x, i, j));
+	      tem = find_address (&XVECEXP (x, i, j), findreg);
 	      /* If this is the first use, let it go so the rest of
 		 the insn can be checked.  */
 	      if (value == 0)
@@ -1360,7 +1381,106 @@ merge_in_block (int max_reg, basic_block bb)
 		  if (dump_file)
 		    dump_inc_insn (dump_file);
 
-		  if (ok && find_address (&PATTERN (mem_insn.insn)) == -1)
+		  if (ok && find_address (&PATTERN (mem_insn.insn),
+					  inc_insn.reg_res) == -1)
+		    {
+		      if (dump_file)
+			dump_mem_insn (dump_file);
+		      if (try_merge ())
+			{
+			  success_in_block++;
+			  insn_is_add_or_inc = false;
+			}
+		    }
+		}
+
+	      if (insn_is_add_or_inc
+		  /* find_address will only recognize an address
+		     with a reg0 that's not reg_res when
+		     reg1_is_const, so cut it off early if we
+		     already know it won't match.  */
+		  && inc_insn.reg1_is_const
+		  && inc_insn.reg0
+		  && inc_insn.reg0 != inc_insn.reg_res)
+		{
+		  /* If we identified an inc_insn that uses two
+		     different pseudos, it's of the form
+
+		     (set reg_res (plus reg0 reg1))
+
+		     where reg1 is a constant (*).
+
+		     The next use of reg_res was not idenfied by
+		     find_address as a mem_insn that we could turn
+		     into auto-inc, so see if we find a suitable
+		     MEM in the next use of reg0, as long as it's
+		     before any subsequent use of reg_res:
+
+		     ... (mem (... reg0 ...)) ...
+
+		     ... reg_res ...
+
+		     In this case, we can turn the plus into a
+		     copy, and the reg0 in the MEM address into a
+		     post_inc of reg_res:
+
+		     (set reg_res reg0)
+
+		     ... (mem (... (post_add reg_res reg1) ...)) ...
+
+		     reg_res will then have the correct value at
+		     subsequent uses, and reg0 will remain
+		     unchanged.
+
+		     (*) We could support non-const reg1, but then
+		     we'd have to check that reg1 remains
+		     unchanged all the way to the modified MEM,
+		     and we'd have to extend find_address to
+		     represent a non-const negated reg1.  */
+		  regno = REGNO (inc_insn.reg0);
+		  rtx_insn *reg0_use = get_next_ref (regno, bb,
+						     reg_next_use);
+
+		  /* Give up if the next use of reg0 is after the next
+		     use of reg_res (same insn is ok; we might have
+		     found a MEM with reg_res before, and that failed,
+		     but now we try reg0, which might work), or defs
+		     of reg_res (same insn is not ok, we'd introduce
+		     another def in the same insn) or reg0.  */
+		  if (reg0_use)
+		    {
+		      int luid = DF_INSN_LUID (reg0_use);
+
+		      /* It might seem pointless to introduce an
+			 auto-inc if there's no subsequent use of
+			 reg_res (i.e., mem_insn.insn == NULL), but
+			 the next use might be in the next iteration
+			 of a loop, and it won't hurt if we make the
+			 change even if it's not needed.  */
+		      if (mem_insn.insn
+			  && luid > DF_INSN_LUID (mem_insn.insn))
+			reg0_use = NULL;
+
+		      rtx_insn *other_insn
+			= get_next_ref (REGNO (inc_insn.reg_res), bb,
+					reg_next_def);
+
+		      if (other_insn && luid >= DF_INSN_LUID (other_insn))
+			reg0_use = NULL;
+
+		      other_insn
+			= get_next_ref (REGNO (inc_insn.reg0), bb,
+					reg_next_def);
+
+		      if (other_insn && luid > DF_INSN_LUID (other_insn))
+			reg0_use = NULL;
+		    }
+
+		  mem_insn.insn = reg0_use;
+
+		  if (mem_insn.insn
+		      && find_address (&PATTERN (mem_insn.insn),
+				       inc_insn.reg0) == -1)
 		    {
 		      if (dump_file)
 			dump_mem_insn (dump_file);


-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
Jeff Law Feb. 28, 2018, 2:39 a.m. | #6
On 02/27/2018 04:18 PM, Alexandre Oliva wrote:
> On Feb 14, 2018, Jeff Law <law@redhat.com> wrote:

> 

>>> +		      regno = REGNO (inc_insn.reg0);

>>> +		      int luid = DF_INSN_LUID (mem_insn.insn);

>>> +		      mem_insn.insn = get_next_ref (regno, bb, reg_next_use);

>> So I think a comment is warranted  right as we enter the TRUE arm.

> 

>> At that point INC_INSN is an inc/dec.  But MEM_INSN is not necessarily a

>> memory reference.  It could be a memory reference, it could be a copy,

>> it could be something completely different (it's just the next insn that

>> references the result of the increment).  In the case we care about we

>> want it to be a copy of INC_INSN's REG_RES back to REG0.

> 

>> ISTM that verifying MEM_INSN is a reg->reg copy (reg_res -> reg0) before

>> we call get_next_ref for reg0 is advisable and probably good from a

>> compile-time standpoint by avoiding calls into find_address.

> 

> But we don't need it to be a copy.  The transformation is just as

> legitimate if the regs go independent ways after that point.  We have

> reg_res set to reg0+reg1, and then a use of reg0 in a MEM before any

> other use of reg_res.  We turn that into a copy of reg0 to reg_res, and

> the MEM addr into a post_add of reg_res with reg1 (possibly a post_inc),

> so that the MEM dereferences reg_res while it's still equal to reg0, and

> after the MEM, reg_res becomes reg0+reg1, as it should for any

> subsequent uses, and reg0 is unmodified.  Whether or not a subsequent

> copy from reg_res to reg0 is to be found won't make the transformation

> any more or less legitimate.

> 

>> After we call get_next_ref to get the next reference of the source of

>> the increment, then we're hoping to find a memory reference that uses

>> REG0.  But it's not guaranteed it's a memory reference insn.

> 

> Yeah, find_address will determine if it contains any of the MEM patterns

> we might be interested in, but it could be anything whatsoever.  The MEM

> pattern might appear virtually anywhere in the insn.

> 

>> I was having an awful time understanding how this code could work from

>> the comments until I put it under a debugger and got a sense of the

>> state as we entered that IF block.  Then it was much clearer :-)

> 

> Sorry, I realize the comments were written based on a lot of context

> about the overall behavior of the pass, that I had learned while trying

> to figure it out.  At the risk of making it redundant, I've expanded the

> comments, and added further tests that won't affect current behavior in

> any significant way, but that might speed things up a bit and will save

> us trouble should find_address be extended to catch additional patterns.

> 

> 

>> I believe Georg had other testcases in subsequent comments in the BZ,

>> but I don't believe they were flagged as regressions.

> 

> However, with the testcases I realized the incremented register could

> still be live, even if we didn't find a subsequent use for it.

> Adjusting for that made those testcases use post_inc too.

> 

> Here's the improved patch, regstrapped on aarch64-, ppc64-, and

> ppc64el-linux-gnu.  Ok to install?

> 

> 

> [PR81611] turn inc-and-use-of-dead-orig into auto-inc

> 

> When the addressing modes available on the machine don't allow offsets

> in addresses, odds are that post-increments will be represented in

> trees and RTL as:

> 

>   y <= x + 1

>   ... *(x) ...

>   x <= y

> 

> so deal with it by turning such RTL as:

> 

>   (set y (plus x n))

>   ... (mem x) ...

> 

> without intervening uses of y into

> 

>   (set y x)

>   ... (mem (post_add y n)) ...

> 

> so as to create auto-inc addresses that we'd otherwise miss.

> 

> 

> for  gcc/ChangeLog

> 

> 	PR rtl-optimization/81611

> 	* auto-inc-dec.c (attempt_change): Move dead note from

> 	mem_insn if it's the next use of regno

> 	(find_address): Take address use of reg holding

> 	non-incremented value.  Add parm to limit search to the named

> 	reg only.

> 	(merge_in_block): Attempt to use a mem insn that is the next

> 	use of the original regno.

OK.  Thanks!

Jeff

Patch

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 2b371667253a..3c0ff9458342 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1276,8 +1276,11 @@  record_equality (tree x, tree y, class const_and_copies *const_and_copies)
 /* Returns true when STMT is a simple iv increment.  It detects the
    following situation:
 
-   i_1 = phi (..., i_2)
-   i_2 = i_1 +/- ...  */
+   i_1 = phi (..., i_k)
+   [...]
+   i_j = i_{j-1}  for each j : 2 <= j <= k-1
+   [...]
+   i_k = i_{k-1} +/- ...  */
 
 bool
 simple_iv_increment_p (gimple *stmt)
@@ -1305,8 +1308,18 @@  simple_iv_increment_p (gimple *stmt)
     return false;
 
   phi = SSA_NAME_DEF_STMT (preinc);
-  if (gimple_code (phi) != GIMPLE_PHI)
-    return false;
+  while (gimple_code (phi) != GIMPLE_PHI)
+    {
+      /* Follow trivial copies, but not the DEF used in a back edge,
+	 so that we don't prevent coalescing.  */
+      if (gimple_code (phi) != GIMPLE_ASSIGN
+	  || gimple_assign_lhs (phi) != preinc
+	  || !gimple_assign_ssa_name_copy_p (phi))
+	return false;
+
+      preinc = gimple_assign_rhs1 (phi);
+      phi = SSA_NAME_DEF_STMT (preinc);
+    }
 
   for (i = 0; i < gimple_phi_num_args (phi); i++)
     if (gimple_phi_arg_def (phi, i) == lhs)


[PR81611] turn inc-and-use-of-dead-orig into auto-inc

When the addressing modes available on the machine don't allow offsets
in addresses, odds are that post-increments will be represented in
trees and RTL as:

  y <= x + 1
  ... *(x) ...
  x <= y

so deal with this form so as to create auto-inc addresses that we'd
otherwise miss.


for  gcc/ChangeLog

	PR rtl-optimization/81611
	* auto-inc-dec.c (attempt_change): Move dead note from
	mem_insn if it's the next use of regno
	(find_address): Take address use of reg holding
	non-incremented value.
	(merge_in_block): Attempt to use a mem insn that is the next
	use of the original regno.
---
 gcc/auto-inc-dec.c |   46 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/gcc/auto-inc-dec.c b/gcc/auto-inc-dec.c
index d02fa9d081c7..4ffbcf56a456 100644
--- a/gcc/auto-inc-dec.c
+++ b/gcc/auto-inc-dec.c
@@ -508,7 +508,11 @@  attempt_change (rtx new_addr, rtx inc_reg)
 	 before the memory reference.  */
       gcc_assert (mov_insn);
       emit_insn_before (mov_insn, inc_insn.insn);
-      move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0);
+      regno = REGNO (inc_insn.reg0);
+      if (reg_next_use[regno] == mem_insn.insn)
+	move_dead_notes (mov_insn, mem_insn.insn, inc_insn.reg0);
+      else
+	move_dead_notes (mov_insn, inc_insn.insn, inc_insn.reg0);
 
       regno = REGNO (inc_insn.reg_res);
       reg_next_def[regno] = mov_insn;
@@ -842,7 +846,7 @@  find_address (rtx *address_of_x)
 
   if (code == MEM && rtx_equal_p (XEXP (x, 0), inc_insn.reg_res))
     {
-      /* Match with *reg0.  */
+      /* Match with *reg_res.  */
       mem_insn.mem_loc = address_of_x;
       mem_insn.reg0 = inc_insn.reg_res;
       mem_insn.reg1_is_const = true;
@@ -850,6 +854,17 @@  find_address (rtx *address_of_x)
       mem_insn.reg1 = GEN_INT (0);
       return -1;
     }
+  if (code == MEM && inc_insn.reg1_is_const && inc_insn.reg0
+      && rtx_equal_p (XEXP (x, 0), inc_insn.reg0))
+    {
+      /* Match with *reg0 AKA *(reg_res - reg1_val).  */
+      mem_insn.mem_loc = address_of_x;
+      mem_insn.reg0 = inc_insn.reg_res;
+      mem_insn.reg1_is_const = true;
+      mem_insn.reg1_val = -inc_insn.reg1_val;
+      mem_insn.reg1 = GEN_INT (mem_insn.reg1_val);
+      return -1;
+    }
   if (code == MEM && GET_CODE (XEXP (x, 0)) == PLUS
       && rtx_equal_p (XEXP (XEXP (x, 0), 0), inc_insn.reg_res))
     {
@@ -1370,6 +1385,33 @@  merge_in_block (int max_reg, basic_block bb)
 			  insn_is_add_or_inc = false;
 			}
 		    }
+
+		  if (ok && insn_is_add_or_inc
+		      && inc_insn.reg0 != inc_insn.reg_res)
+		    {
+		      regno = REGNO (inc_insn.reg0);
+		      int luid = DF_INSN_LUID (mem_insn.insn);
+		      mem_insn.insn = get_next_ref (regno, bb, reg_next_use);
+
+		      /* Try a mem use of reg0 before that of reg_res
+			 too.  If there's no further use of reg_res,
+			 there's no point in trying an auto-inc, and
+			 if the use of reg0 is after that of reg_res,
+			 it will be too late for the auto-inc to
+			 compute reg_res's correct value.  */
+		      if (mem_insn.insn
+			  && luid > DF_INSN_LUID (mem_insn.insn)
+			  && find_address (&PATTERN (mem_insn.insn)) == -1)
+			{
+			  if (dump_file)
+			    dump_mem_insn (dump_file);
+			  if (try_merge ())
+			    {
+			      success_in_block++;
+			      insn_is_add_or_inc = false;
+			    }
+			}
+		    }
 		}
 	    }
 	}