[RFA,PR,rtl-optimization/90275] Handle nop reg->reg copies in cse

Message ID db7befb8786ea4d635ce6181d1d649e5426ddb95.camel@redhat.com
State New
Headers show
Series
  • [RFA,PR,rtl-optimization/90275] Handle nop reg->reg copies in cse
Related show

Commit Message

Jeff Law Feb. 5, 2020, 1:04 a.m.
Richard & Segher, if y'all could check my analysis here, it'd be
appreciated.

pr90275 is a P2 regression that is only triggering on ARM.  David's
testcase in c#1 is the best for this problem as it doesn't require
magic flags like -fno-dce to trigger.

The block in question:

> (code_label 89 88 90 24 15 (nil) [0 uses])

> (note 90 89 97 24 [bb 24] NOTE_INSN_BASIC_BLOCK)

> (insn 97 90 98 24 (parallel [

>             (set (reg:CC 100 cc)

>                 (compare:CC (reg:SI 131 [ d_lsm.21 ])

>                     (const_int 0 [0])))

>             (set (reg:SI 135 [ d_lsm.21 ])

>                 (reg:SI 131 [ d_lsm.21 ]))

>         ]) "pr90275.c":21:45 248 {*movsi_compare0}

>      (expr_list:REG_DEAD (reg:SI 131 [ d_lsm.21 ])

>         (nil)))

> (insn 98 97 151 24 (set (reg:SI 136 [+4 ])

>         (reg:SI 132 [ d_lsm.21+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}

>      (expr_list:REG_DEAD (reg:SI 132 [ d_lsm.21+4 ])

>         (expr_list:REG_DEAD (reg:CC 100 cc)

>             (nil))))

> (insn 151 98 152 24 (set (reg:SI 131 [ d_lsm.21 ])

>         (reg:SI 131 [ d_lsm.21 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}

>      (expr_list:REG_DEAD (reg:SI 135 [ d_lsm.21 ])

>         (nil)))

> (insn 152 151 103 24 (set (reg:SI 132 [ d_lsm.21+4 ])

>         (reg:SI 136 [+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}

>      (expr_list:REG_DEAD (reg:SI 136 [+4 ])

>         (nil)))

> 

insns 97 and 151 are the most important.

We process insn 97 which creates an equivalency between r135 and r131. 
This is expressed by putting both on on the "same_value" chain
(table_elt->{next,prev}_same_value).

When we put the REGs on the chain we'll set REG_QTY to a positive
number which indicates their values are valid.

We continue processing insns forward and run into insn 151 which is a
self-copy.

First CSE will invalidate r131 (because its set).  Invalidation is
accomplished by setting REG_QTY for r131 to a negative value.  It does
not remove r131 from the same value chains.

Then CSE will call insert_regs for r131.  The qty is not valid, so we
get into this code:

>      if (modified || ! qty_valid)

>         {

>           if (classp)

>             for (classp = classp->first_same_value;

>                  classp != 0;

>                  classp = classp->next_same_value)

>               if (REG_P (classp->exp)

>                   && GET_MODE (classp->exp) == GET_MODE (x))

>                 {

>                   unsigned c_regno = REGNO (classp->exp);

> 

>                   gcc_assert (REGNO_QTY_VALID_P (c_regno));

> [ ... ]


So we walk the chain of same values for r131.  WHen walking we run into
r131 itself.  Since r131 has been invalidated  we trip the assert.


The fix is pretty simple.  We just arrange to stop processing insns
that are nop reg->reg copies much like we already do for mem->mem
copies and (set (pc) (pc)).

This has bootstrapped and regression tested on x86_64.  I've also
verified it fixes the testcase in c#1 of pr90275, the test in pr93125
and pr92388.  Interestingly enough I couldn't trigger the original
testcase in 90275, but I'm confident this is ultimately all the same
problem.

OK for the trunk?

Thanks,
Jeff

Comments

Jakub Jelinek Feb. 5, 2020, 6:26 a.m. | #1
On Tue, Feb 04, 2020 at 06:04:09PM -0700, Jeff Law wrote:
> --- a/gcc/cse.c

> +++ b/gcc/cse.c

> @@ -5572,6 +5572,16 @@ cse_insn (rtx_insn *insn)

>  	  sets[i].rtl = 0;

>  	}

>  

> +      /* Similarly for no-op moves.  */

> +      if (n_sets == 1

> +	  && GET_CODE (src) == REG


Just nits:
REG_P (src) ?

> +	  && src == dest)


Is pointer comparison ok?  I mean, shouldn't we instead do
rtx_equal_p (src, dest), set_noop_p (sets[i].rtl) or noop_move_p (insn)?

> +	* gcc.c-torture/compile/pr90275.c: New test


Missing full stop.

> +++ b/gcc/testsuite/gcc.c-torture/compile/pr90275.c

> @@ -0,0 +1,27 @@

> +a, b, c;


int 

> +

> +long long d;

> +

> +e() {


void

(unless the ommission of those makes it not reproduce anymore, which I
doubt).

> +

> +  char f;

> +

> +  for (;;) {

> +

> +    c = a = c ? 5 : 0;

> +

> +    if (f) {

> +

> +      b = a;

> +

> +      f = d;

> +

> +    }

> +

> +    (d || b) < (a > e) ?: (b ? 0 : f) || (d -= f);

> +

> +  }

> +

> +}



	Jakub
Segher Boessenkool Feb. 5, 2020, 12:08 p.m. | #2
Hi all,

On Wed, Feb 05, 2020 at 07:26:03AM +0100, Jakub Jelinek wrote:
> On Tue, Feb 04, 2020 at 06:04:09PM -0700, Jeff Law wrote:

> > --- a/gcc/cse.c

> > +++ b/gcc/cse.c

> > @@ -5572,6 +5572,16 @@ cse_insn (rtx_insn *insn)

> >  	  sets[i].rtl = 0;

> >  	}

> >  

> > +      /* Similarly for no-op moves.  */


It says "no-op MEM moves" right above, so it should say "no-op REG
moves" here?

> > +      if (n_sets == 1

> > +	  && GET_CODE (src) == REG

> 

> Just nits:

> REG_P (src) ?


Hey that is my nit!  Find your own!  ;-)

> > +	  && src == dest)

> 

> Is pointer comparison ok?


It isn't, it doesn't work for hard registers.

> I mean, shouldn't we instead do

> rtx_equal_p (src, dest),


This does not see e.g.
  (set (reg:SI 123) (subreg:SI (reg:DI 123) 0))
as no-op move.

> set_noop_p (sets[i].rtl)


This doesn't catch all such cases either.

> or noop_move_p (insn)?


And this one is plain wrong (it should be called something with "maybe"
in the name, it returns false if it thinks that may lead to better
optimisation, see the REG_EQUAL handling).

What we need here is a test whether CSE can ignore this insn, and we
will run into this problem if we don't (Jeff's analysis).  Does CSE
already have everything it uses to make that decision scribbled away
somewhere, when we get here?  It would be good if we could use the
exact same condition (same predicate function for example) as what
would lead to the problem later, or we'll be playing whack-a-mole for
a while (or CSE is completely rewritten soon, my preferred option, but
"soon" on a GCC timescale will take way too long for the PR).


Segher
Richard Sandiford Feb. 5, 2020, 1:30 p.m. | #3
Jeff Law <law@redhat.com> writes:
> Richard & Segher, if y'all could check my analysis here, it'd be

> appreciated.

>

> pr90275 is a P2 regression that is only triggering on ARM.  David's

> testcase in c#1 is the best for this problem as it doesn't require

> magic flags like -fno-dce to trigger.

>

> The block in question:

>

>> (code_label 89 88 90 24 15 (nil) [0 uses])

>> (note 90 89 97 24 [bb 24] NOTE_INSN_BASIC_BLOCK)

>> (insn 97 90 98 24 (parallel [

>>             (set (reg:CC 100 cc)

>>                 (compare:CC (reg:SI 131 [ d_lsm.21 ])

>>                     (const_int 0 [0])))

>>             (set (reg:SI 135 [ d_lsm.21 ])

>>                 (reg:SI 131 [ d_lsm.21 ]))

>>         ]) "pr90275.c":21:45 248 {*movsi_compare0}

>>      (expr_list:REG_DEAD (reg:SI 131 [ d_lsm.21 ])

>>         (nil)))

>> (insn 98 97 151 24 (set (reg:SI 136 [+4 ])

>>         (reg:SI 132 [ d_lsm.21+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}

>>      (expr_list:REG_DEAD (reg:SI 132 [ d_lsm.21+4 ])

>>         (expr_list:REG_DEAD (reg:CC 100 cc)

>>             (nil))))

>> (insn 151 98 152 24 (set (reg:SI 131 [ d_lsm.21 ])

>>         (reg:SI 131 [ d_lsm.21 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}

>>      (expr_list:REG_DEAD (reg:SI 135 [ d_lsm.21 ])

>>         (nil)))

>> (insn 152 151 103 24 (set (reg:SI 132 [ d_lsm.21+4 ])

>>         (reg:SI 136 [+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}

>>      (expr_list:REG_DEAD (reg:SI 136 [+4 ])

>>         (nil)))

>> 

> insns 97 and 151 are the most important.

>

> We process insn 97 which creates an equivalency between r135 and r131. 

> This is expressed by putting both on on the "same_value" chain

> (table_elt->{next,prev}_same_value).

>

> When we put the REGs on the chain we'll set REG_QTY to a positive

> number which indicates their values are valid.

>

> We continue processing insns forward and run into insn 151 which is a

> self-copy.

>

> First CSE will invalidate r131 (because its set).  Invalidation is

> accomplished by setting REG_QTY for r131 to a negative value.  It does

> not remove r131 from the same value chains.

>

> Then CSE will call insert_regs for r131.  The qty is not valid, so we

> get into this code:

>

>>      if (modified || ! qty_valid)

>>         {

>>           if (classp)

>>             for (classp = classp->first_same_value;

>>                  classp != 0;

>>                  classp = classp->next_same_value)

>>               if (REG_P (classp->exp)

>>                   && GET_MODE (classp->exp) == GET_MODE (x))

>>                 {

>>                   unsigned c_regno = REGNO (classp->exp);

>> 

>>                   gcc_assert (REGNO_QTY_VALID_P (c_regno));

>> [ ... ]

>

> So we walk the chain of same values for r131.  WHen walking we run into

> r131 itself.  Since r131 has been invalidated  we trip the assert.

>

>

> The fix is pretty simple.  We just arrange to stop processing insns

> that are nop reg->reg copies much like we already do for mem->mem

> copies and (set (pc) (pc)).

>

> This has bootstrapped and regression tested on x86_64.  I've also

> verified it fixes the testcase in c#1 of pr90275, the test in pr93125

> and pr92388.  Interestingly enough I couldn't trigger the original

> testcase in 90275, but I'm confident this is ultimately all the same

> problem.


This looks similar to the infamous (to me):

   https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01581.html

which had to be reverted because it broke powerpc64 bootstrap.
The problem was that n_sets is misleading for calls:

   https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01858.html

That's easy to fix (and I have a fix).  But given the damage this caused
last time, I think it's probably best left to GCC 11.

Thanks,
Richard
Jeff Law Feb. 5, 2020, 6:48 p.m. | #4
On Wed, 2020-02-05 at 13:30 +0000, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:

> > Richard & Segher, if y'all could check my analysis here, it'd be

> > appreciated.

> > 

> > pr90275 is a P2 regression that is only triggering on ARM.  David's

> > testcase in c#1 is the best for this problem as it doesn't require

> > magic flags like -fno-dce to trigger.

> > 

> > The block in question:

> > 

> > > (code_label 89 88 90 24 15 (nil) [0 uses])

> > > (note 90 89 97 24 [bb 24] NOTE_INSN_BASIC_BLOCK)

> > > (insn 97 90 98 24 (parallel [

> > >             (set (reg:CC 100 cc)

> > >                 (compare:CC (reg:SI 131 [ d_lsm.21 ])

> > >                     (const_int 0 [0])))

> > >             (set (reg:SI 135 [ d_lsm.21 ])

> > >                 (reg:SI 131 [ d_lsm.21 ]))

> > >         ]) "pr90275.c":21:45 248 {*movsi_compare0}

> > >      (expr_list:REG_DEAD (reg:SI 131 [ d_lsm.21 ])

> > >         (nil)))

> > > (insn 98 97 151 24 (set (reg:SI 136 [+4 ])

> > >         (reg:SI 132 [ d_lsm.21+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}

> > >      (expr_list:REG_DEAD (reg:SI 132 [ d_lsm.21+4 ])

> > >         (expr_list:REG_DEAD (reg:CC 100 cc)

> > >             (nil))))

> > > (insn 151 98 152 24 (set (reg:SI 131 [ d_lsm.21 ])

> > >         (reg:SI 131 [ d_lsm.21 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}

> > >      (expr_list:REG_DEAD (reg:SI 135 [ d_lsm.21 ])

> > >         (nil)))

> > > (insn 152 151 103 24 (set (reg:SI 132 [ d_lsm.21+4 ])

> > >         (reg:SI 136 [+4 ])) "pr90275.c":21:45 241 {*arm_movsi_insn}

> > >      (expr_list:REG_DEAD (reg:SI 136 [+4 ])

> > >         (nil)))

> > > 

> > insns 97 and 151 are the most important.

> > 

> > We process insn 97 which creates an equivalency between r135 and r131. 

> > This is expressed by putting both on on the "same_value" chain

> > (table_elt->{next,prev}_same_value).

> > 

> > When we put the REGs on the chain we'll set REG_QTY to a positive

> > number which indicates their values are valid.

> > 

> > We continue processing insns forward and run into insn 151 which is a

> > self-copy.

> > 

> > First CSE will invalidate r131 (because its set).  Invalidation is

> > accomplished by setting REG_QTY for r131 to a negative value.  It does

> > not remove r131 from the same value chains.

> > 

> > Then CSE will call insert_regs for r131.  The qty is not valid, so we

> > get into this code:

> > 

> > >      if (modified || ! qty_valid)

> > >         {

> > >           if (classp)

> > >             for (classp = classp->first_same_value;

> > >                  classp != 0;

> > >                  classp = classp->next_same_value)

> > >               if (REG_P (classp->exp)

> > >                   && GET_MODE (classp->exp) == GET_MODE (x))

> > >                 {

> > >                   unsigned c_regno = REGNO (classp->exp);

> > > 

> > >                   gcc_assert (REGNO_QTY_VALID_P (c_regno));

> > > [ ... ]

> > 

> > So we walk the chain of same values for r131.  WHen walking we run into

> > r131 itself.  Since r131 has been invalidated  we trip the assert.

> > 

> > 

> > The fix is pretty simple.  We just arrange to stop processing insns

> > that are nop reg->reg copies much like we already do for mem->mem

> > copies and (set (pc) (pc)).

> > 

> > This has bootstrapped and regression tested on x86_64.  I've also

> > verified it fixes the testcase in c#1 of pr90275, the test in pr93125

> > and pr92388.  Interestingly enough I couldn't trigger the original

> > testcase in 90275, but I'm confident this is ultimately all the same

> > problem.

> 

> This looks similar to the infamous (to me):

> 

>    https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01581.html

> 

> which had to be reverted because it broke powerpc64 bootstrap.

> The problem was that n_sets is misleading for calls:

> 

>    https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01858.html

> 

> That's easy to fix (and I have a fix).  But given the damage this caused

> last time, I think it's probably best left to GCC 11.

Yea, it's closely related.  In your case you need to effectively ignore
the nop insn to get the optimization you want.  In mine that nop insn
causes an ICE.

I think we could take your cse bits + adding a !CALL_P separately from
the simplify-rtx stuff which Segher objected to.  THat'd likely solve
the ARM ICEs and take you a tiny step forward on optimizing that SVE
case.  Thoughts?

Jeff
Segher Boessenkool Feb. 6, 2020, 1:56 p.m. | #5
On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote:
> Yea, it's closely related.  In your case you need to effectively ignore

> the nop insn to get the optimization you want.  In mine that nop insn

> causes an ICE.

> 

> I think we could take your cse bits + adding a !CALL_P separately from

> the simplify-rtx stuff which Segher objected to.  THat'd likely solve

> the ARM ICEs and take you a tiny step forward on optimizing that SVE

> case.  Thoughts?


CSE should consistently keep track of what insns are no-op moves (in its
definition, all passes have a slightly different definition of this),
and use that everywhere consistently.

(Or we should rewrite CSE).


Segher
Jeff Law Feb. 7, 2020, 4 p.m. | #6
On Thu, 2020-02-06 at 07:56 -0600, Segher Boessenkool wrote:
> On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote:

> > Yea, it's closely related.  In your case you need to effectively ignore

> > the nop insn to get the optimization you want.  In mine that nop insn

> > causes an ICE.

> > 

> > I think we could take your cse bits + adding a !CALL_P separately from

> > the simplify-rtx stuff which Segher objected to.  THat'd likely solve

> > the ARM ICEs and take you a tiny step forward on optimizing that SVE

> > case.  Thoughts?

> 

> CSE should consistently keep track of what insns are no-op moves (in its

> definition, all passes have a slightly different definition of this),

> and use that everywhere consistently.

So does that mean you object to the cse.c portion of Richard's patch?

Jeff
>
Segher Boessenkool Feb. 8, 2020, 4:41 p.m. | #7
On Fri, Feb 07, 2020 at 09:00:40AM -0700, Jeff Law wrote:
> On Thu, 2020-02-06 at 07:56 -0600, Segher Boessenkool wrote:

> > On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote:

> > > Yea, it's closely related.  In your case you need to effectively ignore

> > > the nop insn to get the optimization you want.  In mine that nop insn

> > > causes an ICE.

> > > 

> > > I think we could take your cse bits + adding a !CALL_P separately from

> > > the simplify-rtx stuff which Segher objected to.  THat'd likely solve

> > > the ARM ICEs and take you a tiny step forward on optimizing that SVE

> > > case.  Thoughts?

> > 

> > CSE should consistently keep track of what insns are no-op moves (in its

> > definition, all passes have a slightly different definition of this),

> > and use that everywhere consistently.

> So does that mean you object to the cse.c portion of Richard's patch?


It's more a "what we need to do in the future" thing, it is stage 4 now,
it is too big a change to do now.

What patch?  The "34" patch?  https://gcc.gnu.org/r278411 .

I don't think each stanza of code should use it's own "noop-ness", no.

I don't know if this patch makes matters worse or not.  It doesn't seem
suitable for stage 4 though.  And Richard said the cse.c part breaks
rs6000, if that is true, yes I do object ;-)


Segher
Christophe Lyon via Gcc-patches March 12, 2020, 6:03 p.m. | #8
On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote:
> On Fri, Feb 07, 2020 at 09:00:40AM -0700, Jeff Law wrote:

> > On Thu, 2020-02-06 at 07:56 -0600, Segher Boessenkool wrote:

> > > On Wed, Feb 05, 2020 at 11:48:23AM -0700, Jeff Law wrote:

> > > > Yea, it's closely related.  In your case you need to effectively ignore

> > > > the nop insn to get the optimization you want.  In mine that nop insn

> > > > causes an ICE.

> > > > 

> > > > I think we could take your cse bits + adding a !CALL_P separately from

> > > > the simplify-rtx stuff which Segher objected to.  THat'd likely solve

> > > > the ARM ICEs and take you a tiny step forward on optimizing that SVE

> > > > case.  Thoughts?

> > > 

> > > CSE should consistently keep track of what insns are no-op moves (in its

> > > definition, all passes have a slightly different definition of this),

> > > and use that everywhere consistently.

> > So does that mean you object to the cse.c portion of Richard's patch?

> 

> It's more a "what we need to do in the future" thing, it is stage 4 now,

> it is too big a change to do now.

I suspect you're referring to the simplify-rtx bits from his patch which I agree
are not appropriate for stage4.  The cse bits from that patch are are simple.

> 

> What patch?  The "34" patch?  https://gcc.gnu.org/r278411 .

> 

> I don't think each stanza of code should use it's own "noop-ness", no.

Richard's patch is actually better than mine in that regard as it handles mem and
reg nop moves in an indentical way.  I don't think refactoring the cse.c code is
advisable now though -- but I do want to fix the multiply-reported ICE on ARM and
Richard's cse changes are the cleanest way to do that that I can see.



> 

> I don't know if this patch makes matters worse or not.  It doesn't seem

> suitable for stage 4 though.  And Richard said the cse.c part breaks

> rs6000, if that is true, yes I do object ;-)

The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it through
my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86 native
and more.

Concretely I'm proposing the following patch to address 90275 and its duplicates.
PR rtl-optimization/90275
	* cse.c (cse_insn): Delete no-op register moves too.

	PR rtl-optimization/90275
	* gcc.c-torture/compile/pr90275.c: New test.

diff --git a/gcc/cse.c b/gcc/cse.c
index 79ee0ce80e3..1779bb9a333 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4625,7 +4625,7 @@ cse_insn (rtx_insn *insn)
   for (i = 0; i < n_sets; i++)
     {
       bool repeat = false;
-      bool mem_noop_insn = false;
+      bool noop_insn = false;
       rtx src, dest;
       rtx src_folded;
       struct table_elt *elt = 0, *p;
@@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)
 	    }
 
 	  /* Similarly, lots of targets don't allow no-op
-	     (set (mem x) (mem x)) moves.  */
+	     (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))
+	     might be impossible for certain registers (like CC registers).  */
 	  else if (n_sets == 1
-		   && MEM_P (trial)
+		   && ! CALL_P (insn)
+		   && (MEM_P (trial) || REG_P (trial))
 		   && MEM_P (dest)
 		   && rtx_equal_p (trial, dest)
 		   && !side_effects_p (dest)
@@ -5334,7 +5336,7 @@ cse_insn (rtx_insn *insn)
 		       || insn_nothrow_p (insn)))
 	    {
 	      SET_SRC (sets[i].rtl) = trial;
-	      mem_noop_insn = true;
+	      noop_insn = true;
 	      break;
 	    }
 
@@ -5562,8 +5564,8 @@ cse_insn (rtx_insn *insn)
 	  sets[i].rtl = 0;
 	}
 
-      /* Similarly for no-op MEM moves.  */
-      else if (mem_noop_insn)
+      /* Similarly for no-op moves.  */
+      else if (noop_insn)
 	{
 	  if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn))
 	    cse_cfg_altered = true;
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr90275.c b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
new file mode 100644
index 00000000000..83e0df77226
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
@@ -0,0 +1,27 @@
+a, b, c;
+
+long long d;
+
+e() {
+
+  char f;
+
+  for (;;) {
+
+    c = a = c ? 5 : 0;
+
+    if (f) {
+
+      b = a;
+
+      f = d;
+
+    }
+
+    (d || b) < (a > e) ?: (b ? 0 : f) || (d -= f);
+
+  }
+
+}
+
+
Segher Boessenkool March 12, 2020, 6:23 p.m. | #9
On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote:
> On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote:

> > I don't think each stanza of code should use it's own "noop-ness", no.

> Richard's patch is actually better than mine in that regard as it handles mem and

> reg nop moves in an indentical way.  I don't think refactoring the cse.c code is

> advisable now though -- but I do want to fix the multiply-reported ICE on ARM and

> Richard's cse changes are the cleanest way to do that that I can see.


It looks pretty simple, yeah...  All of CSE is hopelessly fragile, but
this patch does not make things worse.

> > I don't know if this patch makes matters worse or not.  It doesn't seem

> > suitable for stage 4 though.  And Richard said the cse.c part breaks

> > rs6000, if that is true, yes I do object ;-)

> The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it through

> my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86 native

> and more.


I don't see anything rs6000 below?  Is it just this generic code?

> @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)

>  	    }

>  

>  	  /* Similarly, lots of targets don't allow no-op

> -	     (set (mem x) (mem x)) moves.  */

> +	     (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))

> +	     might be impossible for certain registers (like CC registers).  */

>  	  else if (n_sets == 1

> -		   && MEM_P (trial)

> +		   && ! CALL_P (insn)

> +		   && (MEM_P (trial) || REG_P (trial))

>  		   && MEM_P (dest)

>  		   && rtx_equal_p (trial, dest)

>  		   && !side_effects_p (dest)


This adds the !CALL_P (no space btw) condition, why is that?

(Is that CCmode reg-reg move comment about rs6000?  Huh, we *do* have
patterns for that, or *should* have at least!)


Segher
Christophe Lyon via Gcc-patches March 12, 2020, 6:47 p.m. | #10
On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:
> 

> > > I don't know if this patch makes matters worse or not.  It doesn't seem

> > > suitable for stage 4 though.  And Richard said the cse.c part breaks

> > > rs6000, if that is true, yes I do object ;-)

> > The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it

> > through

> > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86

> > native

> > and more.

> 

> I don't see anything rs6000 below?  Is it just this generic code?

It's just generic code.  THe rs6000 issue is fixed by the !CALL_P condition.

> 

> > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)

> >  	    }

> >  

> >  	  /* Similarly, lots of targets don't allow no-op

> > -	     (set (mem x) (mem x)) moves.  */

> > +	     (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))

> > +	     might be impossible for certain registers (like CC registers).  */

> >  	  else if (n_sets == 1

> > -		   && MEM_P (trial)

> > +		   && ! CALL_P (insn)

> > +		   && (MEM_P (trial) || REG_P (trial))

> >  		   && MEM_P (dest)

> >  		   && rtx_equal_p (trial, dest)

> >  		   && !side_effects_p (dest)

> 

> This adds the !CALL_P (no space btw) condition, why is that?

Because n_sets is not valid for CALL_P insns which resulted in a failure on ppc. 
See find_sets_in_insn which ignores the set on the LHS of a call.  So imagine if
we had a nop register set in parallel with a (set (reg) (call ...)).  We'd end up
deleting the entire PARALLEL which is obviously wrong.

One could argue that find_sets_in_insn should be fixed as well.  I'd be worried
about fallout from that.

jeff
Segher Boessenkool March 12, 2020, 8:26 p.m. | #11
On Thu, Mar 12, 2020 at 12:47:04PM -0600, Jeff Law wrote:
> On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:

> > >  	  else if (n_sets == 1

> > > -		   && MEM_P (trial)

> > > +		   && ! CALL_P (insn)

> > > +		   && (MEM_P (trial) || REG_P (trial))

> > >  		   && MEM_P (dest)

> > >  		   && rtx_equal_p (trial, dest)

> > >  		   && !side_effects_p (dest)

> > 

> > This adds the !CALL_P (no space btw) condition, why is that?

> Because n_sets is not valid for CALL_P insns which resulted in a failure on ppc. 

> See find_sets_in_insn which ignores the set on the LHS of a call.  So imagine if

> we had a nop register set in parallel with a (set (reg) (call ...)).  We'd end up

> deleting the entire PARALLEL which is obviously wrong.


Ah, I see.  So this is exposed on Power by the TOC stuff, I guess?  CSE
sees a TOC set parallel with a call as a no-op because it is set to the
same value (an unspec, not an unspec_volatile) that GCC can derive is
already in the TOC reg?  Or is this some other case?

The change sounds fine, fwiw.

> One could argue that find_sets_in_insn should be fixed as well.  I'd be worried

> about fallout from that.


Should it ignore all SETs in parallel with a call?  Or what do you want
to fix there?


Segher
Christophe Lyon via Gcc-patches March 12, 2020, 8:56 p.m. | #12
On Thu, 2020-03-12 at 15:26 -0500, Segher Boessenkool wrote:
> On Thu, Mar 12, 2020 at 12:47:04PM -0600, Jeff Law wrote:

> > On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:

> > > >  	  else if (n_sets == 1

> > > > -		   && MEM_P (trial)

> > > > +		   && ! CALL_P (insn)

> > > > +		   && (MEM_P (trial) || REG_P (trial))

> > > >  		   && MEM_P (dest)

> > > >  		   && rtx_equal_p (trial, dest)

> > > >  		   && !side_effects_p (dest)

> > > 

> > > This adds the !CALL_P (no space btw) condition, why is that?

> > Because n_sets is not valid for CALL_P insns which resulted in a failure on

> > ppc. 

> > See find_sets_in_insn which ignores the set on the LHS of a call.  So imagine

> > if

> > we had a nop register set in parallel with a (set (reg) (call ...)).  We'd

> > end up

> > deleting the entire PARALLEL which is obviously wrong.

> 

> Ah, I see.  So this is exposed on Power by the TOC stuff, I guess?  CSE

> sees a TOC set parallel with a call as a no-op because it is set to the

> same value (an unspec, not an unspec_volatile) that GCC can derive is

> already in the TOC reg?  Or is this some other case?

Not entirely sure.  Richard's message didn't include the precise details. 

> 

> The change sounds fine, fwiw.

> 

> > One could argue that find_sets_in_insn should be fixed as well.  I'd be

> > worried

> > about fallout from that.

> 

> Should it ignore all SETs in parallel with a call?  Or what do you want

> to fix there?

I was thinking the other way.  Make it include sets even those for the function
return value.  Having n_sets's meaning be different for CALL_INSNs vs other INSNs
just seems like a poor implementation decision because of the inconsistency.

jeff
Christophe Lyon via Gcc-patches March 12, 2020, 10:11 p.m. | #13
On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:
> On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote:

> > On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote:

> > > I don't think each stanza of code should use it's own "noop-ness", no.

> > Richard's patch is actually better than mine in that regard as it handles mem

> > and

> > reg nop moves in an indentical way.  I don't think refactoring the cse.c code

> > is

> > advisable now though -- but I do want to fix the multiply-reported ICE on ARM

> > and

> > Richard's cse changes are the cleanest way to do that that I can see.

> 

> It looks pretty simple, yeah...  All of CSE is hopelessly fragile, but

> this patch does not make things worse.

> 

> > > I don't know if this patch makes matters worse or not.  It doesn't seem

> > > suitable for stage 4 though.  And Richard said the cse.c part breaks

> > > rs6000, if that is true, yes I do object ;-)

> > The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it

> > through

> > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86

> > native

> > and more.

> 

> I don't see anything rs6000 below?  Is it just this generic code?

> 

> > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)

> >  	    }

> >  

> >  	  /* Similarly, lots of targets don't allow no-op

> > -	     (set (mem x) (mem x)) moves.  */

> > +	     (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))

> > +	     might be impossible for certain registers (like CC registers).  */

> >  	  else if (n_sets == 1

> > -		   && MEM_P (trial)

> > +		   && ! CALL_P (insn)

> > +		   && (MEM_P (trial) || REG_P (trial))

> >  		   && MEM_P (dest)

> >  		   && rtx_equal_p (trial, dest)

> >  		   && !side_effects_p (dest)

> 

> This adds the !CALL_P (no space btw) condition, why is that?

> 

> (Is that CCmode reg-reg move comment about rs6000?  Huh, we *do* have

> patterns for that, or *should* have at least!)

I fixed the extraneous whitespace and committed the change.

THanks,
jeff
>
Christophe Lyon via Gcc-patches March 13, 2020, 8:09 a.m. | #14
Hi,


On Thu, 12 Mar 2020 at 23:12, Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>

> On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:

> > On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote:

> > > On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote:

> > > > I don't think each stanza of code should use it's own "noop-ness", no.

> > > Richard's patch is actually better than mine in that regard as it handles mem

> > > and

> > > reg nop moves in an indentical way.  I don't think refactoring the cse.c code

> > > is

> > > advisable now though -- but I do want to fix the multiply-reported ICE on ARM

> > > and

> > > Richard's cse changes are the cleanest way to do that that I can see.

> >

> > It looks pretty simple, yeah...  All of CSE is hopelessly fragile, but

> > this patch does not make things worse.

> >

> > > > I don't know if this patch makes matters worse or not.  It doesn't seem

> > > > suitable for stage 4 though.  And Richard said the cse.c part breaks

> > > > rs6000, if that is true, yes I do object ;-)

> > > The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it

> > > through

> > > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86

> > > native

> > > and more.

> >

> > I don't see anything rs6000 below?  Is it just this generic code?

> >

> > > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)

> > >         }

> > >

> > >       /* Similarly, lots of targets don't allow no-op

> > > -        (set (mem x) (mem x)) moves.  */

> > > +        (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))

> > > +        might be impossible for certain registers (like CC registers).  */

> > >       else if (n_sets == 1

> > > -              && MEM_P (trial)

> > > +              && ! CALL_P (insn)

> > > +              && (MEM_P (trial) || REG_P (trial))

> > >                && MEM_P (dest)

> > >                && rtx_equal_p (trial, dest)

> > >                && !side_effects_p (dest)

> >

> > This adds the !CALL_P (no space btw) condition, why is that?

> >

> > (Is that CCmode reg-reg move comment about rs6000?  Huh, we *do* have

> > patterns for that, or *should* have at least!)

> I fixed the extraneous whitespace and committed the change.

>


The new test fails on ARM:
FAIL: gcc.c-torture/compile/pr90275.c   -O3 -g  (internal compiler error)
during RTL pass: cse_local
/gcc/testsuite/gcc.c-torture/compile/pr90275.c: In function 'e':
/gcc/testsuite/gcc.c-torture/compile/pr90275.c:25:1: internal compiler
error: in insert_regs, at cse.c:1128
0x15725bd insert_regs
        /gcc/cse.c:1128
0x1579731 cse_insn
        /gcc/cse.c:5957
0x157aff6 cse_extended_basic_block
        /gcc/cse.c:6615
0x157aff6 cse_main
        /gcc/cse.c:6794
0x157bc0d rest_of_handle_cse_after_global_opts
        /gcc/cse.c:7766
0x157bc0d execute
        /gcc/cse.c:7817
Please submit a full bug report,


Is the patch supposed to fix all the ICEs on ARM?

I see this with cross-compilers, it seems OK on native builds? (I
can't find error reports about this on gcc-testresults)

Christophe


> THanks,

> jeff

> >

>
Richard Sandiford March 13, 2020, 10:29 a.m. | #15
Jeff, thanks for picking this up.

Jeff Law <law@redhat.com> writes:
> On Thu, 2020-03-12 at 15:26 -0500, Segher Boessenkool wrote:

>> On Thu, Mar 12, 2020 at 12:47:04PM -0600, Jeff Law wrote:

>> > On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:

>> > > >  	  else if (n_sets == 1

>> > > > -		   && MEM_P (trial)

>> > > > +		   && ! CALL_P (insn)

>> > > > +		   && (MEM_P (trial) || REG_P (trial))

>> > > >  		   && MEM_P (dest)

>> > > >  		   && rtx_equal_p (trial, dest)

>> > > >  		   && !side_effects_p (dest)

>> > > 

>> > > This adds the !CALL_P (no space btw) condition, why is that?

>> > Because n_sets is not valid for CALL_P insns which resulted in a failure on

>> > ppc. 

>> > See find_sets_in_insn which ignores the set on the LHS of a call.  So imagine

>> > if

>> > we had a nop register set in parallel with a (set (reg) (call ...)).  We'd

>> > end up

>> > deleting the entire PARALLEL which is obviously wrong.

>> 

>> Ah, I see.  So this is exposed on Power by the TOC stuff, I guess?  CSE

>> sees a TOC set parallel with a call as a no-op because it is set to the

>> same value (an unspec, not an unspec_volatile) that GCC can derive is

>> already in the TOC reg?  Or is this some other case?

> Not entirely sure.  Richard's message didn't include the precise details. 


Yeah, that was exactly it.

On the bright side, removing many calls as dead made for an easy-to-debug
bootstrap failure :-)

Richard
Christophe Lyon via Gcc-patches March 13, 2020, 9:49 p.m. | #16
On Fri, 2020-03-13 at 09:09 +0100, Christophe Lyon wrote:
> Hi,

> 

> 

> On Thu, 12 Mar 2020 at 23:12, Jeff Law via Gcc-patches

> <gcc-patches@gcc.gnu.org> wrote:

> > On Thu, 2020-03-12 at 13:23 -0500, Segher Boessenkool wrote:

> > > On Thu, Mar 12, 2020 at 12:03:08PM -0600, Jeff Law wrote:

> > > > On Sat, 2020-02-08 at 10:41 -0600, Segher Boessenkool wrote:

> > > > > I don't think each stanza of code should use it's own "noop-ness", no.

> > > > Richard's patch is actually better than mine in that regard as it handles

> > > > mem

> > > > and

> > > > reg nop moves in an indentical way.  I don't think refactoring the cse.c

> > > > code

> > > > is

> > > > advisable now though -- but I do want to fix the multiply-reported ICE on

> > > > ARM

> > > > and

> > > > Richard's cse changes are the cleanest way to do that that I can see.

> > > 

> > > It looks pretty simple, yeah...  All of CSE is hopelessly fragile, but

> > > this patch does not make things worse.

> > > 

> > > > > I don't know if this patch makes matters worse or not.  It doesn't seem

> > > > > suitable for stage 4 though.  And Richard said the cse.c part breaks

> > > > > rs6000, if that is true, yes I do object ;-)

> > > > The rs6000 port breakage is trivial to fix.  In fact, I did so and ran it

> > > > through

> > > > my tester, which includes ppc64le and ppc64 a slew of *-elf targets x86

> > > > native

> > > > and more.

> > > 

> > > I don't see anything rs6000 below?  Is it just this generic code?

> > > 

> > > > @@ -5324,9 +5324,11 @@ cse_insn (rtx_insn *insn)

> > > >         }

> > > > 

> > > >       /* Similarly, lots of targets don't allow no-op

> > > > -        (set (mem x) (mem x)) moves.  */

> > > > +        (set (mem x) (mem x)) moves.  Even (set (reg x) (reg x))

> > > > +        might be impossible for certain registers (like CC

> > > > registers).  */

> > > >       else if (n_sets == 1

> > > > -              && MEM_P (trial)

> > > > +              && ! CALL_P (insn)

> > > > +              && (MEM_P (trial) || REG_P (trial))

> > > >                && MEM_P (dest)

> > > >                && rtx_equal_p (trial, dest)

> > > >                && !side_effects_p (dest)

> > > 

> > > This adds the !CALL_P (no space btw) condition, why is that?

> > > 

> > > (Is that CCmode reg-reg move comment about rs6000?  Huh, we *do* have

> > > patterns for that, or *should* have at least!)

> > I fixed the extraneous whitespace and committed the change.

> > 

> 

> The new test fails on ARM:

THanks.  I see what's happening, though I'm not sure *how* it happened.  Anyway,
doing some testing on the fix now.

jeff
>

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d6b5ded32a4..90d9f9d92d3 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@ 
+2020-02-04  Jeff Law <law@redhat.com>
+
+	PR rtl-optimization/90275
+	* cse.c (cse_insn): Stop processing reg->reg nop moves early.
+
 2020-02-04  Richard Biener  <rguenther@suse.de>
 
 	PR tree-optimization/93538
diff --git a/gcc/cse.c b/gcc/cse.c
index 79ee0ce80e3..6e18bdae85f 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -5572,6 +5572,16 @@  cse_insn (rtx_insn *insn)
 	  sets[i].rtl = 0;
 	}
 
+      /* Similarly for no-op moves.  */
+      if (n_sets == 1
+	  && GET_CODE (src) == REG
+	  && src == dest)
+	{
+	  cse_cfg_altered |= delete_insn_and_edges (insn);
+	  /* No more processing for this set.  */
+	  sets[i].rtl = 0;
+	}
+
       /* If this SET is now setting PC to a label, we know it used to
 	 be a conditional or computed branch.  */
       else if (dest == pc_rtx && GET_CODE (src) == LABEL_REF
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index d2dc6648bc4..7be52bd6d2a 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@ 
+2020-02-04  Jeff Law  <law@redhat.com>
+
+	PR rtl-optimization/90275
+	* gcc.c-torture/compile/pr90275.c: New test
+
 2020-02-04  David Malcolm  <dmalcolm@redhat.com>
 
 	* gcc.dg/analyzer/data-model-1.c (struct coord): Convert fields
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr90275.c b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
new file mode 100644
index 00000000000..83e0df77226
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr90275.c
@@ -0,0 +1,27 @@ 
+a, b, c;
+
+long long d;
+
+e() {
+
+  char f;
+
+  for (;;) {
+
+    c = a = c ? 5 : 0;
+
+    if (f) {
+
+      b = a;
+
+      f = d;
+
+    }
+
+    (d || b) < (a > e) ?: (b ? 0 : f) || (d -= f);
+
+  }
+
+}
+
+