lower-subreg: PR94123, SVN r273240, causes gcc.target/powerpc/pr87507.c to fail

Message ID 20ee8944-f0bf-cec1-e3d1-5dd5e9c6a4ef@linux.ibm.com
State New
Headers show
Series
  • lower-subreg: PR94123, SVN r273240, causes gcc.target/powerpc/pr87507.c to fail
Related show

Commit Message

Xionghu Luo via Gcc-patches March 27, 2020, 10:41 p.m.
The pr87507.c test case regressed due to Segher's commit that added
-fsplit-wide-types-early.  The issue is that the lower-subreg pass only
decomposes the TImode code in the example if there is a pseudo reg to pseudo
reg copy.  When the lower-subreg pass is called late (its old location),
then combine changes the generated code by adding a TImode pseudo reg to
pseudo reg copy and lower-subreg successfully decomposes it.

When we run lower-subreg before combine, that copy isn't there so we
do not decompose our TImode uses.  I'm not sure why we require a pseudo
to pseudo copy before we decompose things, but changing find_pseudo_copy
to allow pseudo and hard regs to be copied into a pseudo like below fixes
the issue.

Ian, do you remember why we don't just decompose all wide types and instead
require a pseudo to pseudo copy to exist?

This patch survived bootstrap and regtesting on powerpc64le-linux with
no regressions.

	* lower-subreg.c (find_pseudo_copy): Allow copies from hard registers
	too.



Does anyone have any preferences on the patches above or comments?
If we go with the first patch for stage1, I'll add -fno-split-wide-types-early
to pr87507.c so it doesn't FAIL until the patch goes in.

Peter

Comments

Ian Lance Taylor March 27, 2020, 11:26 p.m. | #1
Peter Bergner <bergner@linux.ibm.com> writes:

> Ian, do you remember why we don't just decompose all wide types and instead

> require a pseudo to pseudo copy to exist?


No, sorry, I don't remember this at all.

Ian
Segher Boessenkool March 28, 2020, 7:22 p.m. | #2
On Fri, Mar 27, 2020 at 05:41:36PM -0500, Peter Bergner wrote:
> Given we're late in stage4, the above patch (assuming it's ok) probably

> shouldn't go in until stage1, since it is changing lower-subreg's behaviour

> slightly.

> 

> A different (ie, safer) approach would be to just rerun lower-subreg at

> its old location, regardless of whether we used -fsplit-wide-types-early.


That is my preference, for a simpler reason even: when I added the new
pass I disabled the old one, thinking it wouldn't do anything useful
anymore.  Here you show that isn't true.

> This way, we are not changing lower-subreg's behaviour, just running it an

> extra time (3 times instead of twice when using -fsplit-wide-types-early).

> I don't think lower-subreg is too expensive to run an extra time


Yes, I think so too.

> and we'd only do it when using -fsplit-wide-types-early.


But note that is true by default for some targets (rs6000 and avr
currently, I think).

>    /* opt_pass methods: */

> -  virtual bool gate (function *) { return flag_split_wide_types

> -					  && !flag_split_wide_types_early; }

> +  virtual bool gate (function *) { return flag_split_wide_types != 0; }


I think you mean
  return flag_split_wide_types != 0 != 0 != 0;

:-P


Segher
Xionghu Luo via Gcc-patches March 28, 2020, 11:39 p.m. | #3
On 3/28/20 2:22 PM, Segher Boessenkool wrote:
> On Fri, Mar 27, 2020 at 05:41:36PM -0500, Peter Bergner wrote:

>> A different (ie, safer) approach would be to just rerun lower-subreg at

>> its old location, regardless of whether we used -fsplit-wide-types-early.

> 

> That is my preference, for a simpler reason even: when I added the new

> pass I disabled the old one, thinking it wouldn't do anything useful

> anymore.  Here you show that isn't true.

>

>> This way, we are not changing lower-subreg's behaviour, just running it an

>> extra time (3 times instead of twice when using -fsplit-wide-types-early).

>> I don't think lower-subreg is too expensive to run an extra time

> 

> Yes, I think so too.


Right.  However, like I said though, the downside is that we don't expose
the decomposed uses to passes in between subreg2 and subreg3, like combine,
etc.  Isn't that why you moved it early in the first place?  Then again,
maybe you're getting the important cases now and subreg3 is just cleanup?

That said, I'm fine with whatever you, richi and others want.



>>    /* opt_pass methods: */

>> -  virtual bool gate (function *) { return flag_split_wide_types

>> -					  && !flag_split_wide_types_early; }

>> +  virtual bool gate (function *) { return flag_split_wide_types != 0; }

> 

> I think you mean

>   return flag_split_wide_types != 0 != 0 != 0;


Heh, I was just reverting it to the code prior to your change.  I can make
that just "return flag_split_wide_types;" if you like and we end up going
with this version.

Peter
Richard Sandiford March 30, 2020, 8:50 a.m. | #4
Peter Bergner via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> The pr87507.c test case regressed due to Segher's commit that added

> -fsplit-wide-types-early.  The issue is that the lower-subreg pass only

> decomposes the TImode code in the example if there is a pseudo reg to pseudo

> reg copy.  When the lower-subreg pass is called late (its old location),

> then combine changes the generated code by adding a TImode pseudo reg to

> pseudo reg copy and lower-subreg successfully decomposes it.

>

> When we run lower-subreg before combine, that copy isn't there so we

> do not decompose our TImode uses.  I'm not sure why we require a pseudo

> to pseudo copy before we decompose things, but changing find_pseudo_copy

> to allow pseudo and hard regs to be copied into a pseudo like below fixes

> the issue.

>

> Ian, do you remember why we don't just decompose all wide types and instead

> require a pseudo to pseudo copy to exist?

>

> This patch survived bootstrap and regtesting on powerpc64le-linux with

> no regressions.

>

> 	* lower-subreg.c (find_pseudo_copy): Allow copies from hard registers

> 	too.

>

> diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c

> index 4c8bc835f93..c6816a34489 100644

> --- a/gcc/lower-subreg.c

> +++ b/gcc/lower-subreg.c

> @@ -419,7 +419,7 @@ find_pseudo_copy (rtx set)

>  

>    rd = REGNO (dest);

>    rs = REGNO (src);

> -  if (HARD_REGISTER_NUM_P (rd) || HARD_REGISTER_NUM_P (rs))

> +  if (HARD_REGISTER_NUM_P (rd))

>      return false;

>  

>    b = reg_copy_graph[rs];


I guess this would also work if we dropped the rd check instead.
So how about s/||/&&/ instead, to avoid the assymetry?

I agree something like this is a better fix long-term, since we
shouldn't be relying on make_more_copies outside combine.

> Given we're late in stage4, the above patch (assuming it's ok) probably

> shouldn't go in until stage1, since it is changing lower-subreg's behaviour

> slightly.

>

> A different (ie, safer) approach would be to just rerun lower-subreg at

> its old location, regardless of whether we used -fsplit-wide-types-early.

> This way, we are not changing lower-subreg's behaviour, just running it an

> extra time (3 times instead of twice when using -fsplit-wide-types-early).

> I don't think lower-subreg is too expensive to run an extra time and we'd

> only do it when using -fsplit-wide-types-early.  The only downside (if any)

> is that we don't decompose these TImode uses early like the patch above does,

> so combine, etc. can't see what they will eventually become.  This does fix

> the bug and also survives bootstrap and regtesting on powerpc64le-linux

> with no regressions.

>

> 	* lower-subreg.c (pass_lower_subreg3::gate): Remove test for

> 	flag_split_wide_types_early.

>

> diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c

> index 4c8bc835f93..807ad398b64 100644

> --- a/gcc/lower-subreg.c

> +++ b/gcc/lower-subreg.c

> @@ -1844,8 +1844,7 @@ public:

>    {}

>  

>    /* opt_pass methods: */

> -  virtual bool gate (function *) { return flag_split_wide_types

> -					  && !flag_split_wide_types_early; }

> +  virtual bool gate (function *) { return flag_split_wide_types != 0; }

>    virtual unsigned int execute (function *)

>      {

>        decompose_multiword_subregs (true);


Looks good to me with the s/ != 0// that Segher mentioned.

With this change, the only remaining function of -fsplit-wide-types-early
is to act as a double lock on one pass.  IMO it'd make more sense to remove
that double lock and make -fsplit-wide-types-early and -fsplit-wide-types
act as independent options, a bit like -fschedule-insns{,2}.

Thanks,
Richard
Segher Boessenkool March 30, 2020, 11:26 a.m. | #5
Hi!

On Mon, Mar 30, 2020 at 09:50:05AM +0100, Richard Sandiford wrote:
> Peter Bergner via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> > -  if (HARD_REGISTER_NUM_P (rd) || HARD_REGISTER_NUM_P (rs))

> > +  if (HARD_REGISTER_NUM_P (rd))

> >      return false;

> >  

> >    b = reg_copy_graph[rs];

> 

> I guess this would also work if we dropped the rd check instead.

> So how about s/||/&&/ instead, to avoid the assymetry?

> 

> I agree something like this is a better fix long-term, since we

> shouldn't be relying on make_more_copies outside combine.


Yes; on the other hand, most RTL passes should do something to not have
hard registers forwarded into non-move instructions (where they can
cause problems later).  (make_more_copies itself is a technicality
specific to how combine works, and we might be able to drop it in the
future).

> With this change, the only remaining function of -fsplit-wide-types-early

> is to act as a double lock on one pass.  IMO it'd make more sense to remove

> that double lock and make -fsplit-wide-types-early and -fsplit-wide-types

> act as independent options, a bit like -fschedule-insns{,2}.


Sure, that would simplify things a bit (at least conceptually).

With or without that change, the documentation could use some tweaking
as well, after this patch.


Segher
Segher Boessenkool March 30, 2020, 4:06 p.m. | #6
Hi!

On Mon, Mar 30, 2020 at 09:50:05AM +0100, Richard Sandiford wrote:
> With this change, the only remaining function of -fsplit-wide-types-early

> is to act as a double lock on one pass.  IMO it'd make more sense to remove

> that double lock and make -fsplit-wide-types-early and -fsplit-wide-types

> act as independent options, a bit like -fschedule-insns{,2}.


But then, -fsplit-wide-types would control two passes, one of which is
identical to -fsplit-wide-types-early, just in a different spot in the
pass pipeline.  That is bad enough on its own, already :-/

I still think all three passes should be controlled by -fsplit-wide-types,
just as they already are (and the "old" two have been for years).


Segher
Xionghu Luo via Gcc-patches March 30, 2020, 4:23 p.m. | #7
On 3/30/20 6:26 AM, Segher Boessenkool wrote:
> On Mon, Mar 30, 2020 at 09:50:05AM +0100, Richard Sandiford wrote:

>> Peter Bergner via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

>>> -  if (HARD_REGISTER_NUM_P (rd) || HARD_REGISTER_NUM_P (rs))

>>> +  if (HARD_REGISTER_NUM_P (rd))

>>>      return false;

>>>  

>>>    b = reg_copy_graph[rs];

>>

>> I guess this would also work if we dropped the rd check instead.

>> So how about s/||/&&/ instead, to avoid the assymetry?

>>

>> I agree something like this is a better fix long-term, since we

>> shouldn't be relying on make_more_copies outside combine.

> 

> Yes; on the other hand, most RTL passes should do something to not have

> hard registers forwarded into non-move instructions (where they can

> cause problems later).  (make_more_copies itself is a technicality

> specific to how combine works, and we might be able to drop it in the

> future).


I kind of agree with Richard above on making it more applicable/symmetric,
but why can't we just remove the HARD_REGISTER_NUM_P() tests altogether?
It's not like lower-subreg is extending hard register lifetime usage than
what is already there in the rtl.  We're just decomposing what's already
there into smaller register sized chunks.  Is there a problem with that
I'm not aware of?


Peter
Xionghu Luo via Gcc-patches March 30, 2020, 4:26 p.m. | #8
On 3/30/20 11:23 AM, Peter Bergner wrote:
> I kind of agree with Richard above on making it more applicable/symmetric,

> but why can't we just remove the HARD_REGISTER_NUM_P() tests altogether?

> It's not like lower-subreg is extending hard register lifetime usage than

> what is already there in the rtl.  We're just decomposing what's already

> there into smaller register sized chunks.  Is there a problem with that

> I'm not aware of?


...or maybe there was an issue when combine used to extend hard register
lifetimes (which it doesn't anymore) and the test above was just a workaround?

Peter
Segher Boessenkool March 30, 2020, 4:39 p.m. | #9
On Mon, Mar 30, 2020 at 11:23:12AM -0500, Peter Bergner wrote:
> On 3/30/20 6:26 AM, Segher Boessenkool wrote:

> > On Mon, Mar 30, 2020 at 09:50:05AM +0100, Richard Sandiford wrote:

> >> Peter Bergner via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> >>> -  if (HARD_REGISTER_NUM_P (rd) || HARD_REGISTER_NUM_P (rs))

> >>> +  if (HARD_REGISTER_NUM_P (rd))

> >>>      return false;

> >>>  

> >>>    b = reg_copy_graph[rs];

> >>

> >> I guess this would also work if we dropped the rd check instead.

> >> So how about s/||/&&/ instead, to avoid the assymetry?

> >>

> >> I agree something like this is a better fix long-term, since we

> >> shouldn't be relying on make_more_copies outside combine.

> > 

> > Yes; on the other hand, most RTL passes should do something to not have

> > hard registers forwarded into non-move instructions (where they can

> > cause problems later).  (make_more_copies itself is a technicality

> > specific to how combine works, and we might be able to drop it in the

> > future).

> 

> I kind of agree with Richard above on making it more applicable/symmetric,

> but why can't we just remove the HARD_REGISTER_NUM_P() tests altogether?

> It's not like lower-subreg is extending hard register lifetime usage than

> what is already there in the rtl.  We're just decomposing what's already

> there into smaller register sized chunks.  Is there a problem with that

> I'm not aware of?


The function comment is (since day 1):
  /* If SET is a copy from one multi-word pseudo-register to another,
     record that in reg_copy_graph.  Return whether it is such a
     copy.  */
so a) that needs fixing; and b) what does this change for the (single)
caller of find_pseudo_copy?  The comment there isn't very enlightening:
  /* We mark pseudo-to-pseudo copies as decomposable during the
     second pass only.  The first pass is so early that there is
     good chance such moves will be optimized away completely by
     subsequent optimizations anyway.

     However, we call find_pseudo_copy even during the first pass
     so as to properly set up the reg_copy_graph.  */

(The function *name* should change as well, if you make this change).


Segher
Xionghu Luo via Gcc-patches April 1, 2020, 5:48 p.m. | #10
On 3/30/20 3:50 AM, Richard Sandiford wrote:
> Peter Bergner via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

>> 	* lower-subreg.c (pass_lower_subreg3::gate): Remove test for

>> 	flag_split_wide_types_early.

>>

>> diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c

>> index 4c8bc835f93..807ad398b64 100644

>> --- a/gcc/lower-subreg.c

>> +++ b/gcc/lower-subreg.c

>> @@ -1844,8 +1844,7 @@ public:

>>    {}

>>  

>>    /* opt_pass methods: */

>> -  virtual bool gate (function *) { return flag_split_wide_types

>> -					  && !flag_split_wide_types_early; }

>> +  virtual bool gate (function *) { return flag_split_wide_types != 0; }

>>    virtual unsigned int execute (function *)

>>      {

>>        decompose_multiword_subregs (true);

> 

> Looks good to me with the s/ != 0// that Segher mentioned.

> 

> With this change, the only remaining function of -fsplit-wide-types-early

> is to act as a double lock on one pass.  IMO it'd make more sense to remove

> that double lock and make -fsplit-wide-types-early and -fsplit-wide-types

> act as independent options, a bit like -fschedule-insns{,2}.


Have we come to consensus on whether to split the options or not?
I think Segher is against it given we actually have 3 passes of
lower-subreg and -fsplit-wide-types would control the 1st and 3rd
passes and -fsplit-wide-types-early would control the second.
That does seem strange to me too.

Peter
Richard Sandiford April 1, 2020, 6:32 p.m. | #11
Peter Bergner <bergner@linux.ibm.com> writes:
> On 3/30/20 3:50 AM, Richard Sandiford wrote:

>> Peter Bergner via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

>>> 	* lower-subreg.c (pass_lower_subreg3::gate): Remove test for

>>> 	flag_split_wide_types_early.

>>>

>>> diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c

>>> index 4c8bc835f93..807ad398b64 100644

>>> --- a/gcc/lower-subreg.c

>>> +++ b/gcc/lower-subreg.c

>>> @@ -1844,8 +1844,7 @@ public:

>>>    {}

>>>  

>>>    /* opt_pass methods: */

>>> -  virtual bool gate (function *) { return flag_split_wide_types

>>> -					  && !flag_split_wide_types_early; }

>>> +  virtual bool gate (function *) { return flag_split_wide_types != 0; }

>>>    virtual unsigned int execute (function *)

>>>      {

>>>        decompose_multiword_subregs (true);

>> 

>> Looks good to me with the s/ != 0// that Segher mentioned.

>> 

>> With this change, the only remaining function of -fsplit-wide-types-early

>> is to act as a double lock on one pass.  IMO it'd make more sense to remove

>> that double lock and make -fsplit-wide-types-early and -fsplit-wide-types

>> act as independent options, a bit like -fschedule-insns{,2}.

>

> Have we come to consensus on whether to split the options or not?

> I think Segher is against it given we actually have 3 passes of

> lower-subreg and -fsplit-wide-types would control the 1st and 3rd

> passes and -fsplit-wide-types-early would control the second.

> That does seem strange to me too.


I guess the name of the option is a bit weird, since it'll control
the middle pass of three.  That's going to be true either way though.

We're talking about having independent options controlling independent
passes, which seems like a Good Thing in general and doesn't seem that
strange to me in this case.  But I'm certainly happy to yield given the
strong opinions the other way.

Thanks,
Richard
Xionghu Luo via Gcc-patches April 1, 2020, 7:43 p.m. | #12
On 4/1/20 1:32 PM, Richard Sandiford wrote:
> Peter Bergner <bergner@linux.ibm.com> writes:

>> Have we come to consensus on whether to split the options or not?

>> I think Segher is against it given we actually have 3 passes of

>> lower-subreg and -fsplit-wide-types would control the 1st and 3rd

>> passes and -fsplit-wide-types-early would control the second.

>> That does seem strange to me too.

> 

> I guess the name of the option is a bit weird, since it'll control

> the middle pass of three.  That's going to be true either way though.

> 

> We're talking about having independent options controlling independent

> passes, which seems like a Good Thing in general and doesn't seem that

> strange to me in this case.  But I'm certainly happy to yield given the

> strong opinions the other way.


Ok, I pushed the patch without breaking them apart.  We can maybe revisit
the issue in stage1, when I'll start testing the first patch that allows
hard registers to be decomposed.

Thanks!

Peter
Segher Boessenkool April 2, 2020, 9:56 p.m. | #13
On Sat, Mar 28, 2020 at 06:39:56PM -0500, Peter Bergner wrote:
> On 3/28/20 2:22 PM, Segher Boessenkool wrote:

> > On Fri, Mar 27, 2020 at 05:41:36PM -0500, Peter Bergner wrote:

> >> A different (ie, safer) approach would be to just rerun lower-subreg at

> >> its old location, regardless of whether we used -fsplit-wide-types-early.

> > 

> > That is my preference, for a simpler reason even: when I added the new

> > pass I disabled the old one, thinking it wouldn't do anything useful

> > anymore.  Here you show that isn't true.

> >

> >> This way, we are not changing lower-subreg's behaviour, just running it an

> >> extra time (3 times instead of twice when using -fsplit-wide-types-early).

> >> I don't think lower-subreg is too expensive to run an extra time

> > 

> > Yes, I think so too.

> 

> Right.  However, like I said though, the downside is that we don't expose

> the decomposed uses to passes in between subreg2 and subreg3, like combine,

> etc.  Isn't that why you moved it early in the first place?  Then again,

> maybe you're getting the important cases now and subreg3 is just cleanup?


Yeah.  subreg1 is the limited one; subreg2 and subreg3 are the "full"
one.  subreg2 is the "early" one that only some archs have by default.
subreg3 will not do much (if subreg2 is enabled), except all the usual
RTL passes (CSE, *prop, combine, etc.) can expose more possibilities
for lower-subreg in some cases.


Segher

Patch

diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c
index 4c8bc835f93..c6816a34489 100644
--- a/gcc/lower-subreg.c
+++ b/gcc/lower-subreg.c
@@ -419,7 +419,7 @@  find_pseudo_copy (rtx set)
 
   rd = REGNO (dest);
   rs = REGNO (src);
-  if (HARD_REGISTER_NUM_P (rd) || HARD_REGISTER_NUM_P (rs))
+  if (HARD_REGISTER_NUM_P (rd))
     return false;
 
   b = reg_copy_graph[rs];

Given we're late in stage4, the above patch (assuming it's ok) probably
shouldn't go in until stage1, since it is changing lower-subreg's behaviour
slightly.

A different (ie, safer) approach would be to just rerun lower-subreg at
its old location, regardless of whether we used -fsplit-wide-types-early.
This way, we are not changing lower-subreg's behaviour, just running it an
extra time (3 times instead of twice when using -fsplit-wide-types-early).
I don't think lower-subreg is too expensive to run an extra time and we'd
only do it when using -fsplit-wide-types-early.  The only downside (if any)
is that we don't decompose these TImode uses early like the patch above does,
so combine, etc. can't see what they will eventually become.  This does fix
the bug and also survives bootstrap and regtesting on powerpc64le-linux
with no regressions.

	* lower-subreg.c (pass_lower_subreg3::gate): Remove test for
	flag_split_wide_types_early.

diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c
index 4c8bc835f93..807ad398b64 100644
--- a/gcc/lower-subreg.c
+++ b/gcc/lower-subreg.c
@@ -1844,8 +1844,7 @@  public:
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *) { return flag_split_wide_types
-					  && !flag_split_wide_types_early; }
+  virtual bool gate (function *) { return flag_split_wide_types != 0; }
   virtual unsigned int execute (function *)
     {
       decompose_multiword_subregs (true);