One more patch for PR93564

Message ID cec451f9-eb63-1d8e-dfa4-c7dd24d745cd@redhat.com
State New
Headers show
Series
  • One more patch for PR93564
Related show

Commit Message

Vladimir Makarov Feb. 28, 2020, 4:39 p.m.
The following patch is dealing with arm failures after submitting 
original patch for PR93564.

   Changing heuristics in the original patch resulted in different order 
of allocation and creating gaps in hard reg file which were not enough 
for pseudos requiring double regs.  So RA started to use caller-saved 
regs and additional store/load insns in function prologue. That is the 
reason for some arm failures.

   The patch was successfully bootstrapped and benchmarked on x86-64.  
On x86-64 SPEC2000 the patch generates a bit smaller and faster in 
average code.

Comments

Christophe Lyon March 2, 2020, 2:37 p.m. | #1
On Fri, 28 Feb 2020 at 17:39, Vladimir Makarov <vmakarov@redhat.com> wrote:
>

>   The following patch is dealing with arm failures after submitting

> original patch for PR93564.

>

>    Changing heuristics in the original patch resulted in different order

> of allocation and creating gaps in hard reg file which were not enough

> for pseudos requiring double regs.  So RA started to use caller-saved

> regs and additional store/load insns in function prologue. That is the

> reason for some arm failures.

>

>    The patch was successfully bootstrapped and benchmarked on x86-64.

> On x86-64 SPEC2000 the patch generates a bit smaller and faster in

> average code.

>


Hi,

This is causing another set of regressions on arm.
For instance on arm-linux-gnueabihf --with-cpu cortex-a9
--with-fpu neon-fp16:
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-not vmov\\.f16
FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov\\.f32\\ts1, s0
FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler vmov\\.f32\\ts1, s0
FAIL: gcc.target/arm/fuse-caller-save.c scan-assembler-times mov\tr3, r0 1
FAIL: gcc.target/arm/unaligned-argument-2.c scan-assembler-times stm 1

Christophe
Jeff Law March 2, 2020, 3:17 p.m. | #2
On Mon, 2020-03-02 at 15:37 +0100, Christophe Lyon wrote:
> On Fri, 28 Feb 2020 at 17:39, Vladimir Makarov <vmakarov@redhat.com> wrote:

> >   The following patch is dealing with arm failures after submitting

> > original patch for PR93564.

> > 

> >    Changing heuristics in the original patch resulted in different order

> > of allocation and creating gaps in hard reg file which were not enough

> > for pseudos requiring double regs.  So RA started to use caller-saved

> > regs and additional store/load insns in function prologue. That is the

> > reason for some arm failures.

> > 

> >    The patch was successfully bootstrapped and benchmarked on x86-64.

> > On x86-64 SPEC2000 the patch generates a bit smaller and faster in

> > average code.

> > 

> 

> Hi,

> 

> This is causing another set of regressions on arm.

> For instance on arm-linux-gnueabihf --with-cpu cortex-a9

> --with-fpu neon-fp16:

> FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-not vmov\\.f16

> FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov\\.f32\\ts1, s0

> FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler vmov\\.f32\\ts1, s0

> FAIL: gcc.target/arm/fuse-caller-save.c scan-assembler-times mov\tr3, r0 1

> FAIL: gcc.target/arm/unaligned-argument-2.c scan-assembler-times stm 1

I suspect at least some of these are likely just register assignments changing.

Jeff
>
Jeff Law March 2, 2020, 3:40 p.m. | #3
On Mon, 2020-03-02 at 08:17 -0700, Jeff Law wrote:
> On Mon, 2020-03-02 at 15:37 +0100, Christophe Lyon wrote:

> > On Fri, 28 Feb 2020 at 17:39, Vladimir Makarov <vmakarov@redhat.com> wrote:

> > >   The following patch is dealing with arm failures after submitting

> > > original patch for PR93564.

> > > 

> > >    Changing heuristics in the original patch resulted in different order

> > > of allocation and creating gaps in hard reg file which were not enough

> > > for pseudos requiring double regs.  So RA started to use caller-saved

> > > regs and additional store/load insns in function prologue. That is the

> > > reason for some arm failures.

> > > 

> > >    The patch was successfully bootstrapped and benchmarked on x86-64.

> > > On x86-64 SPEC2000 the patch generates a bit smaller and faster in

> > > average code.

> > > 

> > 

> > Hi,

> > 

> > This is causing another set of regressions on arm.

> > For instance on arm-linux-gnueabihf --with-cpu cortex-a9

> > --with-fpu neon-fp16:

> > FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-not vmov\\.f16

> > FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov\\.f32\\ts1, s0

> > FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler vmov\\.f32\\ts1, s0

> > FAIL: gcc.target/arm/fuse-caller-save.c scan-assembler-times mov\tr3, r0 1

> > FAIL: gcc.target/arm/unaligned-argument-2.c scan-assembler-times stm 1

> I suspect at least some of these are likely just register assignments

> changing.

In fact, I'm certain that's the case for fuse-caller-save.c.  I'll be looking
at armv8_2-fp16-move-1.c as well since my tester tripped over that as well.  If
you could evaluate the others it'd be appreciated.

jeff
Vladimir Makarov March 2, 2020, 3:46 p.m. | #4
On 2020-03-02 10:17 a.m., Jeff Law wrote:
> On Mon, 2020-03-02 at 15:37 +0100, Christophe Lyon wrote:

>> On Fri, 28 Feb 2020 at 17:39, Vladimir Makarov <vmakarov@redhat.com> wrote:

>>>    The following patch is dealing with arm failures after submitting

>>> original patch for PR93564.

>>>

>>>     Changing heuristics in the original patch resulted in different order

>>> of allocation and creating gaps in hard reg file which were not enough

>>> for pseudos requiring double regs.  So RA started to use caller-saved

>>> regs and additional store/load insns in function prologue. That is the

>>> reason for some arm failures.

>>>

>>>     The patch was successfully bootstrapped and benchmarked on x86-64.

>>> On x86-64 SPEC2000 the patch generates a bit smaller and faster in

>>> average code.

>>>

>> Hi,

>>

>> This is causing another set of regressions on arm.

>> For instance on arm-linux-gnueabihf --with-cpu cortex-a9

>> --with-fpu neon-fp16:

>> FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-not vmov\\.f16

>> FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov\\.f32\\ts1, s0

>> FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler vmov\\.f32\\ts1, s0

>> FAIL: gcc.target/arm/fuse-caller-save.c scan-assembler-times mov\tr3, r0 1

>> FAIL: gcc.target/arm/unaligned-argument-2.c scan-assembler-times stm 1

> I suspect at least some of these are likely just register assignments changing.

>

   It is a generation of unexpected but still correct code. Changing 
heursitics can create small gaps in hard reg files which are not enough 
to fit multi-regs pseudos and there will be more probability of usage of 
callee-saved regs which means loads/stores in prologue/epilogue.

   As assigning to multi-regs pseudos first was never the highest 
priority in the assignment (execution frequency has a higher priority), 
we were lucky enough to generate the expected code.  In general, these 
kind failures are for very small functions without loops where even 
stack is not used.  The more important cases are RA for big functions 
(as we have aggressive inlining) with loops and for these cases the 
latest patch decreases SPEC2000 code size and improved the performance 
visibly at least for x86-64.

   In any case, I'll look at these tests but fixing all RA performance 
issues and tests checking them is might be just chasing a rainbow.
Jeff Law March 2, 2020, 4:11 p.m. | #5
On Mon, 2020-03-02 at 08:40 -0700, Jeff Law wrote:
> On Mon, 2020-03-02 at 08:17 -0700, Jeff Law wrote:

> > On Mon, 2020-03-02 at 15:37 +0100, Christophe Lyon wrote:

> > > On Fri, 28 Feb 2020 at 17:39, Vladimir Makarov <vmakarov@redhat.com>

> > > wrote:

> > > >   The following patch is dealing with arm failures after submitting

> > > > original patch for PR93564.

> > > > 

> > > >    Changing heuristics in the original patch resulted in different

> > > > order

> > > > of allocation and creating gaps in hard reg file which were not enough

> > > > for pseudos requiring double regs.  So RA started to use caller-saved

> > > > regs and additional store/load insns in function prologue. That is the

> > > > reason for some arm failures.

> > > > 

> > > >    The patch was successfully bootstrapped and benchmarked on x86-64.

> > > > On x86-64 SPEC2000 the patch generates a bit smaller and faster in

> > > > average code.

> > > > 

> > > 

> > > Hi,

> > > 

> > > This is causing another set of regressions on arm.

> > > For instance on arm-linux-gnueabihf --with-cpu cortex-a9

> > > --with-fpu neon-fp16:

> > > FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-not vmov\\.f16

> > > FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov\\.f32\\ts1, s0

> > > FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler vmov\\.f32\\ts1, s0

> > > FAIL: gcc.target/arm/fuse-caller-save.c scan-assembler-times mov\tr3, r0

> > > 1

> > > FAIL: gcc.target/arm/unaligned-argument-2.c scan-assembler-times stm 1

> > I suspect at least some of these are likely just register assignments

> > changing.

> In fact, I'm certain that's the case for fuse-caller-save.c.  I'll be looking

> at armv8_2-fp16-move-1.c as well since my tester tripped over that as

> well.  If you could evaluate the others it'd be appreciated.

And I'm now certain armv8_2-fp16-move-1.c is of a similar nature.

In that test we get a slightly different packing of registers after Vlad's IRA
changes.  The different packing into registers ultimately results in one hard
register cprop not happening after Vlad's changes.  As a result we end up with
an extra reg->reg copy and the test fails.

This may be one we just have to live with.  As we come into cprop_hardreg we
have this after Vlad's changes:

(set (reg 0) (reg 18))
(set (reg 18) (float_extend ...)

[ ... ]
set (reg 17) (reg 0)


Obviously the set to reg18 in the middle insn blocks the ability to propagate
the source of the first set into the source of the last set.  Prior to Vlad's
change that middle set used a different hard register and thus didn't block the
hard register cprop.  But that was more of an accident than anything -- Vlad's
work results in, IMHO a better hard register allocation -- which in turn
inhibits hard register cprop.

I think the thing to do is either expect the single copy or xfail the test. 
I'm going to leave it to the ARM maintainers to decide how they can to handle
that.   I don't think this very minor code quality regression is significant
enough to warrant backing out Vlad's change.

Another approach would be to see if register renaming helps here, but that's a
can of worms I don't think we want to open at this point.


Jeff

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f6a9ae2375e..1451807f7d9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@ 
+2020-02-28  Vladimir Makarov  <vmakarov@redhat.com>
+
+	PR rtl-optimization/93564
+	* ira-color.c (assign_hard_reg): Prefer smaller hard regno when we
+	do not honor reg alloc order.
+
 2020-02-27  Joel Hutton  <Joel.Hutton@arm.com>
 
 	PR target/87612
diff --git a/gcc/ira-color.c b/gcc/ira-color.c
index 0ffdd192020..a2bf108c38e 100644
--- a/gcc/ira-color.c
+++ b/gcc/ira-color.c
@@ -1925,7 +1925,9 @@  assign_hard_reg (ira_allocno_t a, bool retry_p)
 	}
       if (min_cost > cost)
 	min_cost = cost;
-      if (min_full_cost > full_cost)
+      if (min_full_cost > full_cost
+	  || (!HONOR_REG_ALLOC_ORDER && min_full_cost == full_cost
+	      && best_hard_regno > hard_regno))
 	{
 	  min_full_cost = full_cost;
 	  best_hard_regno = hard_regno;