Tighten LRA cycling check

Message ID 87wp0x8gad.fsf@linaro.org
State New
Headers show
Series
  • Tighten LRA cycling check
Related show

Commit Message

Richard Sandiford Jan. 4, 2018, 9:28 p.m.
LRA has code to try to prevent cycling, by avoiding reloads that
look too similar to the instruction being reloaded.  E.g. if we
have a R<-C move for some constant C, reloading the source with
another R<-C move is unlikely to be a good idea.

However, this safeguard unnecessarily triggered in tests like
the one in the patch.  We started with instructions like:

(insn 12 9 13 5 (set (reg:DI 0 x0)
        (reg/f:DI 459)) "reg-alloc-1.c":18 47 {*movdi_aarch64}
     (expr_list:REG_EQUAL (symbol_ref:DI ("x00") [flags 0xc0]  <var_decl 0x7f3c03c1f510 x00>)
        (nil)))

where r459 didn't get allocated a register and is equivalent to
constant x00.  LRA would then handle it like this:

Changing pseudo 459 in operand 1 of insn 12 on equiv `x00'
            1 Non-pseudo reload: reject+=2
            1 Non input pseudo reload: reject++
          alt=0,overall=609,losers=1,rld_nregs=1
[...]
          alt=13,overall=9,losers=1,rld_nregs=1
[...]
         Choosing alt 13 in insn 12:  (0) r  (1) w {*movdi_aarch64}

In other words, to avoid loading the constant x00 into another GPR,
LRA decided instead to move it into a floating-point register,
then move that floating-point register into x0:

      Creating newreg=630, assigning class FP_REGS to r630
      Set class ALL_REGS for r631
   12: x0:DI=r630:DI
      REG_EQUAL `x00'
    Inserting insn reload before:
  815: r631:DI=high(`x00')
  816: r630:DI=r631:DI+low(`x00')
      REG_EQUAL `x00'

That's inefficient and doesn't really help to resolve a cycling
problem, since the r630 destination of 816 needs to be reloaded into
a GPR anyway.

The cycling check already had an exception for source values that are
the result of an elimination.  This patch extends it to include the
result of equivalence substitution.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by comparing the before and after assembly output for at
least one target per CPU directory.  The targets most affected were:

  aarch64_be-linux-gn
  aarch64-linux-gnu
  powerpc-eabispe
  riscv32-elf
  riscv64-elf
  sparc64-linux-gnu
  sparc-linux-gnu
  spu-elf

for which it improved code size overall.  There were minor register
renaming differences on some other targets.  x86 and powerpc*-linux-gnu
targets were unaffected.

OK to install?

Richard


2018-01-04  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* lra-constraints.c (process_alt_operands): Test for the equivalence
	substitutions when detecting a possible reload cycle.

gcc/testsuite/
	* gcc.target/aarch64/reg-alloc-1.c: New test.

Comments

Jeff Law Jan. 5, 2018, 11:11 p.m. | #1
On 01/04/2018 02:28 PM, Richard Sandiford wrote:
> LRA has code to try to prevent cycling, by avoiding reloads that

> look too similar to the instruction being reloaded.  E.g. if we

> have a R<-C move for some constant C, reloading the source with

> another R<-C move is unlikely to be a good idea.

> 

> However, this safeguard unnecessarily triggered in tests like

> the one in the patch.  We started with instructions like:

> 

> (insn 12 9 13 5 (set (reg:DI 0 x0)

>         (reg/f:DI 459)) "reg-alloc-1.c":18 47 {*movdi_aarch64}

>      (expr_list:REG_EQUAL (symbol_ref:DI ("x00") [flags 0xc0]  <var_decl 0x7f3c03c1f510 x00>)

>         (nil)))

> 

> where r459 didn't get allocated a register and is equivalent to

> constant x00.  LRA would then handle it like this:

> 

> Changing pseudo 459 in operand 1 of insn 12 on equiv `x00'

>             1 Non-pseudo reload: reject+=2

>             1 Non input pseudo reload: reject++

>           alt=0,overall=609,losers=1,rld_nregs=1

> [...]

>           alt=13,overall=9,losers=1,rld_nregs=1

> [...]

>          Choosing alt 13 in insn 12:  (0) r  (1) w {*movdi_aarch64}

> 

> In other words, to avoid loading the constant x00 into another GPR,

> LRA decided instead to move it into a floating-point register,

> then move that floating-point register into x0:

> 

>       Creating newreg=630, assigning class FP_REGS to r630

>       Set class ALL_REGS for r631

>    12: x0:DI=r630:DI

>       REG_EQUAL `x00'

>     Inserting insn reload before:

>   815: r631:DI=high(`x00')

>   816: r630:DI=r631:DI+low(`x00')

>       REG_EQUAL `x00'

> 

> That's inefficient and doesn't really help to resolve a cycling

> problem, since the r630 destination of 816 needs to be reloaded into

> a GPR anyway.

> 

> The cycling check already had an exception for source values that are

> the result of an elimination.  This patch extends it to include the

> result of equivalence substitution.

> 

> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.

> Also tested by comparing the before and after assembly output for at

> least one target per CPU directory.  The targets most affected were:

> 

>   aarch64_be-linux-gn

>   aarch64-linux-gnu

>   powerpc-eabispe

>   riscv32-elf

>   riscv64-elf

>   sparc64-linux-gnu

>   sparc-linux-gnu

>   spu-elf

> 

> for which it improved code size overall.  There were minor register

> renaming differences on some other targets.  x86 and powerpc*-linux-gnu

> targets were unaffected.

> 

> OK to install?

> 

> Richard

> 

> 

> 2018-01-04  Richard Sandiford  <richard.sandiford@linaro.org>

> 

> gcc/

> 	* lra-constraints.c (process_alt_operands): Test for the equivalence

> 	substitutions when detecting a possible reload cycle.

> 

> gcc/testsuite/

> 	* gcc.target/aarch64/reg-alloc-1.c: New test.

OK.  Presumably related to 83699?  Do you want to reference it in the
ChangeLog?

jeff
Richard Sandiford Jan. 6, 2018, 11:11 a.m. | #2
Jeff Law <law@redhat.com> writes:
> On 01/04/2018 02:28 PM, Richard Sandiford wrote:

>> LRA has code to try to prevent cycling, by avoiding reloads that

>> look too similar to the instruction being reloaded.  E.g. if we

>> have a R<-C move for some constant C, reloading the source with

>> another R<-C move is unlikely to be a good idea.

>> 

>> However, this safeguard unnecessarily triggered in tests like

>> the one in the patch.  We started with instructions like:

>> 

>> (insn 12 9 13 5 (set (reg:DI 0 x0)

>>         (reg/f:DI 459)) "reg-alloc-1.c":18 47 {*movdi_aarch64}

>>      (expr_list:REG_EQUAL (symbol_ref:DI ("x00") [flags 0xc0]  <var_decl 0x7f3c03c1f510 x00>)

>>         (nil)))

>> 

>> where r459 didn't get allocated a register and is equivalent to

>> constant x00.  LRA would then handle it like this:

>> 

>> Changing pseudo 459 in operand 1 of insn 12 on equiv `x00'

>>             1 Non-pseudo reload: reject+=2

>>             1 Non input pseudo reload: reject++

>>           alt=0,overall=609,losers=1,rld_nregs=1

>> [...]

>>           alt=13,overall=9,losers=1,rld_nregs=1

>> [...]

>>          Choosing alt 13 in insn 12:  (0) r  (1) w {*movdi_aarch64}

>> 

>> In other words, to avoid loading the constant x00 into another GPR,

>> LRA decided instead to move it into a floating-point register,

>> then move that floating-point register into x0:

>> 

>>       Creating newreg=630, assigning class FP_REGS to r630

>>       Set class ALL_REGS for r631

>>    12: x0:DI=r630:DI

>>       REG_EQUAL `x00'

>>     Inserting insn reload before:

>>   815: r631:DI=high(`x00')

>>   816: r630:DI=r631:DI+low(`x00')

>>       REG_EQUAL `x00'

>> 

>> That's inefficient and doesn't really help to resolve a cycling

>> problem, since the r630 destination of 816 needs to be reloaded into

>> a GPR anyway.

>> 

>> The cycling check already had an exception for source values that are

>> the result of an elimination.  This patch extends it to include the

>> result of equivalence substitution.

>> 

>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.

>> Also tested by comparing the before and after assembly output for at

>> least one target per CPU directory.  The targets most affected were:

>> 

>>   aarch64_be-linux-gn

>>   aarch64-linux-gnu

>>   powerpc-eabispe

>>   riscv32-elf

>>   riscv64-elf

>>   sparc64-linux-gnu

>>   sparc-linux-gnu

>>   spu-elf

>> 

>> for which it improved code size overall.  There were minor register

>> renaming differences on some other targets.  x86 and powerpc*-linux-gnu

>> targets were unaffected.

>> 

>> OK to install?

>> 

>> Richard

>> 

>> 

>> 2018-01-04  Richard Sandiford  <richard.sandiford@linaro.org>

>> 

>> gcc/

>> 	* lra-constraints.c (process_alt_operands): Test for the equivalence

>> 	substitutions when detecting a possible reload cycle.

>> 

>> gcc/testsuite/

>> 	* gcc.target/aarch64/reg-alloc-1.c: New test.

> OK.


Thanks.

> Presumably related to 83699?  Do you want to reference it in the

> ChangeLog?


It's actually an unrelated problem that I'd originally seen on SVE.
I guess I was just lucky to have two cycling bugs around the same time :-)

Richard

Patch

Index: gcc/lra-constraints.c
===================================================================
--- gcc/lra-constraints.c	2018-01-04 21:28:22.495890864 +0000
+++ gcc/lra-constraints.c	2018-01-04 21:28:22.637885000 +0000
@@ -2866,7 +2866,12 @@  process_alt_operands (int only_alternati
 		  /* If it is a result of recent elimination in move
 		     insn we can transform it into an add still by
 		     using this alternative.  */
-		  && GET_CODE (no_subreg_reg_operand[1]) != PLUS)))
+		  && GET_CODE (no_subreg_reg_operand[1]) != PLUS
+		  /* Likewise if the source has been replaced with an
+		     equivalent value.  This only happens once -- the reload
+		     will use the equivalent value instead of the register it
+		     replaces -- so there should be no danger of cycling.  */
+		  && !equiv_substition_p[1])))
 	{
 	  /* We have a move insn and a new reload insn will be similar
 	     to the current insn.  We should avoid such situation as
Index: gcc/testsuite/gcc.target/aarch64/reg-alloc-1.c
===================================================================
--- /dev/null	2018-01-04 17:16:45.681472446 +0000
+++ gcc/testsuite/gcc.target/aarch64/reg-alloc-1.c	2018-01-04 21:28:22.637885000 +0000
@@ -0,0 +1,26 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-reload-details" } */
+
+#define R1(X, Y) X (Y##0) X (Y##1) X (Y##2) X (Y##3) \
+                 X (Y##4) X (Y##5) X (Y##6) X (Y##7)
+#define R2(X) R1 (X, 0) R1 (X, 1) R1 (X, 2) R1 (X, 3)
+
+#define DEFINE(N) extern int x##N;
+R2 (DEFINE)
+
+void b1 (int *);
+
+void
+foo (int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+#define CALL(N) b1 (&x##N);
+      R2 (CALL);
+      R2 (CALL);
+    }
+#define INC(N) x##N += 1;
+  R2 (INC);
+}
+
+/* { dg-final { scan-rtl-dump-not "DI 32 v0" "reload" } } */