RFA: a x86 test modification

Message ID ada59c67-0b8e-5ceb-cdc5-0ee82d39bc45@redhat.com
State New
Headers show
Series
  • RFA: a x86 test modification
Related show

Commit Message

Vladimir Makarov Nov. 28, 2018, 9:47 p.m.
The patch I committed today recently for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88207

   creates a new regression for pr34256.c.  2 moves is expected but gcc 
with the patch generates 3 moves.  I think now RA generates the right code.

We have the following code before RA

(insn 7 6 13 2 (set (reg:V2SI 88)
         (plus:V2SI (reg:V2SI 89 [ x ])
             (mem/c:V2SI (symbol_ref:DI ("y") [flags 0x2]  <var_decl 
0x7faad08b5b40 y>) [1 y+0 S8 A64])))"mmintrin.h":306:18 1115 {*mmx_addv2si3}
      (expr_list:REG_DEAD (reg:V2SI 89 [ x ])
         (nil)))
(insn 13 7 14 2 (set (reg/i:DI 0 ax)
         (subreg:DI (reg:V2SI 88) 0)) "pr34256.c":11:1 66 {*movdi_internal}
      (expr_list:REG_DEAD (reg:V2SI 88)

The test is expected to assign mmx reg to pseudo 88 but gcc with the 
patch assigns memory to it.  The cost of mmx to general reg move is 13, 
while overall cost of mmx to mem and mem to general moves is 10.  So IRA 
now chooses memory for pseudo 88 according to the minimal cost.

Now, if we want still assign mmx reg to the pseudo 88, we should change 
the costs in machine-dependent x86 code.  But I think it might create 
other unexpected code generation.  As mmx is basically not used nowadays 
the test is not important, I just propose the following patch.

Is it ok for the trunk?

Comments

Jeff Law Nov. 28, 2018, 11:42 p.m. | #1
On 11/28/18 2:47 PM, Vladimir Makarov wrote:
>   The patch I committed today recently for

> 

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88207

> 

>   creates a new regression for pr34256.c.  2 moves is expected but gcc

> with the patch generates 3 moves.  I think now RA generates the right code.

> 

> We have the following code before RA

> 

> (insn 7 6 13 2 (set (reg:V2SI 88)

>         (plus:V2SI (reg:V2SI 89 [ x ])

>             (mem/c:V2SI (symbol_ref:DI ("y") [flags 0x2]  <var_decl

> 0x7faad08b5b40 y>) [1 y+0 S8 A64])))"mmintrin.h":306:18 1115 {*mmx_addv2si3}

> 

>      (expr_list:REG_DEAD (reg:V2SI 89 [ x ])

>         (nil)))

> (insn 13 7 14 2 (set (reg/i:DI 0 ax)

>         (subreg:DI (reg:V2SI 88) 0)) "pr34256.c":11:1 66 {*movdi_internal}

>      (expr_list:REG_DEAD (reg:V2SI 88)

> 

> The test is expected to assign mmx reg to pseudo 88 but gcc with the

> patch assigns memory to it.  The cost of mmx to general reg move is 13,

> while overall cost of mmx to mem and mem to general moves is 10.  So IRA

> now chooses memory for pseudo 88 according to the minimal cost.

> 

> Now, if we want still assign mmx reg to the pseudo 88, we should change

> the costs in machine-dependent x86 code.  But I think it might create

> other unexpected code generation.  As mmx is basically not used nowadays

> the test is not important, I just propose the following patch.

> 

> Is it ok for the trunk?

> 

> 

> 

> 

> pr88207-2.patch

> 

> Index: testsuite/ChangeLog

> ===================================================================

> --- testsuite/ChangeLog	(revision 266582)

> +++ testsuite/ChangeLog	(working copy)

> @@ -1,3 +1,7 @@

> +2018-11-28  Vladimir Makarov  <vmakarov@redhat.com>

> +

> +	* gcc.target/i386/pr34256.c: Adjust the number of expected moves.

OK.
jeff
Uros Bizjak Nov. 29, 2018, 7:03 a.m. | #2
On Thu, Nov 29, 2018 at 12:43 AM Jeff Law <law@redhat.com> wrote:
>

> On 11/28/18 2:47 PM, Vladimir Makarov wrote:

> >   The patch I committed today recently for

> >

> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88207

> >

> >   creates a new regression for pr34256.c.  2 moves is expected but gcc

> > with the patch generates 3 moves.  I think now RA generates the right code.

> >

> > We have the following code before RA

> >

> > (insn 7 6 13 2 (set (reg:V2SI 88)

> >         (plus:V2SI (reg:V2SI 89 [ x ])

> >             (mem/c:V2SI (symbol_ref:DI ("y") [flags 0x2]  <var_decl

> > 0x7faad08b5b40 y>) [1 y+0 S8 A64])))"mmintrin.h":306:18 1115 {*mmx_addv2si3}

> >

> >      (expr_list:REG_DEAD (reg:V2SI 89 [ x ])

> >         (nil)))

> > (insn 13 7 14 2 (set (reg/i:DI 0 ax)

> >         (subreg:DI (reg:V2SI 88) 0)) "pr34256.c":11:1 66 {*movdi_internal}

> >      (expr_list:REG_DEAD (reg:V2SI 88)

> >

> > The test is expected to assign mmx reg to pseudo 88 but gcc with the

> > patch assigns memory to it.  The cost of mmx to general reg move is 13,

> > while overall cost of mmx to mem and mem to general moves is 10.  So IRA

> > now chooses memory for pseudo 88 according to the minimal cost.

> >

> > Now, if we want still assign mmx reg to the pseudo 88, we should change

> > the costs in machine-dependent x86 code.  But I think it might create

> > other unexpected code generation.  As mmx is basically not used nowadays

> > the test is not important, I just propose the following patch.

> >

> > Is it ok for the trunk?


The generated code is correct, in i386.c, secondary_memory_needed, we have:

  /* ??? This is a lie.  We do have moves between mmx/general, and for
     mmx/sse2.  But by saying we need secondary memory we discourage the
     register allocator from using the mmx registers unless needed.  */
  if (MMX_CLASS_P (class1) != MMX_CLASS_P (class2))
    return true;

But I noticed that for -mtune=generic we still generate direct move.
The cost of interunit MMX -> GR moves should always be higher than the
cost of indirect move (due to secondary_memory_needed), c.f. the
comment in ix86_register_move_cost:

  /* In case we require secondary memory, compute cost of the store followed
     by load.  In order to avoid bad register allocation choices, we need
     for this to be *at least* as high as the symmetric MEMORY_MOVE_COST.  */

IMO, the issue with -mtune=generic warrants some additional analysis.
I'll look at target cost calculations.

Thanks,
Uros.
Uros Bizjak Nov. 29, 2018, 9:18 a.m. | #3
On Thu, Nov 29, 2018 at 8:03 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On Thu, Nov 29, 2018 at 12:43 AM Jeff Law <law@redhat.com> wrote:

> >

> > On 11/28/18 2:47 PM, Vladimir Makarov wrote:

> > >   The patch I committed today recently for

> > >

> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88207

> > >

> > >   creates a new regression for pr34256.c.  2 moves is expected but gcc

> > > with the patch generates 3 moves.  I think now RA generates the right code.

> > >

> > > We have the following code before RA

> > >

> > > (insn 7 6 13 2 (set (reg:V2SI 88)

> > >         (plus:V2SI (reg:V2SI 89 [ x ])

> > >             (mem/c:V2SI (symbol_ref:DI ("y") [flags 0x2]  <var_decl

> > > 0x7faad08b5b40 y>) [1 y+0 S8 A64])))"mmintrin.h":306:18 1115 {*mmx_addv2si3}

> > >

> > >      (expr_list:REG_DEAD (reg:V2SI 89 [ x ])

> > >         (nil)))

> > > (insn 13 7 14 2 (set (reg/i:DI 0 ax)

> > >         (subreg:DI (reg:V2SI 88) 0)) "pr34256.c":11:1 66 {*movdi_internal}

> > >      (expr_list:REG_DEAD (reg:V2SI 88)

> > >

> > > The test is expected to assign mmx reg to pseudo 88 but gcc with the

> > > patch assigns memory to it.  The cost of mmx to general reg move is 13,

> > > while overall cost of mmx to mem and mem to general moves is 10.  So IRA

> > > now chooses memory for pseudo 88 according to the minimal cost.

> > >

> > > Now, if we want still assign mmx reg to the pseudo 88, we should change

> > > the costs in machine-dependent x86 code.  But I think it might create

> > > other unexpected code generation.  As mmx is basically not used nowadays

> > > the test is not important, I just propose the following patch.

> > >

> > > Is it ok for the trunk?

>

> The generated code is correct, in i386.c, secondary_memory_needed, we have:

>

>   /* ??? This is a lie.  We do have moves between mmx/general, and for

>      mmx/sse2.  But by saying we need secondary memory we discourage the

>      register allocator from using the mmx registers unless needed.  */

>   if (MMX_CLASS_P (class1) != MMX_CLASS_P (class2))

>     return true;

>

> But I noticed that for -mtune=generic we still generate direct move.

> The cost of interunit MMX -> GR moves should always be higher than the

> cost of indirect move (due to secondary_memory_needed), c.f. the

> comment in ix86_register_move_cost:

>

>   /* In case we require secondary memory, compute cost of the store followed

>      by load.  In order to avoid bad register allocation choices, we need

>      for this to be *at least* as high as the symmetric MEMORY_MOVE_COST.  */

>

> IMO, the issue with -mtune=generic warrants some additional analysis.

> I'll look at target cost calculations.


It looks that there are some issues how RA handles costs.

For -mtune=intel, we have following costs:

MMX->GR = 13
MMX->mem = 6
mem->GR = 4

The direct move has higher cost, so indirect move is emitted (which is correct).

and for -mtune=generic, we have:

MMX->GR = 13
MMX->mem = 6
mem->GR = 6

Although the direct move has slightly higher cost, direct move is
emitted. According to the above comment in ix86_register_move_cost, we
do have *at least* as high cost for direct move as for indirect move
sequence (in fact, we will always have higher cost for direct move),
but RA still emits direct move.

While there are some target-dependent issues with handling of MMX regs
(corrected by attached patch), the patch doesn't change the generated
code. I suspect that since in failing -mtune=generic case we have
costs that are very close to each other (12 vs 13), that perhaps some
threshold in RA cost compares is off by one.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cef809f575fb..9c850678401c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -39528,7 +39528,7 @@ inline_memory_move_cost (machine_mode mode, enum reg_class regclass,
 	  default:
 	    return 100;
 	}
-      if (in)
+      if (in == 2)
         return MAX (ix86_cost->mmx_load [index], ix86_cost->mmx_store [index]);
       return in ? ix86_cost->mmx_load [index] : ix86_cost->mmx_store [index];
     }
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index e60b2296ab6b..66da7bcd7e5d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -208,9 +208,9 @@
 	   ]
 	   (const_string "DI")))
    (set (attr "preferred_for_speed")
-     (cond [(eq_attr "alternative" "10,15")
+     (cond [(eq_attr "alternative" "9,15")
 	      (symbol_ref "TARGET_INTER_UNIT_MOVES_FROM_VEC")
-	    (eq_attr "alternative" "11,16")
+	    (eq_attr "alternative" "10,16")
 	      (symbol_ref "TARGET_INTER_UNIT_MOVES_TO_VEC")
 	   ]
 	   (symbol_ref "true")))])

Patch

Index: testsuite/ChangeLog
===================================================================
--- testsuite/ChangeLog	(revision 266582)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,7 @@ 
+2018-11-28  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* gcc.target/i386/pr34256.c: Adjust the number of expected moves.
+
 2018-11-28  Marek Polacek  <polacek@redhat.com>
 
 	PR c++/88222 - ICE with bit-field with invalid type.
Index: testsuite/gcc.target/i386/pr34256.c
===================================================================
--- testsuite/gcc.target/i386/pr34256.c	(revision 266155)
+++ testsuite/gcc.target/i386/pr34256.c	(working copy)
@@ -10,5 +10,5 @@  unsigned long long  foo(__m64 m) {
   return _mm_cvtm64_si64(_mm_add_pi32(x, y));
 }
 
-/* { dg-final { scan-assembler-times "mov" 2 { target { nonpic || pie_enabled } } } } */
-/* { dg-final { scan-assembler-times "mov" 4 { target { { ! nonpic } && { ! pie_enabled } } } } } */
+/* { dg-final { scan-assembler-times "mov" 3 { target { nonpic || pie_enabled } } } } */
+/* { dg-final { scan-assembler-times "mov" 5 { target { { ! nonpic } && { ! pie_enabled } } } } } */