RFA: fix PR90553, IRA assigning a call-clobbered reg to call with post-increment

Message ID 201905210158.x4L1wcBm005908@ignucius.se.axis.com
State New
Headers show
Series
  • RFA: fix PR90553, IRA assigning a call-clobbered reg to call with post-increment
Related show

Commit Message

Hans-Peter Nilsson May 21, 2019, 1:58 a.m.
I was looking into why I couldn't trivially move cris-elf to
"use init_array".  It appeared that it wasn't the hooks into
that machinery that went wrong, but that a compiler bug is
plaguing __libc_init_array.  It's been there since at least
4.7-era, hiding under the covers of the __init_array being empty
(everything being in .init).

Looking into it, it seems that IRA is treating this insn:

(call_insn 16 14 17 4 (parallel [
            (call (mem:QI (mem/f:SI (post_inc:SI (reg:SI 27 [ ivtmp.7 ])) [1 MEM[base: _8, offset: 0B]+0 S4 A8]) [0 *_1 S1 A8])
                (const_int 0 [0]))
            (clobber (reg:SI 16 srp))
        ]) "t.c":7:5 220 {*expanded_call_non_v32}
     (expr_list:REG_INC (reg:SI 27 [ ivtmp.7 ])
        (expr_list:REG_CALL_DECL (nil)
            (nil)))
    (nil))

...as if the read-part of the post-increment happens before the
call, and the write-part to happen after the call, supposedly
with the value magically kept unclobbered or treated as some
kind of return-value.  Looking around, it seems only the VAX
port would also be affected; grepping around I see no other port
having a call instruction capable of loading a value indirectly,
with a side-effect.

Now, I'm ok with deliberately forbidding autoinc and other
side-effect constructs on call insns during register allocation
and will do the documentation legwork of that part (and the more
involved target-fixing) if there's consensus for that, but it
seems that for IRA the fix is as simple as follows.

LRA seems to have the same issue, but I have no way to reproduce
it there; I'll just have to watch out when I move the port to
LRA.  I don't know if reload is affected, but I believe
autoincdec doesn't count as an output reload.  (Please correct
me if I'm wrong!  An output-reload on a call insn is not
allowed, says a comment in find_reloads, but AFAICS that's still
undocumented.)

So, is this ok?  Regtested on cris-elf and x86_64-linux-gnu
(though the latter uses LRA).  Note that this does *not* cause
the return-value for f3 and f4 in the test-case to be allocated
a call-saved register after the value.

gcc:
	* lra-lives.c (process_bb_node_lives): Consider defs
	for a call insn to be die before the call, not after.


brgds, H-P

Comments

Vladimir Makarov May 21, 2019, 9:05 p.m. | #1
On 2019-05-20 9:58 p.m., Hans-Peter Nilsson wrote:
> I was looking into why I couldn't trivially move cris-elf to

> "use init_array".  It appeared that it wasn't the hooks into

> that machinery that went wrong, but that a compiler bug is

> plaguing __libc_init_array.  It's been there since at least

> 4.7-era, hiding under the covers of the __init_array being empty

> (everything being in .init).

>

> Looking into it, it seems that IRA is treating this insn:

>

> (call_insn 16 14 17 4 (parallel [

>              (call (mem:QI (mem/f:SI (post_inc:SI (reg:SI 27 [ ivtmp.7 ])) [1 MEM[base: _8, offset: 0B]+0 S4 A8]) [0 *_1 S1 A8])

>                  (const_int 0 [0]))

>              (clobber (reg:SI 16 srp))

>          ]) "t.c":7:5 220 {*expanded_call_non_v32}

>       (expr_list:REG_INC (reg:SI 27 [ ivtmp.7 ])

>          (expr_list:REG_CALL_DECL (nil)

>              (nil)))

>      (nil))

>

> ...as if the read-part of the post-increment happens before the

> call, and the write-part to happen after the call, supposedly

> with the value magically kept unclobbered or treated as some

> kind of return-value.  Looking around, it seems only the VAX

> port would also be affected; grepping around I see no other port

> having a call instruction capable of loading a value indirectly,

> with a side-effect.

>

> Now, I'm ok with deliberately forbidding autoinc and other

> side-effect constructs on call insns during register allocation

> and will do the documentation legwork of that part (and the more

> involved target-fixing) if there's consensus for that, but it

> seems that for IRA the fix is as simple as follows.

>

> LRA seems to have the same issue, but I have no way to reproduce

> it there; I'll just have to watch out when I move the port to

> LRA.  I don't know if reload is affected, but I believe

> autoincdec doesn't count as an output reload.  (Please correct

> me if I'm wrong!  An output-reload on a call insn is not

> allowed, says a comment in find_reloads, but AFAICS that's still

> undocumented.)

Probably this case was a reason to prohibit output reloads for calls.
> So, is this ok?  Regtested on cris-elf and x86_64-linux-gnu

> (though the latter uses LRA).  Note that this does *not* cause

> the return-value for f3 and f4 in the test-case to be allocated

> a call-saved register after the value.


Yes, the patch is ok to commit.  Thank you for working on the problem.

It is hard to reproduce the same problem in LRA as LRA mostly follows 
IRA decisions.

I'll probably do the analogous patch for LRA on this week.

>

> gcc:

> 	* lra-lives.c (process_bb_node_lives): Consider defs

It should be ira-lives.c here.
> 	for a call insn to be die before the call, not after.

Patch

--- gcc/ira-lives.c.orig	Fri Apr 19 03:22:15 2019
+++ gcc/ira-lives.c	Sun May 19 07:31:33 2019
@@ -1241,11 +1241,6 @@  process_bb_node_lives (ira_loop_tree_nod
 	  preprocess_constraints (insn);
 	  process_single_reg_class_operands (false, freq);
 
-	  /* See which defined values die here.  */
-	  FOR_EACH_INSN_DEF (def, insn)
-	    if (!call_p || !DF_REF_FLAGS_IS_SET (def, DF_REF_MAY_CLOBBER))
-	      mark_ref_dead (def);
-
 	  if (call_p)
 	    {
 	      /* Try to find a SET in the CALL_INSN_FUNCTION_USAGE, and from
@@ -1308,6 +1303,17 @@  process_bb_node_lives (ira_loop_tree_nod
 		    ALLOCNO_CHEAP_CALLS_CROSSED_NUM (a)++;
 		}
 	    }
+
+	  /* See which defined values die here.  Note that we include
+	     the call insn in the lifetimes of these values, so we don't
+	     mistakenly consider, for e.g. an addressing mode with a
+	     side-effect like a post-increment fetching the address,
+	     that the use happens before the call, and the def to happen
+	     after the call: we believe both to happen before the actual
+	     call.  (We don't handle return-values here.) */
+	  FOR_EACH_INSN_DEF (def, insn)
+	    if (!call_p || !DF_REF_FLAGS_IS_SET (def, DF_REF_MAY_CLOBBER))
+	      mark_ref_dead (def);
 
 	  make_early_clobber_and_input_conflicts ();

For the test-suite, a few variants on the same theme, where the
functions called are found to clobber the pre-patch chosen
call-clobbered register:

gcc/testsuite:
	* gcc.dg/torture/pr90553.c: New test.

--- /dev/null	Tue Oct 29 15:57:07 2002
+++ gcc/testsuite/gcc.dg/torture/pr90553.c	Mon May 20 05:26:43 2019
@@ -0,0 +1,128 @@ 
+/* { dg-do run } */
+
+__attribute__((__noipa__))
+void f1(int x, void (*p1 []) (int, int))
+{
+  int i;
+  for (i = 0; i < x; i++)
+    p1[i](42, 666);
+}
+
+int z1_called = 0;
+int w1_called = 0;
+
+__attribute__((__noipa__))
+void z1(int a, int b)
+{
+  if (w1_called || z1_called)
+    __builtin_abort();
+  z1_called++;
+}
+
+__attribute__((__noipa__))
+void w1(int a, int b)
+{
+  if (w1_called || !z1_called)
+    __builtin_abort();
+  w1_called++;
+}
+
+int z2_called = 0;
+int w2_called = 0;
+
+__attribute__((__noipa__))
+void z2(void)
+{
+  if (w2_called || z2_called)
+    __builtin_abort();
+  z2_called++;
+}
+
+__attribute__((__noipa__))
+void w2(void)
+{
+  if (w2_called || !z2_called)
+    __builtin_abort();
+  w2_called++;
+}
+
+void (*p2 []) () = { w2, z2 };
+
+__attribute__((__noipa__))
+void f2(int x)
+{
+  void (**q) (void) = p2 + x;
+  int i;
+  for (i = 0; i < x; i++)
+    (*(--q))();
+}
+
+__attribute__((__noipa__))
+void f3(int x, int (*p3 []) (int))
+{
+  int i;
+  int next = x;
+  for (i = 0; i < x; i++)
+    next = p3[i](next);
+}
+
+int z3_called = 0;
+int w3_called = 0;
+
+__attribute__((__noipa__))
+int z3(int a)
+{
+  if (w3_called || z3_called || a != 2)
+    __builtin_abort();
+  z3_called++;
+  return 42;
+}
+
+__attribute__((__noipa__))
+int w3(int a)
+{
+  if (w3_called || !z3_called || a != 42)
+    __builtin_abort();
+  w3_called++;
+  return 4096;
+}
+
+int (*p4 []) (int) = { z3, w3 };
+
+__attribute__((__noipa__))
+void f4(int x)
+{
+  int (**q) (int) = p4;
+  int (**r) (int) = p4 + x;
+
+  int next = x;
+  for (; q < r; q++)
+    next = (*q)(next);
+}
+
+int main(void)
+{
+  static int (*p3 []) (int) = { z3, w3 };
+
+  static void (*p1 []) (int, int) = { z1, w1 };
+
+  f1(2, p1);
+  if (z1_called != 1 || w1_called != 1)
+    __builtin_abort();
+
+  f2(2);
+  if (z2_called != 1 || w2_called != 1)
+    __builtin_abort();
+
+  f3(2, p3);
+  if (z3_called != 1 || w3_called != 1)
+    __builtin_abort();
+
+  z3_called = 0;
+  w3_called = 0;
+  f4(2);
+  if (z3_called != 1 || w3_called != 1)
+    __builtin_abort();
+
+  __builtin_exit(0);
+}