[ARM,PR,target/84826] Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabi

Message ID 65a280a5-574c-05bc-39db-7a5289b09492@arm.com
State Superseded
Headers show
Series
  • [ARM,PR,target/84826] Fix ICE in extract_insn, at recog.c:2304 on arm-linux-gnueabi
Related show

Commit Message

Sudakshina Das March 21, 2018, 5:44 p.m.
Hi

The ICE in the bug report was happening because the macro
USE_RETURN_INSN (FALSE) was returning different values at different
points in the compilation. This was internally occurring because the
function arm_compute_static_chain_stack_bytes () which was dependent on
arm_r3_live_at_start_p () was giving a different value after the
cond_exec instructions were created in ce3 causing the liveness of r3
to escape up to the start block.

The function arm_compute_static_chain_stack_bytes () should really
only compute the value once during epilogue/prologue stage. This pass
introduces a new member 'static_chain_stack_bytes' to the target
definition of the struct machine_function which gets calculated in
expand_prologue and is the value that is returned by
arm_compute_static_chain_stack_bytes () beyond that.

Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf
and added the reported test to the testsuite.

Is this ok for trunk?

Sudi


ChangeLog entries:

*** gcc/ChangeLog ***

2018-03-21  Sudakshina Das  <sudi.das@arm.com>

         PR target/84826
         * config/arm/arm.h (machine_function): Add
         static_chain_stack_bytes.
         * config/arm/arm.c (arm_compute_static_chain_stack_bytes):
         Avoid re-computing once computed.
         (arm_expand_prologue): Compute machine->static_chain_stack_bytes.
         (arm_init_machine_status): Initialize
         machine->static_chain_stack_bytes.

*** gcc/testsuite/ChangeLog ***

2018-03-21  Sudakshina Das  <sudi.das@arm.com>

         PR target/84826
         * gcc.target/arm/pr84826.c: New test

Comments

Kyrill Tkachov March 22, 2018, 4:08 p.m. | #1
Hi Sudi,

On 21/03/18 17:44, Sudakshina Das wrote:
> Hi

>

> The ICE in the bug report was happening because the macro

> USE_RETURN_INSN (FALSE) was returning different values at different

> points in the compilation. This was internally occurring because the

> function arm_compute_static_chain_stack_bytes () which was dependent on

> arm_r3_live_at_start_p () was giving a different value after the

> cond_exec instructions were created in ce3 causing the liveness of r3

> to escape up to the start block.

>

> The function arm_compute_static_chain_stack_bytes () should really

> only compute the value once during epilogue/prologue stage. This pass

> introduces a new member 'static_chain_stack_bytes' to the target

> definition of the struct machine_function which gets calculated in

> expand_prologue and is the value that is returned by

> arm_compute_static_chain_stack_bytes () beyond that.

>

> Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf

> and added the reported test to the testsuite.

>

> Is this ok for trunk?

>


Thanks for working on this.
I agree with the approach, I have a couple of comments inline.

> Sudi

>

>

> ChangeLog entries:

>

> *** gcc/ChangeLog ***

>

> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>

>         PR target/84826

>         * config/arm/arm.h (machine_function): Add

>         static_chain_stack_bytes.

>         * config/arm/arm.c (arm_compute_static_chain_stack_bytes):

>         Avoid re-computing once computed.

>         (arm_expand_prologue): Compute machine->static_chain_stack_bytes.

>         (arm_init_machine_status): Initialize

>         machine->static_chain_stack_bytes.

>

> *** gcc/testsuite/ChangeLog ***

>

> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>

>         PR target/84826

>         * gcc.target/arm/pr84826.c: New test


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index bbf3937..2809112 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function
    machine_mode thumb1_cc_mode;
    /* Set to 1 after arm_reorg has started.  */
    int after_arm_reorg;
+  /* The number of bytes used to store the static chain register on the
+     stack, above the stack frame.  */
+  int static_chain_stack_bytes;
  }
  machine_function;
  #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cb6ab81..bc31810 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)
  static int
  arm_compute_static_chain_stack_bytes (void)
  {
+  /* Once the value is updated from the init value of -1, do not
+     re-compute.  */
+  if (cfun->machine->static_chain_stack_bytes != -1)
+    return cfun->machine->static_chain_stack_bytes;
+

My concern is that this approach caches the first value that is computed for static_chain_stack_bytes.
I believe the layout frame code is called multiple times during register allocation as it goes through
the motions and I think we want the last value it computes during reload

How about we do something like:
   if (cfun->machine->static_chain_stack_bytes != -1 &&epilogue_completed)
     return cfun->machine->static_chain_stack_bytes;

?...

  /* See the defining assertion in arm_expand_prologue.  */
    if (IS_NESTED (arm_current_func_type ())
        && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)
@@ -21699,6 +21704,11 @@ arm_expand_prologue (void)
        emit_insn (gen_movsi (stack_pointer_rtx, r1));
      }
  
+  /* Let's compute the static_chain_stack_bytes required and store it.  Right
+     now the value must the -1 as stored by arm_init_machine_status ().  */

... this comment would need to be tweaked as cfun->machine->static_chain_stack_bytes may hold
an intermediate value computed in reload or some other point before epilogue_completed.
  
+  cfun->machine->static_chain_stack_bytes
+    = arm_compute_static_chain_stack_bytes ();
+


Thanks,
Kyrill
Sudakshina Das March 22, 2018, 4:20 p.m. | #2
Hi Kyrill

On 22/03/18 16:08, Kyrill Tkachov wrote:
> Hi Sudi,

> 

> On 21/03/18 17:44, Sudakshina Das wrote:

>> Hi

>>

>> The ICE in the bug report was happening because the macro

>> USE_RETURN_INSN (FALSE) was returning different values at different

>> points in the compilation. This was internally occurring because the

>> function arm_compute_static_chain_stack_bytes () which was dependent on

>> arm_r3_live_at_start_p () was giving a different value after the

>> cond_exec instructions were created in ce3 causing the liveness of r3

>> to escape up to the start block.

>>

>> The function arm_compute_static_chain_stack_bytes () should really

>> only compute the value once during epilogue/prologue stage. This pass

>> introduces a new member 'static_chain_stack_bytes' to the target

>> definition of the struct machine_function which gets calculated in

>> expand_prologue and is the value that is returned by

>> arm_compute_static_chain_stack_bytes () beyond that.

>>

>> Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf

>> and added the reported test to the testsuite.

>>

>> Is this ok for trunk?

>>

> 

> Thanks for working on this.

> I agree with the approach, I have a couple of comments inline.

> 

>> Sudi

>>

>>

>> ChangeLog entries:

>>

>> *** gcc/ChangeLog ***

>>

>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>

>>         PR target/84826

>>         * config/arm/arm.h (machine_function): Add

>>         static_chain_stack_bytes.

>>         * config/arm/arm.c (arm_compute_static_chain_stack_bytes):

>>         Avoid re-computing once computed.

>>         (arm_expand_prologue): Compute machine->static_chain_stack_bytes.

>>         (arm_init_machine_status): Initialize

>>         machine->static_chain_stack_bytes.

>>

>> *** gcc/testsuite/ChangeLog ***

>>

>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>

>>         PR target/84826

>>         * gcc.target/arm/pr84826.c: New test

> 

> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h

> index bbf3937..2809112 100644

> --- a/gcc/config/arm/arm.h

> +++ b/gcc/config/arm/arm.h

> @@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function

>     machine_mode thumb1_cc_mode;

>     /* Set to 1 after arm_reorg has started.  */

>     int after_arm_reorg;

> +  /* The number of bytes used to store the static chain register on the

> +     stack, above the stack frame.  */

> +  int static_chain_stack_bytes;

>   }

>   machine_function;

>   #endif

> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

> index cb6ab81..bc31810 100644

> --- a/gcc/config/arm/arm.c

> +++ b/gcc/config/arm/arm.c

> @@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)

>   static int

>   arm_compute_static_chain_stack_bytes (void)

>   {

> +  /* Once the value is updated from the init value of -1, do not

> +     re-compute.  */

> +  if (cfun->machine->static_chain_stack_bytes != -1)

> +    return cfun->machine->static_chain_stack_bytes;

> +

> 

> My concern is that this approach caches the first value that is computed 

> for static_chain_stack_bytes.

> I believe the layout frame code is called multiple times during register 

> allocation as it goes through

> the motions and I think we want the last value it computes during reload

> 

> How about we do something like:

>    if (cfun->machine->static_chain_stack_bytes != -1 &&epilogue_completed)

>      return cfun->machine->static_chain_stack_bytes;

> 

> ?...

> 

>   /* See the defining assertion in arm_expand_prologue.  */

>     if (IS_NESTED (arm_current_func_type ())

>         && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)

> @@ -21699,6 +21704,11 @@ arm_expand_prologue (void)

>         emit_insn (gen_movsi (stack_pointer_rtx, r1));

>       }

> 

> +  /* Let's compute the static_chain_stack_bytes required and store it.  

> Right

> +     now the value must the -1 as stored by arm_init_machine_status 

> ().  */

> 

> ... this comment would need to be tweaked as 

> cfun->machine->static_chain_stack_bytes may hold

> an intermediate value computed in reload or some other point before 

> epilogue_completed.

> 

> +  cfun->machine->static_chain_stack_bytes

> +    = arm_compute_static_chain_stack_bytes ();

> +


Maybe I did not understand this completely, but my idea was that I am 
initializing the value of cfun->machine->static_chain_stack_bytes to be 
-1 in arm_init_machine_status () and then it stays as -1 all throughout 
reload and hence the function arm_compute_static_chain_stack_bytes () 
will keep computing the value instead of returning the cached value. 
Only during expand_prologue (which I assumed occurs much after reload), 
I overwrite the initial -1 and after that any call to 
arm_compute_static_chain_stack_bytes () would return this cached value.

I did start out writing the patch with a epilogue_completed check but 
realized that even during this stage 
arm_compute_static_chain_stack_bytes () was called several times and 
thus to avoid those re-computations, (again assuming by this stage we 
already should have a fixed value) I re-wrote it with the initialization 
to -1 approach.

Thanks
Sudi

> 

> 

> Thanks,

> Kyrill

>
Kyrill Tkachov March 22, 2018, 4:52 p.m. | #3
On 22/03/18 16:20, Sudakshina Das wrote:
> Hi Kyrill

>

> On 22/03/18 16:08, Kyrill Tkachov wrote:

>> Hi Sudi,

>>

>> On 21/03/18 17:44, Sudakshina Das wrote:

>>> Hi

>>>

>>> The ICE in the bug report was happening because the macro

>>> USE_RETURN_INSN (FALSE) was returning different values at different

>>> points in the compilation. This was internally occurring because the

>>> function arm_compute_static_chain_stack_bytes () which was dependent on

>>> arm_r3_live_at_start_p () was giving a different value after the

>>> cond_exec instructions were created in ce3 causing the liveness of r3

>>> to escape up to the start block.

>>>

>>> The function arm_compute_static_chain_stack_bytes () should really

>>> only compute the value once during epilogue/prologue stage. This pass

>>> introduces a new member 'static_chain_stack_bytes' to the target

>>> definition of the struct machine_function which gets calculated in

>>> expand_prologue and is the value that is returned by

>>> arm_compute_static_chain_stack_bytes () beyond that.

>>>

>>> Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf

>>> and added the reported test to the testsuite.

>>>

>>> Is this ok for trunk?

>>>

>>

>> Thanks for working on this.

>> I agree with the approach, I have a couple of comments inline.

>>

>>> Sudi

>>>

>>>

>>> ChangeLog entries:

>>>

>>> *** gcc/ChangeLog ***

>>>

>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>

>>>         PR target/84826

>>>         * config/arm/arm.h (machine_function): Add

>>>         static_chain_stack_bytes.

>>>         * config/arm/arm.c (arm_compute_static_chain_stack_bytes):

>>>         Avoid re-computing once computed.

>>>         (arm_expand_prologue): Compute machine->static_chain_stack_bytes.

>>>         (arm_init_machine_status): Initialize

>>>         machine->static_chain_stack_bytes.

>>>

>>> *** gcc/testsuite/ChangeLog ***

>>>

>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>

>>>         PR target/84826

>>>         * gcc.target/arm/pr84826.c: New test

>>

>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h

>> index bbf3937..2809112 100644

>> --- a/gcc/config/arm/arm.h

>> +++ b/gcc/config/arm/arm.h

>> @@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function

>>     machine_mode thumb1_cc_mode;

>>     /* Set to 1 after arm_reorg has started.  */

>>     int after_arm_reorg;

>> +  /* The number of bytes used to store the static chain register on the

>> +     stack, above the stack frame.  */

>> +  int static_chain_stack_bytes;

>>   }

>>   machine_function;

>>   #endif

>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

>> index cb6ab81..bc31810 100644

>> --- a/gcc/config/arm/arm.c

>> +++ b/gcc/config/arm/arm.c

>> @@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)

>>   static int

>>   arm_compute_static_chain_stack_bytes (void)

>>   {

>> +  /* Once the value is updated from the init value of -1, do not

>> +     re-compute.  */

>> +  if (cfun->machine->static_chain_stack_bytes != -1)

>> +    return cfun->machine->static_chain_stack_bytes;

>> +

>>

>> My concern is that this approach caches the first value that is computed for static_chain_stack_bytes.

>> I believe the layout frame code is called multiple times during register allocation as it goes through

>> the motions and I think we want the last value it computes during reload

>>

>> How about we do something like:

>>    if (cfun->machine->static_chain_stack_bytes != -1 &&epilogue_completed)

>>      return cfun->machine->static_chain_stack_bytes;

>>

>> ?...

>>

>>   /* See the defining assertion in arm_expand_prologue.  */

>>     if (IS_NESTED (arm_current_func_type ())

>>         && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)

>> @@ -21699,6 +21704,11 @@ arm_expand_prologue (void)

>>         emit_insn (gen_movsi (stack_pointer_rtx, r1));

>>       }

>>

>> +  /* Let's compute the static_chain_stack_bytes required and store it.  Right

>> +     now the value must the -1 as stored by arm_init_machine_status ().  */

>>

>> ... this comment would need to be tweaked as cfun->machine->static_chain_stack_bytes may hold

>> an intermediate value computed in reload or some other point before epilogue_completed.

>>

>> +  cfun->machine->static_chain_stack_bytes

>> +    = arm_compute_static_chain_stack_bytes ();

>> +

>

> Maybe I did not understand this completely, but my idea was that I am initializing the value of cfun->machine->static_chain_stack_bytes to be -1 in arm_init_machine_status () and then it stays as -1 all throughout reload and hence the function arm_compute_static_chain_stack_bytes () will keep computing the value instead of returning the cached value. Only during expand_prologue (which I assumed occurs much after reload), I overwrite the initial -1 and after that any call to arm_compute_static_chain_stack_bytes () would return this cached value.

>

> I did start out writing the patch with a epilogue_completed check but realized that even during this stage arm_compute_static_chain_stack_bytes () was called several times and thus to avoid those re-computations, (again assuming by this stage we already should have a fixed value) I re-wrote it with the initialization to -1 approach.

>


Right, I had read the patch too quickly, sorry.
You perform the caching of cfun->machine->static_chain_stack_bytes in arm_expand_prologue, not arm_compute_static_chain_stack_bytes.
That does give you the right semantics I think.

The patch is ok then with a small typo fix:

+  /* Let's compute the static_chain_stack_bytes required and store it.  Right
+     now the value must the -1 as stored by arm_init_machine_status ().  */

s/the/be/

+  cfun->machine->static_chain_stack_bytes
+    = arm_compute_static_chain_stack_bytes ();
+


Thanks,
Kyrill
Sudakshina Das March 22, 2018, 5:26 p.m. | #4
Hi

On 22/03/18 16:52, Kyrill Tkachov wrote:
> 

> On 22/03/18 16:20, Sudakshina Das wrote:

>> Hi Kyrill

>>

>> On 22/03/18 16:08, Kyrill Tkachov wrote:

>>> Hi Sudi,

>>>

>>> On 21/03/18 17:44, Sudakshina Das wrote:

>>>> Hi

>>>>

>>>> The ICE in the bug report was happening because the macro

>>>> USE_RETURN_INSN (FALSE) was returning different values at different

>>>> points in the compilation. This was internally occurring because the

>>>> function arm_compute_static_chain_stack_bytes () which was dependent on

>>>> arm_r3_live_at_start_p () was giving a different value after the

>>>> cond_exec instructions were created in ce3 causing the liveness of r3

>>>> to escape up to the start block.

>>>>

>>>> The function arm_compute_static_chain_stack_bytes () should really

>>>> only compute the value once during epilogue/prologue stage. This pass

>>>> introduces a new member 'static_chain_stack_bytes' to the target

>>>> definition of the struct machine_function which gets calculated in

>>>> expand_prologue and is the value that is returned by

>>>> arm_compute_static_chain_stack_bytes () beyond that.

>>>>

>>>> Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf

>>>> and added the reported test to the testsuite.

>>>>

>>>> Is this ok for trunk?

>>>>

>>>

>>> Thanks for working on this.

>>> I agree with the approach, I have a couple of comments inline.

>>>

>>>> Sudi

>>>>

>>>>

>>>> ChangeLog entries:

>>>>

>>>> *** gcc/ChangeLog ***

>>>>

>>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>>

>>>>         PR target/84826

>>>>         * config/arm/arm.h (machine_function): Add

>>>>         static_chain_stack_bytes.

>>>>         * config/arm/arm.c (arm_compute_static_chain_stack_bytes):

>>>>         Avoid re-computing once computed.

>>>>         (arm_expand_prologue): Compute 

>>>> machine->static_chain_stack_bytes.

>>>>         (arm_init_machine_status): Initialize

>>>>         machine->static_chain_stack_bytes.

>>>>

>>>> *** gcc/testsuite/ChangeLog ***

>>>>

>>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>>

>>>>         PR target/84826

>>>>         * gcc.target/arm/pr84826.c: New test

>>>

>>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h

>>> index bbf3937..2809112 100644

>>> --- a/gcc/config/arm/arm.h

>>> +++ b/gcc/config/arm/arm.h

>>> @@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function

>>>     machine_mode thumb1_cc_mode;

>>>     /* Set to 1 after arm_reorg has started.  */

>>>     int after_arm_reorg;

>>> +  /* The number of bytes used to store the static chain register on the

>>> +     stack, above the stack frame.  */

>>> +  int static_chain_stack_bytes;

>>>   }

>>>   machine_function;

>>>   #endif

>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

>>> index cb6ab81..bc31810 100644

>>> --- a/gcc/config/arm/arm.c

>>> +++ b/gcc/config/arm/arm.c

>>> @@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)

>>>   static int

>>>   arm_compute_static_chain_stack_bytes (void)

>>>   {

>>> +  /* Once the value is updated from the init value of -1, do not

>>> +     re-compute.  */

>>> +  if (cfun->machine->static_chain_stack_bytes != -1)

>>> +    return cfun->machine->static_chain_stack_bytes;

>>> +

>>>

>>> My concern is that this approach caches the first value that is 

>>> computed for static_chain_stack_bytes.

>>> I believe the layout frame code is called multiple times during 

>>> register allocation as it goes through

>>> the motions and I think we want the last value it computes during reload

>>>

>>> How about we do something like:

>>>    if (cfun->machine->static_chain_stack_bytes != -1 

>>> &&epilogue_completed)

>>>      return cfun->machine->static_chain_stack_bytes;

>>>

>>> ?...

>>>

>>>   /* See the defining assertion in arm_expand_prologue.  */

>>>     if (IS_NESTED (arm_current_func_type ())

>>>         && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)

>>> @@ -21699,6 +21704,11 @@ arm_expand_prologue (void)

>>>         emit_insn (gen_movsi (stack_pointer_rtx, r1));

>>>       }

>>>

>>> +  /* Let's compute the static_chain_stack_bytes required and store 

>>> it.  Right

>>> +     now the value must the -1 as stored by arm_init_machine_status 

>>> ().  */

>>>

>>> ... this comment would need to be tweaked as 

>>> cfun->machine->static_chain_stack_bytes may hold

>>> an intermediate value computed in reload or some other point before 

>>> epilogue_completed.

>>>

>>> +  cfun->machine->static_chain_stack_bytes

>>> +    = arm_compute_static_chain_stack_bytes ();

>>> +

>>

>> Maybe I did not understand this completely, but my idea was that I am 

>> initializing the value of cfun->machine->static_chain_stack_bytes to 

>> be -1 in arm_init_machine_status () and then it stays as -1 all 

>> throughout reload and hence the function 

>> arm_compute_static_chain_stack_bytes () will keep computing the value 

>> instead of returning the cached value. Only during expand_prologue 

>> (which I assumed occurs much after reload), I overwrite the initial -1 

>> and after that any call to arm_compute_static_chain_stack_bytes () 

>> would return this cached value.

>>

>> I did start out writing the patch with a epilogue_completed check but 

>> realized that even during this stage 

>> arm_compute_static_chain_stack_bytes () was called several times and 

>> thus to avoid those re-computations, (again assuming by this stage we 

>> already should have a fixed value) I re-wrote it with the 

>> initialization to -1 approach.

>>

> 

> Right, I had read the patch too quickly, sorry.

> You perform the caching of cfun->machine->static_chain_stack_bytes in 

> arm_expand_prologue, not arm_compute_static_chain_stack_bytes.

> That does give you the right semantics I think.

> 

> The patch is ok then with a small typo fix:


Thanks! Committed as r258777.

Sudi

> 

> +  /* Let's compute the static_chain_stack_bytes required and store it.  

> Right

> +     now the value must the -1 as stored by arm_init_machine_status 

> ().  */

> 

> s/the/be/

> 

> +  cfun->machine->static_chain_stack_bytes

> +    = arm_compute_static_chain_stack_bytes ();

> +

> 

> 

> Thanks,

> Kyrill

>
Christophe Lyon March 23, 2018, 8:47 a.m. | #5
Hi Sudi,


On 22 March 2018 at 18:26, Sudakshina Das <sudi.das@arm.com> wrote:
> Hi

>

>

> On 22/03/18 16:52, Kyrill Tkachov wrote:

>>

>>

>> On 22/03/18 16:20, Sudakshina Das wrote:

>>>

>>> Hi Kyrill

>>>

>>> On 22/03/18 16:08, Kyrill Tkachov wrote:

>>>>

>>>> Hi Sudi,

>>>>

>>>> On 21/03/18 17:44, Sudakshina Das wrote:

>>>>>

>>>>> Hi

>>>>>

>>>>> The ICE in the bug report was happening because the macro

>>>>> USE_RETURN_INSN (FALSE) was returning different values at different

>>>>> points in the compilation. This was internally occurring because the

>>>>> function arm_compute_static_chain_stack_bytes () which was dependent on

>>>>> arm_r3_live_at_start_p () was giving a different value after the

>>>>> cond_exec instructions were created in ce3 causing the liveness of r3

>>>>> to escape up to the start block.

>>>>>

>>>>> The function arm_compute_static_chain_stack_bytes () should really

>>>>> only compute the value once during epilogue/prologue stage. This pass

>>>>> introduces a new member 'static_chain_stack_bytes' to the target

>>>>> definition of the struct machine_function which gets calculated in

>>>>> expand_prologue and is the value that is returned by

>>>>> arm_compute_static_chain_stack_bytes () beyond that.

>>>>>

>>>>> Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf

>>>>> and added the reported test to the testsuite.

>>>>>

>>>>> Is this ok for trunk?

>>>>>

>>>>

>>>> Thanks for working on this.

>>>> I agree with the approach, I have a couple of comments inline.

>>>>

>>>>> Sudi

>>>>>

>>>>>

>>>>> ChangeLog entries:

>>>>>

>>>>> *** gcc/ChangeLog ***

>>>>>

>>>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>>>

>>>>>         PR target/84826

>>>>>         * config/arm/arm.h (machine_function): Add

>>>>>         static_chain_stack_bytes.

>>>>>         * config/arm/arm.c (arm_compute_static_chain_stack_bytes):

>>>>>         Avoid re-computing once computed.

>>>>>         (arm_expand_prologue): Compute

>>>>> machine->static_chain_stack_bytes.

>>>>>         (arm_init_machine_status): Initialize

>>>>>         machine->static_chain_stack_bytes.

>>>>>

>>>>> *** gcc/testsuite/ChangeLog ***

>>>>>

>>>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>>>

>>>>>         PR target/84826

>>>>>         * gcc.target/arm/pr84826.c: New test

>>>>

The new test fails on
arm-none-linux-gnueabi
--with-mode thumb
--with-cpu cortex-a9
--with-fpu default
Dejagnu flags: -march=armv5t

Because:
/gcc/testsuite/gcc.target/arm/pr84826.c: In function 'a':
/gcc/testsuite/gcc.target/arm/pr84826.c:15:1: sorry, unimplemented:
-fstack-check=specific for Thumb-1
compiler exited with status 1
FAIL: gcc.target/arm/pr84826.c (test for excess errors)

You probably have to add a require-effective-target to skip the test
in such cases.

Thanks,

Christophe

>>>>

>>>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h

>>>> index bbf3937..2809112 100644

>>>> --- a/gcc/config/arm/arm.h

>>>> +++ b/gcc/config/arm/arm.h

>>>> @@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function

>>>>     machine_mode thumb1_cc_mode;

>>>>     /* Set to 1 after arm_reorg has started.  */

>>>>     int after_arm_reorg;

>>>> +  /* The number of bytes used to store the static chain register on the

>>>> +     stack, above the stack frame.  */

>>>> +  int static_chain_stack_bytes;

>>>>   }

>>>>   machine_function;

>>>>   #endif

>>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

>>>> index cb6ab81..bc31810 100644

>>>> --- a/gcc/config/arm/arm.c

>>>> +++ b/gcc/config/arm/arm.c

>>>> @@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)

>>>>   static int

>>>>   arm_compute_static_chain_stack_bytes (void)

>>>>   {

>>>> +  /* Once the value is updated from the init value of -1, do not

>>>> +     re-compute.  */

>>>> +  if (cfun->machine->static_chain_stack_bytes != -1)

>>>> +    return cfun->machine->static_chain_stack_bytes;

>>>> +

>>>>

>>>> My concern is that this approach caches the first value that is computed

>>>> for static_chain_stack_bytes.

>>>> I believe the layout frame code is called multiple times during register

>>>> allocation as it goes through

>>>> the motions and I think we want the last value it computes during reload

>>>>

>>>> How about we do something like:

>>>>    if (cfun->machine->static_chain_stack_bytes != -1

>>>> &&epilogue_completed)

>>>>      return cfun->machine->static_chain_stack_bytes;

>>>>

>>>> ?...

>>>>

>>>>   /* See the defining assertion in arm_expand_prologue.  */

>>>>     if (IS_NESTED (arm_current_func_type ())

>>>>         && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)

>>>> @@ -21699,6 +21704,11 @@ arm_expand_prologue (void)

>>>>         emit_insn (gen_movsi (stack_pointer_rtx, r1));

>>>>       }

>>>>

>>>> +  /* Let's compute the static_chain_stack_bytes required and store it.

>>>> Right

>>>> +     now the value must the -1 as stored by arm_init_machine_status ().

>>>> */

>>>>

>>>> ... this comment would need to be tweaked as

>>>> cfun->machine->static_chain_stack_bytes may hold

>>>> an intermediate value computed in reload or some other point before

>>>> epilogue_completed.

>>>>

>>>> +  cfun->machine->static_chain_stack_bytes

>>>> +    = arm_compute_static_chain_stack_bytes ();

>>>> +

>>>

>>>

>>> Maybe I did not understand this completely, but my idea was that I am

>>> initializing the value of cfun->machine->static_chain_stack_bytes to be -1

>>> in arm_init_machine_status () and then it stays as -1 all throughout reload

>>> and hence the function arm_compute_static_chain_stack_bytes () will keep

>>> computing the value instead of returning the cached value. Only during

>>> expand_prologue (which I assumed occurs much after reload), I overwrite the

>>> initial -1 and after that any call to arm_compute_static_chain_stack_bytes

>>> () would return this cached value.

>>>

>>> I did start out writing the patch with a epilogue_completed check but

>>> realized that even during this stage arm_compute_static_chain_stack_bytes ()

>>> was called several times and thus to avoid those re-computations, (again

>>> assuming by this stage we already should have a fixed value) I re-wrote it

>>> with the initialization to -1 approach.

>>>

>>

>> Right, I had read the patch too quickly, sorry.

>> You perform the caching of cfun->machine->static_chain_stack_bytes in

>> arm_expand_prologue, not arm_compute_static_chain_stack_bytes.

>> That does give you the right semantics I think.

>>

>> The patch is ok then with a small typo fix:

>

>

> Thanks! Committed as r258777.

>

> Sudi

>

>

>>

>> +  /* Let's compute the static_chain_stack_bytes required and store it.

>> Right

>> +     now the value must the -1 as stored by arm_init_machine_status ().

>> */

>>

>> s/the/be/

>>

>> +  cfun->machine->static_chain_stack_bytes

>> +    = arm_compute_static_chain_stack_bytes ();

>> +

>>

>>

>> Thanks,

>> Kyrill

>>

>
Kyrill Tkachov March 23, 2018, 9:12 a.m. | #6
On 23/03/18 08:47, Christophe Lyon wrote:
> Hi Sudi,

>

>

> On 22 March 2018 at 18:26, Sudakshina Das <sudi.das@arm.com> wrote:

>> Hi

>>

>>

>> On 22/03/18 16:52, Kyrill Tkachov wrote:

>>>

>>> On 22/03/18 16:20, Sudakshina Das wrote:

>>>> Hi Kyrill

>>>>

>>>> On 22/03/18 16:08, Kyrill Tkachov wrote:

>>>>> Hi Sudi,

>>>>>

>>>>> On 21/03/18 17:44, Sudakshina Das wrote:

>>>>>> Hi

>>>>>>

>>>>>> The ICE in the bug report was happening because the macro

>>>>>> USE_RETURN_INSN (FALSE) was returning different values at different

>>>>>> points in the compilation. This was internally occurring because the

>>>>>> function arm_compute_static_chain_stack_bytes () which was dependent on

>>>>>> arm_r3_live_at_start_p () was giving a different value after the

>>>>>> cond_exec instructions were created in ce3 causing the liveness of r3

>>>>>> to escape up to the start block.

>>>>>>

>>>>>> The function arm_compute_static_chain_stack_bytes () should really

>>>>>> only compute the value once during epilogue/prologue stage. This pass

>>>>>> introduces a new member 'static_chain_stack_bytes' to the target

>>>>>> definition of the struct machine_function which gets calculated in

>>>>>> expand_prologue and is the value that is returned by

>>>>>> arm_compute_static_chain_stack_bytes () beyond that.

>>>>>>

>>>>>> Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf

>>>>>> and added the reported test to the testsuite.

>>>>>>

>>>>>> Is this ok for trunk?

>>>>>>

>>>>> Thanks for working on this.

>>>>> I agree with the approach, I have a couple of comments inline.

>>>>>

>>>>>> Sudi

>>>>>>

>>>>>>

>>>>>> ChangeLog entries:

>>>>>>

>>>>>> *** gcc/ChangeLog ***

>>>>>>

>>>>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>>>>

>>>>>>          PR target/84826

>>>>>>          * config/arm/arm.h (machine_function): Add

>>>>>>          static_chain_stack_bytes.

>>>>>>          * config/arm/arm.c (arm_compute_static_chain_stack_bytes):

>>>>>>          Avoid re-computing once computed.

>>>>>>          (arm_expand_prologue): Compute

>>>>>> machine->static_chain_stack_bytes.

>>>>>>          (arm_init_machine_status): Initialize

>>>>>>          machine->static_chain_stack_bytes.

>>>>>>

>>>>>> *** gcc/testsuite/ChangeLog ***

>>>>>>

>>>>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>>>>

>>>>>>          PR target/84826

>>>>>>          * gcc.target/arm/pr84826.c: New test

> The new test fails on

> arm-none-linux-gnueabi

> --with-mode thumb

> --with-cpu cortex-a9

> --with-fpu default

> Dejagnu flags: -march=armv5t

>

> Because:

> /gcc/testsuite/gcc.target/arm/pr84826.c: In function 'a':

> /gcc/testsuite/gcc.target/arm/pr84826.c:15:1: sorry, unimplemented:

> -fstack-check=specific for Thumb-1

> compiler exited with status 1

> FAIL: gcc.target/arm/pr84826.c (test for excess errors)

>

> You probably have to add a require-effective-target to skip the test

> in such cases.


Yeah, these tests need a
{ dg-require-effective-target supports_stack_clash_protection }

A patch to add that is pre-approved.
Sorry for missing it in review.

Kyrill

> Thanks,

>

> Christophe

>

>>>>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h

>>>>> index bbf3937..2809112 100644

>>>>> --- a/gcc/config/arm/arm.h

>>>>> +++ b/gcc/config/arm/arm.h

>>>>> @@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function

>>>>>      machine_mode thumb1_cc_mode;

>>>>>      /* Set to 1 after arm_reorg has started.  */

>>>>>      int after_arm_reorg;

>>>>> +  /* The number of bytes used to store the static chain register on the

>>>>> +     stack, above the stack frame.  */

>>>>> +  int static_chain_stack_bytes;

>>>>>    }

>>>>>    machine_function;

>>>>>    #endif

>>>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

>>>>> index cb6ab81..bc31810 100644

>>>>> --- a/gcc/config/arm/arm.c

>>>>> +++ b/gcc/config/arm/arm.c

>>>>> @@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)

>>>>>    static int

>>>>>    arm_compute_static_chain_stack_bytes (void)

>>>>>    {

>>>>> +  /* Once the value is updated from the init value of -1, do not

>>>>> +     re-compute.  */

>>>>> +  if (cfun->machine->static_chain_stack_bytes != -1)

>>>>> +    return cfun->machine->static_chain_stack_bytes;

>>>>> +

>>>>>

>>>>> My concern is that this approach caches the first value that is computed

>>>>> for static_chain_stack_bytes.

>>>>> I believe the layout frame code is called multiple times during register

>>>>> allocation as it goes through

>>>>> the motions and I think we want the last value it computes during reload

>>>>>

>>>>> How about we do something like:

>>>>>     if (cfun->machine->static_chain_stack_bytes != -1

>>>>> &&epilogue_completed)

>>>>>       return cfun->machine->static_chain_stack_bytes;

>>>>>

>>>>> ?...

>>>>>

>>>>>    /* See the defining assertion in arm_expand_prologue.  */

>>>>>      if (IS_NESTED (arm_current_func_type ())

>>>>>          && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)

>>>>> @@ -21699,6 +21704,11 @@ arm_expand_prologue (void)

>>>>>          emit_insn (gen_movsi (stack_pointer_rtx, r1));

>>>>>        }

>>>>>

>>>>> +  /* Let's compute the static_chain_stack_bytes required and store it.

>>>>> Right

>>>>> +     now the value must the -1 as stored by arm_init_machine_status ().

>>>>> */

>>>>>

>>>>> ... this comment would need to be tweaked as

>>>>> cfun->machine->static_chain_stack_bytes may hold

>>>>> an intermediate value computed in reload or some other point before

>>>>> epilogue_completed.

>>>>>

>>>>> +  cfun->machine->static_chain_stack_bytes

>>>>> +    = arm_compute_static_chain_stack_bytes ();

>>>>> +

>>>>

>>>> Maybe I did not understand this completely, but my idea was that I am

>>>> initializing the value of cfun->machine->static_chain_stack_bytes to be -1

>>>> in arm_init_machine_status () and then it stays as -1 all throughout reload

>>>> and hence the function arm_compute_static_chain_stack_bytes () will keep

>>>> computing the value instead of returning the cached value. Only during

>>>> expand_prologue (which I assumed occurs much after reload), I overwrite the

>>>> initial -1 and after that any call to arm_compute_static_chain_stack_bytes

>>>> () would return this cached value.

>>>>

>>>> I did start out writing the patch with a epilogue_completed check but

>>>> realized that even during this stage arm_compute_static_chain_stack_bytes ()

>>>> was called several times and thus to avoid those re-computations, (again

>>>> assuming by this stage we already should have a fixed value) I re-wrote it

>>>> with the initialization to -1 approach.

>>>>

>>> Right, I had read the patch too quickly, sorry.

>>> You perform the caching of cfun->machine->static_chain_stack_bytes in

>>> arm_expand_prologue, not arm_compute_static_chain_stack_bytes.

>>> That does give you the right semantics I think.

>>>

>>> The patch is ok then with a small typo fix:

>>

>> Thanks! Committed as r258777.

>>

>> Sudi

>>

>>

>>> +  /* Let's compute the static_chain_stack_bytes required and store it.

>>> Right

>>> +     now the value must the -1 as stored by arm_init_machine_status ().

>>> */

>>>

>>> s/the/be/

>>>

>>> +  cfun->machine->static_chain_stack_bytes

>>> +    = arm_compute_static_chain_stack_bytes ();

>>> +

>>>

>>>

>>> Thanks,

>>> Kyrill

>>>
Sudakshina Das March 23, 2018, 1:31 p.m. | #7
On 23/03/18 09:12, Kyrill Tkachov wrote:
> 

> On 23/03/18 08:47, Christophe Lyon wrote:

>> Hi Sudi,

>>

>>

>> On 22 March 2018 at 18:26, Sudakshina Das <sudi.das@arm.com> wrote:

>>> Hi

>>>

>>>

>>> On 22/03/18 16:52, Kyrill Tkachov wrote:

>>>>

>>>> On 22/03/18 16:20, Sudakshina Das wrote:

>>>>> Hi Kyrill

>>>>>

>>>>> On 22/03/18 16:08, Kyrill Tkachov wrote:

>>>>>> Hi Sudi,

>>>>>>

>>>>>> On 21/03/18 17:44, Sudakshina Das wrote:

>>>>>>> Hi

>>>>>>>

>>>>>>> The ICE in the bug report was happening because the macro

>>>>>>> USE_RETURN_INSN (FALSE) was returning different values at different

>>>>>>> points in the compilation. This was internally occurring because the

>>>>>>> function arm_compute_static_chain_stack_bytes () which was 

>>>>>>> dependent on

>>>>>>> arm_r3_live_at_start_p () was giving a different value after the

>>>>>>> cond_exec instructions were created in ce3 causing the liveness 

>>>>>>> of r3

>>>>>>> to escape up to the start block.

>>>>>>>

>>>>>>> The function arm_compute_static_chain_stack_bytes () should really

>>>>>>> only compute the value once during epilogue/prologue stage. This 

>>>>>>> pass

>>>>>>> introduces a new member 'static_chain_stack_bytes' to the target

>>>>>>> definition of the struct machine_function which gets calculated in

>>>>>>> expand_prologue and is the value that is returned by

>>>>>>> arm_compute_static_chain_stack_bytes () beyond that.

>>>>>>>

>>>>>>> Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf

>>>>>>> and added the reported test to the testsuite.

>>>>>>>

>>>>>>> Is this ok for trunk?

>>>>>>>

>>>>>> Thanks for working on this.

>>>>>> I agree with the approach, I have a couple of comments inline.

>>>>>>

>>>>>>> Sudi

>>>>>>>

>>>>>>>

>>>>>>> ChangeLog entries:

>>>>>>>

>>>>>>> *** gcc/ChangeLog ***

>>>>>>>

>>>>>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>>>>>

>>>>>>>          PR target/84826

>>>>>>>          * config/arm/arm.h (machine_function): Add

>>>>>>>          static_chain_stack_bytes.

>>>>>>>          * config/arm/arm.c (arm_compute_static_chain_stack_bytes):

>>>>>>>          Avoid re-computing once computed.

>>>>>>>          (arm_expand_prologue): Compute

>>>>>>> machine->static_chain_stack_bytes.

>>>>>>>          (arm_init_machine_status): Initialize

>>>>>>>          machine->static_chain_stack_bytes.

>>>>>>>

>>>>>>> *** gcc/testsuite/ChangeLog ***

>>>>>>>

>>>>>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>>>>>

>>>>>>>          PR target/84826

>>>>>>>          * gcc.target/arm/pr84826.c: New test

>> The new test fails on

>> arm-none-linux-gnueabi

>> --with-mode thumb

>> --with-cpu cortex-a9

>> --with-fpu default

>> Dejagnu flags: -march=armv5t

>>

>> Because:

>> /gcc/testsuite/gcc.target/arm/pr84826.c: In function 'a':

>> /gcc/testsuite/gcc.target/arm/pr84826.c:15:1: sorry, unimplemented:

>> -fstack-check=specific for Thumb-1

>> compiler exited with status 1

>> FAIL: gcc.target/arm/pr84826.c (test for excess errors)

>>

>> You probably have to add a require-effective-target to skip the test

>> in such cases.

> 

> Yeah, these tests need a

> { dg-require-effective-target supports_stack_clash_protection }

> 

> A patch to add that is pre-approved.

> Sorry for missing it in review.

> 

> Kyrill

> 


Hi Christophe and Kyrill

How about the attached patch?
{ dg-require-effective-target supports_stack_clash_protection } is not
enabled for any of ARM targets, so this is my work around for that. There is
a comment in target-supports.exp which makes me a little hesitant in
tinkering with the require effective target code.

proc check_effective_target_supports_stack_clash_protection { } {

    # Temporary until the target bits are fully ACK'd.
#  if { [istarget aarch*-*-*] } {
#       return 1
#  }

     if { [istarget x86_64-*-*] || [istarget i?86-*-*]
           || [istarget powerpc*-*-*] || [istarget rs6000*-*-*]
           || [istarget s390*-*-*] } {
         return 1
     }
   return 0
}

I have opened a new PR 85005 which mentions this that is meant
for GCC 9 as part for a bigger cleanup and redesign of the stack
clash protection code on ARM backend.


*** gcc/testsuite/ChangeLog ***

2018-03-23  Sudakshina Das  <sudi.das@arm.com>

	PR target/84826
	* gcc.target/arm/pr84826.c: Add dg directive.

Thanks
Sudi

>> Thanks,

>>

>> Christophe

>>

>>>>>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h

>>>>>> index bbf3937..2809112 100644

>>>>>> --- a/gcc/config/arm/arm.h

>>>>>> +++ b/gcc/config/arm/arm.h

>>>>>> @@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function

>>>>>>      machine_mode thumb1_cc_mode;

>>>>>>      /* Set to 1 after arm_reorg has started.  */

>>>>>>      int after_arm_reorg;

>>>>>> +  /* The number of bytes used to store the static chain register 

>>>>>> on the

>>>>>> +     stack, above the stack frame.  */

>>>>>> +  int static_chain_stack_bytes;

>>>>>>    }

>>>>>>    machine_function;

>>>>>>    #endif

>>>>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

>>>>>> index cb6ab81..bc31810 100644

>>>>>> --- a/gcc/config/arm/arm.c

>>>>>> +++ b/gcc/config/arm/arm.c

>>>>>> @@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)

>>>>>>    static int

>>>>>>    arm_compute_static_chain_stack_bytes (void)

>>>>>>    {

>>>>>> +  /* Once the value is updated from the init value of -1, do not

>>>>>> +     re-compute.  */

>>>>>> +  if (cfun->machine->static_chain_stack_bytes != -1)

>>>>>> +    return cfun->machine->static_chain_stack_bytes;

>>>>>> +

>>>>>>

>>>>>> My concern is that this approach caches the first value that is 

>>>>>> computed

>>>>>> for static_chain_stack_bytes.

>>>>>> I believe the layout frame code is called multiple times during 

>>>>>> register

>>>>>> allocation as it goes through

>>>>>> the motions and I think we want the last value it computes during 

>>>>>> reload

>>>>>>

>>>>>> How about we do something like:

>>>>>>     if (cfun->machine->static_chain_stack_bytes != -1

>>>>>> &&epilogue_completed)

>>>>>>       return cfun->machine->static_chain_stack_bytes;

>>>>>>

>>>>>> ?...

>>>>>>

>>>>>>    /* See the defining assertion in arm_expand_prologue.  */

>>>>>>      if (IS_NESTED (arm_current_func_type ())

>>>>>>          && ((TARGET_APCS_FRAME && frame_pointer_needed && 

>>>>>> TARGET_ARM)

>>>>>> @@ -21699,6 +21704,11 @@ arm_expand_prologue (void)

>>>>>>          emit_insn (gen_movsi (stack_pointer_rtx, r1));

>>>>>>        }

>>>>>>

>>>>>> +  /* Let's compute the static_chain_stack_bytes required and 

>>>>>> store it.

>>>>>> Right

>>>>>> +     now the value must the -1 as stored by 

>>>>>> arm_init_machine_status ().

>>>>>> */

>>>>>>

>>>>>> ... this comment would need to be tweaked as

>>>>>> cfun->machine->static_chain_stack_bytes may hold

>>>>>> an intermediate value computed in reload or some other point before

>>>>>> epilogue_completed.

>>>>>>

>>>>>> +  cfun->machine->static_chain_stack_bytes

>>>>>> +    = arm_compute_static_chain_stack_bytes ();

>>>>>> +

>>>>>

>>>>> Maybe I did not understand this completely, but my idea was that I am

>>>>> initializing the value of cfun->machine->static_chain_stack_bytes 

>>>>> to be -1

>>>>> in arm_init_machine_status () and then it stays as -1 all 

>>>>> throughout reload

>>>>> and hence the function arm_compute_static_chain_stack_bytes () will 

>>>>> keep

>>>>> computing the value instead of returning the cached value. Only during

>>>>> expand_prologue (which I assumed occurs much after reload), I 

>>>>> overwrite the

>>>>> initial -1 and after that any call to 

>>>>> arm_compute_static_chain_stack_bytes

>>>>> () would return this cached value.

>>>>>

>>>>> I did start out writing the patch with a epilogue_completed check but

>>>>> realized that even during this stage 

>>>>> arm_compute_static_chain_stack_bytes ()

>>>>> was called several times and thus to avoid those re-computations, 

>>>>> (again

>>>>> assuming by this stage we already should have a fixed value) I 

>>>>> re-wrote it

>>>>> with the initialization to -1 approach.

>>>>>

>>>> Right, I had read the patch too quickly, sorry.

>>>> You perform the caching of cfun->machine->static_chain_stack_bytes in

>>>> arm_expand_prologue, not arm_compute_static_chain_stack_bytes.

>>>> That does give you the right semantics I think.

>>>>

>>>> The patch is ok then with a small typo fix:

>>>

>>> Thanks! Committed as r258777.

>>>

>>> Sudi

>>>

>>>

>>>> +  /* Let's compute the static_chain_stack_bytes required and store it.

>>>> Right

>>>> +     now the value must the -1 as stored by arm_init_machine_status 

>>>> ().

>>>> */

>>>>

>>>> s/the/be/

>>>>

>>>> +  cfun->machine->static_chain_stack_bytes

>>>> +    = arm_compute_static_chain_stack_bytes ();

>>>> +

>>>>

>>>>

>>>> Thanks,

>>>> Kyrill

>>>>

>
diff --git a/gcc/testsuite/gcc.target/arm/pr84826.c b/gcc/testsuite/gcc.target/arm/pr84826.c
index c61c548..a7af61d 100644
--- a/gcc/testsuite/gcc.target/arm/pr84826.c
+++ b/gcc/testsuite/gcc.target/arm/pr84826.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb2_ok } */
 /* { dg-options "-Ofast -fstack-clash-protection" } */
 
 void d (void *);
Kyrill Tkachov March 23, 2018, 1:50 p.m. | #8
On 23/03/18 13:31, Sudakshina Das wrote:
> On 23/03/18 09:12, Kyrill Tkachov wrote:

>>

>> On 23/03/18 08:47, Christophe Lyon wrote:

>>> Hi Sudi,

>>>

>>>

>>> On 22 March 2018 at 18:26, Sudakshina Das <sudi.das@arm.com> wrote:

>>>> Hi

>>>>

>>>>

>>>> On 22/03/18 16:52, Kyrill Tkachov wrote:

>>>>>

>>>>> On 22/03/18 16:20, Sudakshina Das wrote:

>>>>>> Hi Kyrill

>>>>>>

>>>>>> On 22/03/18 16:08, Kyrill Tkachov wrote:

>>>>>>> Hi Sudi,

>>>>>>>

>>>>>>> On 21/03/18 17:44, Sudakshina Das wrote:

>>>>>>>> Hi

>>>>>>>>

>>>>>>>> The ICE in the bug report was happening because the macro

>>>>>>>> USE_RETURN_INSN (FALSE) was returning different values at different

>>>>>>>> points in the compilation. This was internally occurring because the

>>>>>>>> function arm_compute_static_chain_stack_bytes () which was dependent on

>>>>>>>> arm_r3_live_at_start_p () was giving a different value after the

>>>>>>>> cond_exec instructions were created in ce3 causing the liveness of r3

>>>>>>>> to escape up to the start block.

>>>>>>>>

>>>>>>>> The function arm_compute_static_chain_stack_bytes () should really

>>>>>>>> only compute the value once during epilogue/prologue stage. This pass

>>>>>>>> introduces a new member 'static_chain_stack_bytes' to the target

>>>>>>>> definition of the struct machine_function which gets calculated in

>>>>>>>> expand_prologue and is the value that is returned by

>>>>>>>> arm_compute_static_chain_stack_bytes () beyond that.

>>>>>>>>

>>>>>>>> Testing done: Bootstrapped and regtested on arm-none-linux-gnueabihf

>>>>>>>> and added the reported test to the testsuite.

>>>>>>>>

>>>>>>>> Is this ok for trunk?

>>>>>>>>

>>>>>>> Thanks for working on this.

>>>>>>> I agree with the approach, I have a couple of comments inline.

>>>>>>>

>>>>>>>> Sudi

>>>>>>>>

>>>>>>>>

>>>>>>>> ChangeLog entries:

>>>>>>>>

>>>>>>>> *** gcc/ChangeLog ***

>>>>>>>>

>>>>>>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>>>>>>

>>>>>>>>          PR target/84826

>>>>>>>>          * config/arm/arm.h (machine_function): Add

>>>>>>>>          static_chain_stack_bytes.

>>>>>>>>          * config/arm/arm.c (arm_compute_static_chain_stack_bytes):

>>>>>>>>          Avoid re-computing once computed.

>>>>>>>>          (arm_expand_prologue): Compute

>>>>>>>> machine->static_chain_stack_bytes.

>>>>>>>>          (arm_init_machine_status): Initialize

>>>>>>>>          machine->static_chain_stack_bytes.

>>>>>>>>

>>>>>>>> *** gcc/testsuite/ChangeLog ***

>>>>>>>>

>>>>>>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>>>>>>

>>>>>>>>          PR target/84826

>>>>>>>>          * gcc.target/arm/pr84826.c: New test

>>> The new test fails on

>>> arm-none-linux-gnueabi

>>> --with-mode thumb

>>> --with-cpu cortex-a9

>>> --with-fpu default

>>> Dejagnu flags: -march=armv5t

>>>

>>> Because:

>>> /gcc/testsuite/gcc.target/arm/pr84826.c: In function 'a':

>>> /gcc/testsuite/gcc.target/arm/pr84826.c:15:1: sorry, unimplemented:

>>> -fstack-check=specific for Thumb-1

>>> compiler exited with status 1

>>> FAIL: gcc.target/arm/pr84826.c (test for excess errors)

>>>

>>> You probably have to add a require-effective-target to skip the test

>>> in such cases.

>>

>> Yeah, these tests need a

>> { dg-require-effective-target supports_stack_clash_protection }

>>

>> A patch to add that is pre-approved.

>> Sorry for missing it in review.

>>

>> Kyrill

>>

>

> Hi Christophe and Kyrill

>

> How about the attached patch?

> { dg-require-effective-target supports_stack_clash_protection } is not

> enabled for any of ARM targets, so this is my work around for that. There is

> a comment in target-supports.exp which makes me a little hesitant in

> tinkering with the require effective target code.

>

> proc check_effective_target_supports_stack_clash_protection { } {

>

>    # Temporary until the target bits are fully ACK'd.

> #  if { [istarget aarch*-*-*] } {

> #       return 1

> #  }

>

>     if { [istarget x86_64-*-*] || [istarget i?86-*-*]

>           || [istarget powerpc*-*-*] || [istarget rs6000*-*-*]

>           || [istarget s390*-*-*] } {

>         return 1

>     }

>   return 0

> }

>

> I have opened a new PR 85005 which mentions this that is meant

> for GCC 9 as part for a bigger cleanup and redesign of the stack

> clash protection code on ARM backend.

>

>


Ok. Thanks for doing this.
Kyrill

> *** gcc/testsuite/ChangeLog ***

>

> 2018-03-23  Sudakshina Das  <sudi.das@arm.com>

>

>     PR target/84826

>     * gcc.target/arm/pr84826.c: Add dg directive.

>

> Thanks

> Sudi

>

>>> Thanks,

>>>

>>> Christophe

>>>

>>>>>>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h

>>>>>>> index bbf3937..2809112 100644

>>>>>>> --- a/gcc/config/arm/arm.h

>>>>>>> +++ b/gcc/config/arm/arm.h

>>>>>>> @@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function

>>>>>>>      machine_mode thumb1_cc_mode;

>>>>>>>      /* Set to 1 after arm_reorg has started.  */

>>>>>>>      int after_arm_reorg;

>>>>>>> +  /* The number of bytes used to store the static chain register on the

>>>>>>> +     stack, above the stack frame.  */

>>>>>>> +  int static_chain_stack_bytes;

>>>>>>>    }

>>>>>>>    machine_function;

>>>>>>>    #endif

>>>>>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

>>>>>>> index cb6ab81..bc31810 100644

>>>>>>> --- a/gcc/config/arm/arm.c

>>>>>>> +++ b/gcc/config/arm/arm.c

>>>>>>> @@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)

>>>>>>>    static int

>>>>>>>    arm_compute_static_chain_stack_bytes (void)

>>>>>>>    {

>>>>>>> +  /* Once the value is updated from the init value of -1, do not

>>>>>>> +     re-compute.  */

>>>>>>> +  if (cfun->machine->static_chain_stack_bytes != -1)

>>>>>>> +    return cfun->machine->static_chain_stack_bytes;

>>>>>>> +

>>>>>>>

>>>>>>> My concern is that this approach caches the first value that is computed

>>>>>>> for static_chain_stack_bytes.

>>>>>>> I believe the layout frame code is called multiple times during register

>>>>>>> allocation as it goes through

>>>>>>> the motions and I think we want the last value it computes during reload

>>>>>>>

>>>>>>> How about we do something like:

>>>>>>>     if (cfun->machine->static_chain_stack_bytes != -1

>>>>>>> &&epilogue_completed)

>>>>>>>       return cfun->machine->static_chain_stack_bytes;

>>>>>>>

>>>>>>> ?...

>>>>>>>

>>>>>>>    /* See the defining assertion in arm_expand_prologue.  */

>>>>>>>      if (IS_NESTED (arm_current_func_type ())

>>>>>>>          && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)

>>>>>>> @@ -21699,6 +21704,11 @@ arm_expand_prologue (void)

>>>>>>>          emit_insn (gen_movsi (stack_pointer_rtx, r1));

>>>>>>>        }

>>>>>>>

>>>>>>> +  /* Let's compute the static_chain_stack_bytes required and store it.

>>>>>>> Right

>>>>>>> +     now the value must the -1 as stored by arm_init_machine_status ().

>>>>>>> */

>>>>>>>

>>>>>>> ... this comment would need to be tweaked as

>>>>>>> cfun->machine->static_chain_stack_bytes may hold

>>>>>>> an intermediate value computed in reload or some other point before

>>>>>>> epilogue_completed.

>>>>>>>

>>>>>>> +  cfun->machine->static_chain_stack_bytes

>>>>>>> +    = arm_compute_static_chain_stack_bytes ();

>>>>>>> +

>>>>>>

>>>>>> Maybe I did not understand this completely, but my idea was that I am

>>>>>> initializing the value of cfun->machine->static_chain_stack_bytes to be -1

>>>>>> in arm_init_machine_status () and then it stays as -1 all throughout reload

>>>>>> and hence the function arm_compute_static_chain_stack_bytes () will keep

>>>>>> computing the value instead of returning the cached value. Only during

>>>>>> expand_prologue (which I assumed occurs much after reload), I overwrite the

>>>>>> initial -1 and after that any call to arm_compute_static_chain_stack_bytes

>>>>>> () would return this cached value.

>>>>>>

>>>>>> I did start out writing the patch with a epilogue_completed check but

>>>>>> realized that even during this stage arm_compute_static_chain_stack_bytes ()

>>>>>> was called several times and thus to avoid those re-computations, (again

>>>>>> assuming by this stage we already should have a fixed value) I re-wrote it

>>>>>> with the initialization to -1 approach.

>>>>>>

>>>>> Right, I had read the patch too quickly, sorry.

>>>>> You perform the caching of cfun->machine->static_chain_stack_bytes in

>>>>> arm_expand_prologue, not arm_compute_static_chain_stack_bytes.

>>>>> That does give you the right semantics I think.

>>>>>

>>>>> The patch is ok then with a small typo fix:

>>>>

>>>> Thanks! Committed as r258777.

>>>>

>>>> Sudi

>>>>

>>>>

>>>>> +  /* Let's compute the static_chain_stack_bytes required and store it.

>>>>> Right

>>>>> +     now the value must the -1 as stored by arm_init_machine_status ().

>>>>> */

>>>>>

>>>>> s/the/be/

>>>>>

>>>>> +  cfun->machine->static_chain_stack_bytes

>>>>> +    = arm_compute_static_chain_stack_bytes ();

>>>>> +

>>>>>

>>>>>

>>>>> Thanks,

>>>>> Kyrill

>>>>>

>>

>
Sudakshina Das March 23, 2018, 1:58 p.m. | #9
On 23/03/18 13:50, Kyrill Tkachov wrote:
> 

> On 23/03/18 13:31, Sudakshina Das wrote:

>> On 23/03/18 09:12, Kyrill Tkachov wrote:

>>>

>>> On 23/03/18 08:47, Christophe Lyon wrote:

>>>> Hi Sudi,

>>>>

>>>>

>>>> On 22 March 2018 at 18:26, Sudakshina Das <sudi.das@arm.com> wrote:

>>>>> Hi

>>>>>

>>>>>

>>>>> On 22/03/18 16:52, Kyrill Tkachov wrote:

>>>>>>

>>>>>> On 22/03/18 16:20, Sudakshina Das wrote:

>>>>>>> Hi Kyrill

>>>>>>>

>>>>>>> On 22/03/18 16:08, Kyrill Tkachov wrote:

>>>>>>>> Hi Sudi,

>>>>>>>>

>>>>>>>> On 21/03/18 17:44, Sudakshina Das wrote:

>>>>>>>>> Hi

>>>>>>>>>

>>>>>>>>> The ICE in the bug report was happening because the macro

>>>>>>>>> USE_RETURN_INSN (FALSE) was returning different values at 

>>>>>>>>> different

>>>>>>>>> points in the compilation. This was internally occurring 

>>>>>>>>> because the

>>>>>>>>> function arm_compute_static_chain_stack_bytes () which was 

>>>>>>>>> dependent on

>>>>>>>>> arm_r3_live_at_start_p () was giving a different value after the

>>>>>>>>> cond_exec instructions were created in ce3 causing the liveness 

>>>>>>>>> of r3

>>>>>>>>> to escape up to the start block.

>>>>>>>>>

>>>>>>>>> The function arm_compute_static_chain_stack_bytes () should really

>>>>>>>>> only compute the value once during epilogue/prologue stage. 

>>>>>>>>> This pass

>>>>>>>>> introduces a new member 'static_chain_stack_bytes' to the target

>>>>>>>>> definition of the struct machine_function which gets calculated in

>>>>>>>>> expand_prologue and is the value that is returned by

>>>>>>>>> arm_compute_static_chain_stack_bytes () beyond that.

>>>>>>>>>

>>>>>>>>> Testing done: Bootstrapped and regtested on 

>>>>>>>>> arm-none-linux-gnueabihf

>>>>>>>>> and added the reported test to the testsuite.

>>>>>>>>>

>>>>>>>>> Is this ok for trunk?

>>>>>>>>>

>>>>>>>> Thanks for working on this.

>>>>>>>> I agree with the approach, I have a couple of comments inline.

>>>>>>>>

>>>>>>>>> Sudi

>>>>>>>>>

>>>>>>>>>

>>>>>>>>> ChangeLog entries:

>>>>>>>>>

>>>>>>>>> *** gcc/ChangeLog ***

>>>>>>>>>

>>>>>>>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>>>>>>>

>>>>>>>>>          PR target/84826

>>>>>>>>>          * config/arm/arm.h (machine_function): Add

>>>>>>>>>          static_chain_stack_bytes.

>>>>>>>>>          * config/arm/arm.c 

>>>>>>>>> (arm_compute_static_chain_stack_bytes):

>>>>>>>>>          Avoid re-computing once computed.

>>>>>>>>>          (arm_expand_prologue): Compute

>>>>>>>>> machine->static_chain_stack_bytes.

>>>>>>>>>          (arm_init_machine_status): Initialize

>>>>>>>>>          machine->static_chain_stack_bytes.

>>>>>>>>>

>>>>>>>>> *** gcc/testsuite/ChangeLog ***

>>>>>>>>>

>>>>>>>>> 2018-03-21  Sudakshina Das  <sudi.das@arm.com>

>>>>>>>>>

>>>>>>>>>          PR target/84826

>>>>>>>>>          * gcc.target/arm/pr84826.c: New test

>>>> The new test fails on

>>>> arm-none-linux-gnueabi

>>>> --with-mode thumb

>>>> --with-cpu cortex-a9

>>>> --with-fpu default

>>>> Dejagnu flags: -march=armv5t

>>>>

>>>> Because:

>>>> /gcc/testsuite/gcc.target/arm/pr84826.c: In function 'a':

>>>> /gcc/testsuite/gcc.target/arm/pr84826.c:15:1: sorry, unimplemented:

>>>> -fstack-check=specific for Thumb-1

>>>> compiler exited with status 1

>>>> FAIL: gcc.target/arm/pr84826.c (test for excess errors)

>>>>

>>>> You probably have to add a require-effective-target to skip the test

>>>> in such cases.

>>>

>>> Yeah, these tests need a

>>> { dg-require-effective-target supports_stack_clash_protection }

>>>

>>> A patch to add that is pre-approved.

>>> Sorry for missing it in review.

>>>

>>> Kyrill

>>>

>>

>> Hi Christophe and Kyrill

>>

>> How about the attached patch?

>> { dg-require-effective-target supports_stack_clash_protection } is not

>> enabled for any of ARM targets, so this is my work around for that. 

>> There is

>> a comment in target-supports.exp which makes me a little hesitant in

>> tinkering with the require effective target code.

>>

>> proc check_effective_target_supports_stack_clash_protection { } {

>>

>>    # Temporary until the target bits are fully ACK'd.

>> #  if { [istarget aarch*-*-*] } {

>> #       return 1

>> #  }

>>

>>     if { [istarget x86_64-*-*] || [istarget i?86-*-*]

>>           || [istarget powerpc*-*-*] || [istarget rs6000*-*-*]

>>           || [istarget s390*-*-*] } {

>>         return 1

>>     }

>>   return 0

>> }

>>

>> I have opened a new PR 85005 which mentions this that is meant

>> for GCC 9 as part for a bigger cleanup and redesign of the stack

>> clash protection code on ARM backend.

>>

>>

> 

> Ok. Thanks for doing this.


Thanks and sorry about this!
Committed as r258805.

Sudi

> Kyrill

> 

>> *** gcc/testsuite/ChangeLog ***

>>

>> 2018-03-23  Sudakshina Das  <sudi.das@arm.com>

>>

>>     PR target/84826

>>     * gcc.target/arm/pr84826.c: Add dg directive.

>>

>> Thanks

>> Sudi

>>

>>>> Thanks,

>>>>

>>>> Christophe

>>>>

>>>>>>>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h

>>>>>>>> index bbf3937..2809112 100644

>>>>>>>> --- a/gcc/config/arm/arm.h

>>>>>>>> +++ b/gcc/config/arm/arm.h

>>>>>>>> @@ -1384,6 +1384,9 @@ typedef struct GTY(()) machine_function

>>>>>>>>      machine_mode thumb1_cc_mode;

>>>>>>>>      /* Set to 1 after arm_reorg has started.  */

>>>>>>>>      int after_arm_reorg;

>>>>>>>> +  /* The number of bytes used to store the static chain 

>>>>>>>> register on the

>>>>>>>> +     stack, above the stack frame.  */

>>>>>>>> +  int static_chain_stack_bytes;

>>>>>>>>    }

>>>>>>>>    machine_function;

>>>>>>>>    #endif

>>>>>>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

>>>>>>>> index cb6ab81..bc31810 100644

>>>>>>>> --- a/gcc/config/arm/arm.c

>>>>>>>> +++ b/gcc/config/arm/arm.c

>>>>>>>> @@ -19392,6 +19392,11 @@ arm_r3_live_at_start_p (void)

>>>>>>>>    static int

>>>>>>>>    arm_compute_static_chain_stack_bytes (void)

>>>>>>>>    {

>>>>>>>> +  /* Once the value is updated from the init value of -1, do not

>>>>>>>> +     re-compute.  */

>>>>>>>> +  if (cfun->machine->static_chain_stack_bytes != -1)

>>>>>>>> +    return cfun->machine->static_chain_stack_bytes;

>>>>>>>> +

>>>>>>>>

>>>>>>>> My concern is that this approach caches the first value that is 

>>>>>>>> computed

>>>>>>>> for static_chain_stack_bytes.

>>>>>>>> I believe the layout frame code is called multiple times during 

>>>>>>>> register

>>>>>>>> allocation as it goes through

>>>>>>>> the motions and I think we want the last value it computes 

>>>>>>>> during reload

>>>>>>>>

>>>>>>>> How about we do something like:

>>>>>>>>     if (cfun->machine->static_chain_stack_bytes != -1

>>>>>>>> &&epilogue_completed)

>>>>>>>>       return cfun->machine->static_chain_stack_bytes;

>>>>>>>>

>>>>>>>> ?...

>>>>>>>>

>>>>>>>>    /* See the defining assertion in arm_expand_prologue.  */

>>>>>>>>      if (IS_NESTED (arm_current_func_type ())

>>>>>>>>          && ((TARGET_APCS_FRAME && frame_pointer_needed && 

>>>>>>>> TARGET_ARM)

>>>>>>>> @@ -21699,6 +21704,11 @@ arm_expand_prologue (void)

>>>>>>>>          emit_insn (gen_movsi (stack_pointer_rtx, r1));

>>>>>>>>        }

>>>>>>>>

>>>>>>>> +  /* Let's compute the static_chain_stack_bytes required and 

>>>>>>>> store it.

>>>>>>>> Right

>>>>>>>> +     now the value must the -1 as stored by 

>>>>>>>> arm_init_machine_status ().

>>>>>>>> */

>>>>>>>>

>>>>>>>> ... this comment would need to be tweaked as

>>>>>>>> cfun->machine->static_chain_stack_bytes may hold

>>>>>>>> an intermediate value computed in reload or some other point before

>>>>>>>> epilogue_completed.

>>>>>>>>

>>>>>>>> +  cfun->machine->static_chain_stack_bytes

>>>>>>>> +    = arm_compute_static_chain_stack_bytes ();

>>>>>>>> +

>>>>>>>

>>>>>>> Maybe I did not understand this completely, but my idea was that 

>>>>>>> I am

>>>>>>> initializing the value of cfun->machine->static_chain_stack_bytes 

>>>>>>> to be -1

>>>>>>> in arm_init_machine_status () and then it stays as -1 all 

>>>>>>> throughout reload

>>>>>>> and hence the function arm_compute_static_chain_stack_bytes () 

>>>>>>> will keep

>>>>>>> computing the value instead of returning the cached value. Only 

>>>>>>> during

>>>>>>> expand_prologue (which I assumed occurs much after reload), I 

>>>>>>> overwrite the

>>>>>>> initial -1 and after that any call to 

>>>>>>> arm_compute_static_chain_stack_bytes

>>>>>>> () would return this cached value.

>>>>>>>

>>>>>>> I did start out writing the patch with a epilogue_completed check 

>>>>>>> but

>>>>>>> realized that even during this stage 

>>>>>>> arm_compute_static_chain_stack_bytes ()

>>>>>>> was called several times and thus to avoid those re-computations, 

>>>>>>> (again

>>>>>>> assuming by this stage we already should have a fixed value) I 

>>>>>>> re-wrote it

>>>>>>> with the initialization to -1 approach.

>>>>>>>

>>>>>> Right, I had read the patch too quickly, sorry.

>>>>>> You perform the caching of cfun->machine->static_chain_stack_bytes in

>>>>>> arm_expand_prologue, not arm_compute_static_chain_stack_bytes.

>>>>>> That does give you the right semantics I think.

>>>>>>

>>>>>> The patch is ok then with a small typo fix:

>>>>>

>>>>> Thanks! Committed as r258777.

>>>>>

>>>>> Sudi

>>>>>

>>>>>

>>>>>> +  /* Let's compute the static_chain_stack_bytes required and 

>>>>>> store it.

>>>>>> Right

>>>>>> +     now the value must the -1 as stored by 

>>>>>> arm_init_machine_status ().

>>>>>> */

>>>>>>

>>>>>> s/the/be/

>>>>>>

>>>>>> +  cfun->machine->static_chain_stack_bytes

>>>>>> +    = arm_compute_static_chain_stack_bytes ();

>>>>>> +

>>>>>>

>>>>>>

>>>>>> Thanks,

>>>>>> Kyrill

>>>>>>

>>>

>>

>

Patch

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index bbf3937..2809112 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1384,6 +1384,9 @@  typedef struct GTY(()) machine_function
   machine_mode thumb1_cc_mode;
   /* Set to 1 after arm_reorg has started.  */
   int after_arm_reorg;
+  /* The number of bytes used to store the static chain register on the
+     stack, above the stack frame.  */
+  int static_chain_stack_bytes;
 }
 machine_function;
 #endif
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cb6ab81..bc31810 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19392,6 +19392,11 @@  arm_r3_live_at_start_p (void)
 static int
 arm_compute_static_chain_stack_bytes (void)
 {
+  /* Once the value is updated from the init value of -1, do not
+     re-compute.  */
+  if (cfun->machine->static_chain_stack_bytes != -1)
+    return cfun->machine->static_chain_stack_bytes;
+
   /* See the defining assertion in arm_expand_prologue.  */
   if (IS_NESTED (arm_current_func_type ())
       && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)
@@ -21699,6 +21704,11 @@  arm_expand_prologue (void)
       emit_insn (gen_movsi (stack_pointer_rtx, r1));
     }
 
+  /* Let's compute the static_chain_stack_bytes required and store it.  Right
+     now the value must the -1 as stored by arm_init_machine_status ().  */
+  cfun->machine->static_chain_stack_bytes
+    = arm_compute_static_chain_stack_bytes ();
+
   /* The static chain register is the same as the IP register.  If it is
      clobbered when creating the frame, we need to save and restore it.  */
   clobber_ip = IS_NESTED (func_type)
@@ -24875,6 +24885,7 @@  arm_init_machine_status (void)
 #if ARM_FT_UNKNOWN != 0
   machine->func_type = ARM_FT_UNKNOWN;
 #endif
+  machine->static_chain_stack_bytes = -1;
   return machine;
 }
 
diff --git a/gcc/testsuite/gcc.target/arm/pr84826.c b/gcc/testsuite/gcc.target/arm/pr84826.c
new file mode 100644
index 0000000..c61c548
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr84826.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fstack-clash-protection" } */
+
+void d (void *);
+
+void a ()
+{
+  int b;
+  void bar (int c)
+  {
+    if (__builtin_expect (c, 0))
+      ++b;
+  }
+  d (bar);
+}