[ARM/FDPIC,v5,04/21,ARM] FDPIC: Add support for FDPIC for arm architecture

Message ID 20190515124006.25840-5-christophe.lyon@st.com
State New
Headers show
Series
  • FDPIC ABI for ARM
Related show

Commit Message

Christophe Lyon May 15, 2019, 12:39 p.m.
The FDPIC register is hard-coded to r9, as defined in the ABI.

We have to disable tailcall optimizations if we don't know if the
target function is in the same module. If not, we have to set r9 to
the value associated with the target module.

When generating a symbol address, we have to take into account whether
it is a pointer to data or to a function, because different
relocations are needed.

2019-XX-XX  Christophe Lyon  <christophe.lyon@st.com>
	Mickaël Guêné <mickael.guene@st.com>

	* config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro
	in FDPIC mode.
	* config/arm/arm-protos.h (arm_load_function_descriptor): Declare
	new function.
	* config/arm/arm.c (arm_option_override): Define pic register to
	FDPIC_REGNUM.
	(arm_function_ok_for_sibcall): Disable sibcall optimization if we
	have no decl or go through PLT.
	(arm_load_pic_register): Handle TARGET_FDPIC.
	(arm_is_segment_info_known): New function.
	(arm_pic_static_addr): Add support for FDPIC.
	(arm_load_function_descriptor): New function.
	(arm_assemble_integer): Add support for FDPIC.
	* config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED):
	Define. (FDPIC_REGNUM): New define.
	* config/arm/arm.md (call): Add support for FDPIC.
	(call_value): Likewise.
	(*restore_pic_register_after_call): New pattern.
	(untyped_call): Disable if FDPIC.
	(untyped_return): Likewise.
	* config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New.

Change-Id: I8fb1a6b85ace672184013568c5d28fbda2f7fda4

-- 
2.6.3

Comments

Richard Sandiford July 16, 2019, 11:58 a.m. | #1
Christophe Lyon <christophe.lyon@st.com> writes:
> The FDPIC register is hard-coded to r9, as defined in the ABI.

>

> We have to disable tailcall optimizations if we don't know if the

> target function is in the same module. If not, we have to set r9 to

> the value associated with the target module.

>

> When generating a symbol address, we have to take into account whether

> it is a pointer to data or to a function, because different

> relocations are needed.

>

> 2019-XX-XX  Christophe Lyon  <christophe.lyon@st.com>

> 	Mickaël Guêné <mickael.guene@st.com>

>

> 	* config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro

> 	in FDPIC mode.

> 	* config/arm/arm-protos.h (arm_load_function_descriptor): Declare

> 	new function.

> 	* config/arm/arm.c (arm_option_override): Define pic register to

> 	FDPIC_REGNUM.

> 	(arm_function_ok_for_sibcall): Disable sibcall optimization if we

> 	have no decl or go through PLT.

> 	(arm_load_pic_register): Handle TARGET_FDPIC.

> 	(arm_is_segment_info_known): New function.

> 	(arm_pic_static_addr): Add support for FDPIC.

> 	(arm_load_function_descriptor): New function.

> 	(arm_assemble_integer): Add support for FDPIC.

> 	* config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED):

> 	Define. (FDPIC_REGNUM): New define.

> 	* config/arm/arm.md (call): Add support for FDPIC.

> 	(call_value): Likewise.

> 	(*restore_pic_register_after_call): New pattern.

> 	(untyped_call): Disable if FDPIC.

> 	(untyped_return): Likewise.

> 	* config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New.

>

> Change-Id: I8fb1a6b85ace672184013568c5d28fbda2f7fda4

>

> diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c

> index 6e256ee..34695fa 100644

> --- a/gcc/config/arm/arm-c.c

> +++ b/gcc/config/arm/arm-c.c

> @@ -203,6 +203,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)

>        builtin_define ("__ARM_EABI__");

>      }

>  

> +  def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC);

> +

>    def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV);

>    def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);

>  

> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h

> index 485bc68..272968a 100644

> --- a/gcc/config/arm/arm-protos.h

> +++ b/gcc/config/arm/arm-protos.h

> @@ -139,6 +139,7 @@ extern int arm_max_const_double_inline_cost (void);

>  extern int arm_const_double_inline_cost (rtx);

>  extern bool arm_const_double_by_parts (rtx);

>  extern bool arm_const_double_by_immediates (rtx);

> +extern rtx arm_load_function_descriptor (rtx funcdesc);

>  extern void arm_emit_call_insn (rtx, rtx, bool);

>  bool detect_cmse_nonsecure_call (tree);

>  extern const char *output_call (rtx *);

> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

> index 45abcd8..d9397b5 100644

> --- a/gcc/config/arm/arm.c

> +++ b/gcc/config/arm/arm.c

> @@ -3485,6 +3485,15 @@ arm_option_override (void)

>    if (flag_pic && TARGET_VXWORKS_RTP)

>      arm_pic_register = 9;

>  

> +  /* If in FDPIC mode then force arm_pic_register to be r9.  */

> +  if (TARGET_FDPIC)

> +    {

> +      arm_pic_register = FDPIC_REGNUM;

> +      if (! TARGET_ARM && ! TARGET_THUMB2)

> +	sorry ("FDPIC mode is supported on architecture versions that "

> +	       "support ARM or Thumb-2 only.");

> +    }

> +

>    if (arm_pic_register_string != NULL)

>      {

>        int pic_register = decode_reg_name (arm_pic_register_string);


Isn't this equivalent to rejecting Thumb-1?  I think that would be
clearer in both the condition and the error message.

How does this interact with arm_pic_data_is_text_relative?  Are both
values supported?

> @@ -7295,6 +7304,21 @@ arm_function_ok_for_sibcall (tree decl, tree exp)

>    if (cfun->machine->sibcall_blocked)

>      return false;

>  

> +  if (TARGET_FDPIC)

> +    {

> +      /* In FDPIC, never tailcall something for which we have no decl:

> +	 the target function could be in a different module, requiring

> +	 a different FDPIC register value.  */

> +      if (decl == NULL)

> +	return false;

> +

> +      /* Don't tailcall if we go through the PLT since the FDPIC

> +	 register is then corrupted and we don't restore it after

> +	 static function calls.  */

> +      if (!targetm.binds_local_p (decl))

> +	return false;

> +    }

> +

>    /* Never tailcall something if we are generating code for Thumb-1.  */

>    if (TARGET_THUMB1)

>      return false;

> @@ -7711,7 +7735,9 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)

>  {

>    rtx l1, labelno, pic_tmp, pic_rtx;

>  

> -  if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)

> +  if (crtl->uses_pic_offset_table == 0

> +      || TARGET_SINGLE_PIC_BASE

> +      || TARGET_FDPIC)

>      return;

>  

>    gcc_assert (flag_pic);

> @@ -7780,28 +7806,142 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)

>    emit_use (pic_reg);

>  }

>  

> +/* Try to determine whether an object, referenced via ORIG, will be

> +   placed in the text or data segment.  This is used in FDPIC mode, to

> +   decide which relocations to use when accessing ORIG.  IS_READONLY

> +   is set to true if ORIG is a read-only location, false otherwise.

> +   Return true if we could determine the location of ORIG, false

> +   otherwise.  IS_READONLY is valid only when we return true.  */


Maybe *IS_READONLY in both cases?

> +static bool

> +arm_is_segment_info_known (rtx orig, bool *is_readonly)

> +{

> +  bool res = false;

> +

> +  *is_readonly = false;

> +

> +  if (GET_CODE (orig) == LABEL_REF)

> +    {

> +      res = true;

> +      *is_readonly = true;

> +    }


Think this function would be easier to read with early returns.

> +  else if (SYMBOL_REF_P (orig))


...so "if" rather than "else if" here.

> +    {

> +      if (CONSTANT_POOL_ADDRESS_P (orig))

> +	{

> +	  res = true;

> +	  *is_readonly = true;

> +	}

> +      else if (SYMBOL_REF_LOCAL_P (orig)

> +	       && !SYMBOL_REF_EXTERNAL_P (orig)

> +	       && SYMBOL_REF_DECL (orig)

> +	       && (!DECL_P (SYMBOL_REF_DECL (orig))

> +		   || !DECL_COMMON (SYMBOL_REF_DECL (orig))))

> +	{

> +	  tree decl = SYMBOL_REF_DECL (orig);

> +	  tree init = (TREE_CODE (decl) == VAR_DECL)

> +	    ? DECL_INITIAL (decl) : (TREE_CODE (decl) == CONSTRUCTOR)

> +	    ? decl : 0;

> +	  int reloc = 0;

> +	  bool named_section, readonly;

> +

> +	  if (init && init != error_mark_node)

> +	    reloc = compute_reloc_for_constant (init);

> +

> +	  named_section = TREE_CODE (decl) == VAR_DECL

> +	    && lookup_attribute ("section", DECL_ATTRIBUTES (decl));


Here too I think it would be better to return false early.

How much variation do you support here for named sections?  E.g. can a
linker script really put SECTION_WRITE sections in the text segment?
Seems like there are some cases that could be handled.

(Just asking, not suggesting you should change anything.)

> +	  readonly = decl_readonly_section (decl, reloc);

> +

> +	  /* We don't know where the link script will put a named

> +	     section, so return false in such a case.  */

> +	  res = !named_section;

> +

> +	  if (!named_section)

> +	    *is_readonly = readonly;

> +	}

> +      else

> +	{

> +	  /* We don't know.  */

> +	  res = false;

> +	}

> +    }

> +  else

> +    gcc_unreachable ();

> +

> +  return res;

> +}

> +

>  /* Generate code to load the address of a static var when flag_pic is set.  */

>  static rtx_insn *

>  arm_pic_static_addr (rtx orig, rtx reg)

>  {

>    rtx l1, labelno, offset_rtx;

> +  rtx_insn *insn;

>  

>    gcc_assert (flag_pic);

>  

> -  /* We use an UNSPEC rather than a LABEL_REF because this label

> -     never appears in the code stream.  */

> -  labelno = GEN_INT (pic_labelno++);

> -  l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);

> -  l1 = gen_rtx_CONST (VOIDmode, l1);

> +  bool is_readonly = false;

> +  bool info_known = false;

>  

> -  /* On the ARM the PC register contains 'dot + 8' at the time of the

> -     addition, on the Thumb it is 'dot + 4'.  */

> -  offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);

> -  offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),

> -                               UNSPEC_SYMBOL_OFFSET);

> -  offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);

> +  if (TARGET_FDPIC

> +      && SYMBOL_REF_P (orig)

> +      && !SYMBOL_REF_FUNCTION_P (orig))

> +      info_known = arm_is_segment_info_known (orig, &is_readonly);


Excess indendentation.  Feels like it might be slightly simpler
to handle SYMBOL_REF_FUNCTION_P in arm_is_segment_info_known,
but I guess the idea is that it might not then be clear whether
the caller is asking about a descriptor or the function itself.

>  

> -  return emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, labelno));

> +  if (TARGET_FDPIC

> +      && SYMBOL_REF_P (orig)

> +      && !SYMBOL_REF_FUNCTION_P (orig)

> +      && !info_known)

> +    {

> +      /* We don't know where orig is stored, so we have be

> +	 pessimistic and use a GOT relocation.  */

> +      rtx pat;

> +      rtx mem;

> +      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> +

> +      pat = gen_calculate_pic_address (reg, pic_reg, orig);

> +

> +      /* Make the MEM as close to a constant as possible.  */

> +      mem = SET_SRC (pat);

> +      gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));

> +      MEM_READONLY_P (mem) = 1;

> +      MEM_NOTRAP_P (mem) = 1;

> +

> +      insn = emit_insn (pat);


Think "pat = ..." onwards should be split out into a helper, since it's
a cut-&-paste of the code in legitimize_pic_address.

> +    }

> +  else if (TARGET_FDPIC

> +	   && SYMBOL_REF_P (orig)

> +	   && (SYMBOL_REF_FUNCTION_P (orig)

> +	       || (info_known && !is_readonly)))

> +    {

> +      /* We use the GOTOFF relocation.  */

> +      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> +

> +      rtx l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig), UNSPEC_PIC_SYM);

> +      emit_insn (gen_movsi (reg, l1));

> +      insn = emit_insn (gen_addsi3 (reg, reg, pic_reg));

> +    }

> +  else

> +    {

> +      /* Not FDPIC, not SYMBOL_REF_P or readonly: we can use

> +	 PC-relative access.  */

> +      /* We use an UNSPEC rather than a LABEL_REF because this label

> +	 never appears in the code stream.  */

> +      labelno = GEN_INT (pic_labelno++);

> +      l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);

> +      l1 = gen_rtx_CONST (VOIDmode, l1);

> +

> +      /* On the ARM the PC register contains 'dot + 8' at the time of the

> +	 addition, on the Thumb it is 'dot + 4'.  */

> +      offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);

> +      offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),

> +				   UNSPEC_SYMBOL_OFFSET);

> +      offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);

> +

> +      insn = emit_insn (gen_pic_load_addr_unified (reg, offset_rtx,

> +						   labelno));

> +    }

> +

> +  return insn;

>  }

>  

>  /* Return nonzero if X is valid as an ARM state addressing register.  */

> @@ -16112,9 +16252,36 @@ get_jump_table_size (rtx_jump_table_data *insn)

>    return 0;

>  }

>  

> +/* Emit insns to load the function address from FUNCDESC (an FDPIC

> +   function descriptor) into a register and the GOT address into the

> +   FDPIC register, returning an rtx for the register holding the

> +   function address.  */

> +

> +rtx

> +arm_load_function_descriptor (rtx funcdesc)

> +{

> +  rtx fnaddr_reg = gen_reg_rtx (Pmode);

> +  rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> +  rtx fnaddr = gen_rtx_MEM (Pmode, funcdesc);

> +  rtx gotaddr = gen_rtx_MEM (Pmode, plus_constant (Pmode, funcdesc, 4));

> +  rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));

> +

> +  emit_move_insn (fnaddr_reg, fnaddr);

> +  /* The ABI requires the entry point address to be loaded first, so

> +     prevent the load from being moved after that of the GOT

> +     address.  */


Do you mean that the move insn above has to come before the
pattern below?  If so, I think that should be enforced by making this...

> +  XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode,

> +					gen_rtvec (2, pic_reg, gotaddr),

> +					UNSPEC_PIC_RESTORE);

> +  XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, gotaddr);

> +  XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, pic_reg);

> +  emit_insn (par);

> +

> +  return fnaddr_reg;

> +}

> +


...use fnaddr_reg.

Does the instruction actually use pic_reg?  We only get here for
non-symbolic addresses after all.

It seems simpler to make *restore_pic_register_after_call a named pattern
and use gen_restore_pic_register_after_call instead.

>  /* Return the maximum amount of padding that will be inserted before

>     label LABEL.  */

> -

>  static HOST_WIDE_INT

>  get_label_padding (rtx label)

>  {

> @@ -23069,9 +23236,37 @@ arm_assemble_integer (rtx x, unsigned int size, int aligned_p)

>  		  && (!SYMBOL_REF_LOCAL_P (x)

>  		      || (SYMBOL_REF_DECL (x)

>  			  ? DECL_WEAK (SYMBOL_REF_DECL (x)) : 0))))

> -	    fputs ("(GOT)", asm_out_file);

> +	    {

> +	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))

> +		fputs ("(GOTFUNCDESC)", asm_out_file);

> +	      else

> +		fputs ("(GOT)", asm_out_file);

> +	    }

>  	  else

> -	    fputs ("(GOTOFF)", asm_out_file);

> +	    {

> +	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))

> +		fputs ("(GOTOFFFUNCDESC)", asm_out_file);

> +	      else

> +		{

> +		  bool is_readonly;

> +

> +		  if (arm_is_segment_info_known (x, &is_readonly))

> +		    fputs ("(GOTOFF)", asm_out_file);

> +		  else

> +		    fputs ("(GOT)", asm_out_file);

> +		}

> +	    }

> +	}

> +

> +      /* For FDPIC we also have to mark symbol for .data section.  */

> +      if (TARGET_FDPIC

> +	  && NEED_GOT_RELOC

> +	  && flag_pic

> +	  && !making_const_table

> +	  && SYMBOL_REF_P (x))

> +	{

> +	  if (SYMBOL_REF_FUNCTION_P (x))

> +	    fputs ("(FUNCDESC)", asm_out_file);

>  	}

>        fputc ('\n', asm_out_file);

>        return true;


Do you expect to reach here for LABEL_REFs with TARGET_FDPIC?  The second
block of code tests for SYMBOL_REF_P but the first tests
SYMBOL_REF_FUNCTION_P without checking SYMBOL_REF_P first.

Can NEED_GOT_RELOC or flag_pic be false for TARGET_FDPIC?
Is !flag_pic TARGET_FDPIC supported?

> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md

> index 0aecd03..9036255 100644

> --- a/gcc/config/arm/arm.md

> +++ b/gcc/config/arm/arm.md

> @@ -8127,6 +8127,23 @@

>      rtx callee, pat;

>      tree addr = MEM_EXPR (operands[0]);

>      

> +    /* Force FDPIC register (r9) before call.  */

> +    if (TARGET_FDPIC)

> +      {

> +	/* No need to update r9 if calling a static function.

> +	   In other words: set r9 for indirect or non-local calls.  */

> +	callee = XEXP (operands[0], 0);

> +	if (!SYMBOL_REF_P (callee)

> +	    || !SYMBOL_REF_LOCAL_P (callee)

> +	    || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))


IMO it would be better to calculate this once rather than repeat
it below.

> +	  {

> +	    emit_insn (gen_blockage ());


Why's the blockage needed?  Seems worth a comment.

> +	    rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> +	    emit_move_insn (pic_reg, get_hard_reg_initial_val (Pmode, FDPIC_REGNUM));

> +	    emit_insn (gen_rtx_USE (VOIDmode, pic_reg));


Is this use keeping the register live for the call?  If so,
I think it'd be better to attach it to the CALL_INSN_FUNCTION_USAGE
instead.

> +	 }

> +      }

> +

>      /* In an untyped call, we can get NULL for operand 2.  */

>      if (operands[2] == NULL_RTX)

>        operands[2] = const0_rtx;

> @@ -8140,6 +8157,13 @@

>  	: !REG_P (callee))

>        XEXP (operands[0], 0) = force_reg (Pmode, callee);

>  

> +    if (TARGET_FDPIC && !SYMBOL_REF_P (XEXP (operands[0], 0)))

> +      {

> +	/* Indirect call: set r9 with FDPIC value of callee.  */

> +	XEXP (operands[0], 0)

> +	  = arm_load_function_descriptor (XEXP (operands[0], 0));

> +      }

> +

>      if (detect_cmse_nonsecure_call (addr))

>        {

>  	pat = gen_nonsecure_call_internal (operands[0], operands[1],


Redundant braces.

> @@ -8151,10 +8175,38 @@

>  	pat = gen_call_internal (operands[0], operands[1], operands[2]);

>  	arm_emit_call_insn (pat, XEXP (operands[0], 0), false);

>        }

> +

> +    /* Restore FDPIC register (r9) after call.  */

> +    if (TARGET_FDPIC)

> +      {

> +	/* No need to update r9 if calling a static function.  */

> +	if (!SYMBOL_REF_P (callee)

> +	    || !SYMBOL_REF_LOCAL_P (callee)

> +	    || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))

> +	  {

> +	    rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> +	    emit_move_insn (pic_reg, get_hard_reg_initial_val (Pmode, FDPIC_REGNUM));

> +	    emit_insn (gen_rtx_USE (VOIDmode, pic_reg));

> +	    emit_insn (gen_blockage ());

> +	  }

> +      }

>      DONE;

>    }"

>  )


What's the general assumption about the validity of r9?  Seems odd that
we need to load this value both before and after the call.

>  

> +(define_insn "*restore_pic_register_after_call"

> +  [(parallel [(unspec [(match_operand:SI 0 "s_register_operand" "=r,r")

> +		       (match_operand:SI 1 "nonimmediate_operand" "r,m")]

> +	       UNSPEC_PIC_RESTORE)

> +	      (use (match_dup 1))

> +	      (clobber (match_dup 0))])

> +  ]

> +  ""

> +  "@

> +  mov\t%0, %1

> +  ldr\t%0, %1"

> +)

> +

>  (define_expand "call_internal"

>    [(parallel [(call (match_operand 0 "memory_operand" "")

>  	            (match_operand 1 "general_operand" ""))


Since operand 0 is significant after the instruction, I think this
should be:

(define_insn "*restore_pic_register_after_call"
  [(set (match_operand:SI 0 "s_register_operand" "+r,r")
	(unspec:SI [(match_dup 0)
		    (match_operand:SI 1 "nonimmediate_operand" "r,m")]
		   UNSPEC_PIC_RESTORE))]
  ...

The (use (match_dup 1)) looks redundant, since the unspec itself
uses operand 1.

> @@ -8215,6 +8267,30 @@

>      rtx pat, callee;

>      tree addr = MEM_EXPR (operands[1]);

>      

> +    /* Force FDPIC register (r9) before call.  */

> +    if (TARGET_FDPIC)

> +      {

> +	/* No need to update the FDPIC register (r9) if calling a static function.

> +	   In other words: set r9 for indirect or non-local calls.  */

> +	callee = XEXP (operands[1], 0);

> +	if (!SYMBOL_REF_P (callee)

> +	    || !SYMBOL_REF_LOCAL_P (callee)

> +	    || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))

> +	  {

> +	    rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));

> +	    rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> +	    rtx initial_fdpic_reg =

> +		get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);

> +

> +	    XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode,

> +		gen_rtvec (2, fdpic_reg, initial_fdpic_reg),

> +		UNSPEC_PIC_RESTORE);

> +	    XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, initial_fdpic_reg);

> +	    XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, fdpic_reg);

> +	    emit_insn (par);

> +	  }

> +      }

> +


It's not obvious why this code is different from the call-without-value
case above, which doesn't use UNSPEC_PIC_RESTORE.  I think it should be
split out into a helper function that's used for both call and call_value.

I think it would also be good to have more comments about what
conditions the UNSPEC_PIC_RESTORE pattern is enforcing.

Thanks,
Richard
Christophe Lyon Aug. 20, 2019, 3:56 p.m. | #2
On 16/07/2019 13:58, Richard Sandiford wrote:
> Christophe Lyon <christophe.lyon@st.com> writes:

>> The FDPIC register is hard-coded to r9, as defined in the ABI.

>>

>> We have to disable tailcall optimizations if we don't know if the

>> target function is in the same module. If not, we have to set r9 to

>> the value associated with the target module.

>>

>> When generating a symbol address, we have to take into account whether

>> it is a pointer to data or to a function, because different

>> relocations are needed.

>>

>> 2019-XX-XX  Christophe Lyon  <christophe.lyon@st.com>

>> 	Mickaël Guêné <mickael.guene@st.com>

>>

>> 	* config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro

>> 	in FDPIC mode.

>> 	* config/arm/arm-protos.h (arm_load_function_descriptor): Declare

>> 	new function.

>> 	* config/arm/arm.c (arm_option_override): Define pic register to

>> 	FDPIC_REGNUM.

>> 	(arm_function_ok_for_sibcall): Disable sibcall optimization if we

>> 	have no decl or go through PLT.

>> 	(arm_load_pic_register): Handle TARGET_FDPIC.

>> 	(arm_is_segment_info_known): New function.

>> 	(arm_pic_static_addr): Add support for FDPIC.

>> 	(arm_load_function_descriptor): New function.

>> 	(arm_assemble_integer): Add support for FDPIC.

>> 	* config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED):

>> 	Define. (FDPIC_REGNUM): New define.

>> 	* config/arm/arm.md (call): Add support for FDPIC.

>> 	(call_value): Likewise.

>> 	(*restore_pic_register_after_call): New pattern.

>> 	(untyped_call): Disable if FDPIC.

>> 	(untyped_return): Likewise.

>> 	* config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New.

>>

>> Change-Id: I8fb1a6b85ace672184013568c5d28fbda2f7fda4

>>

>> diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c

>> index 6e256ee..34695fa 100644

>> --- a/gcc/config/arm/arm-c.c

>> +++ b/gcc/config/arm/arm-c.c

>> @@ -203,6 +203,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)

>>         builtin_define ("__ARM_EABI__");

>>       }

>>   

>> +  def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC);

>> +

>>     def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV);

>>     def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);

>>   

>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h

>> index 485bc68..272968a 100644

>> --- a/gcc/config/arm/arm-protos.h

>> +++ b/gcc/config/arm/arm-protos.h

>> @@ -139,6 +139,7 @@ extern int arm_max_const_double_inline_cost (void);

>>   extern int arm_const_double_inline_cost (rtx);

>>   extern bool arm_const_double_by_parts (rtx);

>>   extern bool arm_const_double_by_immediates (rtx);

>> +extern rtx arm_load_function_descriptor (rtx funcdesc);

>>   extern void arm_emit_call_insn (rtx, rtx, bool);

>>   bool detect_cmse_nonsecure_call (tree);

>>   extern const char *output_call (rtx *);

>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

>> index 45abcd8..d9397b5 100644

>> --- a/gcc/config/arm/arm.c

>> +++ b/gcc/config/arm/arm.c

>> @@ -3485,6 +3485,15 @@ arm_option_override (void)

>>     if (flag_pic && TARGET_VXWORKS_RTP)

>>       arm_pic_register = 9;

>>   

>> +  /* If in FDPIC mode then force arm_pic_register to be r9.  */

>> +  if (TARGET_FDPIC)

>> +    {

>> +      arm_pic_register = FDPIC_REGNUM;

>> +      if (! TARGET_ARM && ! TARGET_THUMB2)

>> +	sorry ("FDPIC mode is supported on architecture versions that "

>> +	       "support ARM or Thumb-2 only.");

>> +    }

>> +

>>     if (arm_pic_register_string != NULL)

>>       {

>>         int pic_register = decode_reg_name (arm_pic_register_string);

> 

> Isn't this equivalent to rejecting Thumb-1?  I think that would be

> clearer in both the condition and the error message.

> 

> How does this interact with arm_pic_data_is_text_relative?  Are both

> values supported?

> 

>> @@ -7295,6 +7304,21 @@ arm_function_ok_for_sibcall (tree decl, tree exp)

>>     if (cfun->machine->sibcall_blocked)

>>       return false;

>>   

>> +  if (TARGET_FDPIC)

>> +    {

>> +      /* In FDPIC, never tailcall something for which we have no decl:

>> +	 the target function could be in a different module, requiring

>> +	 a different FDPIC register value.  */

>> +      if (decl == NULL)

>> +	return false;

>> +

>> +      /* Don't tailcall if we go through the PLT since the FDPIC

>> +	 register is then corrupted and we don't restore it after

>> +	 static function calls.  */

>> +      if (!targetm.binds_local_p (decl))

>> +	return false;

>> +    }

>> +

>>     /* Never tailcall something if we are generating code for Thumb-1.  */

>>     if (TARGET_THUMB1)

>>       return false;

>> @@ -7711,7 +7735,9 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)

>>   {

>>     rtx l1, labelno, pic_tmp, pic_rtx;

>>   

>> -  if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)

>> +  if (crtl->uses_pic_offset_table == 0

>> +      || TARGET_SINGLE_PIC_BASE

>> +      || TARGET_FDPIC)

>>       return;

>>   

>>     gcc_assert (flag_pic);

>> @@ -7780,28 +7806,142 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)

>>     emit_use (pic_reg);

>>   }

>>   

>> +/* Try to determine whether an object, referenced via ORIG, will be

>> +   placed in the text or data segment.  This is used in FDPIC mode, to

>> +   decide which relocations to use when accessing ORIG.  IS_READONLY

>> +   is set to true if ORIG is a read-only location, false otherwise.

>> +   Return true if we could determine the location of ORIG, false

>> +   otherwise.  IS_READONLY is valid only when we return true.  */

> 

> Maybe *IS_READONLY in both cases?

> 

>> +static bool

>> +arm_is_segment_info_known (rtx orig, bool *is_readonly)

>> +{

>> +  bool res = false;

>> +

>> +  *is_readonly = false;

>> +

>> +  if (GET_CODE (orig) == LABEL_REF)

>> +    {

>> +      res = true;

>> +      *is_readonly = true;

>> +    }

> 

> Think this function would be easier to read with early returns.

> 

>> +  else if (SYMBOL_REF_P (orig))

> 

> ...so "if" rather than "else if" here.

> 

>> +    {

>> +      if (CONSTANT_POOL_ADDRESS_P (orig))

>> +	{

>> +	  res = true;

>> +	  *is_readonly = true;

>> +	}

>> +      else if (SYMBOL_REF_LOCAL_P (orig)

>> +	       && !SYMBOL_REF_EXTERNAL_P (orig)

>> +	       && SYMBOL_REF_DECL (orig)

>> +	       && (!DECL_P (SYMBOL_REF_DECL (orig))

>> +		   || !DECL_COMMON (SYMBOL_REF_DECL (orig))))

>> +	{

>> +	  tree decl = SYMBOL_REF_DECL (orig);

>> +	  tree init = (TREE_CODE (decl) == VAR_DECL)

>> +	    ? DECL_INITIAL (decl) : (TREE_CODE (decl) == CONSTRUCTOR)

>> +	    ? decl : 0;

>> +	  int reloc = 0;

>> +	  bool named_section, readonly;

>> +

>> +	  if (init && init != error_mark_node)

>> +	    reloc = compute_reloc_for_constant (init);

>> +

>> +	  named_section = TREE_CODE (decl) == VAR_DECL

>> +	    && lookup_attribute ("section", DECL_ATTRIBUTES (decl));

> 

> Here too I think it would be better to return false early.

> 

> How much variation do you support here for named sections?  E.g. can a

> linker script really put SECTION_WRITE sections in the text segment?

> Seems like there are some cases that could be handled.

> 

> (Just asking, not suggesting you should change anything.)

> 

>> +	  readonly = decl_readonly_section (decl, reloc);

>> +

>> +	  /* We don't know where the link script will put a named

>> +	     section, so return false in such a case.  */

>> +	  res = !named_section;

>> +

>> +	  if (!named_section)

>> +	    *is_readonly = readonly;

>> +	}

>> +      else

>> +	{

>> +	  /* We don't know.  */

>> +	  res = false;

>> +	}

>> +    }

>> +  else

>> +    gcc_unreachable ();

>> +

>> +  return res;

>> +}

>> +

>>   /* Generate code to load the address of a static var when flag_pic is set.  */

>>   static rtx_insn *

>>   arm_pic_static_addr (rtx orig, rtx reg)

>>   {

>>     rtx l1, labelno, offset_rtx;

>> +  rtx_insn *insn;

>>   

>>     gcc_assert (flag_pic);

>>   

>> -  /* We use an UNSPEC rather than a LABEL_REF because this label

>> -     never appears in the code stream.  */

>> -  labelno = GEN_INT (pic_labelno++);

>> -  l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);

>> -  l1 = gen_rtx_CONST (VOIDmode, l1);

>> +  bool is_readonly = false;

>> +  bool info_known = false;

>>   

>> -  /* On the ARM the PC register contains 'dot + 8' at the time of the

>> -     addition, on the Thumb it is 'dot + 4'.  */

>> -  offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);

>> -  offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),

>> -                               UNSPEC_SYMBOL_OFFSET);

>> -  offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);

>> +  if (TARGET_FDPIC

>> +      && SYMBOL_REF_P (orig)

>> +      && !SYMBOL_REF_FUNCTION_P (orig))

>> +      info_known = arm_is_segment_info_known (orig, &is_readonly);

> 

> Excess indendentation.  Feels like it might be slightly simpler

> to handle SYMBOL_REF_FUNCTION_P in arm_is_segment_info_known,

> but I guess the idea is that it might not then be clear whether

> the caller is asking about a descriptor or the function itself.

> 

>>   

>> -  return emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, labelno));

>> +  if (TARGET_FDPIC

>> +      && SYMBOL_REF_P (orig)

>> +      && !SYMBOL_REF_FUNCTION_P (orig)

>> +      && !info_known)

>> +    {

>> +      /* We don't know where orig is stored, so we have be

>> +	 pessimistic and use a GOT relocation.  */

>> +      rtx pat;

>> +      rtx mem;

>> +      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

>> +

>> +      pat = gen_calculate_pic_address (reg, pic_reg, orig);

>> +

>> +      /* Make the MEM as close to a constant as possible.  */

>> +      mem = SET_SRC (pat);

>> +      gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));

>> +      MEM_READONLY_P (mem) = 1;

>> +      MEM_NOTRAP_P (mem) = 1;

>> +

>> +      insn = emit_insn (pat);

> 

> Think "pat = ..." onwards should be split out into a helper, since it's

> a cut-&-paste of the code in legitimize_pic_address.

> 

>> +    }

>> +  else if (TARGET_FDPIC

>> +	   && SYMBOL_REF_P (orig)

>> +	   && (SYMBOL_REF_FUNCTION_P (orig)

>> +	       || (info_known && !is_readonly)))

>> +    {

>> +      /* We use the GOTOFF relocation.  */

>> +      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

>> +

>> +      rtx l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig), UNSPEC_PIC_SYM);

>> +      emit_insn (gen_movsi (reg, l1));

>> +      insn = emit_insn (gen_addsi3 (reg, reg, pic_reg));

>> +    }

>> +  else

>> +    {

>> +      /* Not FDPIC, not SYMBOL_REF_P or readonly: we can use

>> +	 PC-relative access.  */

>> +      /* We use an UNSPEC rather than a LABEL_REF because this label

>> +	 never appears in the code stream.  */

>> +      labelno = GEN_INT (pic_labelno++);

>> +      l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);

>> +      l1 = gen_rtx_CONST (VOIDmode, l1);

>> +

>> +      /* On the ARM the PC register contains 'dot + 8' at the time of the

>> +	 addition, on the Thumb it is 'dot + 4'.  */

>> +      offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);

>> +      offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),

>> +				   UNSPEC_SYMBOL_OFFSET);

>> +      offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);

>> +

>> +      insn = emit_insn (gen_pic_load_addr_unified (reg, offset_rtx,

>> +						   labelno));

>> +    }

>> +

>> +  return insn;

>>   }

>>   

>>   /* Return nonzero if X is valid as an ARM state addressing register.  */

>> @@ -16112,9 +16252,36 @@ get_jump_table_size (rtx_jump_table_data *insn)

>>     return 0;

>>   }

>>   

>> +/* Emit insns to load the function address from FUNCDESC (an FDPIC

>> +   function descriptor) into a register and the GOT address into the

>> +   FDPIC register, returning an rtx for the register holding the

>> +   function address.  */

>> +

>> +rtx

>> +arm_load_function_descriptor (rtx funcdesc)

>> +{

>> +  rtx fnaddr_reg = gen_reg_rtx (Pmode);

>> +  rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

>> +  rtx fnaddr = gen_rtx_MEM (Pmode, funcdesc);

>> +  rtx gotaddr = gen_rtx_MEM (Pmode, plus_constant (Pmode, funcdesc, 4));

>> +  rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));

>> +

>> +  emit_move_insn (fnaddr_reg, fnaddr);

>> +  /* The ABI requires the entry point address to be loaded first, so

>> +     prevent the load from being moved after that of the GOT

>> +     address.  */

> 

> Do you mean that the move insn above has to come before the

> pattern below?  If so, I think that should be enforced by making this...

> 

>> +  XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode,

>> +					gen_rtvec (2, pic_reg, gotaddr),

>> +					UNSPEC_PIC_RESTORE);

>> +  XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, gotaddr);

>> +  XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, pic_reg);

>> +  emit_insn (par);

>> +

>> +  return fnaddr_reg;

>> +}

>> +

> 

> ...use fnaddr_reg.

> 

> Does the instruction actually use pic_reg?  We only get here for

> non-symbolic addresses after all.

> 

> It seems simpler to make *restore_pic_register_after_call a named pattern

> and use gen_restore_pic_register_after_call instead.

> 

>>   /* Return the maximum amount of padding that will be inserted before

>>      label LABEL.  */

>> -

>>   static HOST_WIDE_INT

>>   get_label_padding (rtx label)

>>   {

>> @@ -23069,9 +23236,37 @@ arm_assemble_integer (rtx x, unsigned int size, int aligned_p)

>>   		  && (!SYMBOL_REF_LOCAL_P (x)

>>   		      || (SYMBOL_REF_DECL (x)

>>   			  ? DECL_WEAK (SYMBOL_REF_DECL (x)) : 0))))

>> -	    fputs ("(GOT)", asm_out_file);

>> +	    {

>> +	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))

>> +		fputs ("(GOTFUNCDESC)", asm_out_file);

>> +	      else

>> +		fputs ("(GOT)", asm_out_file);

>> +	    }

>>   	  else

>> -	    fputs ("(GOTOFF)", asm_out_file);

>> +	    {

>> +	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))

>> +		fputs ("(GOTOFFFUNCDESC)", asm_out_file);

>> +	      else

>> +		{

>> +		  bool is_readonly;

>> +

>> +		  if (arm_is_segment_info_known (x, &is_readonly))

>> +		    fputs ("(GOTOFF)", asm_out_file);

>> +		  else

>> +		    fputs ("(GOT)", asm_out_file);

>> +		}

>> +	    }

>> +	}

>> +

>> +      /* For FDPIC we also have to mark symbol for .data section.  */

>> +      if (TARGET_FDPIC

>> +	  && NEED_GOT_RELOC

>> +	  && flag_pic

>> +	  && !making_const_table

>> +	  && SYMBOL_REF_P (x))

>> +	{

>> +	  if (SYMBOL_REF_FUNCTION_P (x))

>> +	    fputs ("(FUNCDESC)", asm_out_file);

>>   	}

>>         fputc ('\n', asm_out_file);

>>         return true;

> 

> Do you expect to reach here for LABEL_REFs with TARGET_FDPIC?  The second

> block of code tests for SYMBOL_REF_P but the first tests

> SYMBOL_REF_FUNCTION_P without checking SYMBOL_REF_P first.

> 

> Can NEED_GOT_RELOC or flag_pic be false for TARGET_FDPIC?

> Is !flag_pic TARGET_FDPIC supported?

> 

>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md

>> index 0aecd03..9036255 100644

>> --- a/gcc/config/arm/arm.md

>> +++ b/gcc/config/arm/arm.md

>> @@ -8127,6 +8127,23 @@

>>       rtx callee, pat;

>>       tree addr = MEM_EXPR (operands[0]);

>>       

>> +    /* Force FDPIC register (r9) before call.  */

>> +    if (TARGET_FDPIC)

>> +      {

>> +	/* No need to update r9 if calling a static function.

>> +	   In other words: set r9 for indirect or non-local calls.  */

>> +	callee = XEXP (operands[0], 0);

>> +	if (!SYMBOL_REF_P (callee)

>> +	    || !SYMBOL_REF_LOCAL_P (callee)

>> +	    || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))

> 

> IMO it would be better to calculate this once rather than repeat

> it below.

> 

>> +	  {

>> +	    emit_insn (gen_blockage ());

> 

> Why's the blockage needed?  Seems worth a comment.

> 

>> +	    rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

>> +	    emit_move_insn (pic_reg, get_hard_reg_initial_val (Pmode, FDPIC_REGNUM));

>> +	    emit_insn (gen_rtx_USE (VOIDmode, pic_reg));

> 

> Is this use keeping the register live for the call?  If so,

> I think it'd be better to attach it to the CALL_INSN_FUNCTION_USAGE

> instead.

> 

>> +	 }

>> +      }

>> +

>>       /* In an untyped call, we can get NULL for operand 2.  */

>>       if (operands[2] == NULL_RTX)

>>         operands[2] = const0_rtx;

>> @@ -8140,6 +8157,13 @@

>>   	: !REG_P (callee))

>>         XEXP (operands[0], 0) = force_reg (Pmode, callee);

>>   

>> +    if (TARGET_FDPIC && !SYMBOL_REF_P (XEXP (operands[0], 0)))

>> +      {

>> +	/* Indirect call: set r9 with FDPIC value of callee.  */

>> +	XEXP (operands[0], 0)

>> +	  = arm_load_function_descriptor (XEXP (operands[0], 0));

>> +      }

>> +

>>       if (detect_cmse_nonsecure_call (addr))

>>         {

>>   	pat = gen_nonsecure_call_internal (operands[0], operands[1],

> 

> Redundant braces.

> 

>> @@ -8151,10 +8175,38 @@

>>   	pat = gen_call_internal (operands[0], operands[1], operands[2]);

>>   	arm_emit_call_insn (pat, XEXP (operands[0], 0), false);

>>         }

>> +

>> +    /* Restore FDPIC register (r9) after call.  */

>> +    if (TARGET_FDPIC)

>> +      {

>> +	/* No need to update r9 if calling a static function.  */

>> +	if (!SYMBOL_REF_P (callee)

>> +	    || !SYMBOL_REF_LOCAL_P (callee)

>> +	    || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))

>> +	  {

>> +	    rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

>> +	    emit_move_insn (pic_reg, get_hard_reg_initial_val (Pmode, FDPIC_REGNUM));

>> +	    emit_insn (gen_rtx_USE (VOIDmode, pic_reg));

>> +	    emit_insn (gen_blockage ());

>> +	  }

>> +      }

>>       DONE;

>>     }"

>>   )

> 

> What's the general assumption about the validity of r9?  Seems odd that

> we need to load this value both before and after the call.

> 

>>   

>> +(define_insn "*restore_pic_register_after_call"

>> +  [(parallel [(unspec [(match_operand:SI 0 "s_register_operand" "=r,r")

>> +		       (match_operand:SI 1 "nonimmediate_operand" "r,m")]

>> +	       UNSPEC_PIC_RESTORE)

>> +	      (use (match_dup 1))

>> +	      (clobber (match_dup 0))])

>> +  ]

>> +  ""

>> +  "@

>> +  mov\t%0, %1

>> +  ldr\t%0, %1"

>> +)

>> +

>>   (define_expand "call_internal"

>>     [(parallel [(call (match_operand 0 "memory_operand" "")

>>   	            (match_operand 1 "general_operand" ""))

> 

> Since operand 0 is significant after the instruction, I think this

> should be:

> 

> (define_insn "*restore_pic_register_after_call"

>    [(set (match_operand:SI 0 "s_register_operand" "+r,r")

> 	(unspec:SI [(match_dup 0)

> 		    (match_operand:SI 1 "nonimmediate_operand" "r,m")]

> 		   UNSPEC_PIC_RESTORE))]

>    ...

> 

> The (use (match_dup 1)) looks redundant, since the unspec itself

> uses operand 1.

> 


When I try that, I have cases where the restore instruction is discarded, when the call happens just before function return. Since r9 is caller-saved, it should be restored but after dse2 the dumps say:
(insn (set (reg:SI 9 r9)
    (unspec:SI [
        (reg:SI 9 r9)
        (reg:SI 4 r4 [121])
      ] UNSPEC_PIC_RESTORE))
(expr_list:REG_UNUSED (reg:SI 9 r9) (nil))))

and this is later removed by cprop_hardreg (which says the exit block uses r4, sp, and lr: should I make it use r9?)

Thanks,

Christophe


>> @@ -8215,6 +8267,30 @@

>>       rtx pat, callee;

>>       tree addr = MEM_EXPR (operands[1]);

>>       

>> +    /* Force FDPIC register (r9) before call.  */

>> +    if (TARGET_FDPIC)

>> +      {

>> +	/* No need to update the FDPIC register (r9) if calling a static function.

>> +	   In other words: set r9 for indirect or non-local calls.  */

>> +	callee = XEXP (operands[1], 0);

>> +	if (!SYMBOL_REF_P (callee)

>> +	    || !SYMBOL_REF_LOCAL_P (callee)

>> +	    || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))

>> +	  {

>> +	    rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));

>> +	    rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

>> +	    rtx initial_fdpic_reg =

>> +		get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);

>> +

>> +	    XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode,

>> +		gen_rtvec (2, fdpic_reg, initial_fdpic_reg),

>> +		UNSPEC_PIC_RESTORE);

>> +	    XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, initial_fdpic_reg);

>> +	    XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, fdpic_reg);

>> +	    emit_insn (par);

>> +	  }

>> +      }

>> +

> 

> It's not obvious why this code is different from the call-without-value

> case above, which doesn't use UNSPEC_PIC_RESTORE.  I think it should be

> split out into a helper function that's used for both call and call_value.

> 

> I think it would also be good to have more comments about what

> conditions the UNSPEC_PIC_RESTORE pattern is enforcing.

> 

> Thanks,

> Richard

> .

>
Christophe Lyon Aug. 29, 2019, 3:34 p.m. | #3
On 16/07/2019 13:58, Richard Sandiford wrote:
> Christophe Lyon <christophe.lyon@st.com> writes:

>> The FDPIC register is hard-coded to r9, as defined in the ABI.

>>

>> We have to disable tailcall optimizations if we don't know if the

>> target function is in the same module. If not, we have to set r9 to

>> the value associated with the target module.

>>

>> When generating a symbol address, we have to take into account whether

>> it is a pointer to data or to a function, because different

>> relocations are needed.

>>

>> 2019-XX-XX  Christophe Lyon  <christophe.lyon@st.com>

>> 	Mickaël Guêné <mickael.guene@st.com>

>>

>> 	* config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro

>> 	in FDPIC mode.

>> 	* config/arm/arm-protos.h (arm_load_function_descriptor): Declare

>> 	new function.

>> 	* config/arm/arm.c (arm_option_override): Define pic register to

>> 	FDPIC_REGNUM.

>> 	(arm_function_ok_for_sibcall): Disable sibcall optimization if we

>> 	have no decl or go through PLT.

>> 	(arm_load_pic_register): Handle TARGET_FDPIC.

>> 	(arm_is_segment_info_known): New function.

>> 	(arm_pic_static_addr): Add support for FDPIC.

>> 	(arm_load_function_descriptor): New function.

>> 	(arm_assemble_integer): Add support for FDPIC.

>> 	* config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED):

>> 	Define. (FDPIC_REGNUM): New define.

>> 	* config/arm/arm.md (call): Add support for FDPIC.

>> 	(call_value): Likewise.

>> 	(*restore_pic_register_after_call): New pattern.

>> 	(untyped_call): Disable if FDPIC.

>> 	(untyped_return): Likewise.

>> 	* config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New.

>>

>> Change-Id: I8fb1a6b85ace672184013568c5d28fbda2f7fda4

>>

>> diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c

>> index 6e256ee..34695fa 100644

>> --- a/gcc/config/arm/arm-c.c

>> +++ b/gcc/config/arm/arm-c.c

>> @@ -203,6 +203,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)

>>         builtin_define ("__ARM_EABI__");

>>       }

>>   

>> +  def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC);

>> +

>>     def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV);

>>     def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);

>>   

>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h

>> index 485bc68..272968a 100644

>> --- a/gcc/config/arm/arm-protos.h

>> +++ b/gcc/config/arm/arm-protos.h

>> @@ -139,6 +139,7 @@ extern int arm_max_const_double_inline_cost (void);

>>   extern int arm_const_double_inline_cost (rtx);

>>   extern bool arm_const_double_by_parts (rtx);

>>   extern bool arm_const_double_by_immediates (rtx);

>> +extern rtx arm_load_function_descriptor (rtx funcdesc);

>>   extern void arm_emit_call_insn (rtx, rtx, bool);

>>   bool detect_cmse_nonsecure_call (tree);

>>   extern const char *output_call (rtx *);

>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

>> index 45abcd8..d9397b5 100644

>> --- a/gcc/config/arm/arm.c

>> +++ b/gcc/config/arm/arm.c

>> @@ -3485,6 +3485,15 @@ arm_option_override (void)

>>     if (flag_pic && TARGET_VXWORKS_RTP)

>>       arm_pic_register = 9;

>>   

>> +  /* If in FDPIC mode then force arm_pic_register to be r9.  */

>> +  if (TARGET_FDPIC)

>> +    {

>> +      arm_pic_register = FDPIC_REGNUM;

>> +      if (! TARGET_ARM && ! TARGET_THUMB2)

>> +	sorry ("FDPIC mode is supported on architecture versions that "

>> +	       "support ARM or Thumb-2 only.");

>> +    }

>> +

>>     if (arm_pic_register_string != NULL)

>>       {

>>         int pic_register = decode_reg_name (arm_pic_register_string);

> 

> Isn't this equivalent to rejecting Thumb-1?  I think that would be

> clearer in both the condition and the error message.

> 

Right, fixed.

> How does this interact with arm_pic_data_is_text_relative?  Are both

> values supported?

It doesn't interact well... it only works with the default value.
Otherwise, there are compiler crashes.

> 

>> @@ -7295,6 +7304,21 @@ arm_function_ok_for_sibcall (tree decl, tree exp)

>>     if (cfun->machine->sibcall_blocked)

>>       return false;

>>   

>> +  if (TARGET_FDPIC)

>> +    {

>> +      /* In FDPIC, never tailcall something for which we have no decl:

>> +	 the target function could be in a different module, requiring

>> +	 a different FDPIC register value.  */

>> +      if (decl == NULL)

>> +	return false;

>> +

>> +      /* Don't tailcall if we go through the PLT since the FDPIC

>> +	 register is then corrupted and we don't restore it after

>> +	 static function calls.  */

>> +      if (!targetm.binds_local_p (decl))

>> +	return false;

>> +    }

>> +

>>     /* Never tailcall something if we are generating code for Thumb-1.  */

>>     if (TARGET_THUMB1)

>>       return false;

>> @@ -7711,7 +7735,9 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)

>>   {

>>     rtx l1, labelno, pic_tmp, pic_rtx;

>>   

>> -  if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)

>> +  if (crtl->uses_pic_offset_table == 0

>> +      || TARGET_SINGLE_PIC_BASE

>> +      || TARGET_FDPIC)

>>       return;

>>   

>>     gcc_assert (flag_pic);

>> @@ -7780,28 +7806,142 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)

>>     emit_use (pic_reg);

>>   }

>>   

>> +/* Try to determine whether an object, referenced via ORIG, will be

>> +   placed in the text or data segment.  This is used in FDPIC mode, to

>> +   decide which relocations to use when accessing ORIG.  IS_READONLY

>> +   is set to true if ORIG is a read-only location, false otherwise.

>> +   Return true if we could determine the location of ORIG, false

>> +   otherwise.  IS_READONLY is valid only when we return true.  */

> 

> Maybe *IS_READONLY in both cases?

> 

OK

>> +static bool

>> +arm_is_segment_info_known (rtx orig, bool *is_readonly)

>> +{

>> +  bool res = false;

>> +

>> +  *is_readonly = false;

>> +

>> +  if (GET_CODE (orig) == LABEL_REF)

>> +    {

>> +      res = true;

>> +      *is_readonly = true;

>> +    }

> 

> Think this function would be easier to read with early returns.

> 

Done

>> +  else if (SYMBOL_REF_P (orig))

> 

> ...so "if" rather than "else if" here.

> 

>> +    {

>> +      if (CONSTANT_POOL_ADDRESS_P (orig))

>> +	{

>> +	  res = true;

>> +	  *is_readonly = true;

>> +	}

>> +      else if (SYMBOL_REF_LOCAL_P (orig)

>> +	       && !SYMBOL_REF_EXTERNAL_P (orig)

>> +	       && SYMBOL_REF_DECL (orig)

>> +	       && (!DECL_P (SYMBOL_REF_DECL (orig))

>> +		   || !DECL_COMMON (SYMBOL_REF_DECL (orig))))

>> +	{

>> +	  tree decl = SYMBOL_REF_DECL (orig);

>> +	  tree init = (TREE_CODE (decl) == VAR_DECL)

>> +	    ? DECL_INITIAL (decl) : (TREE_CODE (decl) == CONSTRUCTOR)

>> +	    ? decl : 0;

>> +	  int reloc = 0;

>> +	  bool named_section, readonly;

>> +

>> +	  if (init && init != error_mark_node)

>> +	    reloc = compute_reloc_for_constant (init);

>> +

>> +	  named_section = TREE_CODE (decl) == VAR_DECL

>> +	    && lookup_attribute ("section", DECL_ATTRIBUTES (decl));

> 

> Here too I think it would be better to return false early.

> 

OK

> How much variation do you support here for named sections?  E.g. can a

> linker script really put SECTION_WRITE sections in the text segment?

> Seems like there are some cases that could be handled.

> 

> (Just asking, not suggesting you should change anything.)

> 

I don't think we had such a use-case, we have tested with "common/usual"
linux packages.

>> +	  readonly = decl_readonly_section (decl, reloc);

>> +

>> +	  /* We don't know where the link script will put a named

>> +	     section, so return false in such a case.  */

>> +	  res = !named_section;

>> +

>> +	  if (!named_section)

>> +	    *is_readonly = readonly;

>> +	}

>> +      else

>> +	{

>> +	  /* We don't know.  */

>> +	  res = false;

>> +	}

>> +    }

>> +  else

>> +    gcc_unreachable ();

>> +

>> +  return res;

>> +}

>> +

>>   /* Generate code to load the address of a static var when flag_pic is set.  */

>>   static rtx_insn *

>>   arm_pic_static_addr (rtx orig, rtx reg)

>>   {

>>     rtx l1, labelno, offset_rtx;

>> +  rtx_insn *insn;

>>   

>>     gcc_assert (flag_pic);

>>   

>> -  /* We use an UNSPEC rather than a LABEL_REF because this label

>> -     never appears in the code stream.  */

>> -  labelno = GEN_INT (pic_labelno++);

>> -  l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);

>> -  l1 = gen_rtx_CONST (VOIDmode, l1);

>> +  bool is_readonly = false;

>> +  bool info_known = false;

>>   

>> -  /* On the ARM the PC register contains 'dot + 8' at the time of the

>> -     addition, on the Thumb it is 'dot + 4'.  */

>> -  offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);

>> -  offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),

>> -                               UNSPEC_SYMBOL_OFFSET);

>> -  offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);

>> +  if (TARGET_FDPIC

>> +      && SYMBOL_REF_P (orig)

>> +      && !SYMBOL_REF_FUNCTION_P (orig))

>> +      info_known = arm_is_segment_info_known (orig, &is_readonly);

> 

> Excess indendentation.  Feels like it might be slightly simpler

> to handle SYMBOL_REF_FUNCTION_P in arm_is_segment_info_known,

> but I guess the idea is that it might not then be clear whether

> the caller is asking about a descriptor or the function itself.

> 

Yes

>>   

>> -  return emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, labelno));

>> +  if (TARGET_FDPIC

>> +      && SYMBOL_REF_P (orig)

>> +      && !SYMBOL_REF_FUNCTION_P (orig)

>> +      && !info_known)

>> +    {

>> +      /* We don't know where orig is stored, so we have be

>> +	 pessimistic and use a GOT relocation.  */

>> +      rtx pat;

>> +      rtx mem;

>> +      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

>> +

>> +      pat = gen_calculate_pic_address (reg, pic_reg, orig);

>> +

>> +      /* Make the MEM as close to a constant as possible.  */

>> +      mem = SET_SRC (pat);

>> +      gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));

>> +      MEM_READONLY_P (mem) = 1;

>> +      MEM_NOTRAP_P (mem) = 1;

>> +

>> +      insn = emit_insn (pat);

> 

> Think "pat = ..." onwards should be split out into a helper, since it's

> a cut-&-paste of the code in legitimize_pic_address.

> 

OK, done, created calculate_pic_address_constant for this.

>> +    }

>> +  else if (TARGET_FDPIC

>> +	   && SYMBOL_REF_P (orig)

>> +	   && (SYMBOL_REF_FUNCTION_P (orig)

>> +	       || (info_known && !is_readonly)))

>> +    {

>> +      /* We use the GOTOFF relocation.  */

>> +      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

>> +

>> +      rtx l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig), UNSPEC_PIC_SYM);

>> +      emit_insn (gen_movsi (reg, l1));

>> +      insn = emit_insn (gen_addsi3 (reg, reg, pic_reg));

>> +    }

>> +  else

>> +    {

>> +      /* Not FDPIC, not SYMBOL_REF_P or readonly: we can use

>> +	 PC-relative access.  */

>> +      /* We use an UNSPEC rather than a LABEL_REF because this label

>> +	 never appears in the code stream.  */

>> +      labelno = GEN_INT (pic_labelno++);

>> +      l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);

>> +      l1 = gen_rtx_CONST (VOIDmode, l1);

>> +

>> +      /* On the ARM the PC register contains 'dot + 8' at the time of the

>> +	 addition, on the Thumb it is 'dot + 4'.  */

>> +      offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);

>> +      offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),

>> +				   UNSPEC_SYMBOL_OFFSET);

>> +      offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);

>> +

>> +      insn = emit_insn (gen_pic_load_addr_unified (reg, offset_rtx,

>> +						   labelno));

>> +    }

>> +

>> +  return insn;

>>   }

>>   

>>   /* Return nonzero if X is valid as an ARM state addressing register.  */

>> @@ -16112,9 +16252,36 @@ get_jump_table_size (rtx_jump_table_data *insn)

>>     return 0;

>>   }

>>   

>> +/* Emit insns to load the function address from FUNCDESC (an FDPIC

>> +   function descriptor) into a register and the GOT address into the

>> +   FDPIC register, returning an rtx for the register holding the

>> +   function address.  */

>> +

>> +rtx

>> +arm_load_function_descriptor (rtx funcdesc)

>> +{

>> +  rtx fnaddr_reg = gen_reg_rtx (Pmode);

>> +  rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

>> +  rtx fnaddr = gen_rtx_MEM (Pmode, funcdesc);

>> +  rtx gotaddr = gen_rtx_MEM (Pmode, plus_constant (Pmode, funcdesc, 4));

>> +  rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));

>> +

>> +  emit_move_insn (fnaddr_reg, fnaddr);

>> +  /* The ABI requires the entry point address to be loaded first, so

>> +     prevent the load from being moved after that of the GOT

>> +     address.  */

> 

> Do you mean that the move insn above has to come before the

> pattern below?  If so, I think that should be enforced by making this...

> 

yes...

>> +  XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode,

>> +					gen_rtvec (2, pic_reg, gotaddr),

>> +					UNSPEC_PIC_RESTORE);

>> +  XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, gotaddr);

>> +  XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, pic_reg);

>> +  emit_insn (par);

>> +

>> +  return fnaddr_reg;

>> +}

>> +

> 

> ...use fnaddr_reg.

but I updated the comment: this was related to a race condition,
but since we don't support lazy binding, this is not a problem.

> 

> Does the instruction actually use pic_reg?  We only get here for

> non-symbolic addresses after all.

> 

> It seems simpler to make *restore_pic_register_after_call a named pattern

> and use gen_restore_pic_register_after_call instead.

> 

Done

>>   /* Return the maximum amount of padding that will be inserted before

>>      label LABEL.  */

>> -

>>   static HOST_WIDE_INT

>>   get_label_padding (rtx label)

>>   {

>> @@ -23069,9 +23236,37 @@ arm_assemble_integer (rtx x, unsigned int size, int aligned_p)

>>   		  && (!SYMBOL_REF_LOCAL_P (x)

>>   		      || (SYMBOL_REF_DECL (x)

>>   			  ? DECL_WEAK (SYMBOL_REF_DECL (x)) : 0))))

>> -	    fputs ("(GOT)", asm_out_file);

>> +	    {

>> +	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))

>> +		fputs ("(GOTFUNCDESC)", asm_out_file);

>> +	      else

>> +		fputs ("(GOT)", asm_out_file);

>> +	    }

>>   	  else

>> -	    fputs ("(GOTOFF)", asm_out_file);

>> +	    {

>> +	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))

>> +		fputs ("(GOTOFFFUNCDESC)", asm_out_file);

>> +	      else

>> +		{

>> +		  bool is_readonly;

>> +

>> +		  if (arm_is_segment_info_known (x, &is_readonly))

>> +		    fputs ("(GOTOFF)", asm_out_file);

>> +		  else

>> +		    fputs ("(GOT)", asm_out_file);

>> +		}

>> +	    }

>> +	}

>> +

>> +      /* For FDPIC we also have to mark symbol for .data section.  */

>> +      if (TARGET_FDPIC

>> +	  && NEED_GOT_RELOC

>> +	  && flag_pic

>> +	  && !making_const_table

>> +	  && SYMBOL_REF_P (x))

>> +	{

>> +	  if (SYMBOL_REF_FUNCTION_P (x))

>> +	    fputs ("(FUNCDESC)", asm_out_file);

>>   	}

>>         fputc ('\n', asm_out_file);

>>         return true;

> 

> Do you expect to reach here for LABEL_REFs with TARGET_FDPIC?  The second

No. I reached here with LABEL_REFs only when forcing arm_pic_data_is_text_relative

> block of code tests for SYMBOL_REF_P but the first tests

> SYMBOL_REF_FUNCTION_P without checking SYMBOL_REF_P first.

> 

> Can NEED_GOT_RELOC or flag_pic be false for TARGET_FDPIC?

No.

> Is !flag_pic TARGET_FDPIC supported?

No; flag_pic is false when we use -mno-fdpic, so we revert to the "usual" abi then

> 

>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md

>> index 0aecd03..9036255 100644

>> --- a/gcc/config/arm/arm.md

>> +++ b/gcc/config/arm/arm.md

>> @@ -8127,6 +8127,23 @@

>>       rtx callee, pat;

>>       tree addr = MEM_EXPR (operands[0]);

>>       

>> +    /* Force FDPIC register (r9) before call.  */

>> +    if (TARGET_FDPIC)

>> +      {

>> +	/* No need to update r9 if calling a static function.

>> +	   In other words: set r9 for indirect or non-local calls.  */

>> +	callee = XEXP (operands[0], 0);

>> +	if (!SYMBOL_REF_P (callee)

>> +	    || !SYMBOL_REF_LOCAL_P (callee)

>> +	    || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))

> 

> IMO it would be better to calculate this once rather than repeat

> it below.

> 


I've re-checked and changed this...

>> +	  {

>> +	    emit_insn (gen_blockage ());

> 

> Why's the blockage needed?  Seems worth a comment.

> 

>> +	    rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

>> +	    emit_move_insn (pic_reg, get_hard_reg_initial_val (Pmode, FDPIC_REGNUM));

>> +	    emit_insn (gen_rtx_USE (VOIDmode, pic_reg));

> 

> Is this use keeping the register live for the call?  If so,

> I think it'd be better to attach it to the CALL_INSN_FUNCTION_USAGE

> instead.

> 

So... I've now removed the blockage (unneeded), and no longer force r9 before calls.
It's only restored after calls, even after calling static/same-module functions because they can clobber r9.

>> +	 }

>> +      }

>> +

>>       /* In an untyped call, we can get NULL for operand 2.  */

>>       if (operands[2] == NULL_RTX)

>>         operands[2] = const0_rtx;

>> @@ -8140,6 +8157,13 @@

>>   	: !REG_P (callee))

>>         XEXP (operands[0], 0) = force_reg (Pmode, callee);

>>   

>> +    if (TARGET_FDPIC && !SYMBOL_REF_P (XEXP (operands[0], 0)))

>> +      {

>> +	/* Indirect call: set r9 with FDPIC value of callee.  */

>> +	XEXP (operands[0], 0)

>> +	  = arm_load_function_descriptor (XEXP (operands[0], 0));

>> +      }

>> +

>>       if (detect_cmse_nonsecure_call (addr))

>>         {

>>   	pat = gen_nonsecure_call_internal (operands[0], operands[1],

> 

> Redundant braces.

> 

OK

>> @@ -8151,10 +8175,38 @@

>>   	pat = gen_call_internal (operands[0], operands[1], operands[2]);

>>   	arm_emit_call_insn (pat, XEXP (operands[0], 0), false);

>>         }

>> +

>> +    /* Restore FDPIC register (r9) after call.  */

>> +    if (TARGET_FDPIC)

>> +      {

>> +	/* No need to update r9 if calling a static function.  */

>> +	if (!SYMBOL_REF_P (callee)

>> +	    || !SYMBOL_REF_LOCAL_P (callee)

>> +	    || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))

>> +	  {

>> +	    rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

>> +	    emit_move_insn (pic_reg, get_hard_reg_initial_val (Pmode, FDPIC_REGNUM));

>> +	    emit_insn (gen_rtx_USE (VOIDmode, pic_reg));

>> +	    emit_insn (gen_blockage ());

>> +	  }

>> +      }

>>       DONE;

>>     }"

>>   )

> 

> What's the general assumption about the validity of r9?  Seems odd that

> we need to load this value both before and after the call.

> 

Right, now re-loaded only after the call.

>>   

>> +(define_insn "*restore_pic_register_after_call"

>> +  [(parallel [(unspec [(match_operand:SI 0 "s_register_operand" "=r,r")

>> +		       (match_operand:SI 1 "nonimmediate_operand" "r,m")]

>> +	       UNSPEC_PIC_RESTORE)

>> +	      (use (match_dup 1))

>> +	      (clobber (match_dup 0))])

>> +  ]

>> +  ""

>> +  "@

>> +  mov\t%0, %1

>> +  ldr\t%0, %1"

>> +)

>> +

>>   (define_expand "call_internal"

>>     [(parallel [(call (match_operand 0 "memory_operand" "")

>>   	            (match_operand 1 "general_operand" ""))

> 

> Since operand 0 is significant after the instruction, I think this

> should be:

> 

> (define_insn "*restore_pic_register_after_call"

>    [(set (match_operand:SI 0 "s_register_operand" "+r,r")

> 	(unspec:SI [(match_dup 0)

> 		    (match_operand:SI 1 "nonimmediate_operand" "r,m")]

> 		   UNSPEC_PIC_RESTORE))]

>    ...

> 

> The (use (match_dup 1)) looks redundant, since the unspec itself

> uses operand 1.

> 

When I try that, I have cases where the restore instruction is discarded, when the call happens just before function return. Since r9 is caller-saved, it should be restored but after dse2 the dumps say:
(insn (set (reg:SI 9 r9)
    (unspec:SI [
        (reg:SI 9 r9)
        (reg:SI 4 r4 [121])
      ] UNSPEC_PIC_RESTORE))
(expr_list:REG_UNUSED (reg:SI 9 r9) (nil))))

and this is later removed by cprop_hardreg (which says the exit block uses r4, sp, and lr: should I make it use r9?)

>> @@ -8215,6 +8267,30 @@

>>       rtx pat, callee;

>>       tree addr = MEM_EXPR (operands[1]);

>>       

>> +    /* Force FDPIC register (r9) before call.  */

>> +    if (TARGET_FDPIC)

>> +      {

>> +	/* No need to update the FDPIC register (r9) if calling a static function.

>> +	   In other words: set r9 for indirect or non-local calls.  */

>> +	callee = XEXP (operands[1], 0);

>> +	if (!SYMBOL_REF_P (callee)

>> +	    || !SYMBOL_REF_LOCAL_P (callee)

>> +	    || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))

>> +	  {

>> +	    rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));

>> +	    rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

>> +	    rtx initial_fdpic_reg =

>> +		get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);

>> +

>> +	    XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode,

>> +		gen_rtvec (2, fdpic_reg, initial_fdpic_reg),

>> +		UNSPEC_PIC_RESTORE);

>> +	    XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, initial_fdpic_reg);

>> +	    XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, fdpic_reg);

>> +	    emit_insn (par);

>> +	  }

>> +      }

>> +

> 

> It's not obvious why this code is different from the call-without-value

> case above, which doesn't use UNSPEC_PIC_RESTORE.  I think it should be

> split out into a helper function that's used for both call and call_value.

> 

Now reworked/removed.

> I think it would also be good to have more comments about what

> conditions the UNSPEC_PIC_RESTORE pattern is enforcing.

> 

I think the use-case is now clearer?

> Thanks,

> Richard

> .

>
From 25134e559c8e014774d24db6cac6ca6b3b56a4b0 Mon Sep 17 00:00:00 2001
From: Christophe Lyon <christophe.lyon@linaro.org>

Date: Thu, 8 Feb 2018 11:10:51 +0100
Subject: [ARM/FDPIC v6 04/24] [ARM] FDPIC: Add support for FDPIC for arm
 architecture
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The FDPIC register is hard-coded to r9, as defined in the ABI.

We have to disable tailcall optimizations if we don't know if the
target function is in the same module. If not, we have to set r9 to
the value associated with the target module.

When generating a symbol address, we have to take into account whether
it is a pointer to data or to a function, because different
relocations are needed.

2019-XX-XX  Christophe Lyon  <christophe.lyon@st.com>
	Mickaël Guêné <mickael.guene@st.com>

	gcc/
	* config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro
	in FDPIC mode.
	* config/arm/arm-protos.h (arm_load_function_descriptor): Declare
	new function.
	* config/arm/arm.c (arm_option_override): Define pic register to
	FDPIC_REGNUM.
	(arm_function_ok_for_sibcall): Disable sibcall optimization if we
	have no decl or go through PLT.
	(calculate_pic_address_constant): New function.
	(legitimize_pic_address): Call calculate_pic_address_constant.
	(arm_load_pic_register): Handle TARGET_FDPIC.
	(arm_is_segment_info_known): New function.
	(arm_pic_static_addr): Add support for FDPIC.
	(arm_load_function_descriptor): New function.
	(arm_emit_call_insn): Add support for FDPIC.
	(arm_assemble_integer): Add support for FDPIC.
	* config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED):
	Define. (FDPIC_REGNUM): New define.
	* config/arm/arm.md (call): Add support for FDPIC.
	(call_value): Likewise.
	(restore_pic_register_after_call): New pattern.
	(untyped_call): Disable if FDPIC.
	(untyped_return): Likewise.
	* config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New.

	gcc/testsuite/
	* gcc.target/arm/fp16-aapcs-2.c: Adjust scan-assembler-times.
	* gcc.target/arm/fp16-aapcs-4.c: Likewise.

Change-Id: I1e96d260074ab7b75d36cdff5d34ad898f35c66f

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 6e256ee..34695fa 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -203,6 +203,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
       builtin_define ("__ARM_EABI__");
     }
 
+  def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC);
+
   def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV);
   def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 485bc68..272968a 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -139,6 +139,7 @@ extern int arm_max_const_double_inline_cost (void);
 extern int arm_const_double_inline_cost (rtx);
 extern bool arm_const_double_by_parts (rtx);
 extern bool arm_const_double_by_immediates (rtx);
+extern rtx arm_load_function_descriptor (rtx funcdesc);
 extern void arm_emit_call_insn (rtx, rtx, bool);
 bool detect_cmse_nonsecure_call (tree);
 extern const char *output_call (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 45abcd8..ea6ea37 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3485,6 +3485,14 @@ arm_option_override (void)
   if (flag_pic && TARGET_VXWORKS_RTP)
     arm_pic_register = 9;
 
+  /* If in FDPIC mode then force arm_pic_register to be r9.  */
+  if (TARGET_FDPIC)
+    {
+      arm_pic_register = FDPIC_REGNUM;
+      if (TARGET_THUMB1)
+	sorry ("FDPIC mode is not supported in Thumb-1 mode.");
+    }
+
   if (arm_pic_register_string != NULL)
     {
       int pic_register = decode_reg_name (arm_pic_register_string);
@@ -7295,6 +7303,21 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
   if (cfun->machine->sibcall_blocked)
     return false;
 
+  if (TARGET_FDPIC)
+    {
+      /* In FDPIC, never tailcall something for which we have no decl:
+	 the target function could be in a different module, requiring
+	 a different FDPIC register value.  */
+      if (decl == NULL)
+	return false;
+
+      /* Don't tailcall if we go through the PLT since the FDPIC
+	 register is then corrupted and we don't restore it after
+	 static function calls.  */
+      if (!targetm.binds_local_p (decl))
+	return false;
+    }
+
   /* Never tailcall something if we are generating code for Thumb-1.  */
   if (TARGET_THUMB1)
     return false;
@@ -7501,6 +7524,24 @@ require_pic_register (rtx pic_reg, bool compute_now)
     }
 }
 
+/* Generate insns to calculate the address of ORIG in pic mode.  */
+static rtx_insn *
+calculate_pic_address_constant (rtx reg, rtx pic_reg, rtx orig)
+{
+  rtx pat;
+  rtx mem;
+
+  pat = gen_calculate_pic_address (reg, pic_reg, orig);
+
+  /* Make the MEM as close to a constant as possible.  */
+  mem = SET_SRC (pat);
+  gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));
+  MEM_READONLY_P (mem) = 1;
+  MEM_NOTRAP_P (mem) = 1;
+
+  return emit_insn (pat);
+}
+
 /* Legitimize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
    created to hold the result of the load.  If not NULL, PIC_REG indicates
    which register to use as PIC register, otherwise it is decided by register
@@ -7545,24 +7586,13 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
 	insn = arm_pic_static_addr (orig, reg);
       else
 	{
-	  rtx pat;
-	  rtx mem;
-
 	  /* If this function doesn't have a pic register, create one now.  */
 	  require_pic_register (pic_reg, compute_now);
 
 	  if (pic_reg == NULL_RTX)
 	    pic_reg = cfun->machine->pic_reg;
 
-	  pat = gen_calculate_pic_address (reg, pic_reg, orig);
-
-	  /* Make the MEM as close to a constant as possible.  */
-	  mem = SET_SRC (pat);
-	  gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));
-	  MEM_READONLY_P (mem) = 1;
-	  MEM_NOTRAP_P (mem) = 1;
-
-	  insn = emit_insn (pat);
+	  insn = calculate_pic_address_constant (reg, pic_reg, orig);
 	}
 
       /* Put a REG_EQUAL note on this insn, so that it can be optimized
@@ -7711,7 +7741,9 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
 {
   rtx l1, labelno, pic_tmp, pic_rtx;
 
-  if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
+  if (crtl->uses_pic_offset_table == 0
+      || TARGET_SINGLE_PIC_BASE
+      || TARGET_FDPIC)
     return;
 
   gcc_assert (flag_pic);
@@ -7780,28 +7812,132 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
   emit_use (pic_reg);
 }
 
+/* Try to determine whether an object, referenced via ORIG, will be
+   placed in the text or data segment.  This is used in FDPIC mode, to
+   decide which relocations to use when accessing ORIG.  *IS_READONLY
+   is set to true if ORIG is a read-only location, false otherwise.
+   Return true if we could determine the location of ORIG, false
+   otherwise.  *IS_READONLY is valid only when we return true.  */
+static bool
+arm_is_segment_info_known (rtx orig, bool *is_readonly)
+{
+  *is_readonly = false;
+
+  if (GET_CODE (orig) == LABEL_REF)
+    {
+      *is_readonly = true;
+      return true;
+    }
+
+  if (SYMBOL_REF_P (orig))
+    {
+      if (CONSTANT_POOL_ADDRESS_P (orig))
+	{
+	  *is_readonly = true;
+	  return true;
+	}
+      else if (SYMBOL_REF_LOCAL_P (orig)
+	       && !SYMBOL_REF_EXTERNAL_P (orig)
+	       && SYMBOL_REF_DECL (orig)
+	       && (!DECL_P (SYMBOL_REF_DECL (orig))
+		   || !DECL_COMMON (SYMBOL_REF_DECL (orig))))
+	{
+	  tree decl = SYMBOL_REF_DECL (orig);
+	  tree init = (TREE_CODE (decl) == VAR_DECL)
+	    ? DECL_INITIAL (decl) : (TREE_CODE (decl) == CONSTRUCTOR)
+	    ? decl : 0;
+	  int reloc = 0;
+	  bool named_section, readonly;
+
+	  if (init && init != error_mark_node)
+	    reloc = compute_reloc_for_constant (init);
+
+	  named_section = TREE_CODE (decl) == VAR_DECL
+	    && lookup_attribute ("section", DECL_ATTRIBUTES (decl));
+	  readonly = decl_readonly_section (decl, reloc);
+
+	  /* We don't know where the link script will put a named
+	     section, so return false in such a case.  */
+	  if (named_section)
+	    return false;
+
+	  *is_readonly = readonly;
+	  return true;
+	}
+      else
+	{
+	  /* We don't know.  */
+	  return false;
+	}
+    }
+  else
+    gcc_unreachable ();
+
+  return false;
+}
+
 /* Generate code to load the address of a static var when flag_pic is set.  */
 static rtx_insn *
 arm_pic_static_addr (rtx orig, rtx reg)
 {
   rtx l1, labelno, offset_rtx;
+  rtx_insn *insn;
 
   gcc_assert (flag_pic);
 
-  /* We use an UNSPEC rather than a LABEL_REF because this label
-     never appears in the code stream.  */
-  labelno = GEN_INT (pic_labelno++);
-  l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
-  l1 = gen_rtx_CONST (VOIDmode, l1);
+  bool is_readonly = false;
+  bool info_known = false;
+
+  if (TARGET_FDPIC
+      && SYMBOL_REF_P (orig)
+      && !SYMBOL_REF_FUNCTION_P (orig))
+    info_known = arm_is_segment_info_known (orig, &is_readonly);
+
+  if (TARGET_FDPIC
+      && SYMBOL_REF_P (orig)
+      && !SYMBOL_REF_FUNCTION_P (orig)
+      && !info_known)
+    {
+      /* We don't know where orig is stored, so we have be
+	 pessimistic and use a GOT relocation.  */
+      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+
+      insn = calculate_pic_address_constant (reg, pic_reg, orig);
+    }
+  else if (TARGET_FDPIC
+	   && SYMBOL_REF_P (orig)
+	   && (SYMBOL_REF_FUNCTION_P (orig)
+	       || (info_known && !is_readonly)))
+    {
+      /* We use the GOTOFF relocation.  */
+      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+
+      rtx l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig), UNSPEC_PIC_SYM);
+      emit_insn (gen_movsi (reg, l1));
+      insn = emit_insn (gen_addsi3 (reg, reg, pic_reg));
+    }
+  else
+    {
+      /* Not FDPIC, not SYMBOL_REF_P or readonly: we can use
+	 PC-relative access.  */
+      /* We use an UNSPEC rather than a LABEL_REF because this label
+	 never appears in the code stream.  */
+      labelno = GEN_INT (pic_labelno++);
+      l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
+      l1 = gen_rtx_CONST (VOIDmode, l1);
+
+      /* On the ARM the PC register contains 'dot + 8' at the time of the
+	 addition, on the Thumb it is 'dot + 4'.  */
+      offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);
+      offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),
+				   UNSPEC_SYMBOL_OFFSET);
+      offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);
 
-  /* On the ARM the PC register contains 'dot + 8' at the time of the
-     addition, on the Thumb it is 'dot + 4'.  */
-  offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);
-  offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),
-                               UNSPEC_SYMBOL_OFFSET);
-  offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);
+      insn = emit_insn (gen_pic_load_addr_unified (reg, offset_rtx,
+						   labelno));
+    }
 
-  return emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, labelno));
+  return insn;
 }
 
 /* Return nonzero if X is valid as an ARM state addressing register.  */
@@ -8510,7 +8646,7 @@ load_tls_operand (rtx x, rtx reg)
 static rtx_insn *
 arm_call_tls_get_addr (rtx x, rtx reg, rtx *valuep, int reloc)
 {
-  rtx label, labelno, sum;
+  rtx label, labelno = NULL_RTX, sum;
 
   gcc_assert (reloc != TLS_DESCSEQ);
   start_sequence ();
@@ -16112,9 +16248,32 @@ get_jump_table_size (rtx_jump_table_data *insn)
   return 0;
 }
 
+/* Emit insns to load the function address from FUNCDESC (an FDPIC
+   function descriptor) into a register and the GOT address into the
+   FDPIC register, returning an rtx for the register holding the
+   function address.  */
+
+rtx
+arm_load_function_descriptor (rtx funcdesc)
+{
+  rtx fnaddr_reg = gen_reg_rtx (Pmode);
+  rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+  rtx fnaddr = gen_rtx_MEM (Pmode, funcdesc);
+  rtx gotaddr = gen_rtx_MEM (Pmode, plus_constant (Pmode, funcdesc, 4));
+
+  emit_move_insn (fnaddr_reg, fnaddr);
+
+  /* The ABI requires the entry point address to be loaded first, but
+     since we cannot support lazy binding for lack of atomic load of
+     two 32-bits values, we do not need to bother to prevent the
+     previous load from being moved after that of the GOT address.  */
+  emit_insn (gen_restore_pic_register_after_call (pic_reg, gotaddr));
+
+  return fnaddr_reg;
+}
+
 /* Return the maximum amount of padding that will be inserted before
    label LABEL.  */
-
 static HOST_WIDE_INT
 get_label_padding (rtx label)
 {
@@ -18249,6 +18408,12 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
+  if (TARGET_FDPIC)
+    {
+      rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+      use_reg (&CALL_INSN_FUNCTION_USAGE (insn), fdpic_reg);
+    }
+
   if (TARGET_AAPCS_BASED)
     {
       /* For AAPCS, IP and CC can be clobbered by veneers inserted by the
@@ -23069,9 +23234,37 @@ arm_assemble_integer (rtx x, unsigned int size, int aligned_p)
 		  && (!SYMBOL_REF_LOCAL_P (x)
 		      || (SYMBOL_REF_DECL (x)
 			  ? DECL_WEAK (SYMBOL_REF_DECL (x)) : 0))))
-	    fputs ("(GOT)", asm_out_file);
+	    {
+	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))
+		fputs ("(GOTFUNCDESC)", asm_out_file);
+	      else
+		fputs ("(GOT)", asm_out_file);
+	    }
 	  else
-	    fputs ("(GOTOFF)", asm_out_file);
+	    {
+	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))
+		fputs ("(GOTOFFFUNCDESC)", asm_out_file);
+	      else
+		{
+		  bool is_readonly;
+
+		  if (arm_is_segment_info_known (x, &is_readonly))
+		    fputs ("(GOTOFF)", asm_out_file);
+		  else
+		    fputs ("(GOT)", asm_out_file);
+		}
+	    }
+	}
+
+      /* For FDPIC we also have to mark symbol for .data section.  */
+      if (TARGET_FDPIC
+	  && NEED_GOT_RELOC
+	  && flag_pic
+	  && !making_const_table
+	  && SYMBOL_REF_P (x))
+	{
+	  if (SYMBOL_REF_FUNCTION_P (x))
+	    fputs ("(FUNCDESC)", asm_out_file);
 	}
       fputc ('\n', asm_out_file);
       return true;
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 4866e1e..7b50ef5 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -892,6 +892,9 @@ extern int arm_arch_cmse;
    Pascal), so the following is not true.  */
 #define STATIC_CHAIN_REGNUM	12
 
+/* r9 is the FDPIC register (base register for GOT and FUNCDESC accesses).  */
+#define FDPIC_REGNUM		9
+
 /* Define this to be where the real frame pointer is if it is not possible to
    work out the offset between the frame pointer and the automatic variables
    until after register allocation has taken place.  FRAME_POINTER_REGNUM
@@ -1948,6 +1951,10 @@ extern unsigned arm_pic_register;
    data addresses in memory.  */
 #define PIC_OFFSET_TABLE_REGNUM arm_pic_register
 
+/* For FDPIC, the FDPIC register is call-clobbered (otherwise PLT
+   entries would need to handle saving and restoring it).  */
+#define PIC_OFFSET_TABLE_REG_CALL_CLOBBERED TARGET_FDPIC
+
 /* We can't directly access anything that contains a symbol,
    nor can we indirect via the constant pool.  One exception is
    UNSPEC_TLS, which is always PIC.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0aecd03..328d32d 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8140,6 +8140,11 @@
 	: !REG_P (callee))
       XEXP (operands[0], 0) = force_reg (Pmode, callee);
 
+    if (TARGET_FDPIC && !SYMBOL_REF_P (XEXP (operands[0], 0)))
+	/* Indirect call: set r9 with FDPIC value of callee.  */
+	XEXP (operands[0], 0)
+	  = arm_load_function_descriptor (XEXP (operands[0], 0));
+
     if (detect_cmse_nonsecure_call (addr))
       {
 	pat = gen_nonsecure_call_internal (operands[0], operands[1],
@@ -8151,10 +8156,35 @@
 	pat = gen_call_internal (operands[0], operands[1], operands[2]);
 	arm_emit_call_insn (pat, XEXP (operands[0], 0), false);
       }
+
+    /* Restore FDPIC register (r9) after call.  */
+    if (TARGET_FDPIC)
+      {
+	rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+	rtx initial_fdpic_reg =
+	    get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);
+
+	emit_insn (gen_restore_pic_register_after_call (fdpic_reg,
+							initial_fdpic_reg));
+      }
+
     DONE;
   }"
 )
 
+(define_insn "restore_pic_register_after_call"
+  [(parallel [(unspec [(match_operand:SI 0 "s_register_operand" "=r,r")
+		       (match_operand:SI 1 "nonimmediate_operand" "r,m")]
+	       UNSPEC_PIC_RESTORE)
+	      (use (match_dup 1))
+	      (clobber (match_dup 0))])
+  ]
+  ""
+  "@
+  mov\t%0, %1
+  ldr\t%0, %1"
+)
+
 (define_expand "call_internal"
   [(parallel [(call (match_operand 0 "memory_operand" "")
 	            (match_operand 1 "general_operand" ""))
@@ -8228,6 +8258,11 @@
 	: !REG_P (callee))
       XEXP (operands[1], 0) = force_reg (Pmode, callee);
 
+    if (TARGET_FDPIC && !SYMBOL_REF_P (XEXP (operands[1], 0)))
+	/* Indirect call: set r9 with FDPIC value of callee.  */
+	XEXP (operands[1], 0)
+	  = arm_load_function_descriptor (XEXP (operands[1], 0));
+
     if (detect_cmse_nonsecure_call (addr))
       {
 	pat = gen_nonsecure_call_value_internal (operands[0], operands[1],
@@ -8240,6 +8275,18 @@
 				       operands[2], operands[3]);
 	arm_emit_call_insn (pat, XEXP (operands[1], 0), false);
       }
+
+    /* Restore FDPIC register (r9) after call.  */
+    if (TARGET_FDPIC)
+      {
+	rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+	rtx initial_fdpic_reg =
+	    get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);
+
+	emit_insn (gen_restore_pic_register_after_call (fdpic_reg,
+							initial_fdpic_reg));
+      }
+
     DONE;
   }"
 )
@@ -8582,7 +8629,7 @@
 		    (const_int 0))
 	      (match_operand 1 "" "")
 	      (match_operand 2 "" "")])]
-  "TARGET_EITHER"
+  "TARGET_EITHER && !TARGET_FDPIC"
   "
   {
     int i;
@@ -8649,7 +8696,7 @@
 (define_expand "untyped_return"
   [(match_operand:BLK 0 "memory_operand" "")
    (match_operand 1 "" "")]
-  "TARGET_EITHER"
+  "TARGET_EITHER && !TARGET_FDPIC"
   "
   {
     int i;
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 174bcc5..bda35d5 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -89,6 +89,7 @@
   UNSPEC_SP_SET		; Represent the setting of stack protector's canary
   UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
 			; against the guard.
+  UNSPEC_PIC_RESTORE	; Use to restore fdpic register
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c b/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c
index 4753e36..51a76fc 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c
@@ -17,5 +17,5 @@ F (__fp16 a, __fp16 b, __fp16 c)
 }
 
 /* { dg-final { scan-assembler-times {mov\tr[0-9]+, r[0-2]} 3 } }  */
-/* { dg-final { scan-assembler-times {mov\tr1, r0} 1 } }  */
+/* { dg-final { scan-assembler-times {mov\tr1, r[03]} 1 } }  */
 /* { dg-final { scan-assembler-times {mov\tr0, r[0-9]+} 2 } }  */
diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c b/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
index 41c7ab7..ae65fb8 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
@@ -16,5 +16,5 @@ F (__fp16 a, __fp16 b, __fp16 c)
 }
 
 /* { dg-final { scan-assembler-times {mov\tr[0-9]+, r[0-2]} 3 } }  */
-/* { dg-final { scan-assembler-times {mov\tr1, r0} 1 } }  */
+/* { dg-final { scan-assembler-times {mov\tr1, r[03]} 1 } }  */
 /* { dg-final { scan-assembler-times {mov\tr0, r[0-9]+} 2 } }  */
-- 
2.6.3
Richard Sandiford Sept. 2, 2019, 4:12 p.m. | #4
Sorry for the slow reply.

Christophe Lyon <christophe.lyon@st.com> writes:
> On 16/07/2019 13:58, Richard Sandiford wrote:

>> Christophe Lyon <christophe.lyon@st.com> writes:

>>> +(define_insn "*restore_pic_register_after_call"

>>> +  [(parallel [(unspec [(match_operand:SI 0 "s_register_operand" "=r,r")

>>> +		       (match_operand:SI 1 "nonimmediate_operand" "r,m")]

>>> +	       UNSPEC_PIC_RESTORE)

>>> +	      (use (match_dup 1))

>>> +	      (clobber (match_dup 0))])

>>> +  ]

>>> +  ""

>>> +  "@

>>> +  mov\t%0, %1

>>> +  ldr\t%0, %1"

>>> +)

>>> +

>>>   (define_expand "call_internal"

>>>     [(parallel [(call (match_operand 0 "memory_operand" "")

>>>   	            (match_operand 1 "general_operand" ""))

>> 

>> Since operand 0 is significant after the instruction, I think this

>> should be:

>> 

>> (define_insn "*restore_pic_register_after_call"

>>    [(set (match_operand:SI 0 "s_register_operand" "+r,r")

>> 	(unspec:SI [(match_dup 0)

>> 		    (match_operand:SI 1 "nonimmediate_operand" "r,m")]

>> 		   UNSPEC_PIC_RESTORE))]

>>    ...

>> 

>> The (use (match_dup 1)) looks redundant, since the unspec itself

>> uses operand 1.

>> 

> When I try that, I have cases where the restore instruction is discarded, when the call happens just before function return. Since r9 is caller-saved, it should be restored but after dse2 the dumps say:

> (insn (set (reg:SI 9 r9)

>     (unspec:SI [

>         (reg:SI 9 r9)

>         (reg:SI 4 r4 [121])

>       ] UNSPEC_PIC_RESTORE))

> (expr_list:REG_UNUSED (reg:SI 9 r9) (nil))))

>

> and this is later removed by cprop_hardreg (which says the exit block uses r4, sp, and lr: should I make it use r9?)


But if it's caller-saved (i.e. call-clobbered), function A shouldn't
need to restore r9 after a call unless A needs the value of r9 for
something.  I.e. A shouldn't need to restore r9 for A's own caller,
because the caller should be doing that iself.

So if r9 is caller-saved and not referenced between the call and
function exit, deleting the restore sounds like the right thing to do.

Richard
Christophe Lyon Sept. 2, 2019, 8:04 p.m. | #5
On Mon, 2 Sep 2019 at 18:12, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>

> Sorry for the slow reply.

>

> Christophe Lyon <christophe.lyon@st.com> writes:

> > On 16/07/2019 13:58, Richard Sandiford wrote:

> >> Christophe Lyon <christophe.lyon@st.com> writes:

> >>> +(define_insn "*restore_pic_register_after_call"

> >>> +  [(parallel [(unspec [(match_operand:SI 0 "s_register_operand" "=r,r")

> >>> +                  (match_operand:SI 1 "nonimmediate_operand" "r,m")]

> >>> +          UNSPEC_PIC_RESTORE)

> >>> +         (use (match_dup 1))

> >>> +         (clobber (match_dup 0))])

> >>> +  ]

> >>> +  ""

> >>> +  "@

> >>> +  mov\t%0, %1

> >>> +  ldr\t%0, %1"

> >>> +)

> >>> +

> >>>   (define_expand "call_internal"

> >>>     [(parallel [(call (match_operand 0 "memory_operand" "")

> >>>                 (match_operand 1 "general_operand" ""))

> >>

> >> Since operand 0 is significant after the instruction, I think this

> >> should be:

> >>

> >> (define_insn "*restore_pic_register_after_call"

> >>    [(set (match_operand:SI 0 "s_register_operand" "+r,r")

> >>      (unspec:SI [(match_dup 0)

> >>                  (match_operand:SI 1 "nonimmediate_operand" "r,m")]

> >>                 UNSPEC_PIC_RESTORE))]

> >>    ...

> >>

> >> The (use (match_dup 1)) looks redundant, since the unspec itself

> >> uses operand 1.

> >>

> > When I try that, I have cases where the restore instruction is discarded, when the call happens just before function return. Since r9 is caller-saved, it should be restored but after dse2 the dumps say:

> > (insn (set (reg:SI 9 r9)

> >     (unspec:SI [

> >         (reg:SI 9 r9)

> >         (reg:SI 4 r4 [121])

> >       ] UNSPEC_PIC_RESTORE))

> > (expr_list:REG_UNUSED (reg:SI 9 r9) (nil))))

> >

> > and this is later removed by cprop_hardreg (which says the exit block uses r4, sp, and lr: should I make it use r9?)

>

> But if it's caller-saved (i.e. call-clobbered), function A shouldn't

> need to restore r9 after a call unless A needs the value of r9 for

> something.  I.e. A shouldn't need to restore r9 for A's own caller,

> because the caller should be doing that iself.

>

> So if r9 is caller-saved and not referenced between the call and

> function exit, deleting the restore sounds like the right thing to do.

>


Of course! I should have found that myself: I tried this change before I removed
an "optimization" we had that avoided restoring  r9 before calling
functions in the same module... thus breaking the ABI.

Since the previous patch I send didn't have this "optimization", the
above now works.
New patch attached.

Thanks!

Christophe


> Richard
commit 7b606c39834c5fefbde30a3131f9b123d02e7491
Author: Christophe Lyon <christophe.lyon@linaro.org>
Date:   Thu Feb 8 11:10:51 2018 +0100

    [ARM] FDPIC: Add support for FDPIC for arm architecture
    
    The FDPIC register is hard-coded to r9, as defined in the ABI.
    
    We have to disable tailcall optimizations if we don't know if the
    target function is in the same module. If not, we have to set r9 to
    the value associated with the target module.
    
    When generating a symbol address, we have to take into account whether
    it is a pointer to data or to a function, because different
    relocations are needed.
    
    2019-XX-XX  Christophe Lyon  <christophe.lyon@st.com>
    	Mickaël Guêné <mickael.guene@st.com>
    
    	gcc/
    	* config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro
    	in FDPIC mode.
    	* config/arm/arm-protos.h (arm_load_function_descriptor): Declare
    	new function.
    	* config/arm/arm.c (arm_option_override): Define pic register to
    	FDPIC_REGNUM.
    	(arm_function_ok_for_sibcall): Disable sibcall optimization if we
    	have no decl or go through PLT.
    	(calculate_pic_address_constant): New function.
    	(legitimize_pic_address): Call calculate_pic_address_constant.
    	(arm_load_pic_register): Handle TARGET_FDPIC.
    	(arm_is_segment_info_known): New function.
    	(arm_pic_static_addr): Add support for FDPIC.
    	(arm_load_function_descriptor): New function.
    	(arm_emit_call_insn): Add support for FDPIC.
    	(arm_assemble_integer): Add support for FDPIC.
    	* config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED):
    	Define. (FDPIC_REGNUM): New define.
    	* config/arm/arm.md (call): Add support for FDPIC.
    	(call_value): Likewise.
    	(restore_pic_register_after_call): New pattern.
    	(untyped_call): Disable if FDPIC.
    	(untyped_return): Likewise.
    	* config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New.
    
    	gcc/testsuite/
    	* gcc.target/arm/fp16-aapcs-2.c: Adjust scan-assembler-times.
    	* gcc.target/arm/fp16-aapcs-4.c: Likewise.
    
    Change-Id: I1e96d260074ab7b75d36cdff5d34ad898f35c66f

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 6e256ee..34695fa 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -203,6 +203,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
       builtin_define ("__ARM_EABI__");
     }
 
+  def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC);
+
   def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV);
   def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 485bc68..272968a 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -139,6 +139,7 @@ extern int arm_max_const_double_inline_cost (void);
 extern int arm_const_double_inline_cost (rtx);
 extern bool arm_const_double_by_parts (rtx);
 extern bool arm_const_double_by_immediates (rtx);
+extern rtx arm_load_function_descriptor (rtx funcdesc);
 extern void arm_emit_call_insn (rtx, rtx, bool);
 bool detect_cmse_nonsecure_call (tree);
 extern const char *output_call (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 45abcd8..ea6ea37 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3485,6 +3485,14 @@ arm_option_override (void)
   if (flag_pic && TARGET_VXWORKS_RTP)
     arm_pic_register = 9;
 
+  /* If in FDPIC mode then force arm_pic_register to be r9.  */
+  if (TARGET_FDPIC)
+    {
+      arm_pic_register = FDPIC_REGNUM;
+      if (TARGET_THUMB1)
+	sorry ("FDPIC mode is not supported in Thumb-1 mode.");
+    }
+
   if (arm_pic_register_string != NULL)
     {
       int pic_register = decode_reg_name (arm_pic_register_string);
@@ -7295,6 +7303,21 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
   if (cfun->machine->sibcall_blocked)
     return false;
 
+  if (TARGET_FDPIC)
+    {
+      /* In FDPIC, never tailcall something for which we have no decl:
+	 the target function could be in a different module, requiring
+	 a different FDPIC register value.  */
+      if (decl == NULL)
+	return false;
+
+      /* Don't tailcall if we go through the PLT since the FDPIC
+	 register is then corrupted and we don't restore it after
+	 static function calls.  */
+      if (!targetm.binds_local_p (decl))
+	return false;
+    }
+
   /* Never tailcall something if we are generating code for Thumb-1.  */
   if (TARGET_THUMB1)
     return false;
@@ -7501,6 +7524,24 @@ require_pic_register (rtx pic_reg, bool compute_now)
     }
 }
 
+/* Generate insns to calculate the address of ORIG in pic mode.  */
+static rtx_insn *
+calculate_pic_address_constant (rtx reg, rtx pic_reg, rtx orig)
+{
+  rtx pat;
+  rtx mem;
+
+  pat = gen_calculate_pic_address (reg, pic_reg, orig);
+
+  /* Make the MEM as close to a constant as possible.  */
+  mem = SET_SRC (pat);
+  gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));
+  MEM_READONLY_P (mem) = 1;
+  MEM_NOTRAP_P (mem) = 1;
+
+  return emit_insn (pat);
+}
+
 /* Legitimize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
    created to hold the result of the load.  If not NULL, PIC_REG indicates
    which register to use as PIC register, otherwise it is decided by register
@@ -7545,24 +7586,13 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
 	insn = arm_pic_static_addr (orig, reg);
       else
 	{
-	  rtx pat;
-	  rtx mem;
-
 	  /* If this function doesn't have a pic register, create one now.  */
 	  require_pic_register (pic_reg, compute_now);
 
 	  if (pic_reg == NULL_RTX)
 	    pic_reg = cfun->machine->pic_reg;
 
-	  pat = gen_calculate_pic_address (reg, pic_reg, orig);
-
-	  /* Make the MEM as close to a constant as possible.  */
-	  mem = SET_SRC (pat);
-	  gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));
-	  MEM_READONLY_P (mem) = 1;
-	  MEM_NOTRAP_P (mem) = 1;
-
-	  insn = emit_insn (pat);
+	  insn = calculate_pic_address_constant (reg, pic_reg, orig);
 	}
 
       /* Put a REG_EQUAL note on this insn, so that it can be optimized
@@ -7711,7 +7741,9 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
 {
   rtx l1, labelno, pic_tmp, pic_rtx;
 
-  if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
+  if (crtl->uses_pic_offset_table == 0
+      || TARGET_SINGLE_PIC_BASE
+      || TARGET_FDPIC)
     return;
 
   gcc_assert (flag_pic);
@@ -7780,28 +7812,132 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
   emit_use (pic_reg);
 }
 
+/* Try to determine whether an object, referenced via ORIG, will be
+   placed in the text or data segment.  This is used in FDPIC mode, to
+   decide which relocations to use when accessing ORIG.  *IS_READONLY
+   is set to true if ORIG is a read-only location, false otherwise.
+   Return true if we could determine the location of ORIG, false
+   otherwise.  *IS_READONLY is valid only when we return true.  */
+static bool
+arm_is_segment_info_known (rtx orig, bool *is_readonly)
+{
+  *is_readonly = false;
+
+  if (GET_CODE (orig) == LABEL_REF)
+    {
+      *is_readonly = true;
+      return true;
+    }
+
+  if (SYMBOL_REF_P (orig))
+    {
+      if (CONSTANT_POOL_ADDRESS_P (orig))
+	{
+	  *is_readonly = true;
+	  return true;
+	}
+      else if (SYMBOL_REF_LOCAL_P (orig)
+	       && !SYMBOL_REF_EXTERNAL_P (orig)
+	       && SYMBOL_REF_DECL (orig)
+	       && (!DECL_P (SYMBOL_REF_DECL (orig))
+		   || !DECL_COMMON (SYMBOL_REF_DECL (orig))))
+	{
+	  tree decl = SYMBOL_REF_DECL (orig);
+	  tree init = (TREE_CODE (decl) == VAR_DECL)
+	    ? DECL_INITIAL (decl) : (TREE_CODE (decl) == CONSTRUCTOR)
+	    ? decl : 0;
+	  int reloc = 0;
+	  bool named_section, readonly;
+
+	  if (init && init != error_mark_node)
+	    reloc = compute_reloc_for_constant (init);
+
+	  named_section = TREE_CODE (decl) == VAR_DECL
+	    && lookup_attribute ("section", DECL_ATTRIBUTES (decl));
+	  readonly = decl_readonly_section (decl, reloc);
+
+	  /* We don't know where the link script will put a named
+	     section, so return false in such a case.  */
+	  if (named_section)
+	    return false;
+
+	  *is_readonly = readonly;
+	  return true;
+	}
+      else
+	{
+	  /* We don't know.  */
+	  return false;
+	}
+    }
+  else
+    gcc_unreachable ();
+
+  return false;
+}
+
 /* Generate code to load the address of a static var when flag_pic is set.  */
 static rtx_insn *
 arm_pic_static_addr (rtx orig, rtx reg)
 {
   rtx l1, labelno, offset_rtx;
+  rtx_insn *insn;
 
   gcc_assert (flag_pic);
 
-  /* We use an UNSPEC rather than a LABEL_REF because this label
-     never appears in the code stream.  */
-  labelno = GEN_INT (pic_labelno++);
-  l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
-  l1 = gen_rtx_CONST (VOIDmode, l1);
+  bool is_readonly = false;
+  bool info_known = false;
+
+  if (TARGET_FDPIC
+      && SYMBOL_REF_P (orig)
+      && !SYMBOL_REF_FUNCTION_P (orig))
+    info_known = arm_is_segment_info_known (orig, &is_readonly);
+
+  if (TARGET_FDPIC
+      && SYMBOL_REF_P (orig)
+      && !SYMBOL_REF_FUNCTION_P (orig)
+      && !info_known)
+    {
+      /* We don't know where orig is stored, so we have be
+	 pessimistic and use a GOT relocation.  */
+      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+
+      insn = calculate_pic_address_constant (reg, pic_reg, orig);
+    }
+  else if (TARGET_FDPIC
+	   && SYMBOL_REF_P (orig)
+	   && (SYMBOL_REF_FUNCTION_P (orig)
+	       || (info_known && !is_readonly)))
+    {
+      /* We use the GOTOFF relocation.  */
+      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+
+      rtx l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig), UNSPEC_PIC_SYM);
+      emit_insn (gen_movsi (reg, l1));
+      insn = emit_insn (gen_addsi3 (reg, reg, pic_reg));
+    }
+  else
+    {
+      /* Not FDPIC, not SYMBOL_REF_P or readonly: we can use
+	 PC-relative access.  */
+      /* We use an UNSPEC rather than a LABEL_REF because this label
+	 never appears in the code stream.  */
+      labelno = GEN_INT (pic_labelno++);
+      l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
+      l1 = gen_rtx_CONST (VOIDmode, l1);
+
+      /* On the ARM the PC register contains 'dot + 8' at the time of the
+	 addition, on the Thumb it is 'dot + 4'.  */
+      offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);
+      offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),
+				   UNSPEC_SYMBOL_OFFSET);
+      offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);
 
-  /* On the ARM the PC register contains 'dot + 8' at the time of the
-     addition, on the Thumb it is 'dot + 4'.  */
-  offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);
-  offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),
-                               UNSPEC_SYMBOL_OFFSET);
-  offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);
+      insn = emit_insn (gen_pic_load_addr_unified (reg, offset_rtx,
+						   labelno));
+    }
 
-  return emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, labelno));
+  return insn;
 }
 
 /* Return nonzero if X is valid as an ARM state addressing register.  */
@@ -8510,7 +8646,7 @@ load_tls_operand (rtx x, rtx reg)
 static rtx_insn *
 arm_call_tls_get_addr (rtx x, rtx reg, rtx *valuep, int reloc)
 {
-  rtx label, labelno, sum;
+  rtx label, labelno = NULL_RTX, sum;
 
   gcc_assert (reloc != TLS_DESCSEQ);
   start_sequence ();
@@ -16112,9 +16248,32 @@ get_jump_table_size (rtx_jump_table_data *insn)
   return 0;
 }
 
+/* Emit insns to load the function address from FUNCDESC (an FDPIC
+   function descriptor) into a register and the GOT address into the
+   FDPIC register, returning an rtx for the register holding the
+   function address.  */
+
+rtx
+arm_load_function_descriptor (rtx funcdesc)
+{
+  rtx fnaddr_reg = gen_reg_rtx (Pmode);
+  rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+  rtx fnaddr = gen_rtx_MEM (Pmode, funcdesc);
+  rtx gotaddr = gen_rtx_MEM (Pmode, plus_constant (Pmode, funcdesc, 4));
+
+  emit_move_insn (fnaddr_reg, fnaddr);
+
+  /* The ABI requires the entry point address to be loaded first, but
+     since we cannot support lazy binding for lack of atomic load of
+     two 32-bits values, we do not need to bother to prevent the
+     previous load from being moved after that of the GOT address.  */
+  emit_insn (gen_restore_pic_register_after_call (pic_reg, gotaddr));
+
+  return fnaddr_reg;
+}
+
 /* Return the maximum amount of padding that will be inserted before
    label LABEL.  */
-
 static HOST_WIDE_INT
 get_label_padding (rtx label)
 {
@@ -18249,6 +18408,12 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
+  if (TARGET_FDPIC)
+    {
+      rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+      use_reg (&CALL_INSN_FUNCTION_USAGE (insn), fdpic_reg);
+    }
+
   if (TARGET_AAPCS_BASED)
     {
       /* For AAPCS, IP and CC can be clobbered by veneers inserted by the
@@ -23069,9 +23234,37 @@ arm_assemble_integer (rtx x, unsigned int size, int aligned_p)
 		  && (!SYMBOL_REF_LOCAL_P (x)
 		      || (SYMBOL_REF_DECL (x)
 			  ? DECL_WEAK (SYMBOL_REF_DECL (x)) : 0))))
-	    fputs ("(GOT)", asm_out_file);
+	    {
+	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))
+		fputs ("(GOTFUNCDESC)", asm_out_file);
+	      else
+		fputs ("(GOT)", asm_out_file);
+	    }
 	  else
-	    fputs ("(GOTOFF)", asm_out_file);
+	    {
+	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))
+		fputs ("(GOTOFFFUNCDESC)", asm_out_file);
+	      else
+		{
+		  bool is_readonly;
+
+		  if (arm_is_segment_info_known (x, &is_readonly))
+		    fputs ("(GOTOFF)", asm_out_file);
+		  else
+		    fputs ("(GOT)", asm_out_file);
+		}
+	    }
+	}
+
+      /* For FDPIC we also have to mark symbol for .data section.  */
+      if (TARGET_FDPIC
+	  && NEED_GOT_RELOC
+	  && flag_pic
+	  && !making_const_table
+	  && SYMBOL_REF_P (x))
+	{
+	  if (SYMBOL_REF_FUNCTION_P (x))
+	    fputs ("(FUNCDESC)", asm_out_file);
 	}
       fputc ('\n', asm_out_file);
       return true;
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 4866e1e..7b50ef5 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -892,6 +892,9 @@ extern int arm_arch_cmse;
    Pascal), so the following is not true.  */
 #define STATIC_CHAIN_REGNUM	12
 
+/* r9 is the FDPIC register (base register for GOT and FUNCDESC accesses).  */
+#define FDPIC_REGNUM		9
+
 /* Define this to be where the real frame pointer is if it is not possible to
    work out the offset between the frame pointer and the automatic variables
    until after register allocation has taken place.  FRAME_POINTER_REGNUM
@@ -1948,6 +1951,10 @@ extern unsigned arm_pic_register;
    data addresses in memory.  */
 #define PIC_OFFSET_TABLE_REGNUM arm_pic_register
 
+/* For FDPIC, the FDPIC register is call-clobbered (otherwise PLT
+   entries would need to handle saving and restoring it).  */
+#define PIC_OFFSET_TABLE_REG_CALL_CLOBBERED TARGET_FDPIC
+
 /* We can't directly access anything that contains a symbol,
    nor can we indirect via the constant pool.  One exception is
    UNSPEC_TLS, which is always PIC.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0aecd03..b0ebb13 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8140,6 +8140,11 @@
 	: !REG_P (callee))
       XEXP (operands[0], 0) = force_reg (Pmode, callee);
 
+    if (TARGET_FDPIC && !SYMBOL_REF_P (XEXP (operands[0], 0)))
+	/* Indirect call: set r9 with FDPIC value of callee.  */
+	XEXP (operands[0], 0)
+	  = arm_load_function_descriptor (XEXP (operands[0], 0));
+
     if (detect_cmse_nonsecure_call (addr))
       {
 	pat = gen_nonsecure_call_internal (operands[0], operands[1],
@@ -8151,10 +8156,33 @@
 	pat = gen_call_internal (operands[0], operands[1], operands[2]);
 	arm_emit_call_insn (pat, XEXP (operands[0], 0), false);
       }
+
+    /* Restore FDPIC register (r9) after call.  */
+    if (TARGET_FDPIC)
+      {
+	rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+	rtx initial_fdpic_reg =
+	    get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);
+
+	emit_insn (gen_restore_pic_register_after_call (fdpic_reg,
+							initial_fdpic_reg));
+      }
+
     DONE;
   }"
 )
 
+(define_insn "restore_pic_register_after_call"
+  [(set (match_operand:SI 0 "s_register_operand" "+r,r")
+        (unspec:SI [(match_dup 0)
+                    (match_operand:SI 1 "nonimmediate_operand" "r,m")]
+                   UNSPEC_PIC_RESTORE))]
+  ""
+  "@
+  mov\t%0, %1
+  ldr\t%0, %1"
+)
+
 (define_expand "call_internal"
   [(parallel [(call (match_operand 0 "memory_operand" "")
 	            (match_operand 1 "general_operand" ""))
@@ -8228,6 +8256,11 @@
 	: !REG_P (callee))
       XEXP (operands[1], 0) = force_reg (Pmode, callee);
 
+    if (TARGET_FDPIC && !SYMBOL_REF_P (XEXP (operands[1], 0)))
+	/* Indirect call: set r9 with FDPIC value of callee.  */
+	XEXP (operands[1], 0)
+	  = arm_load_function_descriptor (XEXP (operands[1], 0));
+
     if (detect_cmse_nonsecure_call (addr))
       {
 	pat = gen_nonsecure_call_value_internal (operands[0], operands[1],
@@ -8240,6 +8273,18 @@
 				       operands[2], operands[3]);
 	arm_emit_call_insn (pat, XEXP (operands[1], 0), false);
       }
+
+    /* Restore FDPIC register (r9) after call.  */
+    if (TARGET_FDPIC)
+      {
+	rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+	rtx initial_fdpic_reg =
+	    get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);
+
+	emit_insn (gen_restore_pic_register_after_call (fdpic_reg,
+							initial_fdpic_reg));
+      }
+
     DONE;
   }"
 )
@@ -8582,7 +8627,7 @@
 		    (const_int 0))
 	      (match_operand 1 "" "")
 	      (match_operand 2 "" "")])]
-  "TARGET_EITHER"
+  "TARGET_EITHER && !TARGET_FDPIC"
   "
   {
     int i;
@@ -8649,7 +8694,7 @@
 (define_expand "untyped_return"
   [(match_operand:BLK 0 "memory_operand" "")
    (match_operand 1 "" "")]
-  "TARGET_EITHER"
+  "TARGET_EITHER && !TARGET_FDPIC"
   "
   {
     int i;
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 174bcc5..bda35d5 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -89,6 +89,7 @@
   UNSPEC_SP_SET		; Represent the setting of stack protector's canary
   UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
 			; against the guard.
+  UNSPEC_PIC_RESTORE	; Use to restore fdpic register
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c b/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c
index 4753e36..51a76fc 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c
@@ -17,5 +17,5 @@ F (__fp16 a, __fp16 b, __fp16 c)
 }
 
 /* { dg-final { scan-assembler-times {mov\tr[0-9]+, r[0-2]} 3 } }  */
-/* { dg-final { scan-assembler-times {mov\tr1, r0} 1 } }  */
+/* { dg-final { scan-assembler-times {mov\tr1, r[03]} 1 } }  */
 /* { dg-final { scan-assembler-times {mov\tr0, r[0-9]+} 2 } }  */
diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c b/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
index 41c7ab7..ae65fb8 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
@@ -16,5 +16,5 @@ F (__fp16 a, __fp16 b, __fp16 c)
 }
 
 /* { dg-final { scan-assembler-times {mov\tr[0-9]+, r[0-2]} 3 } }  */
-/* { dg-final { scan-assembler-times {mov\tr1, r0} 1 } }  */
+/* { dg-final { scan-assembler-times {mov\tr1, r[03]} 1 } }  */
 /* { dg-final { scan-assembler-times {mov\tr0, r[0-9]+} 2 } }  */
Richard Sandiford Sept. 3, 2019, 8:39 a.m. | #6
Christophe Lyon <christophe.lyon@linaro.org> writes:
> @@ -3485,6 +3485,14 @@ arm_option_override (void)

>    if (flag_pic && TARGET_VXWORKS_RTP)

>      arm_pic_register = 9;

>  

> +  /* If in FDPIC mode then force arm_pic_register to be r9.  */

> +  if (TARGET_FDPIC)

> +    {

> +      arm_pic_register = FDPIC_REGNUM;

> +      if (TARGET_THUMB1)

> +	sorry ("FDPIC mode is not supported in Thumb-1 mode.");


Should be no "." at the end.

> +    }

> +

>    if (arm_pic_register_string != NULL)

>      {

>        int pic_register = decode_reg_name (arm_pic_register_string);

> [...]

> @@ -7295,6 +7303,21 @@ arm_function_ok_for_sibcall (tree decl, tree exp)

>    if (cfun->machine->sibcall_blocked)

>      return false;

>  

> +  if (TARGET_FDPIC)

> +    {

> +      /* In FDPIC, never tailcall something for which we have no decl:

> +	 the target function could be in a different module, requiring

> +	 a different FDPIC register value.  */

> +      if (decl == NULL)

> +	return false;

> +

> +      /* Don't tailcall if we go through the PLT since the FDPIC

> +	 register is then corrupted and we don't restore it after

> +	 static function calls.  */

> +      if (!targetm.binds_local_p (decl))

> +	return false;

> +    }

> +

>    /* Never tailcall something if we are generating code for Thumb-1.  */

>    if (TARGET_THUMB1)

>      return false;


Is this still needed after you removed the optimisation to avoid
restoring r9?  (Not really a review comment, just curious.)

> [...]

> @@ -7780,28 +7812,132 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)

>    emit_use (pic_reg);

>  }

>  

> +/* Try to determine whether an object, referenced via ORIG, will be

> +   placed in the text or data segment.  This is used in FDPIC mode, to

> +   decide which relocations to use when accessing ORIG.  *IS_READONLY

> +   is set to true if ORIG is a read-only location, false otherwise.

> +   Return true if we could determine the location of ORIG, false

> +   otherwise.  *IS_READONLY is valid only when we return true.  */

> +static bool

> +arm_is_segment_info_known (rtx orig, bool *is_readonly)

> +{

> +  *is_readonly = false;

> +

> +  if (GET_CODE (orig) == LABEL_REF)

> +    {

> +      *is_readonly = true;

> +      return true;

> +    }

> +

> +  if (SYMBOL_REF_P (orig))

> +    {

> +      if (CONSTANT_POOL_ADDRESS_P (orig))

> +	{

> +	  *is_readonly = true;

> +	  return true;

> +	}

> +      else if (SYMBOL_REF_LOCAL_P (orig)

> +	       && !SYMBOL_REF_EXTERNAL_P (orig)

> +	       && SYMBOL_REF_DECL (orig)

> +	       && (!DECL_P (SYMBOL_REF_DECL (orig))

> +		   || !DECL_COMMON (SYMBOL_REF_DECL (orig))))


This can just be an "if".

> +	{

> +	  tree decl = SYMBOL_REF_DECL (orig);

> +	  tree init = (TREE_CODE (decl) == VAR_DECL)

> +	    ? DECL_INITIAL (decl) : (TREE_CODE (decl) == CONSTRUCTOR)

> +	    ? decl : 0;

> +	  int reloc = 0;

> +	  bool named_section, readonly;

> +

> +	  if (init && init != error_mark_node)

> +	    reloc = compute_reloc_for_constant (init);

> +

> +	  named_section = TREE_CODE (decl) == VAR_DECL

> +	    && lookup_attribute ("section", DECL_ATTRIBUTES (decl));

> +	  readonly = decl_readonly_section (decl, reloc);

> +

> +	  /* We don't know where the link script will put a named

> +	     section, so return false in such a case.  */

> +	  if (named_section)

> +	    return false;

> +

> +	  *is_readonly = readonly;

> +	  return true;

> +	}

> +      else

> +	{

> +	  /* We don't know.  */

> +	  return false;

> +	}

> +    }

> +  else

> +    gcc_unreachable ();

> +

> +  return false;


Then this can end with:

      /* We don't know.  */
      return false;
    }

  gcc_unreachable ();
}

> +}

> +

>  /* Generate code to load the address of a static var when flag_pic is set.  */

>  static rtx_insn *

>  arm_pic_static_addr (rtx orig, rtx reg)

>  {

>    rtx l1, labelno, offset_rtx;

> +  rtx_insn *insn;

>  

>    gcc_assert (flag_pic);

>  

> -  /* We use an UNSPEC rather than a LABEL_REF because this label

> -     never appears in the code stream.  */

> -  labelno = GEN_INT (pic_labelno++);

> -  l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);

> -  l1 = gen_rtx_CONST (VOIDmode, l1);

> +  bool is_readonly = false;

> +  bool info_known = false;

> +

> +  if (TARGET_FDPIC

> +      && SYMBOL_REF_P (orig)

> +      && !SYMBOL_REF_FUNCTION_P (orig))

> +    info_known = arm_is_segment_info_known (orig, &is_readonly);

> +

> +  if (TARGET_FDPIC

> +      && SYMBOL_REF_P (orig)

> +      && !SYMBOL_REF_FUNCTION_P (orig)

> +      && !info_known)

> +    {

> +      /* We don't know where orig is stored, so we have be

> +	 pessimistic and use a GOT relocation.  */

> +      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> +

> +      insn = calculate_pic_address_constant (reg, pic_reg, orig);

> +    }

> +  else if (TARGET_FDPIC

> +	   && SYMBOL_REF_P (orig)

> +	   && (SYMBOL_REF_FUNCTION_P (orig)

> +	       || (info_known && !is_readonly)))


The info_known check is redundant here.  I think it's actually clearer
without, since it's then more obvious that the final "else" is handling:

  !SYMBOL_REF_FUNCTION_P (orig) && is_readonly

(Initially I misread the condition and was wondering why it was safe to
drop to the "else" when "!info_known".  But it doesn't do that of course.)

> +    {

> +      /* We use the GOTOFF relocation.  */

> +      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> +

> +      rtx l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig), UNSPEC_PIC_SYM);

> +      emit_insn (gen_movsi (reg, l1));

> +      insn = emit_insn (gen_addsi3 (reg, reg, pic_reg));

> +    }

> +  else

> +    {

> +      /* Not FDPIC, not SYMBOL_REF_P or readonly: we can use

> +	 PC-relative access.  */

> +      /* We use an UNSPEC rather than a LABEL_REF because this label

> +	 never appears in the code stream.  */

> +      labelno = GEN_INT (pic_labelno++);

> +      l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);

> +      l1 = gen_rtx_CONST (VOIDmode, l1);

> +

> +      /* On the ARM the PC register contains 'dot + 8' at the time of the

> +	 addition, on the Thumb it is 'dot + 4'.  */

> +      offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);

> +      offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),

> +				   UNSPEC_SYMBOL_OFFSET);

> +      offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);

>  

> -  /* On the ARM the PC register contains 'dot + 8' at the time of the

> -     addition, on the Thumb it is 'dot + 4'.  */

> -  offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);

> -  offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),

> -                               UNSPEC_SYMBOL_OFFSET);

> -  offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);

> +      insn = emit_insn (gen_pic_load_addr_unified (reg, offset_rtx,

> +						   labelno));

> +    }

>  

> -  return emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, labelno));

> +  return insn;

>  }

>  

>  /* Return nonzero if X is valid as an ARM state addressing register.  */

> @@ -8510,7 +8646,7 @@ load_tls_operand (rtx x, rtx reg)

>  static rtx_insn *

>  arm_call_tls_get_addr (rtx x, rtx reg, rtx *valuep, int reloc)

>  {

> -  rtx label, labelno, sum;

> +  rtx label, labelno = NULL_RTX, sum;

>  

>    gcc_assert (reloc != TLS_DESCSEQ);

>    start_sequence ();


Looks like this might be a stray change (not mentioned in the changelog).

> [...]

> @@ -23069,9 +23234,37 @@ arm_assemble_integer (rtx x, unsigned int size, int aligned_p)

>  		  && (!SYMBOL_REF_LOCAL_P (x)

>  		      || (SYMBOL_REF_DECL (x)

>  			  ? DECL_WEAK (SYMBOL_REF_DECL (x)) : 0))))

> -	    fputs ("(GOT)", asm_out_file);

> +	    {

> +	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))

> +		fputs ("(GOTFUNCDESC)", asm_out_file);

> +	      else

> +		fputs ("(GOT)", asm_out_file);

> +	    }

>  	  else

> -	    fputs ("(GOTOFF)", asm_out_file);

> +	    {

> +	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))

> +		fputs ("(GOTOFFFUNCDESC)", asm_out_file);

> +	      else

> +		{

> +		  bool is_readonly;

> +

> +		  if (arm_is_segment_info_known (x, &is_readonly))

> +		    fputs ("(GOTOFF)", asm_out_file);

> +		  else

> +		    fputs ("(GOT)", asm_out_file);

> +		}


It looks like this changes behaviour for non-FDPIC.  Is that intentional?
Or should it be:

		  if (!TARGET_FDPIC
		      || arm_is_segment_info_known (x, &is_readonly))

?

> +	    }

> +	}

> +

> +      /* For FDPIC we also have to mark symbol for .data section.  */

> +      if (TARGET_FDPIC

> +	  && NEED_GOT_RELOC

> +	  && flag_pic

> +	  && !making_const_table

> +	  && SYMBOL_REF_P (x))

> +	{

> +	  if (SYMBOL_REF_FUNCTION_P (x))

> +	    fputs ("(FUNCDESC)", asm_out_file);

>  	}

>        fputc ('\n', asm_out_file);

>        return true;


Given:

> > Can NEED_GOT_RELOC or flag_pic be false for TARGET_FDPIC?

> No.

>

> > Is !flag_pic TARGET_FDPIC supported?

> No; flag_pic is false when we use -mno-fdpic, so we revert to the "usual" abi then


the flag_pic and NEED_GOT_RELOC checks look redundant.

Might as well put the SYMBOL_REF_FUNCTION_P (x) in the main "if"
statement rather than split it out.

> [...]

> @@ -8151,10 +8156,33 @@

>  	pat = gen_call_internal (operands[0], operands[1], operands[2]);

>  	arm_emit_call_insn (pat, XEXP (operands[0], 0), false);

>        }

> +

> +    /* Restore FDPIC register (r9) after call.  */

> +    if (TARGET_FDPIC)

> +      {

> +	rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> +	rtx initial_fdpic_reg =

> +	    get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);


Formatting nit: "=" should be on the next line.

> +

> +	emit_insn (gen_restore_pic_register_after_call (fdpic_reg,

> +							initial_fdpic_reg));

> +      }

> +

>      DONE;

>    }"

>  )

>  

> [...]

> @@ -8240,6 +8273,18 @@

>  				       operands[2], operands[3]);

>  	arm_emit_call_insn (pat, XEXP (operands[1], 0), false);

>        }

> +

> +    /* Restore FDPIC register (r9) after call.  */

> +    if (TARGET_FDPIC)

> +      {

> +	rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> +	rtx initial_fdpic_reg =

> +	    get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);


Same here.

Looks good otherwise, thanks.

Richard
Christophe Lyon Sept. 4, 2019, 7:58 p.m. | #7
On Tue, 3 Sep 2019 at 10:40, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>

> Christophe Lyon <christophe.lyon@linaro.org> writes:

> > @@ -3485,6 +3485,14 @@ arm_option_override (void)

> >    if (flag_pic && TARGET_VXWORKS_RTP)

> >      arm_pic_register = 9;

> >

> > +  /* If in FDPIC mode then force arm_pic_register to be r9.  */

> > +  if (TARGET_FDPIC)

> > +    {

> > +      arm_pic_register = FDPIC_REGNUM;

> > +      if (TARGET_THUMB1)

> > +     sorry ("FDPIC mode is not supported in Thumb-1 mode.");

>

> Should be no "." at the end.

Done

>

> > +    }

> > +

> >    if (arm_pic_register_string != NULL)

> >      {

> >        int pic_register = decode_reg_name (arm_pic_register_string);

> > [...]

> > @@ -7295,6 +7303,21 @@ arm_function_ok_for_sibcall (tree decl, tree exp)

> >    if (cfun->machine->sibcall_blocked)

> >      return false;

> >

> > +  if (TARGET_FDPIC)

> > +    {

> > +      /* In FDPIC, never tailcall something for which we have no decl:

> > +      the target function could be in a different module, requiring

> > +      a different FDPIC register value.  */

> > +      if (decl == NULL)

> > +     return false;

> > +

> > +      /* Don't tailcall if we go through the PLT since the FDPIC

> > +      register is then corrupted and we don't restore it after

> > +      static function calls.  */

> > +      if (!targetm.binds_local_p (decl))

> > +     return false;

> > +    }

> > +

> >    /* Never tailcall something if we are generating code for Thumb-1.  */

> >    if (TARGET_THUMB1)

> >      return false;

>

> Is this still needed after you removed the optimisation to avoid

> restoring r9?  (Not really a review comment, just curious.)

You are right: it's not needed anymore, I forgot to remove it.

>

> > [...]

> > @@ -7780,28 +7812,132 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)

> >    emit_use (pic_reg);

> >  }

> >

> > +/* Try to determine whether an object, referenced via ORIG, will be

> > +   placed in the text or data segment.  This is used in FDPIC mode, to

> > +   decide which relocations to use when accessing ORIG.  *IS_READONLY

> > +   is set to true if ORIG is a read-only location, false otherwise.

> > +   Return true if we could determine the location of ORIG, false

> > +   otherwise.  *IS_READONLY is valid only when we return true.  */

> > +static bool

> > +arm_is_segment_info_known (rtx orig, bool *is_readonly)

> > +{

> > +  *is_readonly = false;

> > +

> > +  if (GET_CODE (orig) == LABEL_REF)

> > +    {

> > +      *is_readonly = true;

> > +      return true;

> > +    }

> > +

> > +  if (SYMBOL_REF_P (orig))

> > +    {

> > +      if (CONSTANT_POOL_ADDRESS_P (orig))

> > +     {

> > +       *is_readonly = true;

> > +       return true;

> > +     }

> > +      else if (SYMBOL_REF_LOCAL_P (orig)

> > +            && !SYMBOL_REF_EXTERNAL_P (orig)

> > +            && SYMBOL_REF_DECL (orig)

> > +            && (!DECL_P (SYMBOL_REF_DECL (orig))

> > +                || !DECL_COMMON (SYMBOL_REF_DECL (orig))))

>

> This can just be an "if".

>

Done

> > +     {

> > +       tree decl = SYMBOL_REF_DECL (orig);

> > +       tree init = (TREE_CODE (decl) == VAR_DECL)

> > +         ? DECL_INITIAL (decl) : (TREE_CODE (decl) == CONSTRUCTOR)

> > +         ? decl : 0;

> > +       int reloc = 0;

> > +       bool named_section, readonly;

> > +

> > +       if (init && init != error_mark_node)

> > +         reloc = compute_reloc_for_constant (init);

> > +

> > +       named_section = TREE_CODE (decl) == VAR_DECL

> > +         && lookup_attribute ("section", DECL_ATTRIBUTES (decl));

> > +       readonly = decl_readonly_section (decl, reloc);

> > +

> > +       /* We don't know where the link script will put a named

> > +          section, so return false in such a case.  */

> > +       if (named_section)

> > +         return false;

> > +

> > +       *is_readonly = readonly;

> > +       return true;

> > +     }

> > +      else

> > +     {

> > +       /* We don't know.  */

> > +       return false;

> > +     }

> > +    }

> > +  else

> > +    gcc_unreachable ();

> > +

> > +  return false;

>

> Then this can end with:

>

>       /* We don't know.  */

>       return false;

>     }

>

>   gcc_unreachable ();

> }

>

OK

> > +}

> > +

> >  /* Generate code to load the address of a static var when flag_pic is set.  */

> >  static rtx_insn *

> >  arm_pic_static_addr (rtx orig, rtx reg)

> >  {

> >    rtx l1, labelno, offset_rtx;

> > +  rtx_insn *insn;

> >

> >    gcc_assert (flag_pic);

> >

> > -  /* We use an UNSPEC rather than a LABEL_REF because this label

> > -     never appears in the code stream.  */

> > -  labelno = GEN_INT (pic_labelno++);

> > -  l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);

> > -  l1 = gen_rtx_CONST (VOIDmode, l1);

> > +  bool is_readonly = false;

> > +  bool info_known = false;

> > +

> > +  if (TARGET_FDPIC

> > +      && SYMBOL_REF_P (orig)

> > +      && !SYMBOL_REF_FUNCTION_P (orig))

> > +    info_known = arm_is_segment_info_known (orig, &is_readonly);

> > +

> > +  if (TARGET_FDPIC

> > +      && SYMBOL_REF_P (orig)

> > +      && !SYMBOL_REF_FUNCTION_P (orig)

> > +      && !info_known)

> > +    {

> > +      /* We don't know where orig is stored, so we have be

> > +      pessimistic and use a GOT relocation.  */

> > +      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> > +

> > +      insn = calculate_pic_address_constant (reg, pic_reg, orig);

> > +    }

> > +  else if (TARGET_FDPIC

> > +        && SYMBOL_REF_P (orig)

> > +        && (SYMBOL_REF_FUNCTION_P (orig)

> > +            || (info_known && !is_readonly)))

>

> The info_known check is redundant here.  I think it's actually clearer

> without, since it's then more obvious that the final "else" is handling:

>

>   !SYMBOL_REF_FUNCTION_P (orig) && is_readonly

>

> (Initially I misread the condition and was wondering why it was safe to

> drop to the "else" when "!info_known".  But it doesn't do that of course.)

>

OK, done

> > +    {

> > +      /* We use the GOTOFF relocation.  */

> > +      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> > +

> > +      rtx l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig), UNSPEC_PIC_SYM);

> > +      emit_insn (gen_movsi (reg, l1));

> > +      insn = emit_insn (gen_addsi3 (reg, reg, pic_reg));

> > +    }

> > +  else

> > +    {

> > +      /* Not FDPIC, not SYMBOL_REF_P or readonly: we can use

> > +      PC-relative access.  */

> > +      /* We use an UNSPEC rather than a LABEL_REF because this label

> > +      never appears in the code stream.  */

> > +      labelno = GEN_INT (pic_labelno++);

> > +      l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);

> > +      l1 = gen_rtx_CONST (VOIDmode, l1);

> > +

> > +      /* On the ARM the PC register contains 'dot + 8' at the time of the

> > +      addition, on the Thumb it is 'dot + 4'.  */

> > +      offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);

> > +      offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),

> > +                                UNSPEC_SYMBOL_OFFSET);

> > +      offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);

> >

> > -  /* On the ARM the PC register contains 'dot + 8' at the time of the

> > -     addition, on the Thumb it is 'dot + 4'.  */

> > -  offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);

> > -  offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),

> > -                               UNSPEC_SYMBOL_OFFSET);

> > -  offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);

> > +      insn = emit_insn (gen_pic_load_addr_unified (reg, offset_rtx,

> > +                                                labelno));

> > +    }

> >

> > -  return emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, labelno));

> > +  return insn;

> >  }

> >

> >  /* Return nonzero if X is valid as an ARM state addressing register.  */

> > @@ -8510,7 +8646,7 @@ load_tls_operand (rtx x, rtx reg)

> >  static rtx_insn *

> >  arm_call_tls_get_addr (rtx x, rtx reg, rtx *valuep, int reloc)

> >  {

> > -  rtx label, labelno, sum;

> > +  rtx label, labelno = NULL_RTX, sum;

> >

> >    gcc_assert (reloc != TLS_DESCSEQ);

> >    start_sequence ();

>

> Looks like this might be a stray change (not mentioned in the changelog).


Right, it belongs to patch 10/21 "Implement TLS support".
The initialization avoids a warning about possibly non-initialized variable.

>

> > [...]

> > @@ -23069,9 +23234,37 @@ arm_assemble_integer (rtx x, unsigned int size, int aligned_p)

> >                 && (!SYMBOL_REF_LOCAL_P (x)

> >                     || (SYMBOL_REF_DECL (x)

> >                         ? DECL_WEAK (SYMBOL_REF_DECL (x)) : 0))))

> > -         fputs ("(GOT)", asm_out_file);

> > +         {

> > +           if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))

> > +             fputs ("(GOTFUNCDESC)", asm_out_file);

> > +           else

> > +             fputs ("(GOT)", asm_out_file);

> > +         }

> >         else

> > -         fputs ("(GOTOFF)", asm_out_file);

> > +         {

> > +           if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))

> > +             fputs ("(GOTOFFFUNCDESC)", asm_out_file);

> > +           else

> > +             {

> > +               bool is_readonly;

> > +

> > +               if (arm_is_segment_info_known (x, &is_readonly))

> > +                 fputs ("(GOTOFF)", asm_out_file);

> > +               else

> > +                 fputs ("(GOT)", asm_out_file);

> > +             }

>

> It looks like this changes behaviour for non-FDPIC.  Is that intentional?

> Or should it be:

>

>                   if (!TARGET_FDPIC

>                       || arm_is_segment_info_known (x, &is_readonly))

>

> ?

You are right, it was not intentional.
Unfortunately, it does not introduce a regression on
arm-none-linux-gnueabi[hf] with the GCC testsuite.
Fixed.

>

> > +         }

> > +     }

> > +

> > +      /* For FDPIC we also have to mark symbol for .data section.  */

> > +      if (TARGET_FDPIC

> > +       && NEED_GOT_RELOC

> > +       && flag_pic

> > +       && !making_const_table

> > +       && SYMBOL_REF_P (x))

> > +     {

> > +       if (SYMBOL_REF_FUNCTION_P (x))

> > +         fputs ("(FUNCDESC)", asm_out_file);

> >       }

> >        fputc ('\n', asm_out_file);

> >        return true;

>

> Given:

>

> > > Can NEED_GOT_RELOC or flag_pic be false for TARGET_FDPIC?

> > No.

> >

> > > Is !flag_pic TARGET_FDPIC supported?

> > No; flag_pic is false when we use -mno-fdpic, so we revert to the "usual" abi then

>

> the flag_pic and NEED_GOT_RELOC checks look redundant.

>

> Might as well put the SYMBOL_REF_FUNCTION_P (x) in the main "if"

> statement rather than split it out.

>

OK

> > [...]

> > @@ -8151,10 +8156,33 @@

> >       pat = gen_call_internal (operands[0], operands[1], operands[2]);

> >       arm_emit_call_insn (pat, XEXP (operands[0], 0), false);

> >        }

> > +

> > +    /* Restore FDPIC register (r9) after call.  */

> > +    if (TARGET_FDPIC)

> > +      {

> > +     rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> > +     rtx initial_fdpic_reg =

> > +         get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);

>

> Formatting nit: "=" should be on the next line.

OK

>

> > +

> > +     emit_insn (gen_restore_pic_register_after_call (fdpic_reg,

> > +                                                     initial_fdpic_reg));

> > +      }

> > +

> >      DONE;

> >    }"

> >  )

> >

> > [...]

> > @@ -8240,6 +8273,18 @@

> >                                      operands[2], operands[3]);

> >       arm_emit_call_insn (pat, XEXP (operands[1], 0), false);

> >        }

> > +

> > +    /* Restore FDPIC register (r9) after call.  */

> > +    if (TARGET_FDPIC)

> > +      {

> > +     rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);

> > +     rtx initial_fdpic_reg =

> > +         get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);

>

> Same here.

OK

>

> Looks good otherwise, thanks.

>

Here is an updated version, thanks!

> Richard
commit d257f25ea571139c80a93124ff91894921d46fa0
Author: Christophe Lyon <christophe.lyon@linaro.org>
Date:   Thu Feb 8 11:10:51 2018 +0100

    [ARM] FDPIC: Add support for FDPIC for arm architecture
    
    The FDPIC register is hard-coded to r9, as defined in the ABI.
    
    We have to disable tailcall optimizations if we don't know if the
    target function is in the same module. If not, we have to set r9 to
    the value associated with the target module.
    
    When generating a symbol address, we have to take into account whether
    it is a pointer to data or to a function, because different
    relocations are needed.
    
    2019-XX-XX  Christophe Lyon  <christophe.lyon@st.com>
    	Mickaël Guêné <mickael.guene@st.com>
    
    	gcc/
    	* config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro
    	in FDPIC mode.
    	* config/arm/arm-protos.h (arm_load_function_descriptor): Declare
    	new function.
    	* config/arm/arm.c (arm_option_override): Define pic register to
    	FDPIC_REGNUM.
    	(arm_function_ok_for_sibcall): Disable sibcall optimization if we
    	have no decl or go through PLT.
    	(calculate_pic_address_constant): New function.
    	(legitimize_pic_address): Call calculate_pic_address_constant.
    	(arm_load_pic_register): Handle TARGET_FDPIC.
    	(arm_is_segment_info_known): New function.
    	(arm_pic_static_addr): Add support for FDPIC.
    	(arm_load_function_descriptor): New function.
    	(arm_emit_call_insn): Add support for FDPIC.
    	(arm_assemble_integer): Add support for FDPIC.
    	* config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED):
    	Define. (FDPIC_REGNUM): New define.
    	* config/arm/arm.md (call): Add support for FDPIC.
    	(call_value): Likewise.
    	(restore_pic_register_after_call): New pattern.
    	(untyped_call): Disable if FDPIC.
    	(untyped_return): Likewise.
    	* config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New.
    
    	gcc/testsuite/
    	* gcc.target/arm/fp16-aapcs-2.c: Adjust scan-assembler-times.
    	* gcc.target/arm/fp16-aapcs-4.c: Likewise.
    
    Change-Id: I1e96d260074ab7b75d36cdff5d34ad898f35c66f

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 6e256ee..34695fa 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -203,6 +203,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
       builtin_define ("__ARM_EABI__");
     }
 
+  def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC);
+
   def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV);
   def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 485bc68..272968a 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -139,6 +139,7 @@ extern int arm_max_const_double_inline_cost (void);
 extern int arm_const_double_inline_cost (rtx);
 extern bool arm_const_double_by_parts (rtx);
 extern bool arm_const_double_by_immediates (rtx);
+extern rtx arm_load_function_descriptor (rtx funcdesc);
 extern void arm_emit_call_insn (rtx, rtx, bool);
 bool detect_cmse_nonsecure_call (tree);
 extern const char *output_call (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 45abcd8..541b139 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3485,6 +3485,14 @@ arm_option_override (void)
   if (flag_pic && TARGET_VXWORKS_RTP)
     arm_pic_register = 9;
 
+  /* If in FDPIC mode then force arm_pic_register to be r9.  */
+  if (TARGET_FDPIC)
+    {
+      arm_pic_register = FDPIC_REGNUM;
+      if (TARGET_THUMB1)
+	sorry ("FDPIC mode is not supported in Thumb-1 mode");
+    }
+
   if (arm_pic_register_string != NULL)
     {
       int pic_register = decode_reg_name (arm_pic_register_string);
@@ -7295,6 +7303,15 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
   if (cfun->machine->sibcall_blocked)
     return false;
 
+  if (TARGET_FDPIC)
+    {
+      /* In FDPIC, never tailcall something for which we have no decl:
+	 the target function could be in a different module, requiring
+	 a different FDPIC register value.  */
+      if (decl == NULL)
+	return false;
+    }
+
   /* Never tailcall something if we are generating code for Thumb-1.  */
   if (TARGET_THUMB1)
     return false;
@@ -7501,6 +7518,24 @@ require_pic_register (rtx pic_reg, bool compute_now)
     }
 }
 
+/* Generate insns to calculate the address of ORIG in pic mode.  */
+static rtx_insn *
+calculate_pic_address_constant (rtx reg, rtx pic_reg, rtx orig)
+{
+  rtx pat;
+  rtx mem;
+
+  pat = gen_calculate_pic_address (reg, pic_reg, orig);
+
+  /* Make the MEM as close to a constant as possible.  */
+  mem = SET_SRC (pat);
+  gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));
+  MEM_READONLY_P (mem) = 1;
+  MEM_NOTRAP_P (mem) = 1;
+
+  return emit_insn (pat);
+}
+
 /* Legitimize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
    created to hold the result of the load.  If not NULL, PIC_REG indicates
    which register to use as PIC register, otherwise it is decided by register
@@ -7545,24 +7580,13 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
 	insn = arm_pic_static_addr (orig, reg);
       else
 	{
-	  rtx pat;
-	  rtx mem;
-
 	  /* If this function doesn't have a pic register, create one now.  */
 	  require_pic_register (pic_reg, compute_now);
 
 	  if (pic_reg == NULL_RTX)
 	    pic_reg = cfun->machine->pic_reg;
 
-	  pat = gen_calculate_pic_address (reg, pic_reg, orig);
-
-	  /* Make the MEM as close to a constant as possible.  */
-	  mem = SET_SRC (pat);
-	  gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));
-	  MEM_READONLY_P (mem) = 1;
-	  MEM_NOTRAP_P (mem) = 1;
-
-	  insn = emit_insn (pat);
+	  insn = calculate_pic_address_constant (reg, pic_reg, orig);
 	}
 
       /* Put a REG_EQUAL note on this insn, so that it can be optimized
@@ -7711,7 +7735,9 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
 {
   rtx l1, labelno, pic_tmp, pic_rtx;
 
-  if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
+  if (crtl->uses_pic_offset_table == 0
+      || TARGET_SINGLE_PIC_BASE
+      || TARGET_FDPIC)
     return;
 
   gcc_assert (flag_pic);
@@ -7780,28 +7806,128 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
   emit_use (pic_reg);
 }
 
+/* Try to determine whether an object, referenced via ORIG, will be
+   placed in the text or data segment.  This is used in FDPIC mode, to
+   decide which relocations to use when accessing ORIG.  *IS_READONLY
+   is set to true if ORIG is a read-only location, false otherwise.
+   Return true if we could determine the location of ORIG, false
+   otherwise.  *IS_READONLY is valid only when we return true.  */
+static bool
+arm_is_segment_info_known (rtx orig, bool *is_readonly)
+{
+  *is_readonly = false;
+
+  if (GET_CODE (orig) == LABEL_REF)
+    {
+      *is_readonly = true;
+      return true;
+    }
+
+  if (SYMBOL_REF_P (orig))
+    {
+      if (CONSTANT_POOL_ADDRESS_P (orig))
+	{
+	  *is_readonly = true;
+	  return true;
+	}
+      if (SYMBOL_REF_LOCAL_P (orig)
+	  && !SYMBOL_REF_EXTERNAL_P (orig)
+	  && SYMBOL_REF_DECL (orig)
+	  && (!DECL_P (SYMBOL_REF_DECL (orig))
+	      || !DECL_COMMON (SYMBOL_REF_DECL (orig))))
+	{
+	  tree decl = SYMBOL_REF_DECL (orig);
+	  tree init = (TREE_CODE (decl) == VAR_DECL)
+	    ? DECL_INITIAL (decl) : (TREE_CODE (decl) == CONSTRUCTOR)
+	    ? decl : 0;
+	  int reloc = 0;
+	  bool named_section, readonly;
+
+	  if (init && init != error_mark_node)
+	    reloc = compute_reloc_for_constant (init);
+
+	  named_section = TREE_CODE (decl) == VAR_DECL
+	    && lookup_attribute ("section", DECL_ATTRIBUTES (decl));
+	  readonly = decl_readonly_section (decl, reloc);
+
+	  /* We don't know where the link script will put a named
+	     section, so return false in such a case.  */
+	  if (named_section)
+	    return false;
+
+	  *is_readonly = readonly;
+	  return true;
+	}
+      else
+	/* We don't know.  */
+	return false;
+    }
+
+  gcc_unreachable ();
+}
+
 /* Generate code to load the address of a static var when flag_pic is set.  */
 static rtx_insn *
 arm_pic_static_addr (rtx orig, rtx reg)
 {
   rtx l1, labelno, offset_rtx;
+  rtx_insn *insn;
 
   gcc_assert (flag_pic);
 
-  /* We use an UNSPEC rather than a LABEL_REF because this label
-     never appears in the code stream.  */
-  labelno = GEN_INT (pic_labelno++);
-  l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
-  l1 = gen_rtx_CONST (VOIDmode, l1);
+  bool is_readonly = false;
+  bool info_known = false;
 
-  /* On the ARM the PC register contains 'dot + 8' at the time of the
-     addition, on the Thumb it is 'dot + 4'.  */
-  offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);
-  offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),
-                               UNSPEC_SYMBOL_OFFSET);
-  offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);
+  if (TARGET_FDPIC
+      && SYMBOL_REF_P (orig)
+      && !SYMBOL_REF_FUNCTION_P (orig))
+    info_known = arm_is_segment_info_known (orig, &is_readonly);
 
-  return emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, labelno));
+  if (TARGET_FDPIC
+      && SYMBOL_REF_P (orig)
+      && !SYMBOL_REF_FUNCTION_P (orig)
+      && !info_known)
+    {
+      /* We don't know where orig is stored, so we have be
+	 pessimistic and use a GOT relocation.  */
+      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+
+      insn = calculate_pic_address_constant (reg, pic_reg, orig);
+    }
+  else if (TARGET_FDPIC
+	   && SYMBOL_REF_P (orig)
+	   && (SYMBOL_REF_FUNCTION_P (orig)
+	       || !is_readonly))
+    {
+      /* We use the GOTOFF relocation.  */
+      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+
+      rtx l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig), UNSPEC_PIC_SYM);
+      emit_insn (gen_movsi (reg, l1));
+      insn = emit_insn (gen_addsi3 (reg, reg, pic_reg));
+    }
+  else
+    {
+      /* Not FDPIC, not SYMBOL_REF_P or readonly: we can use
+	 PC-relative access.  */
+      /* We use an UNSPEC rather than a LABEL_REF because this label
+	 never appears in the code stream.  */
+      labelno = GEN_INT (pic_labelno++);
+      l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
+      l1 = gen_rtx_CONST (VOIDmode, l1);
+
+      /* On the ARM the PC register contains 'dot + 8' at the time of the
+	 addition, on the Thumb it is 'dot + 4'.  */
+      offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);
+      offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),
+				   UNSPEC_SYMBOL_OFFSET);
+      offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);
+
+      insn = emit_insn (gen_pic_load_addr_unified (reg, offset_rtx,
+						   labelno));
+    }
+
+  return insn;
 }
 
 /* Return nonzero if X is valid as an ARM state addressing register.  */
@@ -16112,9 +16238,32 @@ get_jump_table_size (rtx_jump_table_data *insn)
   return 0;
 }
 
+/* Emit insns to load the function address from FUNCDESC (an FDPIC
+   function descriptor) into a register and the GOT address into the
+   FDPIC register, returning an rtx for the register holding the
+   function address.  */
+
+rtx
+arm_load_function_descriptor (rtx funcdesc)
+{
+  rtx fnaddr_reg = gen_reg_rtx (Pmode);
+  rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+  rtx fnaddr = gen_rtx_MEM (Pmode, funcdesc);
+  rtx gotaddr = gen_rtx_MEM (Pmode, plus_constant (Pmode, funcdesc, 4));
+
+  emit_move_insn (fnaddr_reg, fnaddr);
+
+  /* The ABI requires the entry point address to be loaded first, but
+     since we cannot support lazy binding for lack of atomic load of
+     two 32-bits values, we do not need to bother to prevent the
+     previous load from being moved after that of the GOT address.  */
+  emit_insn (gen_restore_pic_register_after_call (pic_reg, gotaddr));
+
+  return fnaddr_reg;
+}
+
 /* Return the maximum amount of padding that will be inserted before
    label LABEL.  */
-
 static HOST_WIDE_INT
 get_label_padding (rtx label)
 {
@@ -18249,6 +18398,12 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
+  if (TARGET_FDPIC)
+    {
+      rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+      use_reg (&CALL_INSN_FUNCTION_USAGE (insn), fdpic_reg);
+    }
+
   if (TARGET_AAPCS_BASED)
     {
       /* For AAPCS, IP and CC can be clobbered by veneers inserted by the
@@ -23069,10 +23224,36 @@ arm_assemble_integer (rtx x, unsigned int size, int aligned_p)
 		  && (!SYMBOL_REF_LOCAL_P (x)
 		      || (SYMBOL_REF_DECL (x)
 			  ? DECL_WEAK (SYMBOL_REF_DECL (x)) : 0))))
-	    fputs ("(GOT)", asm_out_file);
+	    {
+	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))
+		fputs ("(GOTFUNCDESC)", asm_out_file);
+	      else
+		fputs ("(GOT)", asm_out_file);
+	    }
 	  else
-	    fputs ("(GOTOFF)", asm_out_file);
+	    {
+	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))
+		fputs ("(GOTOFFFUNCDESC)", asm_out_file);
+	      else
+		{
+		  bool is_readonly;
+
+		  if (!TARGET_FDPIC
+		      || arm_is_segment_info_known (x, &is_readonly))
+		    fputs ("(GOTOFF)", asm_out_file);
+		  else
+		    fputs ("(GOT)", asm_out_file);
+		}
+	    }
 	}
+
+      /* For FDPIC we also have to mark symbol for .data section.  */
+      if (TARGET_FDPIC
+	  && !making_const_table
+	  && SYMBOL_REF_P (x)
+	  && SYMBOL_REF_FUNCTION_P (x))
+	fputs ("(FUNCDESC)", asm_out_file);
+
       fputc ('\n', asm_out_file);
       return true;
     }
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 4866e1e..7b50ef5 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -892,6 +892,9 @@ extern int arm_arch_cmse;
    Pascal), so the following is not true.  */
 #define STATIC_CHAIN_REGNUM	12
 
+/* r9 is the FDPIC register (base register for GOT and FUNCDESC accesses).  */
+#define FDPIC_REGNUM		9
+
 /* Define this to be where the real frame pointer is if it is not possible to
    work out the offset between the frame pointer and the automatic variables
    until after register allocation has taken place.  FRAME_POINTER_REGNUM
@@ -1948,6 +1951,10 @@ extern unsigned arm_pic_register;
    data addresses in memory.  */
 #define PIC_OFFSET_TABLE_REGNUM arm_pic_register
 
+/* For FDPIC, the FDPIC register is call-clobbered (otherwise PLT
+   entries would need to handle saving and restoring it).  */
+#define PIC_OFFSET_TABLE_REG_CALL_CLOBBERED TARGET_FDPIC
+
 /* We can't directly access anything that contains a symbol,
    nor can we indirect via the constant pool.  One exception is
    UNSPEC_TLS, which is always PIC.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0aecd03..6d26e9e 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8140,6 +8140,11 @@
 	: !REG_P (callee))
       XEXP (operands[0], 0) = force_reg (Pmode, callee);
 
+    if (TARGET_FDPIC && !SYMBOL_REF_P (XEXP (operands[0], 0)))
+	/* Indirect call: set r9 with FDPIC value of callee.  */
+	XEXP (operands[0], 0)
+	  = arm_load_function_descriptor (XEXP (operands[0], 0));
+
     if (detect_cmse_nonsecure_call (addr))
       {
 	pat = gen_nonsecure_call_internal (operands[0], operands[1],
@@ -8151,10 +8156,33 @@
 	pat = gen_call_internal (operands[0], operands[1], operands[2]);
 	arm_emit_call_insn (pat, XEXP (operands[0], 0), false);
       }
+
+    /* Restore FDPIC register (r9) after call.  */
+    if (TARGET_FDPIC)
+      {
+	rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+	rtx initial_fdpic_reg
+	    = get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);
+
+	emit_insn (gen_restore_pic_register_after_call (fdpic_reg,
+							initial_fdpic_reg));
+      }
+
     DONE;
   }"
 )
 
+(define_insn "restore_pic_register_after_call"
+  [(set (match_operand:SI 0 "s_register_operand" "+r,r")
+        (unspec:SI [(match_dup 0)
+                    (match_operand:SI 1 "nonimmediate_operand" "r,m")]
+                   UNSPEC_PIC_RESTORE))]
+  ""
+  "@
+  mov\t%0, %1
+  ldr\t%0, %1"
+)
+
 (define_expand "call_internal"
   [(parallel [(call (match_operand 0 "memory_operand" "")
 	            (match_operand 1 "general_operand" ""))
@@ -8228,6 +8256,11 @@
 	: !REG_P (callee))
       XEXP (operands[1], 0) = force_reg (Pmode, callee);
 
+    if (TARGET_FDPIC && !SYMBOL_REF_P (XEXP (operands[1], 0)))
+	/* Indirect call: set r9 with FDPIC value of callee.  */
+	XEXP (operands[1], 0)
+	  = arm_load_function_descriptor (XEXP (operands[1], 0));
+
     if (detect_cmse_nonsecure_call (addr))
       {
 	pat = gen_nonsecure_call_value_internal (operands[0], operands[1],
@@ -8240,6 +8273,18 @@
 				       operands[2], operands[3]);
 	arm_emit_call_insn (pat, XEXP (operands[1], 0), false);
       }
+
+    /* Restore FDPIC register (r9) after call.  */
+    if (TARGET_FDPIC)
+      {
+	rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+	rtx initial_fdpic_reg
+	    = get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);
+
+	emit_insn (gen_restore_pic_register_after_call (fdpic_reg,
+							initial_fdpic_reg));
+      }
+
     DONE;
   }"
 )
@@ -8582,7 +8627,7 @@
 		    (const_int 0))
 	      (match_operand 1 "" "")
 	      (match_operand 2 "" "")])]
-  "TARGET_EITHER"
+  "TARGET_EITHER && !TARGET_FDPIC"
   "
   {
     int i;
@@ -8649,7 +8694,7 @@
 (define_expand "untyped_return"
   [(match_operand:BLK 0 "memory_operand" "")
    (match_operand 1 "" "")]
-  "TARGET_EITHER"
+  "TARGET_EITHER && !TARGET_FDPIC"
   "
   {
     int i;
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 174bcc5..bda35d5 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -89,6 +89,7 @@
   UNSPEC_SP_SET		; Represent the setting of stack protector's canary
   UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
 			; against the guard.
+  UNSPEC_PIC_RESTORE	; Use to restore fdpic register
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c b/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c
index 4753e36..51a76fc 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-2.c
@@ -17,5 +17,5 @@ F (__fp16 a, __fp16 b, __fp16 c)
 }
 
 /* { dg-final { scan-assembler-times {mov\tr[0-9]+, r[0-2]} 3 } }  */
-/* { dg-final { scan-assembler-times {mov\tr1, r0} 1 } }  */
+/* { dg-final { scan-assembler-times {mov\tr1, r[03]} 1 } }  */
 /* { dg-final { scan-assembler-times {mov\tr0, r[0-9]+} 2 } }  */
diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c b/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
index 41c7ab7..ae65fb8 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-4.c
@@ -16,5 +16,5 @@ F (__fp16 a, __fp16 b, __fp16 c)
 }
 
 /* { dg-final { scan-assembler-times {mov\tr[0-9]+, r[0-2]} 3 } }  */
-/* { dg-final { scan-assembler-times {mov\tr1, r0} 1 } }  */
+/* { dg-final { scan-assembler-times {mov\tr1, r[03]} 1 } }  */
 /* { dg-final { scan-assembler-times {mov\tr0, r[0-9]+} 2 } }  */
Richard Sandiford Sept. 5, 2019, 8:03 a.m. | #8
Christophe Lyon <christophe.lyon@linaro.org> writes:
> +  if (SYMBOL_REF_P (orig))

> +    {

> +      if (CONSTANT_POOL_ADDRESS_P (orig))

> +	{

> +	  *is_readonly = true;

> +	  return true;

> +	}

> +      if (SYMBOL_REF_LOCAL_P (orig)

> +	  && !SYMBOL_REF_EXTERNAL_P (orig)

> +	  && SYMBOL_REF_DECL (orig)

> +	  && (!DECL_P (SYMBOL_REF_DECL (orig))

> +	      || !DECL_COMMON (SYMBOL_REF_DECL (orig))))

> +	{

> +	  tree decl = SYMBOL_REF_DECL (orig);

> +	  tree init = (TREE_CODE (decl) == VAR_DECL)

> +	    ? DECL_INITIAL (decl) : (TREE_CODE (decl) == CONSTRUCTOR)

> +	    ? decl : 0;

> +	  int reloc = 0;

> +	  bool named_section, readonly;

> +

> +	  if (init && init != error_mark_node)

> +	    reloc = compute_reloc_for_constant (init);

> +

> +	  named_section = TREE_CODE (decl) == VAR_DECL

> +	    && lookup_attribute ("section", DECL_ATTRIBUTES (decl));

> +	  readonly = decl_readonly_section (decl, reloc);

> +

> +	  /* We don't know where the link script will put a named

> +	     section, so return false in such a case.  */

> +	  if (named_section)

> +	    return false;

> +

> +	  *is_readonly = readonly;

> +	  return true;

> +	}

> +      else

> +	/* We don't know.  */

> +	return false;


Nit: no need for the "else".

OK with that changes, thanks.

Richard

Patch

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 6e256ee..34695fa 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -203,6 +203,8 @@  arm_cpu_builtins (struct cpp_reader* pfile)
       builtin_define ("__ARM_EABI__");
     }
 
+  def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC);
+
   def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV);
   def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);
 
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 485bc68..272968a 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -139,6 +139,7 @@  extern int arm_max_const_double_inline_cost (void);
 extern int arm_const_double_inline_cost (rtx);
 extern bool arm_const_double_by_parts (rtx);
 extern bool arm_const_double_by_immediates (rtx);
+extern rtx arm_load_function_descriptor (rtx funcdesc);
 extern void arm_emit_call_insn (rtx, rtx, bool);
 bool detect_cmse_nonsecure_call (tree);
 extern const char *output_call (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 45abcd8..d9397b5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3485,6 +3485,15 @@  arm_option_override (void)
   if (flag_pic && TARGET_VXWORKS_RTP)
     arm_pic_register = 9;
 
+  /* If in FDPIC mode then force arm_pic_register to be r9.  */
+  if (TARGET_FDPIC)
+    {
+      arm_pic_register = FDPIC_REGNUM;
+      if (! TARGET_ARM && ! TARGET_THUMB2)
+	sorry ("FDPIC mode is supported on architecture versions that "
+	       "support ARM or Thumb-2 only.");
+    }
+
   if (arm_pic_register_string != NULL)
     {
       int pic_register = decode_reg_name (arm_pic_register_string);
@@ -7295,6 +7304,21 @@  arm_function_ok_for_sibcall (tree decl, tree exp)
   if (cfun->machine->sibcall_blocked)
     return false;
 
+  if (TARGET_FDPIC)
+    {
+      /* In FDPIC, never tailcall something for which we have no decl:
+	 the target function could be in a different module, requiring
+	 a different FDPIC register value.  */
+      if (decl == NULL)
+	return false;
+
+      /* Don't tailcall if we go through the PLT since the FDPIC
+	 register is then corrupted and we don't restore it after
+	 static function calls.  */
+      if (!targetm.binds_local_p (decl))
+	return false;
+    }
+
   /* Never tailcall something if we are generating code for Thumb-1.  */
   if (TARGET_THUMB1)
     return false;
@@ -7711,7 +7735,9 @@  arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
 {
   rtx l1, labelno, pic_tmp, pic_rtx;
 
-  if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
+  if (crtl->uses_pic_offset_table == 0
+      || TARGET_SINGLE_PIC_BASE
+      || TARGET_FDPIC)
     return;
 
   gcc_assert (flag_pic);
@@ -7780,28 +7806,142 @@  arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
   emit_use (pic_reg);
 }
 
+/* Try to determine whether an object, referenced via ORIG, will be
+   placed in the text or data segment.  This is used in FDPIC mode, to
+   decide which relocations to use when accessing ORIG.  IS_READONLY
+   is set to true if ORIG is a read-only location, false otherwise.
+   Return true if we could determine the location of ORIG, false
+   otherwise.  IS_READONLY is valid only when we return true.  */
+static bool
+arm_is_segment_info_known (rtx orig, bool *is_readonly)
+{
+  bool res = false;
+
+  *is_readonly = false;
+
+  if (GET_CODE (orig) == LABEL_REF)
+    {
+      res = true;
+      *is_readonly = true;
+    }
+  else if (SYMBOL_REF_P (orig))
+    {
+      if (CONSTANT_POOL_ADDRESS_P (orig))
+	{
+	  res = true;
+	  *is_readonly = true;
+	}
+      else if (SYMBOL_REF_LOCAL_P (orig)
+	       && !SYMBOL_REF_EXTERNAL_P (orig)
+	       && SYMBOL_REF_DECL (orig)
+	       && (!DECL_P (SYMBOL_REF_DECL (orig))
+		   || !DECL_COMMON (SYMBOL_REF_DECL (orig))))
+	{
+	  tree decl = SYMBOL_REF_DECL (orig);
+	  tree init = (TREE_CODE (decl) == VAR_DECL)
+	    ? DECL_INITIAL (decl) : (TREE_CODE (decl) == CONSTRUCTOR)
+	    ? decl : 0;
+	  int reloc = 0;
+	  bool named_section, readonly;
+
+	  if (init && init != error_mark_node)
+	    reloc = compute_reloc_for_constant (init);
+
+	  named_section = TREE_CODE (decl) == VAR_DECL
+	    && lookup_attribute ("section", DECL_ATTRIBUTES (decl));
+	  readonly = decl_readonly_section (decl, reloc);
+
+	  /* We don't know where the link script will put a named
+	     section, so return false in such a case.  */
+	  res = !named_section;
+
+	  if (!named_section)
+	    *is_readonly = readonly;
+	}
+      else
+	{
+	  /* We don't know.  */
+	  res = false;
+	}
+    }
+  else
+    gcc_unreachable ();
+
+  return res;
+}
+
 /* Generate code to load the address of a static var when flag_pic is set.  */
 static rtx_insn *
 arm_pic_static_addr (rtx orig, rtx reg)
 {
   rtx l1, labelno, offset_rtx;
+  rtx_insn *insn;
 
   gcc_assert (flag_pic);
 
-  /* We use an UNSPEC rather than a LABEL_REF because this label
-     never appears in the code stream.  */
-  labelno = GEN_INT (pic_labelno++);
-  l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
-  l1 = gen_rtx_CONST (VOIDmode, l1);
+  bool is_readonly = false;
+  bool info_known = false;
 
-  /* On the ARM the PC register contains 'dot + 8' at the time of the
-     addition, on the Thumb it is 'dot + 4'.  */
-  offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);
-  offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),
-                               UNSPEC_SYMBOL_OFFSET);
-  offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);
+  if (TARGET_FDPIC
+      && SYMBOL_REF_P (orig)
+      && !SYMBOL_REF_FUNCTION_P (orig))
+      info_known = arm_is_segment_info_known (orig, &is_readonly);
 
-  return emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, labelno));
+  if (TARGET_FDPIC
+      && SYMBOL_REF_P (orig)
+      && !SYMBOL_REF_FUNCTION_P (orig)
+      && !info_known)
+    {
+      /* We don't know where orig is stored, so we have be
+	 pessimistic and use a GOT relocation.  */
+      rtx pat;
+      rtx mem;
+      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+
+      pat = gen_calculate_pic_address (reg, pic_reg, orig);
+
+      /* Make the MEM as close to a constant as possible.  */
+      mem = SET_SRC (pat);
+      gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));
+      MEM_READONLY_P (mem) = 1;
+      MEM_NOTRAP_P (mem) = 1;
+
+      insn = emit_insn (pat);
+    }
+  else if (TARGET_FDPIC
+	   && SYMBOL_REF_P (orig)
+	   && (SYMBOL_REF_FUNCTION_P (orig)
+	       || (info_known && !is_readonly)))
+    {
+      /* We use the GOTOFF relocation.  */
+      rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+
+      rtx l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig), UNSPEC_PIC_SYM);
+      emit_insn (gen_movsi (reg, l1));
+      insn = emit_insn (gen_addsi3 (reg, reg, pic_reg));
+    }
+  else
+    {
+      /* Not FDPIC, not SYMBOL_REF_P or readonly: we can use
+	 PC-relative access.  */
+      /* We use an UNSPEC rather than a LABEL_REF because this label
+	 never appears in the code stream.  */
+      labelno = GEN_INT (pic_labelno++);
+      l1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, labelno), UNSPEC_PIC_LABEL);
+      l1 = gen_rtx_CONST (VOIDmode, l1);
+
+      /* On the ARM the PC register contains 'dot + 8' at the time of the
+	 addition, on the Thumb it is 'dot + 4'.  */
+      offset_rtx = plus_constant (Pmode, l1, TARGET_ARM ? 8 : 4);
+      offset_rtx = gen_rtx_UNSPEC (Pmode, gen_rtvec (2, orig, offset_rtx),
+				   UNSPEC_SYMBOL_OFFSET);
+      offset_rtx = gen_rtx_CONST (Pmode, offset_rtx);
+
+      insn = emit_insn (gen_pic_load_addr_unified (reg, offset_rtx,
+						   labelno));
+    }
+
+  return insn;
 }
 
 /* Return nonzero if X is valid as an ARM state addressing register.  */
@@ -16112,9 +16252,36 @@  get_jump_table_size (rtx_jump_table_data *insn)
   return 0;
 }
 
+/* Emit insns to load the function address from FUNCDESC (an FDPIC
+   function descriptor) into a register and the GOT address into the
+   FDPIC register, returning an rtx for the register holding the
+   function address.  */
+
+rtx
+arm_load_function_descriptor (rtx funcdesc)
+{
+  rtx fnaddr_reg = gen_reg_rtx (Pmode);
+  rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+  rtx fnaddr = gen_rtx_MEM (Pmode, funcdesc);
+  rtx gotaddr = gen_rtx_MEM (Pmode, plus_constant (Pmode, funcdesc, 4));
+  rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+
+  emit_move_insn (fnaddr_reg, fnaddr);
+  /* The ABI requires the entry point address to be loaded first, so
+     prevent the load from being moved after that of the GOT
+     address.  */
+  XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode,
+					gen_rtvec (2, pic_reg, gotaddr),
+					UNSPEC_PIC_RESTORE);
+  XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, gotaddr);
+  XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, pic_reg);
+  emit_insn (par);
+
+  return fnaddr_reg;
+}
+
 /* Return the maximum amount of padding that will be inserted before
    label LABEL.  */
-
 static HOST_WIDE_INT
 get_label_padding (rtx label)
 {
@@ -23069,9 +23236,37 @@  arm_assemble_integer (rtx x, unsigned int size, int aligned_p)
 		  && (!SYMBOL_REF_LOCAL_P (x)
 		      || (SYMBOL_REF_DECL (x)
 			  ? DECL_WEAK (SYMBOL_REF_DECL (x)) : 0))))
-	    fputs ("(GOT)", asm_out_file);
+	    {
+	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))
+		fputs ("(GOTFUNCDESC)", asm_out_file);
+	      else
+		fputs ("(GOT)", asm_out_file);
+	    }
 	  else
-	    fputs ("(GOTOFF)", asm_out_file);
+	    {
+	      if (TARGET_FDPIC && SYMBOL_REF_FUNCTION_P (x))
+		fputs ("(GOTOFFFUNCDESC)", asm_out_file);
+	      else
+		{
+		  bool is_readonly;
+
+		  if (arm_is_segment_info_known (x, &is_readonly))
+		    fputs ("(GOTOFF)", asm_out_file);
+		  else
+		    fputs ("(GOT)", asm_out_file);
+		}
+	    }
+	}
+
+      /* For FDPIC we also have to mark symbol for .data section.  */
+      if (TARGET_FDPIC
+	  && NEED_GOT_RELOC
+	  && flag_pic
+	  && !making_const_table
+	  && SYMBOL_REF_P (x))
+	{
+	  if (SYMBOL_REF_FUNCTION_P (x))
+	    fputs ("(FUNCDESC)", asm_out_file);
 	}
       fputc ('\n', asm_out_file);
       return true;
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 4866e1e..7b50ef5 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -892,6 +892,9 @@  extern int arm_arch_cmse;
    Pascal), so the following is not true.  */
 #define STATIC_CHAIN_REGNUM	12
 
+/* r9 is the FDPIC register (base register for GOT and FUNCDESC accesses).  */
+#define FDPIC_REGNUM		9
+
 /* Define this to be where the real frame pointer is if it is not possible to
    work out the offset between the frame pointer and the automatic variables
    until after register allocation has taken place.  FRAME_POINTER_REGNUM
@@ -1948,6 +1951,10 @@  extern unsigned arm_pic_register;
    data addresses in memory.  */
 #define PIC_OFFSET_TABLE_REGNUM arm_pic_register
 
+/* For FDPIC, the FDPIC register is call-clobbered (otherwise PLT
+   entries would need to handle saving and restoring it).  */
+#define PIC_OFFSET_TABLE_REG_CALL_CLOBBERED TARGET_FDPIC
+
 /* We can't directly access anything that contains a symbol,
    nor can we indirect via the constant pool.  One exception is
    UNSPEC_TLS, which is always PIC.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0aecd03..9036255 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8127,6 +8127,23 @@ 
     rtx callee, pat;
     tree addr = MEM_EXPR (operands[0]);
     
+    /* Force FDPIC register (r9) before call.  */
+    if (TARGET_FDPIC)
+      {
+	/* No need to update r9 if calling a static function.
+	   In other words: set r9 for indirect or non-local calls.  */
+	callee = XEXP (operands[0], 0);
+	if (!SYMBOL_REF_P (callee)
+	    || !SYMBOL_REF_LOCAL_P (callee)
+	    || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))
+	  {
+	    emit_insn (gen_blockage ());
+	    rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+	    emit_move_insn (pic_reg, get_hard_reg_initial_val (Pmode, FDPIC_REGNUM));
+	    emit_insn (gen_rtx_USE (VOIDmode, pic_reg));
+	 }
+      }
+
     /* In an untyped call, we can get NULL for operand 2.  */
     if (operands[2] == NULL_RTX)
       operands[2] = const0_rtx;
@@ -8140,6 +8157,13 @@ 
 	: !REG_P (callee))
       XEXP (operands[0], 0) = force_reg (Pmode, callee);
 
+    if (TARGET_FDPIC && !SYMBOL_REF_P (XEXP (operands[0], 0)))
+      {
+	/* Indirect call: set r9 with FDPIC value of callee.  */
+	XEXP (operands[0], 0)
+	  = arm_load_function_descriptor (XEXP (operands[0], 0));
+      }
+
     if (detect_cmse_nonsecure_call (addr))
       {
 	pat = gen_nonsecure_call_internal (operands[0], operands[1],
@@ -8151,10 +8175,38 @@ 
 	pat = gen_call_internal (operands[0], operands[1], operands[2]);
 	arm_emit_call_insn (pat, XEXP (operands[0], 0), false);
       }
+
+    /* Restore FDPIC register (r9) after call.  */
+    if (TARGET_FDPIC)
+      {
+	/* No need to update r9 if calling a static function.  */
+	if (!SYMBOL_REF_P (callee)
+	    || !SYMBOL_REF_LOCAL_P (callee)
+	    || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))
+	  {
+	    rtx pic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+	    emit_move_insn (pic_reg, get_hard_reg_initial_val (Pmode, FDPIC_REGNUM));
+	    emit_insn (gen_rtx_USE (VOIDmode, pic_reg));
+	    emit_insn (gen_blockage ());
+	  }
+      }
     DONE;
   }"
 )
 
+(define_insn "*restore_pic_register_after_call"
+  [(parallel [(unspec [(match_operand:SI 0 "s_register_operand" "=r,r")
+		       (match_operand:SI 1 "nonimmediate_operand" "r,m")]
+	       UNSPEC_PIC_RESTORE)
+	      (use (match_dup 1))
+	      (clobber (match_dup 0))])
+  ]
+  ""
+  "@
+  mov\t%0, %1
+  ldr\t%0, %1"
+)
+
 (define_expand "call_internal"
   [(parallel [(call (match_operand 0 "memory_operand" "")
 	            (match_operand 1 "general_operand" ""))
@@ -8215,6 +8267,30 @@ 
     rtx pat, callee;
     tree addr = MEM_EXPR (operands[1]);
     
+    /* Force FDPIC register (r9) before call.  */
+    if (TARGET_FDPIC)
+      {
+	/* No need to update the FDPIC register (r9) if calling a static function.
+	   In other words: set r9 for indirect or non-local calls.  */
+	callee = XEXP (operands[1], 0);
+	if (!SYMBOL_REF_P (callee)
+	    || !SYMBOL_REF_LOCAL_P (callee)
+	    || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))
+	  {
+	    rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+	    rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+	    rtx initial_fdpic_reg =
+		get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);
+
+	    XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode,
+		gen_rtvec (2, fdpic_reg, initial_fdpic_reg),
+		UNSPEC_PIC_RESTORE);
+	    XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, initial_fdpic_reg);
+	    XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, fdpic_reg);
+	    emit_insn (par);
+	  }
+      }
+
     /* In an untyped call, we can get NULL for operand 2.  */
     if (operands[3] == 0)
       operands[3] = const0_rtx;
@@ -8228,6 +8304,14 @@ 
 	: !REG_P (callee))
       XEXP (operands[1], 0) = force_reg (Pmode, callee);
 
+    if (TARGET_FDPIC
+	&& !SYMBOL_REF_P (XEXP (operands[1], 0)))
+      {
+	/* Indirect call: set r9 with FDPIC value of callee.  */
+	XEXP (operands[1], 0)
+	  = arm_load_function_descriptor (XEXP (operands[1], 0));
+      }
+
     if (detect_cmse_nonsecure_call (addr))
       {
 	pat = gen_nonsecure_call_value_internal (operands[0], operands[1],
@@ -8240,6 +8324,28 @@ 
 				       operands[2], operands[3]);
 	arm_emit_call_insn (pat, XEXP (operands[1], 0), false);
       }
+    /* Restore FDPIC register (r9) after call.  */
+    if (TARGET_FDPIC)
+      {
+	/* No need to update r9 if calling a static function.  */
+	if (!SYMBOL_REF_P (callee)
+	    || !SYMBOL_REF_LOCAL_P (callee)
+	    || arm_is_long_call_p (SYMBOL_REF_DECL (callee)))
+	  {
+	    rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+	    rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+	    rtx initial_fdpic_reg =
+		get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);
+
+	    XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode,
+		gen_rtvec (2, fdpic_reg, initial_fdpic_reg),
+		UNSPEC_PIC_RESTORE);
+	    XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, initial_fdpic_reg);
+	    XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, fdpic_reg);
+	    emit_insn (par);
+	  }
+      }
+
     DONE;
   }"
 )
@@ -8582,7 +8688,7 @@ 
 		    (const_int 0))
 	      (match_operand 1 "" "")
 	      (match_operand 2 "" "")])]
-  "TARGET_EITHER"
+  "TARGET_EITHER && !TARGET_FDPIC"
   "
   {
     int i;
@@ -8649,7 +8755,7 @@ 
 (define_expand "untyped_return"
   [(match_operand:BLK 0 "memory_operand" "")
    (match_operand 1 "" "")]
-  "TARGET_EITHER"
+  "TARGET_EITHER && !TARGET_FDPIC"
   "
   {
     int i;
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 174bcc5..bda35d5 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -89,6 +89,7 @@ 
   UNSPEC_SP_SET		; Represent the setting of stack protector's canary
   UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
 			; against the guard.
+  UNSPEC_PIC_RESTORE	; Use to restore fdpic register
 ])
 
 (define_c_enum "unspec" [