RFC: [PATCH] x86: Add -mzero-caller-saved-regs=[skip|used|all]

Message ID 20180926181029.GA20898@intel.com
State New
Headers show
Series
  • RFC: [PATCH] x86: Add -mzero-caller-saved-regs=[skip|used|all]
Related show

Commit Message

H.J. Lu Sept. 26, 2018, 6:10 p.m.
Add -mzero-caller-saved-regs=[skip|used|all] command-line option and
zero_caller_saved_regs("skip|used|all") function attribue:

1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")

Don't zero caller-saved integer registers upon function return.

2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")

Zero used caller-saved integer registers upon function return.

3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")

Zero all caller-saved integer registers upon function return.

Tested on i686 and x86-64 with bootstrapping GCC trunk and
-mzero-caller-saved-regs=used as well as -mzero-caller-saved-regs=all
enabled by default.

gcc/

	* config/i386/i386-opts.h (zero_caller_saved_regs): New enum.
	* config/i386/i386-protos.h (ix86_split_simple_return_pop_internal):
	Renamed to ...
	(ix86_split_simple_return_internal): This.
	* config/i386/i386.c (ix86_set_zero_caller_saved_regs_type): New
	function.
	(ix86_set_current_function): Call ix86_set_zero_caller_saved_regs_type.
	(ix86_expand_prologue): Replace gen_prologue_use with
	gen_pro_epilogue_use.
	(ix86_expand_epilogue): Replace gen_simple_return_pop_internal
	with ix86_split_simple_return_internal.  Replace
	gen_simple_return_internal with ix86_split_simple_return_internal.
	(ix86_find_live_outgoing_regs): New function.
	(ix86_split_simple_return_pop_internal): Removed.
	(ix86_split_simple_return_internal): New function.
	(ix86_handle_fndecl_attribute): Support zero_caller_saved_regs
	attribute.
	(ix86_attribute_table): Add zero_caller_saved_regs.
	* config/i386/i386.h (machine_function): Add
	zero_caller_saved_regs_type and live_outgoing_regs.
	(TARGET_POP_SCRATCH_REGISTER): New.
	* config/i386/i386.md (UNSPEC_SIMPLE_RETURN): New UNSPEC.
	(UNSPECV_PROLOGUE_USE): Renamed to ...
	(UNSPECV_PRO_EPILOGUE_USE): This.
	(prologue_use): Renamed to ...
	(pro_epilogue_use): This.
	(simple_return_internal): Changed to define_insn_and_split.
	(simple_return_internal_1): New pattern.
	(simple_return_pop_internal): Replace
	ix86_split_simple_return_pop_internal with
	ix86_split_simple_return_internal.  Always call
	ix86_split_simple_return_internal if epilogue_completed is
	true.
	(simple_return_pop_internal_1): New pattern.
	(Epilogue deallocator to pop peepholes): Enabled only if
	TARGET_POP_SCRATCH_REGISTER is true.
	* config/i386/i386.opt (mzero-caller-saved-regs=): New option.
	* doc/extend.texi: Document zero_caller_saved_regs attribute.
	* doc/invoke.texi: Document -mzero-caller-saved-regs=.

gcc/testsuite/

	* gcc.target/i386/zero-scratch-regs-1.c: New test.
	* gcc.target/i386/zero-scratch-regs-2.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-3.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-4.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-5.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-6.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-7.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-8.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-9.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-10.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-11.c: Likewise.
	* gcc.target/i386/zero-scratch-regs-12.c: Likewise.
---
 gcc/config/i386/i386-opts.h                   |   7 +
 gcc/config/i386/i386-protos.h                 |   2 +-
 gcc/config/i386/i386.c                        | 245 ++++++++++++++++--
 gcc/config/i386/i386.h                        |  13 +
 gcc/config/i386/i386.md                       |  54 +++-
 gcc/config/i386/i386.opt                      |  17 ++
 gcc/doc/extend.texi                           |   8 +
 gcc/doc/invoke.texi                           |  12 +-
 .../gcc.target/i386/zero-scratch-regs-1.c     |  10 +
 .../gcc.target/i386/zero-scratch-regs-10.c    |  19 ++
 .../gcc.target/i386/zero-scratch-regs-11.c    |  39 +++
 .../gcc.target/i386/zero-scratch-regs-12.c    |  39 +++
 .../gcc.target/i386/zero-scratch-regs-2.c     |  17 ++
 .../gcc.target/i386/zero-scratch-regs-3.c     |  10 +
 .../gcc.target/i386/zero-scratch-regs-4.c     |  12 +
 .../gcc.target/i386/zero-scratch-regs-5.c     |  18 ++
 .../gcc.target/i386/zero-scratch-regs-6.c     |  12 +
 .../gcc.target/i386/zero-scratch-regs-7.c     |  11 +
 .../gcc.target/i386/zero-scratch-regs-8.c     |  17 ++
 .../gcc.target/i386/zero-scratch-regs-9.c     |  13 +
 20 files changed, 538 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c

-- 
2.17.1

Comments

Richard Biener Sept. 27, 2018, 10:52 a.m. | #1
On Wed, Sep 26, 2018 at 8:11 PM H.J. Lu <hongjiu.lu@intel.com> wrote:
>

> Add -mzero-caller-saved-regs=[skip|used|all] command-line option and

> zero_caller_saved_regs("skip|used|all") function attribue:

>

> 1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")

>

> Don't zero caller-saved integer registers upon function return.

>

> 2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")

>

> Zero used caller-saved integer registers upon function return.

>

> 3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")

>

> Zero all caller-saved integer registers upon function return.

>

> Tested on i686 and x86-64 with bootstrapping GCC trunk and

> -mzero-caller-saved-regs=used as well as -mzero-caller-saved-regs=all

> enabled by default.


Can this be done in a target independet way?

Richard.

> gcc/

>

>         * config/i386/i386-opts.h (zero_caller_saved_regs): New enum.

>         * config/i386/i386-protos.h (ix86_split_simple_return_pop_internal):

>         Renamed to ...

>         (ix86_split_simple_return_internal): This.

>         * config/i386/i386.c (ix86_set_zero_caller_saved_regs_type): New

>         function.

>         (ix86_set_current_function): Call ix86_set_zero_caller_saved_regs_type.

>         (ix86_expand_prologue): Replace gen_prologue_use with

>         gen_pro_epilogue_use.

>         (ix86_expand_epilogue): Replace gen_simple_return_pop_internal

>         with ix86_split_simple_return_internal.  Replace

>         gen_simple_return_internal with ix86_split_simple_return_internal.

>         (ix86_find_live_outgoing_regs): New function.

>         (ix86_split_simple_return_pop_internal): Removed.

>         (ix86_split_simple_return_internal): New function.

>         (ix86_handle_fndecl_attribute): Support zero_caller_saved_regs

>         attribute.

>         (ix86_attribute_table): Add zero_caller_saved_regs.

>         * config/i386/i386.h (machine_function): Add

>         zero_caller_saved_regs_type and live_outgoing_regs.

>         (TARGET_POP_SCRATCH_REGISTER): New.

>         * config/i386/i386.md (UNSPEC_SIMPLE_RETURN): New UNSPEC.

>         (UNSPECV_PROLOGUE_USE): Renamed to ...

>         (UNSPECV_PRO_EPILOGUE_USE): This.

>         (prologue_use): Renamed to ...

>         (pro_epilogue_use): This.

>         (simple_return_internal): Changed to define_insn_and_split.

>         (simple_return_internal_1): New pattern.

>         (simple_return_pop_internal): Replace

>         ix86_split_simple_return_pop_internal with

>         ix86_split_simple_return_internal.  Always call

>         ix86_split_simple_return_internal if epilogue_completed is

>         true.

>         (simple_return_pop_internal_1): New pattern.

>         (Epilogue deallocator to pop peepholes): Enabled only if

>         TARGET_POP_SCRATCH_REGISTER is true.

>         * config/i386/i386.opt (mzero-caller-saved-regs=): New option.

>         * doc/extend.texi: Document zero_caller_saved_regs attribute.

>         * doc/invoke.texi: Document -mzero-caller-saved-regs=.

>

> gcc/testsuite/

>

>         * gcc.target/i386/zero-scratch-regs-1.c: New test.

>         * gcc.target/i386/zero-scratch-regs-2.c: Likewise.

>         * gcc.target/i386/zero-scratch-regs-3.c: Likewise.

>         * gcc.target/i386/zero-scratch-regs-4.c: Likewise.

>         * gcc.target/i386/zero-scratch-regs-5.c: Likewise.

>         * gcc.target/i386/zero-scratch-regs-6.c: Likewise.

>         * gcc.target/i386/zero-scratch-regs-7.c: Likewise.

>         * gcc.target/i386/zero-scratch-regs-8.c: Likewise.

>         * gcc.target/i386/zero-scratch-regs-9.c: Likewise.

>         * gcc.target/i386/zero-scratch-regs-10.c: Likewise.

>         * gcc.target/i386/zero-scratch-regs-11.c: Likewise.

>         * gcc.target/i386/zero-scratch-regs-12.c: Likewise.

> ---

>  gcc/config/i386/i386-opts.h                   |   7 +

>  gcc/config/i386/i386-protos.h                 |   2 +-

>  gcc/config/i386/i386.c                        | 245 ++++++++++++++++--

>  gcc/config/i386/i386.h                        |  13 +

>  gcc/config/i386/i386.md                       |  54 +++-

>  gcc/config/i386/i386.opt                      |  17 ++

>  gcc/doc/extend.texi                           |   8 +

>  gcc/doc/invoke.texi                           |  12 +-

>  .../gcc.target/i386/zero-scratch-regs-1.c     |  10 +

>  .../gcc.target/i386/zero-scratch-regs-10.c    |  19 ++

>  .../gcc.target/i386/zero-scratch-regs-11.c    |  39 +++

>  .../gcc.target/i386/zero-scratch-regs-12.c    |  39 +++

>  .../gcc.target/i386/zero-scratch-regs-2.c     |  17 ++

>  .../gcc.target/i386/zero-scratch-regs-3.c     |  10 +

>  .../gcc.target/i386/zero-scratch-regs-4.c     |  12 +

>  .../gcc.target/i386/zero-scratch-regs-5.c     |  18 ++

>  .../gcc.target/i386/zero-scratch-regs-6.c     |  12 +

>  .../gcc.target/i386/zero-scratch-regs-7.c     |  11 +

>  .../gcc.target/i386/zero-scratch-regs-8.c     |  17 ++

>  .../gcc.target/i386/zero-scratch-regs-9.c     |  13 +

>  20 files changed, 538 insertions(+), 37 deletions(-)

>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c

>

> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h

> index 46366cbfa72..7f9a92e7e5b 100644

> --- a/gcc/config/i386/i386-opts.h

> +++ b/gcc/config/i386/i386-opts.h

> @@ -119,4 +119,11 @@ enum indirect_branch {

>    indirect_branch_thunk_extern

>  };

>

> +enum zero_caller_saved_regs {

> +  zero_caller_saved_regs_unset = 0,

> +  zero_caller_saved_regs_skip,

> +  zero_caller_saved_regs_used,

> +  zero_caller_saved_regs_all

> +};

> +

>  #endif

> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h

> index d1d59633dc0..a92f34a48b1 100644

> --- a/gcc/config/i386/i386-protos.h

> +++ b/gcc/config/i386/i386-protos.h

> @@ -310,7 +310,7 @@ extern const char * ix86_output_call_insn (rtx_insn *insn, rtx call_op);

>  extern const char * ix86_output_indirect_jmp (rtx call_op);

>  extern const char * ix86_output_function_return (bool long_p);

>  extern const char * ix86_output_indirect_function_return (rtx ret_op);

> -extern void ix86_split_simple_return_pop_internal (rtx);

> +extern void ix86_split_simple_return_internal (rtx);

>  extern bool ix86_operands_ok_for_move_multiple (rtx *operands, bool load,

>                                                 machine_mode mode);

>  extern int ix86_min_insn_size (rtx_insn *);

> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c

> index ef72219f165..359062e6f44 100644

> --- a/gcc/config/i386/i386.c

> +++ b/gcc/config/i386/i386.c

> @@ -5561,6 +5561,40 @@ ix86_set_func_type (tree fndecl)

>      }

>  }

>

> +/* Set the zero_caller_saved_regs_type field from the function FNDECL.  */

> +

> +static void

> +ix86_set_zero_caller_saved_regs_type (tree fndecl)

> +{

> +  if (cfun->machine->zero_caller_saved_regs_type

> +      == zero_caller_saved_regs_unset)

> +    {

> +      tree attr = lookup_attribute ("zero_caller_saved_regs",

> +                                   DECL_ATTRIBUTES (fndecl));

> +      if (attr != NULL)

> +       {

> +         tree args = TREE_VALUE (attr);

> +         if (args == NULL)

> +           gcc_unreachable ();

> +         tree cst = TREE_VALUE (args);

> +         if (strcmp (TREE_STRING_POINTER (cst), "skip") == 0)

> +           cfun->machine->zero_caller_saved_regs_type

> +             = zero_caller_saved_regs_skip;

> +         else if (strcmp (TREE_STRING_POINTER (cst), "used") == 0)

> +           cfun->machine->zero_caller_saved_regs_type

> +             = zero_caller_saved_regs_used;

> +         else if (strcmp (TREE_STRING_POINTER (cst), "all") == 0)

> +           cfun->machine->zero_caller_saved_regs_type

> +             = zero_caller_saved_regs_all;

> +         else

> +           gcc_unreachable ();

> +       }

> +      else

> +       cfun->machine->zero_caller_saved_regs_type

> +         = ix86_zero_caller_saved_regs;

> +    }

> +}

> +

>  /* Set the indirect_branch_type field from the function FNDECL.  */

>

>  static void

> @@ -5661,6 +5695,7 @@ ix86_set_current_function (tree fndecl)

>         {

>           ix86_set_func_type (fndecl);

>           ix86_set_indirect_branch_type (fndecl);

> +         ix86_set_zero_caller_saved_regs_type (fndecl);

>         }

>        return;

>      }

> @@ -5682,6 +5717,7 @@ ix86_set_current_function (tree fndecl)

>

>    ix86_set_func_type (fndecl);

>    ix86_set_indirect_branch_type (fndecl);

> +  ix86_set_zero_caller_saved_regs_type (fndecl);

>

>    tree new_tree = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);

>    if (new_tree == NULL_TREE)

> @@ -13542,7 +13578,7 @@ ix86_expand_prologue (void)

>        insn = emit_insn (gen_set_got (pic));

>        RTX_FRAME_RELATED_P (insn) = 1;

>        add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);

> -      emit_insn (gen_prologue_use (pic));

> +      emit_insn (gen_pro_epilogue_use (pic));

>        /* Deleting already emmitted SET_GOT if exist and allocated to

>          REAL_PIC_OFFSET_TABLE_REGNUM.  */

>        ix86_elim_entry_set_got (pic);

> @@ -13571,7 +13607,7 @@ ix86_expand_prologue (void)

>       Further, prevent alloca modifications to the stack pointer from being

>       combined with prologue modifications.  */

>    if (TARGET_SEH)

> -    emit_insn (gen_prologue_use (stack_pointer_rtx));

> +    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));

>  }

>

>  /* Emit code to restore REG using a POP insn.  */

> @@ -14289,7 +14325,7 @@ ix86_expand_epilogue (int style)

>           emit_jump_insn (gen_simple_return_indirect_internal (ecx));

>         }

>        else

> -       emit_jump_insn (gen_simple_return_pop_internal (popc));

> +       ix86_split_simple_return_internal (popc);

>      }

>    else if (!m->call_ms2sysv || !restore_stub_is_tail)

>      {

> @@ -14316,7 +14352,7 @@ ix86_expand_epilogue (int style)

>           emit_jump_insn (gen_simple_return_indirect_internal (ecx));

>         }

>        else

> -       emit_jump_insn (gen_simple_return_internal ());

> +       ix86_split_simple_return_internal (NULL_RTX);

>      }

>

>    /* Restore the state back to the state from the prologue,

> @@ -28402,37 +28438,169 @@ ix86_output_indirect_function_return (rtx ret_op)

>      return "%!jmp\t%A0";

>  }

>

> -/* Split simple return with popping POPC bytes from stack to indirect

> -   branch with stack adjustment .  */

> +/* Find general registers which are live at the exit of basic block BB

> +   and set their corresponding bits in LIVE_OUTGOING_REGS.  */

> +

> +static void

> +ix86_find_live_outgoing_regs (basic_block bb,

> +                             unsigned int &live_outgoing_regs)

> +{

> +  bitmap live_out = df_get_live_out (bb);

> +

> +  bool zero_all = (cfun->machine->zero_caller_saved_regs_type

> +                  == zero_caller_saved_regs_all);

> +

> +  unsigned int regno;

> +

> +  /* Check for live outgoing registers.  */

> +  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)

> +    {

> +      /* Only zero general registers.  */

> +      if (!GENERAL_REGNO_P (regno))

> +       continue;

> +

> +      int i = regno;

> +      if (i >= FIRST_REX_INT_REG)

> +       i -= (FIRST_REX_INT_REG - LAST_INT_REG - 1);

> +

> +      /* No need to check it again if it is live.  */

> +      if ((live_outgoing_regs & (1 << i)))

> +       continue;

> +

> +      /* A register is considered LIVE if

> +        1. It is a fixed register.

> +        2. If isn't a caller-saved register.

> +        3. If it is a live outgoing register.

> +        4. It is never used in the function and we don't zero all

> +           caller-saved registers.

> +       */

> +      if (fixed_regs[regno]

> +         || !call_used_regs[regno]

> +         || REGNO_REG_SET_P (live_out, regno)

> +         || (!zero_all && !df_regs_ever_live_p (regno)))

> +       live_outgoing_regs |= 1 << i;

> +    }

> +}

> +

> +/* Split simple return with popping POPC bytes from stack, if POPC

> +   isn't NULL_RTX, and zero caller-saved general registers if needed.

> +   When popping POPC bytes from stack for -mfunction-return=, convert

> +   return to indirect branch with stack adjustment.  */

>

>  void

> -ix86_split_simple_return_pop_internal (rtx popc)

> +ix86_split_simple_return_internal (rtx popc)

>  {

> -  struct machine_function *m = cfun->machine;

> -  rtx ecx = gen_rtx_REG (SImode, CX_REG);

> -  rtx_insn *insn;

> +  /* No need to zero caller-saved registers in main ().  Don't zero

> +     caller-saved registers if __builtin_eh_return is called since it

> +     isn't a normal function return.  */

> +  if ((cfun->machine->zero_caller_saved_regs_type

> +       != zero_caller_saved_regs_skip)

> +      && !crtl->calls_eh_return

> +      && cfun->machine->func_type == TYPE_NORMAL

> +      && !MAIN_NAME_P (DECL_NAME (current_function_decl)))

> +    {

> +      unsigned int &live_outgoing_regs

> +       = cfun->machine->live_outgoing_regs;

>

> -  /* There is no "pascal" calling convention in any 64bit ABI.  */

> -  gcc_assert (!TARGET_64BIT);

> +      if (live_outgoing_regs == 0)

> +       {

> +         edge e;

> +         edge_iterator ei;

>

> -  insn = emit_insn (gen_pop (ecx));

> -  m->fs.cfa_offset -= UNITS_PER_WORD;

> -  m->fs.sp_offset -= UNITS_PER_WORD;

> +         /* ECX register is used for return with pop.  */

> +         if (popc != NULL_RTX

> +             && (cfun->machine->function_return_type

> +                 != indirect_branch_keep))

> +           live_outgoing_regs = 1 << CX_REG;

>

> -  rtx x = plus_constant (Pmode, stack_pointer_rtx, UNITS_PER_WORD);

> -  x = gen_rtx_SET (stack_pointer_rtx, x);

> -  add_reg_note (insn, REG_CFA_ADJUST_CFA, x);

> -  add_reg_note (insn, REG_CFA_REGISTER, gen_rtx_SET (ecx, pc_rtx));

> -  RTX_FRAME_RELATED_P (insn) = 1;

> +         FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)

> +           {

> +             ix86_find_live_outgoing_regs (e->src,

> +                                           live_outgoing_regs);

> +           }

> +       }

>

> -  x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, popc);

> -  x = gen_rtx_SET (stack_pointer_rtx, x);

> -  insn = emit_insn (x);

> -  add_reg_note (insn, REG_CFA_ADJUST_CFA, x);

> -  RTX_FRAME_RELATED_P (insn) = 1;

> +      rtx zero = NULL_RTX;

> +

> +      unsigned int regno;

> +

> +      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)

> +       {

> +         if (!GENERAL_REGNO_P (regno))

> +           continue;

> +

> +         int i = regno;

> +         if (i >= FIRST_REX_INT_REG)

> +           i -= (FIRST_REX_INT_REG - LAST_INT_REG - 1);

> +         if ((live_outgoing_regs & (1 << i)))

> +           continue;

> +

> +         /* Zero out dead caller-saved register.  We only need to zero

> +            the lower 32 bits.  */

> +         rtx reg = gen_rtx_REG (SImode, regno);

> +         if (zero == NULL_RTX)

> +           {

> +             zero = reg;

> +             rtx tmp = gen_rtx_SET (reg, const0_rtx);

> +             if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())

> +               {

> +                 rtx clob = gen_rtx_CLOBBER (VOIDmode,

> +                                             gen_rtx_REG (CCmode,

> +                                                          FLAGS_REG));

> +                 tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,

> +                                                              tmp,

> +                                                              clob));

> +               }

> +             emit_insn (tmp);

> +           }

> +         else

> +           emit_move_insn (reg, zero);

> +

> +         /* Mark it in use  */

> +         emit_insn (gen_pro_epilogue_use (reg));

> +       }

> +    }

> +

> +  if (popc)

> +    {

> +      if (cfun->machine->function_return_type != indirect_branch_keep)

> +       {

> +         struct machine_function *m = cfun->machine;

> +         rtx ecx = gen_rtx_REG (SImode, CX_REG);

> +         rtx_insn *insn;

> +

> +         /* There is no "pascal" calling convention in any 64bit ABI.  */

> +         gcc_assert (!TARGET_64BIT);

> +

> +         insn = emit_insn (gen_pop (ecx));

> +         m->fs.cfa_offset -= UNITS_PER_WORD;

> +         m->fs.sp_offset -= UNITS_PER_WORD;

> +

> +         rtx x = plus_constant (Pmode, stack_pointer_rtx,

> +                                UNITS_PER_WORD);

> +         x = gen_rtx_SET (stack_pointer_rtx, x);

> +         add_reg_note (insn, REG_CFA_ADJUST_CFA, x);

> +         add_reg_note (insn, REG_CFA_REGISTER,

> +                       gen_rtx_SET (ecx, pc_rtx));

> +         RTX_FRAME_RELATED_P (insn) = 1;

>

> -  /* Now return address is in ECX.  */

> -  emit_jump_insn (gen_simple_return_indirect_internal (ecx));

> +         x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, popc);

> +         x = gen_rtx_SET (stack_pointer_rtx, x);

> +         insn = emit_insn (x);

> +         add_reg_note (insn, REG_CFA_ADJUST_CFA, copy_rtx (x));

> +         RTX_FRAME_RELATED_P (insn) = 1;

> +

> +         /* Mark ECX in use  */

> +         emit_insn (gen_pro_epilogue_use (ecx));

> +

> +         /* Now return address is in ECX.  */

> +         emit_jump_insn (gen_simple_return_indirect_internal (ecx));

> +       }

> +      else

> +       emit_jump_insn (gen_simple_return_pop_internal_1 (popc));

> +    }

> +  else

> +    emit_jump_insn (gen_simple_return_internal_1 ());

>  }

>

>  /* Output the assembly for a call instruction.  */

> @@ -40798,6 +40966,27 @@ ix86_handle_fndecl_attribute (tree *node, tree name, tree args, int,

>         }

>      }

>

> +  if (is_attribute_p ("zero_caller_saved_regs", name))

> +    {

> +      tree cst = TREE_VALUE (args);

> +      if (TREE_CODE (cst) != STRING_CST)

> +       {

> +         warning (OPT_Wattributes,

> +                  "%qE attribute requires a string constant argument",

> +                  name);

> +         *no_add_attrs = true;

> +       }

> +      else if (strcmp (TREE_STRING_POINTER (cst), "skip") != 0

> +              && strcmp (TREE_STRING_POINTER (cst), "used") != 0

> +              && strcmp (TREE_STRING_POINTER (cst), "all") != 0)

> +       {

> +         warning (OPT_Wattributes,

> +                  "argument to %qE attribute is not (skip|used|all)",

> +                  name);

> +         *no_add_attrs = true;

> +       }

> +    }

> +

>    return NULL_TREE;

>  }

>

> @@ -45099,6 +45288,8 @@ static const struct attribute_spec ix86_attribute_table[] =

>      ix86_handle_fndecl_attribute, NULL },

>    { "indirect_return", 0, 0, false, true, true, false,

>      NULL, NULL },

> +  { "zero_caller_saved_regs", 1, 1, true, false, false, false,

> +    ix86_handle_fndecl_attribute, NULL },

>

>    /* End element.  */

>    { NULL, 0, 0, false, false, false, false, NULL, NULL }

> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h

> index 6445ee5d50a..60deec0a496 100644

> --- a/gcc/config/i386/i386.h

> +++ b/gcc/config/i386/i386.h

> @@ -2715,6 +2715,10 @@ struct GTY(()) machine_function {

>       the "interrupt" or "no_caller_saved_registers" attribute.  */

>    BOOL_BITFIELD no_caller_saved_registers : 1;

>

> +  /* How to clear caller-saved general registers upon function

> +     return.  */

> +  ENUM_BITFIELD(zero_caller_saved_regs) zero_caller_saved_regs_type : 3;

> +

>    /* If true, there is register available for argument passing.  This

>       is used only in ix86_function_ok_for_sibcall by 32-bit to determine

>       if there is scratch register available for indirect sibcall.  In

> @@ -2742,6 +2746,9 @@ struct GTY(()) machine_function {

>    /* If true, ENDBR is queued at function entrance.  */

>    BOOL_BITFIELD endbr_queued_at_entrance : 1;

>

> +  /* Registers live at exit.  */

> +  unsigned int live_outgoing_regs;

> +

>    /* The largest alignment, in bytes, of stack slot actually used.  */

>    unsigned int max_used_stack_alignment;

>

> @@ -2841,6 +2848,12 @@ extern void debug_dispatch_window (int);

>    (ix86_indirect_branch_register \

>     || cfun->machine->indirect_branch_type != indirect_branch_keep)

>

> +#define TARGET_POP_SCRATCH_REGISTER \

> +  (TARGET_64BIT \

> +   || (cfun->machine->zero_caller_saved_regs_type \

> +       == zero_caller_saved_regs_skip) \

> +   || cfun->machine->function_return_type == indirect_branch_keep)

> +

>  #define IX86_HLE_ACQUIRE (1 << 16)

>  #define IX86_HLE_RELEASE (1 << 17)

>

> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md

> index 86f2c032e1b..cf8faacb7e3 100644

> --- a/gcc/config/i386/i386.md

> +++ b/gcc/config/i386/i386.md

> @@ -183,6 +183,8 @@

>    UNSPEC_PDEP

>    UNSPEC_PEXT

>

> +  UNSPEC_SIMPLE_RETURN

> +

>    ;; IRET support

>    UNSPEC_INTERRUPT_RETURN

>  ])

> @@ -193,7 +195,7 @@

>    UNSPECV_STACK_PROBE

>    UNSPECV_PROBE_STACK_RANGE

>    UNSPECV_ALIGN

> -  UNSPECV_PROLOGUE_USE

> +  UNSPECV_PRO_EPILOGUE_USE

>    UNSPECV_SPLIT_STACK_RETURN

>    UNSPECV_CLD

>    UNSPECV_NOPS

> @@ -12997,8 +12999,8 @@

>

>  ;; As USE insns aren't meaningful after reload, this is used instead

>  ;; to prevent deleting instructions setting registers for PIC code

> -(define_insn "prologue_use"

> -  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]

> +(define_insn "pro_epilogue_use"

> +  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]

>    ""

>    ""

>    [(set_attr "length" "0")])

> @@ -13039,10 +13041,23 @@

>      }

>  })

>

> -(define_insn "simple_return_internal"

> +(define_insn_and_split "simple_return_internal"

>    [(simple_return)]

>    "reload_completed"

>    "* return ix86_output_function_return (false);"

> +  "&& epilogue_completed"

> +  [(const_int 0)]

> +  "ix86_split_simple_return_internal (NULL_RTX); DONE;"

> +  [(set_attr "length" "1")

> +   (set_attr "atom_unit" "jeu")

> +   (set_attr "length_immediate" "0")

> +   (set_attr "modrm" "0")])

> +

> +(define_insn "simple_return_internal_1"

> +  [(simple_return)

> +   (unspec [(const_int 0)] UNSPEC_SIMPLE_RETURN)]

> +  "reload_completed"

> +  "* return ix86_output_function_return (false);"

>    [(set_attr "length" "1")

>     (set_attr "atom_unit" "jeu")

>     (set_attr "length_immediate" "0")

> @@ -13075,9 +13090,21 @@

>     (use (match_operand:SI 0 "const_int_operand"))]

>    "reload_completed"

>    "%!ret\t%0"

> -  "&& cfun->machine->function_return_type != indirect_branch_keep"

> +  "&& (epilogue_completed

> +       || cfun->machine->function_return_type != indirect_branch_keep)"

>    [(const_int 0)]

> -  "ix86_split_simple_return_pop_internal (operands[0]); DONE;"

> +  "ix86_split_simple_return_internal (operands[0]); DONE;"

> +  [(set_attr "length" "3")

> +   (set_attr "atom_unit" "jeu")

> +   (set_attr "length_immediate" "2")

> +   (set_attr "modrm" "0")])

> +

> +(define_insn "simple_return_pop_internal_1"

> +  [(simple_return)

> +   (use (match_operand:SI 0 "const_int_operand"))

> +   (unspec [(const_int 0)] UNSPEC_SIMPLE_RETURN)]

> +  "reload_completed"

> +  "%!ret\t%0"

>    [(set_attr "length" "3")

>     (set_attr "atom_unit" "jeu")

>     (set_attr "length_immediate" "2")

> @@ -18900,6 +18927,11 @@

>     (set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))])

>

>  ;; Convert epilogue deallocator to pop.

> +;; Don't do it when

> +;; -mfunction-return= -mzero-caller-saved-regs=

> +;; is used in 32-bit snce return with stack pop needs to increment

> +;; stack register and scratch registers must be zeroed.  Pop scratch

> +;; register will load value from stack.

>  (define_peephole2

>    [(match_scratch:W 1 "r")

>     (parallel [(set (reg:P SP_REG)

> @@ -18908,6 +18940,7 @@

>               (clobber (reg:CC FLAGS_REG))

>               (clobber (mem:BLK (scratch)))])]

>    "(TARGET_SINGLE_POP || optimize_insn_for_size_p ())

> +   && TARGET_POP_SCRATCH_REGISTER

>     && INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)"

>    [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))

>               (clobber (mem:BLK (scratch)))])])

> @@ -18923,6 +18956,7 @@

>               (clobber (reg:CC FLAGS_REG))

>               (clobber (mem:BLK (scratch)))])]

>    "(TARGET_DOUBLE_POP || optimize_insn_for_size_p ())

> +   && TARGET_POP_SCRATCH_REGISTER

>     && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"

>    [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))

>               (clobber (mem:BLK (scratch)))])

> @@ -18936,6 +18970,7 @@

>               (clobber (reg:CC FLAGS_REG))

>               (clobber (mem:BLK (scratch)))])]

>    "optimize_insn_for_size_p ()

> +   && TARGET_POP_SCRATCH_REGISTER

>     && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"

>    [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))

>               (clobber (mem:BLK (scratch)))])

> @@ -18948,7 +18983,8 @@

>                    (plus:P (reg:P SP_REG)

>                            (match_operand:P 0 "const_int_operand")))

>               (clobber (reg:CC FLAGS_REG))])]

> -  "INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)"

> +  "TARGET_POP_SCRATCH_REGISTER

> +   && INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)"

>    [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))])

>

>  ;; Two pops case is tricky, since pop causes dependency

> @@ -18960,7 +18996,8 @@

>                    (plus:P (reg:P SP_REG)

>                            (match_operand:P 0 "const_int_operand")))

>               (clobber (reg:CC FLAGS_REG))])]

> -  "INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"

> +  "TARGET_POP_SCRATCH_REGISTER

> +   && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"

>    [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))

>     (set (match_dup 2) (mem:W (post_inc:P (reg:P SP_REG))))])

>

> @@ -18971,6 +19008,7 @@

>                            (match_operand:P 0 "const_int_operand")))

>               (clobber (reg:CC FLAGS_REG))])]

>    "optimize_insn_for_size_p ()

> +   && TARGET_POP_SCRATCH_REGISTER

>     && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"

>    [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))

>     (set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))])

> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt

> index e7fbf9b6f99..da9b442ecbf 100644

> --- a/gcc/config/i386/i386.opt

> +++ b/gcc/config/i386/i386.opt

> @@ -1063,3 +1063,20 @@ Support WAITPKG built-in functions and code generation.

>  mcldemote

>  Target Report Mask(ISA_CLDEMOTE) Var(ix86_isa_flags2) Save

>  Support CLDEMOTE built-in functions and code generation.

> +

> +mzero-caller-saved-regs=

> +Target Report RejectNegative Joined Enum(zero_caller_saved_regs) Var(ix86_zero_caller_saved_regs) Init(zero_caller_saved_regs_skip)

> +Clear caller-saved general registers upon function return.

> +

> +Enum

> +Name(zero_caller_saved_regs) Type(enum zero_caller_saved_regs)

> +Known choices of clearing caller-saved general registers upon function return (for use with the -mzero-caller-saved-regs= option):

> +

> +EnumValue

> +Enum(zero_caller_saved_regs) String(skip) Value(zero_caller_saved_regs_skip)

> +

> +EnumValue

> +Enum(zero_caller_saved_regs) String(used) Value(zero_caller_saved_regs_used)

> +

> +EnumValue

> +Enum(zero_caller_saved_regs) String(all) Value(zero_caller_saved_regs_all)

> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi

> index cfe6a8e5bb8..023f6155e58 100644

> --- a/gcc/doc/extend.texi

> +++ b/gcc/doc/extend.texi

> @@ -5931,6 +5931,14 @@ The @code{indirect_return} attribute can be applied to a function,

>  as well as variable or type of function pointer to inform the

>  compiler that the function may return via indirect branch.

>

> +@item zero_caller_saved_regs("@var{choice}")

> +@cindex @code{zero_caller_saved_regs} function attribute, x86

> +On x86 targets, the @code{zero_caller_saved_regs} attribute causes the

> +compiler to zero caller-saved integer registers at function return with

> +@var{choice}.  @samp{skip} doesn't zero caller-saved integer registers.

> +@samp{used} zeros caller-saved integer registers which are used in

> +function.  @samp{all} zeros all caller-saved integer registers.

> +

>  @end table

>

>  On the x86, the inliner does not inline a

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi

> index 7ef4e7a449b..796477452d5 100644

> --- a/gcc/doc/invoke.texi

> +++ b/gcc/doc/invoke.texi

> @@ -1307,7 +1307,7 @@ See RS/6000 and PowerPC Options.

>  -mstack-protector-guard-symbol=@var{symbol} @gol

>  -mgeneral-regs-only -mcall-ms2sysv-xlogues @gol

>  -mindirect-branch=@var{choice} -mfunction-return=@var{choice} @gol

> --mindirect-branch-register}

> +-mindirect-branch-register -mzero-caller-saved-regs=@var{choice}}

>

>  @emph{x86 Windows Options}

>  @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol

> @@ -28459,6 +28459,16 @@ not be reachable in the large code model.

>  @opindex -mindirect-branch-register

>  Force indirect call and jump via register.

>

> +@item -mzero-caller-saved-regs=@var{choice}

> +@opindex -mzero-caller-saved-regs

> +Zero caller-saved integer registers at function return with @var{choice}.

> +The default is @samp{skip}, which doesn't zero caller-saved integer

> +registers.  @samp{used} zeros caller-saved integer registers which are

> +used in function.  @samp{all} zeros all caller-saved integer registers.

> +You can control this behavior for a specific function by using the

> +function attribute @code{zero_caller_saved_regs}.

> +@xref{Function Attributes}.

> +

>  @end table

>

>  These @samp{-m} switches are supported in addition to the above

> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c

> new file mode 100644

> index 00000000000..08533500eff

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c

> @@ -0,0 +1,10 @@

> +/* { dg-do compile { target *-*-linux* } } */

> +/* { dg-options "-O2 -mzero-caller-saved-regs=used" } */

> +

> +void

> +foo (void)

> +{

> +}

> +

> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */

> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */

> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c

> new file mode 100644

> index 00000000000..961bb720cb2

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c

> @@ -0,0 +1,19 @@

> +/* { dg-do compile { target *-*-linux* } } */

> +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */

> +

> +extern int foo (int) __attribute__ ((zero_caller_saved_regs("all")));

> +

> +int

> +foo (int x)

> +{

> +  return x;

> +}

> +

> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */

> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c

> new file mode 100644

> index 00000000000..677c5b3d9fd

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c

> @@ -0,0 +1,39 @@

> +/* { dg-do run { target *-*-linux* } } */

> +/* { dg-options "-O2 -mzero-caller-saved-regs=used" } */

> +

> +struct S { int i; };

> +__attribute__((const, noinline, noclone))

> +struct S foo (int x)

> +{

> +  struct S s;

> +  s.i = x;

> +  return s;

> +}

> +

> +int a[2048], b[2048], c[2048], d[2048];

> +struct S e[2048];

> +

> +__attribute__((noinline, noclone)) void

> +bar (void)

> +{

> +  int i;

> +  for (i = 0; i < 1024; i++)

> +    {

> +      e[i] = foo (i);

> +      a[i+2] = a[i] + a[i+1];

> +      b[10] = b[10] + i;

> +      c[i] = c[2047 - i];

> +      d[i] = d[i + 1];

> +    }

> +}

> +

> +int

> +main ()

> +{

> +  int i;

> +  bar ();

> +  for (i = 0; i < 1024; i++)

> +    if (e[i].i != i)

> +      __builtin_abort ();

> +  return 0;

> +}

> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c

> new file mode 100644

> index 00000000000..26e48d56179

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c

> @@ -0,0 +1,39 @@

> +/* { dg-do run { target *-*-linux* } } */

> +/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */

> +

> +struct S { int i; };

> +__attribute__((const, noinline, noclone))

> +struct S foo (int x)

> +{

> +  struct S s;

> +  s.i = x;

> +  return s;

> +}

> +

> +int a[2048], b[2048], c[2048], d[2048];

> +struct S e[2048];

> +

> +__attribute__((noinline, noclone)) void

> +bar (void)

> +{

> +  int i;

> +  for (i = 0; i < 1024; i++)

> +    {

> +      e[i] = foo (i);

> +      a[i+2] = a[i] + a[i+1];

> +      b[10] = b[10] + i;

> +      c[i] = c[2047 - i];

> +      d[i] = d[i + 1];

> +    }

> +}

> +

> +int

> +main ()

> +{

> +  int i;

> +  bar ();

> +  for (i = 0; i < 1024; i++)

> +    if (e[i].i != i)

> +      __builtin_abort ();

> +  return 0;

> +}

> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c

> new file mode 100644

> index 00000000000..cc402ad605c

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c

> @@ -0,0 +1,17 @@

> +/* { dg-do compile { target *-*-linux* } } */

> +/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */

> +

> +void

> +foo (void)

> +{

> +}

> +

> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */

> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c

> new file mode 100644

> index 00000000000..ed75361d545

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c

> @@ -0,0 +1,10 @@

> +/* { dg-do compile { target *-*-linux* } } */

> +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */

> +

> +void

> +foo (void)

> +{

> +}

> +

> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */

> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */

> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c

> new file mode 100644

> index 00000000000..83e2c4efcf2

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c

> @@ -0,0 +1,12 @@

> +/* { dg-do compile { target *-*-linux* } } */

> +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */

> +

> +extern void foo (void) __attribute__ ((zero_caller_saved_regs("used")));

> +

> +void

> +foo (void)

> +{

> +}

> +

> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */

> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */

> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c

> new file mode 100644

> index 00000000000..ef902d5311a

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c

> @@ -0,0 +1,18 @@

> +/* { dg-do compile { target *-*-linux* } } */

> +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */

> +

> +__attribute__ ((zero_caller_saved_regs("all")))

> +void

> +foo (void)

> +{

> +}

> +

> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */

> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c

> new file mode 100644

> index 00000000000..91e54b5403e

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c

> @@ -0,0 +1,12 @@

> +/* { dg-do compile { target *-*-linux* } } */

> +/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */

> +

> +extern void foo (void) __attribute__ ((zero_caller_saved_regs("skip")));

> +

> +void

> +foo (void)

> +{

> +}

> +

> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */

> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */

> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c

> new file mode 100644

> index 00000000000..5e21de9bca5

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c

> @@ -0,0 +1,11 @@

> +/* { dg-do compile { target *-*-linux* } } */

> +/* { dg-options "-O2 -mzero-caller-saved-regs=used" } */

> +

> +int

> +foo (int x)

> +{

> +  return x;

> +}

> +

> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */

> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */

> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c

> new file mode 100644

> index 00000000000..27fd9e48640

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c

> @@ -0,0 +1,17 @@

> +/* { dg-do compile { target *-*-linux* } } */

> +/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */

> +

> +int

> +foo (int x)

> +{

> +  return x;

> +}

> +

> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */

> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */

> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c

> new file mode 100644

> index 00000000000..dee849d9e5e

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c

> @@ -0,0 +1,13 @@

> +/* { dg-do compile { target *-*-linux* } } */

> +/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */

> +

> +extern int foo (int) __attribute__ ((zero_caller_saved_regs("used")));

> +

> +int

> +foo (int x)

> +{

> +  return x;

> +}

> +

> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */

> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */

> --

> 2.17.1

>
H.J. Lu Sept. 27, 2018, 12:51 p.m. | #2
On Thu, Sep 27, 2018 at 3:52 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, Sep 26, 2018 at 8:11 PM H.J. Lu <hongjiu.lu@intel.com> wrote:

>>

>> Add -mzero-caller-saved-regs=[skip|used|all] command-line option and

>> zero_caller_saved_regs("skip|used|all") function attribue:

>>

>> 1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")

>>

>> Don't zero caller-saved integer registers upon function return.

>>

>> 2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")

>>

>> Zero used caller-saved integer registers upon function return.

>>

>> 3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")

>>

>> Zero all caller-saved integer registers upon function return.

>>

>> Tested on i686 and x86-64 with bootstrapping GCC trunk and

>> -mzero-caller-saved-regs=used as well as -mzero-caller-saved-regs=all

>> enabled by default.

>

> Can this be done in a target independet way?


Find out the live outgoing integer registers can be done in a target
independent way.  But zeroing caller-saved registers should be done
when epilogue is generated, which is target dependent.

> Richard.

>

>> gcc/

>>

>>         * config/i386/i386-opts.h (zero_caller_saved_regs): New enum.

>>         * config/i386/i386-protos.h (ix86_split_simple_return_pop_internal):

>>         Renamed to ...

>>         (ix86_split_simple_return_internal): This.

>>         * config/i386/i386.c (ix86_set_zero_caller_saved_regs_type): New

>>         function.

>>         (ix86_set_current_function): Call ix86_set_zero_caller_saved_regs_type.

>>         (ix86_expand_prologue): Replace gen_prologue_use with

>>         gen_pro_epilogue_use.

>>         (ix86_expand_epilogue): Replace gen_simple_return_pop_internal

>>         with ix86_split_simple_return_internal.  Replace

>>         gen_simple_return_internal with ix86_split_simple_return_internal.

>>         (ix86_find_live_outgoing_regs): New function.

>>         (ix86_split_simple_return_pop_internal): Removed.

>>         (ix86_split_simple_return_internal): New function.

>>         (ix86_handle_fndecl_attribute): Support zero_caller_saved_regs

>>         attribute.

>>         (ix86_attribute_table): Add zero_caller_saved_regs.

>>         * config/i386/i386.h (machine_function): Add

>>         zero_caller_saved_regs_type and live_outgoing_regs.

>>         (TARGET_POP_SCRATCH_REGISTER): New.

>>         * config/i386/i386.md (UNSPEC_SIMPLE_RETURN): New UNSPEC.

>>         (UNSPECV_PROLOGUE_USE): Renamed to ...

>>         (UNSPECV_PRO_EPILOGUE_USE): This.

>>         (prologue_use): Renamed to ...

>>         (pro_epilogue_use): This.

>>         (simple_return_internal): Changed to define_insn_and_split.

>>         (simple_return_internal_1): New pattern.

>>         (simple_return_pop_internal): Replace

>>         ix86_split_simple_return_pop_internal with

>>         ix86_split_simple_return_internal.  Always call

>>         ix86_split_simple_return_internal if epilogue_completed is

>>         true.

>>         (simple_return_pop_internal_1): New pattern.

>>         (Epilogue deallocator to pop peepholes): Enabled only if

>>         TARGET_POP_SCRATCH_REGISTER is true.

>>         * config/i386/i386.opt (mzero-caller-saved-regs=): New option.

>>         * doc/extend.texi: Document zero_caller_saved_regs attribute.

>>         * doc/invoke.texi: Document -mzero-caller-saved-regs=.

>>

>> gcc/testsuite/

>>

>>         * gcc.target/i386/zero-scratch-regs-1.c: New test.

>>         * gcc.target/i386/zero-scratch-regs-2.c: Likewise.

>>         * gcc.target/i386/zero-scratch-regs-3.c: Likewise.

>>         * gcc.target/i386/zero-scratch-regs-4.c: Likewise.

>>         * gcc.target/i386/zero-scratch-regs-5.c: Likewise.

>>         * gcc.target/i386/zero-scratch-regs-6.c: Likewise.

>>         * gcc.target/i386/zero-scratch-regs-7.c: Likewise.

>>         * gcc.target/i386/zero-scratch-regs-8.c: Likewise.

>>         * gcc.target/i386/zero-scratch-regs-9.c: Likewise.

>>         * gcc.target/i386/zero-scratch-regs-10.c: Likewise.

>>         * gcc.target/i386/zero-scratch-regs-11.c: Likewise.

>>         * gcc.target/i386/zero-scratch-regs-12.c: Likewise.




-- 
H.J.
Szabolcs Nagy Sept. 27, 2018, 1:08 p.m. | #3
On 26/09/18 19:10, H.J. Lu wrote:
> Add -mzero-caller-saved-regs=[skip|used|all] command-line option and

> zero_caller_saved_regs("skip|used|all") function attribue:

> 

> 1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")

> 

> Don't zero caller-saved integer registers upon function return.

> 

> 2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")

> 

> Zero used caller-saved integer registers upon function return.

> 

> 3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")

> 

> Zero all caller-saved integer registers upon function return.

> 

> Tested on i686 and x86-64 with bootstrapping GCC trunk and

> -mzero-caller-saved-regs=used as well as -mzero-caller-saved-regs=all

> enabled by default.

> 


from this description and the documentation it's
not clear to me what this tries to achieve.

is it trying to prevent information leak?
or some pcs hack the caller may rely on?

if it's for information leak then i'd expect such
attribute to be used on crypto code.. however i'd
expect crypto code to use simd registers as well,
so integer only cleaning needs explanation.
H.J. Lu Sept. 27, 2018, 1:14 p.m. | #4
On Thu, Sep 27, 2018 at 6:08 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> On 26/09/18 19:10, H.J. Lu wrote:

>>

>> Add -mzero-caller-saved-regs=[skip|used|all] command-line option and

>> zero_caller_saved_regs("skip|used|all") function attribue:

>>

>> 1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")

>>

>> Don't zero caller-saved integer registers upon function return.

>>

>> 2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")

>>

>> Zero used caller-saved integer registers upon function return.

>>

>> 3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")

>>

>> Zero all caller-saved integer registers upon function return.

>>

>> Tested on i686 and x86-64 with bootstrapping GCC trunk and

>> -mzero-caller-saved-regs=used as well as -mzero-caller-saved-regs=all

>> enabled by default.

>>

>

> from this description and the documentation it's

> not clear to me what this tries to achieve.

>

> is it trying to prevent information leak?

> or some pcs hack the caller may rely on?

>

> if it's for information leak then i'd expect such

> attribute to be used on crypto code.. however i'd

> expect crypto code to use simd registers as well,

> so integer only cleaning needs explanation.


The target usage is in Linux kernel.

-- 
H.J.
Rich Felker Sept. 27, 2018, 2:01 p.m. | #5
On Wed, Sep 26, 2018 at 11:10:29AM -0700, H.J. Lu wrote:
> Add -mzero-caller-saved-regs=[skip|used|all] command-line option and

> zero_caller_saved_regs("skip|used|all") function attribue:


Minor nit, but could this be named -mzero-call-clobbered-regs?
"Caller-saved" is a misnomer and inconsistent with other gcc usage.
For example -fcall-used-[reg] documents it as "an allocable register
that is clobbered by function calls". This is a piece of terminology
I've been trying to get fixed in educational materials so it would be
nice to avoid a new instance of it in gcc.

Rich
Richard Biener Sept. 27, 2018, 2:57 p.m. | #6
On Thu, Sep 27, 2018 at 3:16 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Thu, Sep 27, 2018 at 6:08 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:

> > On 26/09/18 19:10, H.J. Lu wrote:

> >>

> >> Add -mzero-caller-saved-regs=[skip|used|all] command-line option and

> >> zero_caller_saved_regs("skip|used|all") function attribue:

> >>

> >> 1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")

> >>

> >> Don't zero caller-saved integer registers upon function return.

> >>

> >> 2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")

> >>

> >> Zero used caller-saved integer registers upon function return.

> >>

> >> 3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")

> >>

> >> Zero all caller-saved integer registers upon function return.

> >>

> >> Tested on i686 and x86-64 with bootstrapping GCC trunk and

> >> -mzero-caller-saved-regs=used as well as -mzero-caller-saved-regs=all

> >> enabled by default.

> >>

> >

> > from this description and the documentation it's

> > not clear to me what this tries to achieve.

> >

> > is it trying to prevent information leak?

> > or some pcs hack the caller may rely on?

> >

> > if it's for information leak then i'd expect such

> > attribute to be used on crypto code.. however i'd

> > expect crypto code to use simd registers as well,

> > so integer only cleaning needs explanation.

>

> The target usage is in Linux kernel.


Maybe still somehow encode that in the option since it otherwise raises
expectations that are not met?
-mzero-call-clobbered-regs=used-int|all-int|skip|used-simd|used-fp,etc.?
and sorry() on unimplemented ones?  Or simply zero also non-integer
regs the same way?  I suppose
there isn't sth like vzeroupper that zeros all SIMD regs and completely?

Richard.

> --

> H.J.
Florian Weimer Sept. 27, 2018, 7:24 p.m. | #7
* H. J. Lu:

> +@item zero_caller_saved_regs("@var{choice}")

> +@cindex @code{zero_caller_saved_regs} function attribute, x86

> +On x86 targets, the @code{zero_caller_saved_regs} attribute causes the

> +compiler to zero caller-saved integer registers at function return with

> +@var{choice}.  @samp{skip} doesn't zero caller-saved integer registers.

> +@samp{used} zeros caller-saved integer registers which are used in

> +function.  @samp{all} zeros all caller-saved integer registers.


Perhaps “according to @var{choice}:”.  And say that the default for the
attribute is controlled by @option{-mzero-caller-saved-regs}?

(Maybe “skip” should be none?)

I assume we can check for this use __has_attribute?  We would use this
in the implementation of explicit_bzero in glibc.

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi

> index 7ef4e7a449b..796477452d5 100644

> --- a/gcc/doc/invoke.texi

> +++ b/gcc/doc/invoke.texi


> @@ -28459,6 +28459,16 @@ not be reachable in the large code model.

>  @opindex -mindirect-branch-register

>  Force indirect call and jump via register.


> +@item -mzero-caller-saved-regs=@var{choice}

> +@opindex -mzero-caller-saved-regs

> +Zero caller-saved integer registers at function return with @var{choice}.

> +The default is @samp{skip}, which doesn't zero caller-saved integer

> +registers.  @samp{used} zeros caller-saved integer registers which are

> +used in function.  @samp{all} zeros all caller-saved integer registers.

> +You can control this behavior for a specific function by using the

> +function attribute @code{zero_caller_saved_regs}.

> +@xref{Function Attributes}.


See above regarding “with @var{choice}”.

Thanks,
Florian
H.J. Lu Sept. 27, 2018, 7:30 p.m. | #8
On Thu, Sep 27, 2018 at 7:57 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Sep 27, 2018 at 3:16 PM H.J. Lu <hjl.tools@gmail.com> wrote:

>>

>> On Thu, Sep 27, 2018 at 6:08 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:

>> > On 26/09/18 19:10, H.J. Lu wrote:

>> >>

>> >> Add -mzero-caller-saved-regs=[skip|used|all] command-line option and

>> >> zero_caller_saved_regs("skip|used|all") function attribue:

>> >>

>> >> 1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")

>> >>

>> >> Don't zero caller-saved integer registers upon function return.

>> >>

>> >> 2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")

>> >>

>> >> Zero used caller-saved integer registers upon function return.

>> >>

>> >> 3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")

>> >>

>> >> Zero all caller-saved integer registers upon function return.

>> >>

>> >> Tested on i686 and x86-64 with bootstrapping GCC trunk and

>> >> -mzero-caller-saved-regs=used as well as -mzero-caller-saved-regs=all

>> >> enabled by default.

>> >>

>> >

>> > from this description and the documentation it's

>> > not clear to me what this tries to achieve.

>> >

>> > is it trying to prevent information leak?

>> > or some pcs hack the caller may rely on?

>> >

>> > if it's for information leak then i'd expect such

>> > attribute to be used on crypto code.. however i'd

>> > expect crypto code to use simd registers as well,

>> > so integer only cleaning needs explanation.

>>

>> The target usage is in Linux kernel.

>

> Maybe still somehow encode that in the option since it otherwise raises

> expectations that are not met?

> -mzero-call-clobbered-regs=used-int|all-int|skip|used-simd|used-fp,etc.?

> and sorry() on unimplemented ones?  Or simply zero also non-integer

> regs the same way?  I suppose

> there isn't sth like vzeroupper that zeros all SIMD regs and completely?

>


The problem with SIMD regs is how to cover different ISAs, widths and
number of SIMD regs.  We need to generate at least 3 different code paths
to clear SSE, AVX and AVX512 registers.

-- 
H.J.
H.J. Lu Sept. 27, 2018, 7:47 p.m. | #9
On Thu, Sep 27, 2018 at 12:24 PM, Florian Weimer <fweimer@redhat.com> wrote:
> * H. J. Lu:

>

>> +@item zero_caller_saved_regs("@var{choice}")

>> +@cindex @code{zero_caller_saved_regs} function attribute, x86

>> +On x86 targets, the @code{zero_caller_saved_regs} attribute causes the

>> +compiler to zero caller-saved integer registers at function return with

>> +@var{choice}.  @samp{skip} doesn't zero caller-saved integer registers.

>> +@samp{used} zeros caller-saved integer registers which are used in

>> +function.  @samp{all} zeros all caller-saved integer registers.

>

> Perhaps “according to @var{choice}:”.  And say that the default for the

> attribute is controlled by @option{-mzero-caller-saved-regs}?


Sure.

> (Maybe “skip” should be none?)


I have no strong opinion here.

> I assume we can check for this use __has_attribute?  We would use this


Yes.

> in the implementation of explicit_bzero in glibc.


Good to know.

>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi

>> index 7ef4e7a449b..796477452d5 100644

>> --- a/gcc/doc/invoke.texi

>> +++ b/gcc/doc/invoke.texi

>

>> @@ -28459,6 +28459,16 @@ not be reachable in the large code model.

>>  @opindex -mindirect-branch-register

>>  Force indirect call and jump via register.

>

>> +@item -mzero-caller-saved-regs=@var{choice}

>> +@opindex -mzero-caller-saved-regs

>> +Zero caller-saved integer registers at function return with @var{choice}.

>> +The default is @samp{skip}, which doesn't zero caller-saved integer

>> +registers.  @samp{used} zeros caller-saved integer registers which are

>> +used in function.  @samp{all} zeros all caller-saved integer registers.

>> +You can control this behavior for a specific function by using the

>> +function attribute @code{zero_caller_saved_regs}.

>> +@xref{Function Attributes}.

>

> See above regarding “with @var{choice}”.


I will update it.

Thanks.


-- 
H.J.
H.J. Lu Sept. 27, 2018, 7:50 p.m. | #10
On Thu, Sep 27, 2018 at 7:01 AM, Rich Felker <dalias@libc.org> wrote:
> On Wed, Sep 26, 2018 at 11:10:29AM -0700, H.J. Lu wrote:

>> Add -mzero-caller-saved-regs=[skip|used|all] command-line option and

>> zero_caller_saved_regs("skip|used|all") function attribue:

>

> Minor nit, but could this be named -mzero-call-clobbered-regs?

> "Caller-saved" is a misnomer and inconsistent with other gcc usage.

> For example -fcall-used-[reg] documents it as "an allocable register

> that is clobbered by function calls". This is a piece of terminology

> I've been trying to get fixed in educational materials so it would be

> nice to avoid a new instance of it in gcc.


I picked caller-saved since this terminology is used in x86 psABIs.
I have no strong opinion here.  We can decide on the name later
after we decide that we want this feature in GCC.


-- 
H.J.
H.J. Lu Oct. 31, 2018, 7:42 p.m. | #11
On Thu, Sep 27, 2018 at 7:58 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>

> On Thu, Sep 27, 2018 at 3:16 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> >

> > On Thu, Sep 27, 2018 at 6:08 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:

> > > On 26/09/18 19:10, H.J. Lu wrote:

> > >>

> > >> Add -mzero-caller-saved-regs=[skip|used|all] command-line option and

> > >> zero_caller_saved_regs("skip|used|all") function attribue:

> > >>

> > >> 1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")

> > >>

> > >> Don't zero caller-saved integer registers upon function return.

> > >>

> > >> 2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")

> > >>

> > >> Zero used caller-saved integer registers upon function return.

> > >>

> > >> 3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")

> > >>

> > >> Zero all caller-saved integer registers upon function return.

> > >>

> > >> Tested on i686 and x86-64 with bootstrapping GCC trunk and

> > >> -mzero-caller-saved-regs=used as well as -mzero-caller-saved-regs=all

> > >> enabled by default.

> > >>

> > >

> > > from this description and the documentation it's

> > > not clear to me what this tries to achieve.

> > >

> > > is it trying to prevent information leak?

> > > or some pcs hack the caller may rely on?

> > >

> > > if it's for information leak then i'd expect such

> > > attribute to be used on crypto code.. however i'd

> > > expect crypto code to use simd registers as well,

> > > so integer only cleaning needs explanation.

> >

> > The target usage is in Linux kernel.

>

> Maybe still somehow encode that in the option since it otherwise raises

> expectations that are not met?

> -mzero-call-clobbered-regs=used-int|all-int|skip|used-simd|used-fp,etc.?

> and sorry() on unimplemented ones?  Or simply zero also non-integer

> regs the same way?  I suppose

> there isn't sth like vzeroupper that zeros all SIMD regs and completely?

>


Here is the updated patch to zero caller-saved vector registers.   I don't
mind a different option name if it is preferred.  I may be able to create
some generic utility functions which can be used by other backends.  But
actual implementation must be target specific.

Any comments?

Thanks.


-- 
H.J.
H.J. Lu Nov. 29, 2018, 11:14 p.m. | #12
On Wed, Oct 31, 2018 at 12:42 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Thu, Sep 27, 2018 at 7:58 AM Richard Biener

> <richard.guenther@gmail.com> wrote:

> >

> > On Thu, Sep 27, 2018 at 3:16 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> > >

> > > On Thu, Sep 27, 2018 at 6:08 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:

> > > > On 26/09/18 19:10, H.J. Lu wrote:

> > > >>

> > > >> Add -mzero-caller-saved-regs=[skip|used|all] command-line option and

> > > >> zero_caller_saved_regs("skip|used|all") function attribue:

> > > >>

> > > >> 1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")

> > > >>

> > > >> Don't zero caller-saved integer registers upon function return.

> > > >>

> > > >> 2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")

> > > >>

> > > >> Zero used caller-saved integer registers upon function return.

> > > >>

> > > >> 3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")

> > > >>

> > > >> Zero all caller-saved integer registers upon function return.

> > > >>

> > > >> Tested on i686 and x86-64 with bootstrapping GCC trunk and

> > > >> -mzero-caller-saved-regs=used as well as -mzero-caller-saved-regs=all

> > > >> enabled by default.

> > > >>

> > > >

> > > > from this description and the documentation it's

> > > > not clear to me what this tries to achieve.

> > > >

> > > > is it trying to prevent information leak?

> > > > or some pcs hack the caller may rely on?

> > > >

> > > > if it's for information leak then i'd expect such

> > > > attribute to be used on crypto code.. however i'd

> > > > expect crypto code to use simd registers as well,

> > > > so integer only cleaning needs explanation.

> > >

> > > The target usage is in Linux kernel.

> >

> > Maybe still somehow encode that in the option since it otherwise raises

> > expectations that are not met?

> > -mzero-call-clobbered-regs=used-int|all-int|skip|used-simd|used-fp,etc.?

> > and sorry() on unimplemented ones?  Or simply zero also non-integer

> > regs the same way?  I suppose

> > there isn't sth like vzeroupper that zeros all SIMD regs and completely?

> >

>

> Here is the updated patch to zero caller-saved vector registers.   I don't

> mind a different option name if it is preferred.  I may be able to create

> some generic utility functions which can be used by other backends.  But

> actual implementation must be target specific.

>

> Any comments?


PING.

https://gcc.gnu.org/ml/gcc-patches/2018-10/msg02079.html

-- 
H.J.
H.J. Lu Dec. 21, 2018, 12:41 p.m. | #13
On Thu, Nov 29, 2018 at 3:14 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Wed, Oct 31, 2018 at 12:42 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> >

> > On Thu, Sep 27, 2018 at 7:58 AM Richard Biener

> > <richard.guenther@gmail.com> wrote:

> > >

> > > On Thu, Sep 27, 2018 at 3:16 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> > > >

> > > > On Thu, Sep 27, 2018 at 6:08 AM, Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:

> > > > > On 26/09/18 19:10, H.J. Lu wrote:

> > > > >>

> > > > >> Add -mzero-caller-saved-regs=[skip|used|all] command-line option and

> > > > >> zero_caller_saved_regs("skip|used|all") function attribue:

> > > > >>

> > > > >> 1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")

> > > > >>

> > > > >> Don't zero caller-saved integer registers upon function return.

> > > > >>

> > > > >> 2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")

> > > > >>

> > > > >> Zero used caller-saved integer registers upon function return.

> > > > >>

> > > > >> 3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")

> > > > >>

> > > > >> Zero all caller-saved integer registers upon function return.

> > > > >>

> > > > >> Tested on i686 and x86-64 with bootstrapping GCC trunk and

> > > > >> -mzero-caller-saved-regs=used as well as -mzero-caller-saved-regs=all

> > > > >> enabled by default.

> > > > >>

> > > > >

> > > > > from this description and the documentation it's

> > > > > not clear to me what this tries to achieve.

> > > > >

> > > > > is it trying to prevent information leak?

> > > > > or some pcs hack the caller may rely on?

> > > > >

> > > > > if it's for information leak then i'd expect such

> > > > > attribute to be used on crypto code.. however i'd

> > > > > expect crypto code to use simd registers as well,

> > > > > so integer only cleaning needs explanation.

> > > >

> > > > The target usage is in Linux kernel.

> > >

> > > Maybe still somehow encode that in the option since it otherwise raises

> > > expectations that are not met?

> > > -mzero-call-clobbered-regs=used-int|all-int|skip|used-simd|used-fp,etc.?

> > > and sorry() on unimplemented ones?  Or simply zero also non-integer

> > > regs the same way?  I suppose

> > > there isn't sth like vzeroupper that zeros all SIMD regs and completely?

> > >

> >

> > Here is the updated patch to zero caller-saved vector registers.   I don't

> > mind a different option name if it is preferred.  I may be able to create

> > some generic utility functions which can be used by other backends.  But

> > actual implementation must be target specific.

> >

> > Any comments?

>

> PING.

>

> https://gcc.gnu.org/ml/gcc-patches/2018-10/msg02079.html

>


PING.

-- 
H.J.

Patch

diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index 46366cbfa72..7f9a92e7e5b 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -119,4 +119,11 @@  enum indirect_branch {
   indirect_branch_thunk_extern
 };
 
+enum zero_caller_saved_regs {
+  zero_caller_saved_regs_unset = 0,
+  zero_caller_saved_regs_skip,
+  zero_caller_saved_regs_used,
+  zero_caller_saved_regs_all
+};
+
 #endif
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index d1d59633dc0..a92f34a48b1 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -310,7 +310,7 @@  extern const char * ix86_output_call_insn (rtx_insn *insn, rtx call_op);
 extern const char * ix86_output_indirect_jmp (rtx call_op);
 extern const char * ix86_output_function_return (bool long_p);
 extern const char * ix86_output_indirect_function_return (rtx ret_op);
-extern void ix86_split_simple_return_pop_internal (rtx);
+extern void ix86_split_simple_return_internal (rtx);
 extern bool ix86_operands_ok_for_move_multiple (rtx *operands, bool load,
 						machine_mode mode);
 extern int ix86_min_insn_size (rtx_insn *);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ef72219f165..359062e6f44 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5561,6 +5561,40 @@  ix86_set_func_type (tree fndecl)
     }
 }
 
+/* Set the zero_caller_saved_regs_type field from the function FNDECL.  */
+
+static void
+ix86_set_zero_caller_saved_regs_type (tree fndecl)
+{
+  if (cfun->machine->zero_caller_saved_regs_type
+      == zero_caller_saved_regs_unset)
+    {
+      tree attr = lookup_attribute ("zero_caller_saved_regs",
+				    DECL_ATTRIBUTES (fndecl));
+      if (attr != NULL)
+	{
+	  tree args = TREE_VALUE (attr);
+	  if (args == NULL)
+	    gcc_unreachable ();
+	  tree cst = TREE_VALUE (args);
+	  if (strcmp (TREE_STRING_POINTER (cst), "skip") == 0)
+	    cfun->machine->zero_caller_saved_regs_type
+	      = zero_caller_saved_regs_skip;
+	  else if (strcmp (TREE_STRING_POINTER (cst), "used") == 0)
+	    cfun->machine->zero_caller_saved_regs_type
+	      = zero_caller_saved_regs_used;
+	  else if (strcmp (TREE_STRING_POINTER (cst), "all") == 0)
+	    cfun->machine->zero_caller_saved_regs_type
+	      = zero_caller_saved_regs_all;
+	  else
+	    gcc_unreachable ();
+	}
+      else
+	cfun->machine->zero_caller_saved_regs_type
+	  = ix86_zero_caller_saved_regs;
+    }
+}
+
 /* Set the indirect_branch_type field from the function FNDECL.  */
 
 static void
@@ -5661,6 +5695,7 @@  ix86_set_current_function (tree fndecl)
 	{
 	  ix86_set_func_type (fndecl);
 	  ix86_set_indirect_branch_type (fndecl);
+	  ix86_set_zero_caller_saved_regs_type (fndecl);
 	}
       return;
     }
@@ -5682,6 +5717,7 @@  ix86_set_current_function (tree fndecl)
 
   ix86_set_func_type (fndecl);
   ix86_set_indirect_branch_type (fndecl);
+  ix86_set_zero_caller_saved_regs_type (fndecl);
 
   tree new_tree = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
   if (new_tree == NULL_TREE)
@@ -13542,7 +13578,7 @@  ix86_expand_prologue (void)
       insn = emit_insn (gen_set_got (pic));
       RTX_FRAME_RELATED_P (insn) = 1;
       add_reg_note (insn, REG_CFA_FLUSH_QUEUE, NULL_RTX);
-      emit_insn (gen_prologue_use (pic));
+      emit_insn (gen_pro_epilogue_use (pic));
       /* Deleting already emmitted SET_GOT if exist and allocated to
 	 REAL_PIC_OFFSET_TABLE_REGNUM.  */
       ix86_elim_entry_set_got (pic);
@@ -13571,7 +13607,7 @@  ix86_expand_prologue (void)
      Further, prevent alloca modifications to the stack pointer from being
      combined with prologue modifications.  */
   if (TARGET_SEH)
-    emit_insn (gen_prologue_use (stack_pointer_rtx));
+    emit_insn (gen_pro_epilogue_use (stack_pointer_rtx));
 }
 
 /* Emit code to restore REG using a POP insn.  */
@@ -14289,7 +14325,7 @@  ix86_expand_epilogue (int style)
 	  emit_jump_insn (gen_simple_return_indirect_internal (ecx));
 	}
       else
-	emit_jump_insn (gen_simple_return_pop_internal (popc));
+	ix86_split_simple_return_internal (popc);
     }
   else if (!m->call_ms2sysv || !restore_stub_is_tail)
     {
@@ -14316,7 +14352,7 @@  ix86_expand_epilogue (int style)
 	  emit_jump_insn (gen_simple_return_indirect_internal (ecx));
 	}
       else
-	emit_jump_insn (gen_simple_return_internal ());
+	ix86_split_simple_return_internal (NULL_RTX);
     }
 
   /* Restore the state back to the state from the prologue,
@@ -28402,37 +28438,169 @@  ix86_output_indirect_function_return (rtx ret_op)
     return "%!jmp\t%A0";
 }
 
-/* Split simple return with popping POPC bytes from stack to indirect
-   branch with stack adjustment .  */
+/* Find general registers which are live at the exit of basic block BB
+   and set their corresponding bits in LIVE_OUTGOING_REGS.  */
+
+static void
+ix86_find_live_outgoing_regs (basic_block bb,
+			      unsigned int &live_outgoing_regs)
+{
+  bitmap live_out = df_get_live_out (bb);
+
+  bool zero_all = (cfun->machine->zero_caller_saved_regs_type
+		   == zero_caller_saved_regs_all);
+
+  unsigned int regno;
+
+  /* Check for live outgoing registers.  */
+  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    {
+      /* Only zero general registers.  */
+      if (!GENERAL_REGNO_P (regno))
+	continue;
+
+      int i = regno;
+      if (i >= FIRST_REX_INT_REG)
+	i -= (FIRST_REX_INT_REG - LAST_INT_REG - 1);
+
+      /* No need to check it again if it is live.  */
+      if ((live_outgoing_regs & (1 << i)))
+	continue;
+
+      /* A register is considered LIVE if
+	 1. It is a fixed register.
+	 2. If isn't a caller-saved register.
+	 3. If it is a live outgoing register.
+	 4. It is never used in the function and we don't zero all
+	    caller-saved registers.
+       */
+      if (fixed_regs[regno]
+	  || !call_used_regs[regno]
+	  || REGNO_REG_SET_P (live_out, regno)
+	  || (!zero_all && !df_regs_ever_live_p (regno)))
+	live_outgoing_regs |= 1 << i;
+    }
+}
+
+/* Split simple return with popping POPC bytes from stack, if POPC
+   isn't NULL_RTX, and zero caller-saved general registers if needed.
+   When popping POPC bytes from stack for -mfunction-return=, convert
+   return to indirect branch with stack adjustment.  */
 
 void
-ix86_split_simple_return_pop_internal (rtx popc)
+ix86_split_simple_return_internal (rtx popc)
 {
-  struct machine_function *m = cfun->machine;
-  rtx ecx = gen_rtx_REG (SImode, CX_REG);
-  rtx_insn *insn;
+  /* No need to zero caller-saved registers in main ().  Don't zero
+     caller-saved registers if __builtin_eh_return is called since it
+     isn't a normal function return.  */
+  if ((cfun->machine->zero_caller_saved_regs_type
+       != zero_caller_saved_regs_skip)
+      && !crtl->calls_eh_return
+      && cfun->machine->func_type == TYPE_NORMAL
+      && !MAIN_NAME_P (DECL_NAME (current_function_decl)))
+    {
+      unsigned int &live_outgoing_regs
+	= cfun->machine->live_outgoing_regs;
 
-  /* There is no "pascal" calling convention in any 64bit ABI.  */
-  gcc_assert (!TARGET_64BIT);
+      if (live_outgoing_regs == 0)
+	{
+	  edge e;
+	  edge_iterator ei;
 
-  insn = emit_insn (gen_pop (ecx));
-  m->fs.cfa_offset -= UNITS_PER_WORD;
-  m->fs.sp_offset -= UNITS_PER_WORD;
+	  /* ECX register is used for return with pop.  */
+	  if (popc != NULL_RTX
+	      && (cfun->machine->function_return_type
+		  != indirect_branch_keep))
+	    live_outgoing_regs = 1 << CX_REG;
 
-  rtx x = plus_constant (Pmode, stack_pointer_rtx, UNITS_PER_WORD);
-  x = gen_rtx_SET (stack_pointer_rtx, x);
-  add_reg_note (insn, REG_CFA_ADJUST_CFA, x);
-  add_reg_note (insn, REG_CFA_REGISTER, gen_rtx_SET (ecx, pc_rtx));
-  RTX_FRAME_RELATED_P (insn) = 1;
+	  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+	    {
+	      ix86_find_live_outgoing_regs (e->src,
+					    live_outgoing_regs);
+	    }
+	}
 
-  x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, popc);
-  x = gen_rtx_SET (stack_pointer_rtx, x);
-  insn = emit_insn (x);
-  add_reg_note (insn, REG_CFA_ADJUST_CFA, x);
-  RTX_FRAME_RELATED_P (insn) = 1;
+      rtx zero = NULL_RTX;
+
+      unsigned int regno;
+
+      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+	{
+	  if (!GENERAL_REGNO_P (regno))
+	    continue;
+
+	  int i = regno;
+	  if (i >= FIRST_REX_INT_REG)
+	    i -= (FIRST_REX_INT_REG - LAST_INT_REG - 1);
+	  if ((live_outgoing_regs & (1 << i)))
+	    continue;
+
+	  /* Zero out dead caller-saved register.  We only need to zero
+	     the lower 32 bits.  */
+	  rtx reg = gen_rtx_REG (SImode, regno);
+	  if (zero == NULL_RTX)
+	    {
+	      zero = reg;
+	      rtx tmp = gen_rtx_SET (reg, const0_rtx);
+	      if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
+		{
+		  rtx clob = gen_rtx_CLOBBER (VOIDmode,
+					      gen_rtx_REG (CCmode,
+							   FLAGS_REG));
+		  tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
+							       tmp,
+							       clob));
+		}
+	      emit_insn (tmp);
+	    }
+	  else
+	    emit_move_insn (reg, zero);
+
+	  /* Mark it in use  */
+	  emit_insn (gen_pro_epilogue_use (reg));
+	}
+    }
+
+  if (popc)
+    {
+      if (cfun->machine->function_return_type != indirect_branch_keep)
+	{
+	  struct machine_function *m = cfun->machine;
+	  rtx ecx = gen_rtx_REG (SImode, CX_REG);
+	  rtx_insn *insn;
+
+	  /* There is no "pascal" calling convention in any 64bit ABI.  */
+	  gcc_assert (!TARGET_64BIT);
+
+	  insn = emit_insn (gen_pop (ecx));
+	  m->fs.cfa_offset -= UNITS_PER_WORD;
+	  m->fs.sp_offset -= UNITS_PER_WORD;
+
+	  rtx x = plus_constant (Pmode, stack_pointer_rtx,
+				 UNITS_PER_WORD);
+	  x = gen_rtx_SET (stack_pointer_rtx, x);
+	  add_reg_note (insn, REG_CFA_ADJUST_CFA, x);
+	  add_reg_note (insn, REG_CFA_REGISTER,
+			gen_rtx_SET (ecx, pc_rtx));
+	  RTX_FRAME_RELATED_P (insn) = 1;
 
-  /* Now return address is in ECX.  */
-  emit_jump_insn (gen_simple_return_indirect_internal (ecx));
+	  x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, popc);
+	  x = gen_rtx_SET (stack_pointer_rtx, x);
+	  insn = emit_insn (x);
+	  add_reg_note (insn, REG_CFA_ADJUST_CFA, copy_rtx (x));
+	  RTX_FRAME_RELATED_P (insn) = 1;
+
+	  /* Mark ECX in use  */
+	  emit_insn (gen_pro_epilogue_use (ecx));
+
+	  /* Now return address is in ECX.  */
+	  emit_jump_insn (gen_simple_return_indirect_internal (ecx));
+	}
+      else
+	emit_jump_insn (gen_simple_return_pop_internal_1 (popc));
+    }
+  else
+    emit_jump_insn (gen_simple_return_internal_1 ());
 }
 
 /* Output the assembly for a call instruction.  */
@@ -40798,6 +40966,27 @@  ix86_handle_fndecl_attribute (tree *node, tree name, tree args, int,
 	}
     }
 
+  if (is_attribute_p ("zero_caller_saved_regs", name))
+    {
+      tree cst = TREE_VALUE (args);
+      if (TREE_CODE (cst) != STRING_CST)
+	{
+	  warning (OPT_Wattributes,
+		   "%qE attribute requires a string constant argument",
+		   name);
+	  *no_add_attrs = true;
+	}
+      else if (strcmp (TREE_STRING_POINTER (cst), "skip") != 0
+	       && strcmp (TREE_STRING_POINTER (cst), "used") != 0
+	       && strcmp (TREE_STRING_POINTER (cst), "all") != 0)
+	{
+	  warning (OPT_Wattributes,
+		   "argument to %qE attribute is not (skip|used|all)",
+		   name);
+	  *no_add_attrs = true;
+	}
+    }
+
   return NULL_TREE;
 }
 
@@ -45099,6 +45288,8 @@  static const struct attribute_spec ix86_attribute_table[] =
     ix86_handle_fndecl_attribute, NULL },
   { "indirect_return", 0, 0, false, true, true, false,
     NULL, NULL },
+  { "zero_caller_saved_regs", 1, 1, true, false, false, false,
+    ix86_handle_fndecl_attribute, NULL },
 
   /* End element.  */
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 6445ee5d50a..60deec0a496 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2715,6 +2715,10 @@  struct GTY(()) machine_function {
      the "interrupt" or "no_caller_saved_registers" attribute.  */
   BOOL_BITFIELD no_caller_saved_registers : 1;
 
+  /* How to clear caller-saved general registers upon function
+     return.  */
+  ENUM_BITFIELD(zero_caller_saved_regs) zero_caller_saved_regs_type : 3;
+
   /* If true, there is register available for argument passing.  This
      is used only in ix86_function_ok_for_sibcall by 32-bit to determine
      if there is scratch register available for indirect sibcall.  In
@@ -2742,6 +2746,9 @@  struct GTY(()) machine_function {
   /* If true, ENDBR is queued at function entrance.  */
   BOOL_BITFIELD endbr_queued_at_entrance : 1;
 
+  /* Registers live at exit.  */
+  unsigned int live_outgoing_regs;
+
   /* The largest alignment, in bytes, of stack slot actually used.  */
   unsigned int max_used_stack_alignment;
 
@@ -2841,6 +2848,12 @@  extern void debug_dispatch_window (int);
   (ix86_indirect_branch_register \
    || cfun->machine->indirect_branch_type != indirect_branch_keep)
 
+#define TARGET_POP_SCRATCH_REGISTER \
+  (TARGET_64BIT \
+   || (cfun->machine->zero_caller_saved_regs_type \
+       == zero_caller_saved_regs_skip) \
+   || cfun->machine->function_return_type == indirect_branch_keep)
+
 #define IX86_HLE_ACQUIRE (1 << 16)
 #define IX86_HLE_RELEASE (1 << 17)
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 86f2c032e1b..cf8faacb7e3 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -183,6 +183,8 @@ 
   UNSPEC_PDEP
   UNSPEC_PEXT
 
+  UNSPEC_SIMPLE_RETURN
+
   ;; IRET support
   UNSPEC_INTERRUPT_RETURN
 ])
@@ -193,7 +195,7 @@ 
   UNSPECV_STACK_PROBE
   UNSPECV_PROBE_STACK_RANGE
   UNSPECV_ALIGN
-  UNSPECV_PROLOGUE_USE
+  UNSPECV_PRO_EPILOGUE_USE
   UNSPECV_SPLIT_STACK_RETURN
   UNSPECV_CLD
   UNSPECV_NOPS
@@ -12997,8 +12999,8 @@ 
 
 ;; As USE insns aren't meaningful after reload, this is used instead
 ;; to prevent deleting instructions setting registers for PIC code
-(define_insn "prologue_use"
-  [(unspec_volatile [(match_operand 0)] UNSPECV_PROLOGUE_USE)]
+(define_insn "pro_epilogue_use"
+  [(unspec_volatile [(match_operand 0)] UNSPECV_PRO_EPILOGUE_USE)]
   ""
   ""
   [(set_attr "length" "0")])
@@ -13039,10 +13041,23 @@ 
     }
 })
 
-(define_insn "simple_return_internal"
+(define_insn_and_split "simple_return_internal"
   [(simple_return)]
   "reload_completed"
   "* return ix86_output_function_return (false);"
+  "&& epilogue_completed"
+  [(const_int 0)]
+  "ix86_split_simple_return_internal (NULL_RTX); DONE;"
+  [(set_attr "length" "1")
+   (set_attr "atom_unit" "jeu")
+   (set_attr "length_immediate" "0")
+   (set_attr "modrm" "0")])
+
+(define_insn "simple_return_internal_1"
+  [(simple_return)
+   (unspec [(const_int 0)] UNSPEC_SIMPLE_RETURN)]
+  "reload_completed"
+  "* return ix86_output_function_return (false);"
   [(set_attr "length" "1")
    (set_attr "atom_unit" "jeu")
    (set_attr "length_immediate" "0")
@@ -13075,9 +13090,21 @@ 
    (use (match_operand:SI 0 "const_int_operand"))]
   "reload_completed"
   "%!ret\t%0"
-  "&& cfun->machine->function_return_type != indirect_branch_keep"
+  "&& (epilogue_completed
+       || cfun->machine->function_return_type != indirect_branch_keep)"
   [(const_int 0)]
-  "ix86_split_simple_return_pop_internal (operands[0]); DONE;"
+  "ix86_split_simple_return_internal (operands[0]); DONE;"
+  [(set_attr "length" "3")
+   (set_attr "atom_unit" "jeu")
+   (set_attr "length_immediate" "2")
+   (set_attr "modrm" "0")])
+
+(define_insn "simple_return_pop_internal_1"
+  [(simple_return)
+   (use (match_operand:SI 0 "const_int_operand"))
+   (unspec [(const_int 0)] UNSPEC_SIMPLE_RETURN)]
+  "reload_completed"
+  "%!ret\t%0"
   [(set_attr "length" "3")
    (set_attr "atom_unit" "jeu")
    (set_attr "length_immediate" "2")
@@ -18900,6 +18927,11 @@ 
    (set (mem:W (pre_dec:P (reg:P SP_REG))) (match_dup 1))])
 
 ;; Convert epilogue deallocator to pop.
+;; Don't do it when
+;; -mfunction-return= -mzero-caller-saved-regs=
+;; is used in 32-bit snce return with stack pop needs to increment
+;; stack register and scratch registers must be zeroed.  Pop scratch
+;; register will load value from stack.
 (define_peephole2
   [(match_scratch:W 1 "r")
    (parallel [(set (reg:P SP_REG)
@@ -18908,6 +18940,7 @@ 
 	      (clobber (reg:CC FLAGS_REG))
 	      (clobber (mem:BLK (scratch)))])]
   "(TARGET_SINGLE_POP || optimize_insn_for_size_p ())
+   && TARGET_POP_SCRATCH_REGISTER
    && INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)"
   [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
 	      (clobber (mem:BLK (scratch)))])])
@@ -18923,6 +18956,7 @@ 
 	      (clobber (reg:CC FLAGS_REG))
 	      (clobber (mem:BLK (scratch)))])]
   "(TARGET_DOUBLE_POP || optimize_insn_for_size_p ())
+   && TARGET_POP_SCRATCH_REGISTER
    && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
   [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
 	      (clobber (mem:BLK (scratch)))])
@@ -18936,6 +18970,7 @@ 
 	      (clobber (reg:CC FLAGS_REG))
 	      (clobber (mem:BLK (scratch)))])]
   "optimize_insn_for_size_p ()
+   && TARGET_POP_SCRATCH_REGISTER
    && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
   [(parallel [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
 	      (clobber (mem:BLK (scratch)))])
@@ -18948,7 +18983,8 @@ 
 		   (plus:P (reg:P SP_REG)
 			   (match_operand:P 0 "const_int_operand")))
 	      (clobber (reg:CC FLAGS_REG))])]
-  "INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)"
+  "TARGET_POP_SCRATCH_REGISTER
+   && INTVAL (operands[0]) == GET_MODE_SIZE (word_mode)"
   [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))])
 
 ;; Two pops case is tricky, since pop causes dependency
@@ -18960,7 +18996,8 @@ 
 		   (plus:P (reg:P SP_REG)
 			   (match_operand:P 0 "const_int_operand")))
 	      (clobber (reg:CC FLAGS_REG))])]
-  "INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
+  "TARGET_POP_SCRATCH_REGISTER
+   && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
   [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
    (set (match_dup 2) (mem:W (post_inc:P (reg:P SP_REG))))])
 
@@ -18971,6 +19008,7 @@ 
 			   (match_operand:P 0 "const_int_operand")))
 	      (clobber (reg:CC FLAGS_REG))])]
   "optimize_insn_for_size_p ()
+   && TARGET_POP_SCRATCH_REGISTER
    && INTVAL (operands[0]) == 2*GET_MODE_SIZE (word_mode)"
   [(set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))
    (set (match_dup 1) (mem:W (post_inc:P (reg:P SP_REG))))])
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index e7fbf9b6f99..da9b442ecbf 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1063,3 +1063,20 @@  Support WAITPKG built-in functions and code generation.
 mcldemote
 Target Report Mask(ISA_CLDEMOTE) Var(ix86_isa_flags2) Save
 Support CLDEMOTE built-in functions and code generation.
+
+mzero-caller-saved-regs=
+Target Report RejectNegative Joined Enum(zero_caller_saved_regs) Var(ix86_zero_caller_saved_regs) Init(zero_caller_saved_regs_skip)
+Clear caller-saved general registers upon function return.
+
+Enum
+Name(zero_caller_saved_regs) Type(enum zero_caller_saved_regs)
+Known choices of clearing caller-saved general registers upon function return (for use with the -mzero-caller-saved-regs= option):
+
+EnumValue
+Enum(zero_caller_saved_regs) String(skip) Value(zero_caller_saved_regs_skip)
+
+EnumValue
+Enum(zero_caller_saved_regs) String(used) Value(zero_caller_saved_regs_used)
+
+EnumValue
+Enum(zero_caller_saved_regs) String(all) Value(zero_caller_saved_regs_all)
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cfe6a8e5bb8..023f6155e58 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -5931,6 +5931,14 @@  The @code{indirect_return} attribute can be applied to a function,
 as well as variable or type of function pointer to inform the
 compiler that the function may return via indirect branch.
 
+@item zero_caller_saved_regs("@var{choice}")
+@cindex @code{zero_caller_saved_regs} function attribute, x86
+On x86 targets, the @code{zero_caller_saved_regs} attribute causes the
+compiler to zero caller-saved integer registers at function return with
+@var{choice}.  @samp{skip} doesn't zero caller-saved integer registers.
+@samp{used} zeros caller-saved integer registers which are used in
+function.  @samp{all} zeros all caller-saved integer registers.
+
 @end table
 
 On the x86, the inliner does not inline a
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7ef4e7a449b..796477452d5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1307,7 +1307,7 @@  See RS/6000 and PowerPC Options.
 -mstack-protector-guard-symbol=@var{symbol} @gol
 -mgeneral-regs-only -mcall-ms2sysv-xlogues @gol
 -mindirect-branch=@var{choice} -mfunction-return=@var{choice} @gol
--mindirect-branch-register}
+-mindirect-branch-register -mzero-caller-saved-regs=@var{choice}}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
@@ -28459,6 +28459,16 @@  not be reachable in the large code model.
 @opindex -mindirect-branch-register
 Force indirect call and jump via register.
 
+@item -mzero-caller-saved-regs=@var{choice}
+@opindex -mzero-caller-saved-regs
+Zero caller-saved integer registers at function return with @var{choice}.
+The default is @samp{skip}, which doesn't zero caller-saved integer
+registers.  @samp{used} zeros caller-saved integer registers which are
+used in function.  @samp{all} zeros all caller-saved integer registers.
+You can control this behavior for a specific function by using the
+function attribute @code{zero_caller_saved_regs}.
+@xref{Function Attributes}.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
new file mode 100644
index 00000000000..08533500eff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=used" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
new file mode 100644
index 00000000000..961bb720cb2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
+
+extern int foo (int) __attribute__ ((zero_caller_saved_regs("all")));
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
new file mode 100644
index 00000000000..677c5b3d9fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
@@ -0,0 +1,39 @@ 
+/* { dg-do run { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=used" } */
+
+struct S { int i; };
+__attribute__((const, noinline, noclone))
+struct S foo (int x)
+{
+  struct S s;
+  s.i = x;
+  return s;
+}
+
+int a[2048], b[2048], c[2048], d[2048];
+struct S e[2048];
+
+__attribute__((noinline, noclone)) void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      e[i] = foo (i);
+      a[i+2] = a[i] + a[i+1];
+      b[10] = b[10] + i;
+      c[i] = c[2047 - i];
+      d[i] = d[i + 1];
+    }
+}
+
+int
+main ()
+{
+  int i;
+  bar ();
+  for (i = 0; i < 1024; i++)
+    if (e[i].i != i)
+      __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
new file mode 100644
index 00000000000..26e48d56179
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
@@ -0,0 +1,39 @@ 
+/* { dg-do run { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */
+
+struct S { int i; };
+__attribute__((const, noinline, noclone))
+struct S foo (int x)
+{
+  struct S s;
+  s.i = x;
+  return s;
+}
+
+int a[2048], b[2048], c[2048], d[2048];
+struct S e[2048];
+
+__attribute__((noinline, noclone)) void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      e[i] = foo (i);
+      a[i+2] = a[i] + a[i+1];
+      b[10] = b[10] + i;
+      c[i] = c[2047 - i];
+      d[i] = d[i + 1];
+    }
+}
+
+int
+main ()
+{
+  int i;
+  bar ();
+  for (i = 0; i < 1024; i++)
+    if (e[i].i != i)
+      __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
new file mode 100644
index 00000000000..cc402ad605c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
@@ -0,0 +1,17 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
new file mode 100644
index 00000000000..ed75361d545
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
new file mode 100644
index 00000000000..83e2c4efcf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
+
+extern void foo (void) __attribute__ ((zero_caller_saved_regs("used")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
new file mode 100644
index 00000000000..ef902d5311a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
@@ -0,0 +1,18 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
+
+__attribute__ ((zero_caller_saved_regs("all")))
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
new file mode 100644
index 00000000000..91e54b5403e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */
+
+extern void foo (void) __attribute__ ((zero_caller_saved_regs("skip")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
new file mode 100644
index 00000000000..5e21de9bca5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=used" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
new file mode 100644
index 00000000000..27fd9e48640
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
@@ -0,0 +1,17 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=all" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
new file mode 100644
index 00000000000..dee849d9e5e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -mzero-caller-saved-regs=skip" } */
+
+extern int foo (int) __attribute__ ((zero_caller_saved_regs("used")));
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */