[1/2] Add new RTX instruction class FILLER_INSN

Message ID gkry2nbyiiu.fsf@arm.com
State New
Headers show
Series
  • [1/2] Add new RTX instruction class FILLER_INSN
Related show

Commit Message

Andrea Corallo July 22, 2020, 10:02 a.m.
Hi all,

I'd like to submit the following two patches implementing a new AArch64
specific back-end pass that helps optimize branch-dense code, which can
be a bottleneck for performance on some Arm cores.  This is achieved by
padding out the branch-dense sections of the instruction stream with
nops.

The original patch was already posted some time ago:

https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg200721.html

This follows up splitting as suggested in two patches, rebasing on
master and implementing the suggestions of the first code review.

This first patch implements the addition of a new RTX instruction class
FILLER_INSN, which has been white listed to allow placement of NOPs
outside of a basic block.  This is to allow padding after unconditional
branches.  This is favorable so that any performance gained from
diluting branches is not paid straight back via excessive eating of
nops.

It was deemed that a new RTX class was less invasive than modifying
behavior in regards to standard UNSPEC nops.

1/2 is requirement for 2/2.  Please see this the cover letter of this last
for more details on the pass itself.

Regards

  Andrea

gcc/ChangeLog

2020-07-17  Andrea Corallo  <andrea.corallo@arm.com>
	    Carey Williams  <carey.williams@arm.com>

	* cfgbuild.c (inside_basic_block_p): Handle FILLER_INSN.
	* cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside
	basic blocks.
	* coretypes.h: New rtx class.
	* emit-rtl.c (emit_filler_after): New function.
	* rtl.def (FILLER_INSN): New rtl define.
	* rtl.h (rtx_filler_insn): Define new structure.
	(FILLER_INSN_P): New macro.
	(is_a_helper <rtx_filler_insn *>::test): New test helper for
	rtx_filler_insn.
	(emit_filler_after): New extern.
	* target-insns.def: Add target insn definition.

Comments

Martin Sebor via Gcc-patches July 22, 2020, 12:24 p.m. | #1
On Wed, Jul 22, 2020 at 12:03 PM Andrea Corallo <andrea.corallo@arm.com> wrote:
>

> Hi all,

>

> I'd like to submit the following two patches implementing a new AArch64

> specific back-end pass that helps optimize branch-dense code, which can

> be a bottleneck for performance on some Arm cores.  This is achieved by

> padding out the branch-dense sections of the instruction stream with

> nops.

>

> The original patch was already posted some time ago:

>

> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg200721.html

>

> This follows up splitting as suggested in two patches, rebasing on

> master and implementing the suggestions of the first code review.

>

> This first patch implements the addition of a new RTX instruction class

> FILLER_INSN, which has been white listed to allow placement of NOPs

> outside of a basic block.  This is to allow padding after unconditional

> branches.  This is favorable so that any performance gained from

> diluting branches is not paid straight back via excessive eating of

> nops.

>

> It was deemed that a new RTX class was less invasive than modifying

> behavior in regards to standard UNSPEC nops.

>

> 1/2 is requirement for 2/2.  Please see this the cover letter of this last

> for more details on the pass itself.


I wonder if such effect of instructions on the pipeline can be modeled
in the DFA and thus whether the scheduler could issue (always ready)
NOPs?

I also wonder whether such optimization is better suited for the assembler
which should know instruction lengths and alignment in a more precise
way and also would know whether extra nops make immediates too large
for pc relative things like short branches or section anchor accesses
(or whatever else)?

Richard.

> Regards

>

>   Andrea

>

> gcc/ChangeLog

>

> 2020-07-17  Andrea Corallo  <andrea.corallo@arm.com>

>             Carey Williams  <carey.williams@arm.com>

>

>         * cfgbuild.c (inside_basic_block_p): Handle FILLER_INSN.

>         * cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside

>         basic blocks.

>         * coretypes.h: New rtx class.

>         * emit-rtl.c (emit_filler_after): New function.

>         * rtl.def (FILLER_INSN): New rtl define.

>         * rtl.h (rtx_filler_insn): Define new structure.

>         (FILLER_INSN_P): New macro.

>         (is_a_helper <rtx_filler_insn *>::test): New test helper for

>         rtx_filler_insn.

>         (emit_filler_after): New extern.

>         * target-insns.def: Add target insn definition.
Richard Earnshaw (lists) July 22, 2020, 1:16 p.m. | #2
On 22/07/2020 13:24, Richard Biener via Gcc-patches wrote:
> On Wed, Jul 22, 2020 at 12:03 PM Andrea Corallo <andrea.corallo@arm.com> wrote:

>>

>> Hi all,

>>

>> I'd like to submit the following two patches implementing a new AArch64

>> specific back-end pass that helps optimize branch-dense code, which can

>> be a bottleneck for performance on some Arm cores.  This is achieved by

>> padding out the branch-dense sections of the instruction stream with

>> nops.

>>

>> The original patch was already posted some time ago:

>>

>> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg200721.html

>>

>> This follows up splitting as suggested in two patches, rebasing on

>> master and implementing the suggestions of the first code review.

>>

>> This first patch implements the addition of a new RTX instruction class

>> FILLER_INSN, which has been white listed to allow placement of NOPs

>> outside of a basic block.  This is to allow padding after unconditional

>> branches.  This is favorable so that any performance gained from

>> diluting branches is not paid straight back via excessive eating of

>> nops.

>>

>> It was deemed that a new RTX class was less invasive than modifying

>> behavior in regards to standard UNSPEC nops.

>>

>> 1/2 is requirement for 2/2.  Please see this the cover letter of this last

>> for more details on the pass itself.

> 

> I wonder if such effect of instructions on the pipeline can be modeled

> in the DFA and thus whether the scheduler could issue (always ready)

> NOPs?

> 

> I also wonder whether such optimization is better suited for the assembler

> which should know instruction lengths and alignment in a more precise

> way and also would know whether extra nops make immediates too large

> for pc relative things like short branches or section anchor accesses

> (or whatever else)?


No, the assembler should never spontaneously insert instructions.  That
breaks the branch range calculations that the compiler relies upon.

R.

> 

> Richard.

> 

>> Regards

>>

>>   Andrea

>>

>> gcc/ChangeLog

>>

>> 2020-07-17  Andrea Corallo  <andrea.corallo@arm.com>

>>             Carey Williams  <carey.williams@arm.com>

>>

>>         * cfgbuild.c (inside_basic_block_p): Handle FILLER_INSN.

>>         * cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside

>>         basic blocks.

>>         * coretypes.h: New rtx class.

>>         * emit-rtl.c (emit_filler_after): New function.

>>         * rtl.def (FILLER_INSN): New rtl define.

>>         * rtl.h (rtx_filler_insn): Define new structure.

>>         (FILLER_INSN_P): New macro.

>>         (is_a_helper <rtx_filler_insn *>::test): New test helper for

>>         rtx_filler_insn.

>>         (emit_filler_after): New extern.

>>         * target-insns.def: Add target insn definition.
Andrea Corallo July 22, 2020, 2:51 p.m. | #3
Richard Biener <richard.guenther@gmail.com> writes:

> I wonder if such effect of instructions on the pipeline can be modeled

> in the DFA and thus whether the scheduler could issue (always ready)

> NOPs?


I might be wrong but the DFA model should be reasoning in terms of
executed instructions given an execution path, on the contrary this is
taking in account the 'footprint' of the branches of a program in
memory.  This is what some u-arch is sentive to.

  Andrea
Joseph Myers July 22, 2020, 6:41 p.m. | #4
New insn types should be documented in rtl.texi (I think in the "Insns" 
section).

-- 
Joseph S. Myers
joseph@codesourcery.com
Segher Boessenkool July 24, 2020, 9:18 p.m. | #5
Hi Andrea,

On Wed, Jul 22, 2020 at 12:02:33PM +0200, Andrea Corallo wrote:
> This first patch implements the addition of a new RTX instruction class

> FILLER_INSN, which has been white listed to allow placement of NOPs

> outside of a basic block.  This is to allow padding after unconditional

> branches.  This is favorable so that any performance gained from

> diluting branches is not paid straight back via excessive eating of

> nops.


> It was deemed that a new RTX class was less invasive than modifying

> behavior in regards to standard UNSPEC nops.


So I wonder if this cannot be done with some kind of NOTE, instead?

> +	      /* Allow nops after branches, via FILLER_INSN.  */

> +	      bool fail = true;

> +	      subrtx_iterator::array_type array;

> +	      FOR_EACH_SUBRTX (iter, array, x, ALL)

> +		{

> +		  const_rtx rtx = *iter;

> +		  if (GET_CODE (rtx) == FILLER_INSN)

> +		    {

> +		      fail = false;

> +		      break;

> +		    }

> +		}

> +	      if (fail)

> +		fatal_insn ("insn outside basic block", x);


This stops checking after finding a FILLER_INSN as first insn.  Is that
on purpose?

> +/* Make an insn of code FILLER_INSN to

> +   pad out the instruction stream.

> +   PATTERN should be from gen_filler_insn ().

> +   AFTER will typically be an unconditional

> +   branch at the end of a basic block.  */


Because it only allows one particular insn pattern, it should be pretty
easy to use a NOTE for this, or even a normal INSN where the ouside-of-BB
test can then just look at its pattern.


As you see, I really do not like to have another RTX class, without very
well defined semantics even.  Not without first being shown no
alternatives are acceptable, anyway :-)


Segher
Eric Botcazou July 26, 2020, 6:19 p.m. | #6
> As you see, I really do not like to have another RTX class, without very

> well defined semantics even.  Not without first being shown no

> alternatives are acceptable, anyway :-)


Seconded.

-- 
Eric Botcazou
Andrea Corallo July 28, 2020, 7:29 p.m. | #7
Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi Andrea,

>

> On Wed, Jul 22, 2020 at 12:02:33PM +0200, Andrea Corallo wrote:

>> This first patch implements the addition of a new RTX instruction class

>> FILLER_INSN, which has been white listed to allow placement of NOPs

>> outside of a basic block.  This is to allow padding after unconditional

>> branches.  This is favorable so that any performance gained from

>> diluting branches is not paid straight back via excessive eating of

>> nops.

>

>> It was deemed that a new RTX class was less invasive than modifying

>> behavior in regards to standard UNSPEC nops.

>

> So I wonder if this cannot be done with some kind of NOTE, instead?

>

>> +	      /* Allow nops after branches, via FILLER_INSN.  */

>> +	      bool fail = true;

>> +	      subrtx_iterator::array_type array;

>> +	      FOR_EACH_SUBRTX (iter, array, x, ALL)

>> +		{

>> +		  const_rtx rtx = *iter;

>> +		  if (GET_CODE (rtx) == FILLER_INSN)

>> +		    {

>> +		      fail = false;

>> +		      break;

>> +		    }

>> +		}

>> +	      if (fail)

>> +		fatal_insn ("insn outside basic block", x);

>

> This stops checking after finding a FILLER_INSN as first insn.  Is that

> on purpose?


I guess after one is expected to find only other fillers but yeah, I
missed this while reviving the patch, it should be improved agree.

>> +/* Make an insn of code FILLER_INSN to

>> +   pad out the instruction stream.

>> +   PATTERN should be from gen_filler_insn ().

>> +   AFTER will typically be an unconditional

>> +   branch at the end of a basic block.  */

>

> Because it only allows one particular insn pattern, it should be pretty

> easy to use a NOTE for this, or even a normal INSN where the ouside-of-BB

> test can then just look at its pattern.

>

>

> As you see, I really do not like to have another RTX class, without very

> well defined semantics even.  Not without first being shown no

> alternatives are acceptable, anyway :-)


I see I see :)  I really don't have any strong opinion about this.  I'll
be happy to rework the patch in this direction if this is the outcome
suggested by the code review.

Thanks!

  Andrea
Andrea Corallo Aug. 19, 2020, 9:13 a.m. | #8
Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi Andrea,

>

> On Wed, Jul 22, 2020 at 12:02:33PM +0200, Andrea Corallo wrote:

>> This first patch implements the addition of a new RTX instruction class

>> FILLER_INSN, which has been white listed to allow placement of NOPs

>> outside of a basic block.  This is to allow padding after unconditional

>> branches.  This is favorable so that any performance gained from

>> diluting branches is not paid straight back via excessive eating of

>> nops.

>

>> It was deemed that a new RTX class was less invasive than modifying

>> behavior in regards to standard UNSPEC nops.

>

> So I wonder if this cannot be done with some kind of NOTE, instead?

>


Hi Segher,

I was having a look into reworking this using an insn note as (IIUC)
suggested.  The idea is appealing but looking into insn-notes.def I've
found the following comment:

"We are slowly removing the concept of insn-chain notes from the
compiler.  Adding new codes to this file is STRONGLY DISCOURAGED.
If you think you need one, look for other ways to express what you
mean, such as register notes or bits in the basic-block structure."

Would still be justificated in this case to proceed this way?  The other
option would be to add the information into the basic-block or into
struct rtx_jump_insn.

My GCC experience is far from sufficient for having a formed opinion on
this, I'd probably bet on struct rtx_jump_insn as the better option.

Any thoughts?

Thanks!

  Andrea
Richard Sandiford Aug. 19, 2020, 10:52 a.m. | #9
Andrea Corallo <andrea.corallo@arm.com> writes:
> Segher Boessenkool <segher@kernel.crashing.org> writes:

>

>> Hi Andrea,

>>

>> On Wed, Jul 22, 2020 at 12:02:33PM +0200, Andrea Corallo wrote:

>>> This first patch implements the addition of a new RTX instruction class

>>> FILLER_INSN, which has been white listed to allow placement of NOPs

>>> outside of a basic block.  This is to allow padding after unconditional

>>> branches.  This is favorable so that any performance gained from

>>> diluting branches is not paid straight back via excessive eating of

>>> nops.

>>

>>> It was deemed that a new RTX class was less invasive than modifying

>>> behavior in regards to standard UNSPEC nops.

>>

>> So I wonder if this cannot be done with some kind of NOTE, instead?

>>

>

> Hi Segher,

>

> I was having a look into reworking this using an insn note as (IIUC)

> suggested.  The idea is appealing but looking into insn-notes.def I've

> found the following comment:

>

> "We are slowly removing the concept of insn-chain notes from the

> compiler.  Adding new codes to this file is STRONGLY DISCOURAGED.

> If you think you need one, look for other ways to express what you

> mean, such as register notes or bits in the basic-block structure."

>

> Would still be justificated in this case to proceed this way?  The other

> option would be to add the information into the basic-block or into

> struct rtx_jump_insn.

>

> My GCC experience is far from sufficient for having a formed opinion on

> this, I'd probably bet on struct rtx_jump_insn as the better option.


Adding it to the basic block structure wouldn't work because we need
this information to survive until asm output time, and the cfg doesn't
last that long.  (Would be nice if it did, but that's a whole new can
of worms.)

Using REG_NOTES on the jump might be OK.  I guess the note value could
be the length in bytes.  shorten_branches would then need to look for
these notes and add the associated length after adding the length of
the insn itself.  There would then need to be some hook that final.c
can call to emit nops of the given length.

I guess there's also the option of representing this in the same way
as a delayed branch sequence, which is to make the jump insn pattern:

  (sequence [(normal jump insn)
             (delayed insn 1)
             ...])

The members of the sequence are full insns, rather than just patterns.
For this use case, the delayed insns would all be nops.

However, not much is prepared to handle the sequence representation
before the normal pass_machine_reorg position.  (The main dbr pass
itself is pass_delay_slots, but some targets run dbr within
pass_machine_reorg instead.)  There again, it isn't worth doing
layout optimisations earlier than pass_machine_reorg anyway.

Thanks,
Richard
Segher Boessenkool Aug. 19, 2020, 4:51 p.m. | #10
[ Please don't post new patch series as replies to old ]

On Wed, Jul 22, 2020 at 12:02:33PM +0200, Andrea Corallo wrote:
> This first patch implements the addition of a new RTX instruction class

> FILLER_INSN, which has been white listed to allow placement of NOPs

> outside of a basic block.  This is to allow padding after unconditional

> branches.  This is favorable so that any performance gained from

> diluting branches is not paid straight back via excessive eating of

> nops.

> 

> It was deemed that a new RTX class was less invasive than modifying

> behavior in regards to standard UNSPEC nops.


Deemed, by whom?  There are several people very against it, too.  You
need to modify only one simple behaviour (maybe in a handful of places),
making a new RTX class for that is excessive.

> 	* cfgbuild.c (inside_basic_block_p): Handle FILLER_INSN.

> 	* cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside

> 	basic blocks.

> 	* coretypes.h: New rtx class.


coretypes.h is not a new RTX class?  :-)  Maybe:

	* coretypes.h (rtx_filler_insn): New struct.

> @@ -3033,7 +3034,20 @@ rtl_verify_bb_layout (void)

>  	      break;

>  

>  	    default:

> -	      fatal_insn ("insn outside basic block", x);

> +	      /* Allow nops after branches, via FILLER_INSN.  */

> +	      bool fail = true;

> +	      subrtx_iterator::array_type array;

> +	      FOR_EACH_SUBRTX (iter, array, x, ALL)

> +		{

> +		  const_rtx rtx = *iter;

> +		  if (GET_CODE (rtx) == FILLER_INSN)

> +		    {

> +		      fail = false;

> +		      break;

> +		    }

> +		}

> +	      if (fail)

> +		fatal_insn ("insn outside basic block", x);

>  	    }

>  	}


It wouldn't be hard to allow some existing RTL here.  Maybe something
with CODE_FOR_filler_nop or similar (after you recog () it).

It still allows anything after leading filler insns, btw; you could get
rid of the "fail" variable altogether, just call fatal_insn as soon as
you see some unexpected RTX code.

> +  rtx_insn* i = make_insn_raw (pattern);


rtx_insn *i = ...


Segher
Segher Boessenkool Aug. 19, 2020, 5:28 p.m. | #11
On Wed, Aug 19, 2020 at 11:13:40AM +0200, Andrea Corallo wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:

> > So I wonder if this cannot be done with some kind of NOTE, instead?

> 

> I was having a look into reworking this using an insn note as (IIUC)

> suggested.  The idea is appealing but looking into insn-notes.def I've

> found the following comment:

> 

> "We are slowly removing the concept of insn-chain notes from the

> compiler.  Adding new codes to this file is STRONGLY DISCOURAGED.

> If you think you need one, look for other ways to express what you

> mean, such as register notes or bits in the basic-block structure."


That is from 2004.  Since then 9 note types have been removed, but 7
new types added.  (There are 18 insn note types now).

> Would still be justificated in this case to proceed this way?


Yes, it is a lesser evil imho.

> The other

> option would be to add the information into the basic-block or into

> struct rtx_jump_insn.


Or just look at the insn code, define a "filler-nop" insn, allow those
after BBs?  Any reason that wouldn't work?


Segher
Andrea Corallo Aug. 19, 2020, 5:47 p.m. | #12
Segher Boessenkool <segher@kernel.crashing.org> writes:

> [ Please don't post new patch series as replies to old ]

>

> On Wed, Jul 22, 2020 at 12:02:33PM +0200, Andrea Corallo wrote:

>> This first patch implements the addition of a new RTX instruction class

>> FILLER_INSN, which has been white listed to allow placement of NOPs

>> outside of a basic block.  This is to allow padding after unconditional

>> branches.  This is favorable so that any performance gained from

>> diluting branches is not paid straight back via excessive eating of

>> nops.

>> 

>> It was deemed that a new RTX class was less invasive than modifying

>> behavior in regards to standard UNSPEC nops.

>

> Deemed, by whom?  There are several people very against it, too.  You

> need to modify only one simple behaviour (maybe in a handful of places),

> making a new RTX class for that is excessive.


Hi Segher,

That's understood and agreed.  I haven't posted any new patch on this,
the quoted mail is an old one.  I just wanted to discuss how to proceede
this way with my mail of today.

Thanks for your feedback!

  Andrea

Patch

From 475bbb3984ed133b020b344eebc2d4d3bf8ce52f Mon Sep 17 00:00:00 2001
From: Andrea Corallo <andrea.corallo@arm.com>
Date: Thu, 16 Jul 2020 09:21:38 +0100
Subject: [PATCH 1/2] Add new RTX instruction class FILLER_INSN

gcc/ChangeLog

2020-07-17  Andrea Corallo  <andrea.corallo@arm.com>
	    Carey Williams  <carey.williams@arm.com>

	* cfgbuild.c (inside_basic_block_p): Handle FILLER_INSN.
	* cfgrtl.c (rtl_verify_bb_layout): Whitelist FILLER_INSN outside
	basic blocks.
	* coretypes.h: New rtx class.
	* emit-rtl.c (emit_filler_after): New function.
	* rtl.def (FILLER_INSN): New rtl define.
	* rtl.h (rtx_filler_insn): Define new structure.
	(FILLER_INSN_P): New macro.
	(is_a_helper <rtx_filler_insn *>::test): New test helper for
	rtx_filler_insn.
	(emit_filler_after): New extern.
	* target-insns.def: Add target insn definition.
---
 gcc/cfgbuild.c       |  1 +
 gcc/cfgrtl.c         | 16 +++++++++++++++-
 gcc/coretypes.h      |  1 +
 gcc/emit-rtl.c       | 14 ++++++++++++++
 gcc/rtl.def          |  4 ++++
 gcc/rtl.h            | 23 +++++++++++++++++++++++
 gcc/target-insns.def |  1 +
 7 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c
index 478afa5fe91c..07cb06afba07 100644
--- a/gcc/cfgbuild.c
+++ b/gcc/cfgbuild.c
@@ -58,6 +58,7 @@  inside_basic_block_p (const rtx_insn *insn)
 
     case JUMP_TABLE_DATA:
     case BARRIER:
+    case FILLER_INSN:
     case NOTE:
       return false;
 
diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index 827e84a44ddd..02139aaa268d 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -61,6 +61,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "tree-pass.h"
 #include "print-rtl.h"
+#include "rtl-iter.h"
 
 /* Disable warnings about missing quoting in GCC diagnostics.  */
 #if __GNUC__ >= 10
@@ -3033,7 +3034,20 @@  rtl_verify_bb_layout (void)
 	      break;
 
 	    default:
-	      fatal_insn ("insn outside basic block", x);
+	      /* Allow nops after branches, via FILLER_INSN.  */
+	      bool fail = true;
+	      subrtx_iterator::array_type array;
+	      FOR_EACH_SUBRTX (iter, array, x, ALL)
+		{
+		  const_rtx rtx = *iter;
+		  if (GET_CODE (rtx) == FILLER_INSN)
+		    {
+		      fail = false;
+		      break;
+		    }
+		}
+	      if (fail)
+		fatal_insn ("insn outside basic block", x);
 	    }
 	}
 
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 6b6cfcdf210d..5c6633a815c5 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -84,6 +84,7 @@  struct rtx_def;
     struct rtx_call_insn;       /* CALL_P (X) */
     struct rtx_jump_table_data; /* JUMP_TABLE_DATA_P (X) */
     struct rtx_barrier;         /* BARRIER_P (X) */
+    struct rtx_filler_insn;     /* FILLER_INSN_P (X) */
     struct rtx_code_label;      /* LABEL_P (X) */
     struct rtx_note;            /* NOTE_P (X) */
 
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index f9b0e9714d9e..76f25c011b2a 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -4746,6 +4746,20 @@  emit_barrier_after (rtx_insn *after)
   return insn;
 }
 
+/* Make an insn of code FILLER_INSN to
+   pad out the instruction stream.
+   PATTERN should be from gen_filler_insn ().
+   AFTER will typically be an unconditional
+   branch at the end of a basic block.  */
+
+rtx_insn *
+emit_filler_after (rtx pattern, rtx_insn *after)
+{
+  rtx_insn* i = make_insn_raw (pattern);
+  add_insn_after_nobb (i, after);
+  return i;
+}
+
 /* Emit the label LABEL after the insn AFTER.  */
 
 rtx_insn *
diff --git a/gcc/rtl.def b/gcc/rtl.def
index 9754333eafba..0e0040eaa1cf 100644
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@@ -328,6 +328,10 @@  DEF_RTL_EXPR(RETURN, "return", "", RTX_EXTRA)
    conditional jumps.  */
 DEF_RTL_EXPR(SIMPLE_RETURN, "simple_return", "", RTX_EXTRA)
 
+/* Special filler type, used to pad the instruction stream.  */
+
+DEF_RTL_EXPR(FILLER_INSN, "filler_insn", "", RTX_INSN)
+
 /* Special for EH return from subroutine.  */
 
 DEF_RTL_EXPR(EH_RETURN, "eh_return", "", RTX_EXTRA)
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 0872cc408eb1..60abd609007f 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -674,6 +674,17 @@  struct GTY(()) rtx_barrier : public rtx_insn
      from rtl.def.  */
 };
 
+struct GTY(()) rtx_filler_insn : public rtx_insn
+{
+  /* No extra fields, but adds the invariant:
+       FILLER_INSN_P (X) aka (GET_CODE (X) == FILLER_INSN)
+     i.e. a marker that indicates the INSN stream should be padded.
+
+     This is an instance of:
+       DEF_RTL_EXPR(FILLER_INSN, "filler_insn", "", RTX_INSN)
+     from rtl.def.  */
+};
+
 struct GTY(()) rtx_code_label : public rtx_insn
 {
   /* No extra fields, but adds the invariant:
@@ -865,6 +876,9 @@  struct GTY(()) rtvec_def {
 /* Predicate yielding nonzero iff X is a barrier insn.  */
 #define BARRIER_P(X) (GET_CODE (X) == BARRIER)
 
+/* Predicate yielding nonzero iff X is a filler insn.  */
+#define FILLER_INSN_P(X) (GET_CODE (X) == FILLER_INSN)
+
 /* Predicate yielding nonzero iff X is a data for a jump table.  */
 #define JUMP_TABLE_DATA_P(INSN) (GET_CODE (INSN) == JUMP_TABLE_DATA)
 
@@ -970,6 +984,14 @@  is_a_helper <rtx_barrier *>::test (rtx rt)
   return BARRIER_P (rt);
 }
 
+template <>
+template <>
+inline bool
+is_a_helper <rtx_filler_insn *>::test (rtx rt)
+{
+  return FILLER_INSN_P (rt);
+}
+
 template <>
 template <>
 inline bool
@@ -3300,6 +3322,7 @@  extern rtx_insn *emit_debug_insn_after (rtx, rtx_insn *);
 extern rtx_insn *emit_debug_insn_after_noloc (rtx, rtx_insn *);
 extern rtx_insn *emit_debug_insn_after_setloc (rtx, rtx_insn *, location_t);
 extern rtx_barrier *emit_barrier_after (rtx_insn *);
+extern rtx_insn *emit_filler_after (rtx, rtx_insn *);
 extern rtx_insn *emit_label_after (rtx_insn *, rtx_insn *);
 extern rtx_note *emit_note_after (enum insn_note, rtx_insn *);
 extern rtx_insn *emit_insn (rtx);
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 4d7eb92cf68c..dfdf0273edc1 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -94,6 +94,7 @@  DEF_TARGET_INSN (sibcall_epilogue, (void))
 DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
 				 rtx opt4))
 DEF_TARGET_INSN (simple_return, (void))
+DEF_TARGET_INSN (filler_insn, (void))
 DEF_TARGET_INSN (split_stack_prologue, (void))
 DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
 DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
-- 
2.17.1