[1/2] i386: Generate lfence with load/indirect branch/ret [CVE-2020-0551]

Message ID 20200310160528.303613-2-hjl.tools@gmail.com
State New
Headers show
Series
  • x86: Add assembler mitigation for CVE-2020-0551
Related show

Commit Message

H.J. Lu March 10, 2020, 4:05 p.m.
Add 3 command-line options to generate lfence for load, indirect near
branch and ret to help mitigate:

https://nvd.nist.gov/vuln/detail/CVE-2020-0551

1. -mlfence-after-load=[no|yes]:
  -mlfence-after-load=yes generates lfence after load instructions.
2. -mlfence-before-indirect-branch=[none|all|memory|register]:
  a. -mlfence-before-indirect-branch=all generates lfence before indirect
  near branches via register and a warning before indirect near branches
  via memory.
  b. -mlfence-before-indirect-branch=memory issue a warning before
  indirect near branches via memory.
  c. -mlfence-before-indirect-branch=register generates lfence before
  indirect near branches via register.
Note that lfence won't be generated before indirect near branches via
register with -mlfence-after-load=yes since lfence will be generated
after loading branch target register.
3. -mlfence-before-ret=[none|or|not]
  a. -mlfence-before-ret=or generates or with lfence before ret.
  b. -mlfence-before-ret=not generates not with lfence before ret.

A warning will be issued and lfence won't be generated before indirect
near branch and ret if the previous item is a prefix or a constant
directive, which may be used to hardcode an instruction, since there
is no clear instruction boundary.

	* config/tc-i386.c (lfence_after_load): New.
	(lfence_before_indirect_branch_kind): New.
	(lfence_before_indirect_branch): New.
	(lfence_before_ret_kind): New.
	(lfence_before_ret): New.
	(last_insn): New.
	(load_insn_p): New.
	(insert_lfence_after): New.
	(insert_lfence_before): New.
	(md_assemble): Call insert_lfence_before and insert_lfence_after.
	Set last_insn.
	(OPTION_MLFENCE_AFTER_LOAD): New.
	(OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH): New.
	(OPTION_MLFENCE_BEFORE_RET): New.
	(md_longopts): Add -mlfence-after-load=,
	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
	(md_parse_option): Handle -mlfence-after-load=,
	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
	(md_show_usage): Display -mlfence-after-load=,
	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
	(i386_cons_align): New.
	* config/tc-i386.h (i386_cons_align): New.
	(md_cons_align): New.
	* doc/c-i386.texi: Document -mlfence-after-load=,
	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
---
 gas/ChangeLog        |  28 ++++
 gas/config/tc-i386.c | 368 ++++++++++++++++++++++++++++++++++++++++++-
 gas/doc/c-i386.texi  |  43 +++++
 3 files changed, 438 insertions(+), 1 deletion(-)

-- 
2.24.1

Comments

Jan Beulich March 11, 2020, 10:55 a.m. | #1
On 10.03.2020 17:05, H.J. Lu wrote:
> @@ -4311,6 +4333,291 @@ optimize_encoding (void)

>      }

>  }

>  

> +/* Return non-zero for load instruction.  */

> +

> +static int

> +load_insn_p (void)

> +{

> +  unsigned int dest;

> +  int any_vex_p = is_any_vex_encoding (&i.tm);

> +

> +  if (!any_vex_p)

> +    {

> +      /* lea  */

> +      if (i.tm.base_opcode == 0x8d)

> +	return 0;


Also include INVLPG, CLFLUSH etc, and maybe some prefetches here?
(I'll mention the LEA-like MPX insns as well, but I think I can
predict your reply.)

> +      /* pop  */

> +      if ((i.tm.base_opcode & 0xfffffff8) == 0x58


Mind using ~7 instead?

> +	  || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))

> +	return 1;


What about segment register POPs, POPF, POPA, ENTER, and LEAVE?

> +      /* movs, cmps, lods, scas.  */

> +      if ((i.tm.base_opcode >= 0xa4 && i.tm.base_opcode <= 0xa7)

> +	  || (i.tm.base_opcode >= 0xac && i.tm.base_opcode <= 0xaf))


This can be had with a single comparison:

      if ((i.tm.base_opcode | 0xb) == 0xaf)

> +	return 1;

> +

> +      /* outs */

> +      if (i.tm.base_opcode == 0x6e || i.tm.base_opcode == 0x6f)


And here:

      if ((i.tm.base_opcode | 1) == 0x6f)

Similar folding of comparisons may also be desirable further down.

Also, what about XLATB? What about implicit memory accesses done by
e.g. segment register loads? As to AMD-specific insns with implicit
memory operands (often accessed through rAX), should the doc
perhaps mention they're intentionally not covered?

> +	return 1;

> +    }

> +

> +  /* No memory operand.  */

> +  if (!i.mem_operands)

> +    return 0;

> +

> +  if (any_vex_p)

> +    {

> +      /* vldmxcsr.  */

> +      if (i.tm.base_opcode == 0xae

> +	  && i.tm.opcode_modifier.vex

> +	  && i.tm.opcode_modifier.vexopcode == VEX0F

> +	  && i.tm.extension_opcode == 2)

> +	return 1;

> +    }

> +  else

> +    {

> +      /* test, not, neg, mul, imul, div, idiv.  */

> +      if ((i.tm.base_opcode == 0xf6 || i.tm.base_opcode == 0xf7)

> +	  && i.tm.extension_opcode != 1)

> +	return 1;

> +

> +      /* inc, dec.  */

> +      if ((i.tm.base_opcode == 0xfe || i.tm.base_opcode == 0xff)

> +	  && i.tm.extension_opcode <= 1)

> +	return 1;

> +

> +      /* add, or, adc, sbb, and, sub, xor, cmp.  */

> +      if (i.tm.base_opcode >= 0x80 && i.tm.base_opcode <= 0x83)

> +	return 1;

> +

> +      /* bt, bts, btr, btc.  */

> +      if (i.tm.base_opcode == 0xfba

> +	  && (i.tm.extension_opcode >= 4 && i.tm.extension_opcode <= 7))

> +	return 1;

> +

> +      /* rol, ror, rcl, rcr, shl/sal, shr, sar. */

> +      if ((i.tm.base_opcode == 0xc0

> +	   || i.tm.base_opcode == 0xc1

> +	   || (i.tm.base_opcode >= 0xd0 && i.tm.base_opcode <= 0xd3))

> +	  && i.tm.extension_opcode != 6)

> +	return 1;

> +

> +      /* cmpxchg8b, cmpxchg16b, xrstors.  */

> +      if (i.tm.base_opcode == 0xfc7

> +	  && (i.tm.extension_opcode == 1 || i.tm.extension_opcode == 3))

> +	return 1;

> +

> +      /* fxrstor, ldmxcsr, xrstor.  */

> +      if (i.tm.base_opcode == 0xfae

> +	  && (i.tm.extension_opcode == 1

> +	      || i.tm.extension_opcode == 2

> +	      || i.tm.extension_opcode == 5))

> +	return 1;

> +

> +      /* lgdt, lidt, lmsw.  */

> +      if (i.tm.base_opcode == 0xf01

> +	  && (i.tm.extension_opcode == 2

> +	      || i.tm.extension_opcode == 3

> +	      || i.tm.extension_opcode == 6))

> +	return 1;

> +

> +      /* vmptrld */

> +      if (i.tm.base_opcode == 0xfc7

> +	  && i.tm.extension_opcode == 6)

> +	return 1;

> +

> +      /* Check for x87 instructions.  */

> +      if (i.tm.base_opcode >= 0xd8 && i.tm.base_opcode <= 0xdf)

> +	{

> +	  /* Skip fst, fstp, fstenv, fstcw.  */

> +	  if (i.tm.base_opcode == 0xd9

> +	      && (i.tm.extension_opcode == 2

> +		  || i.tm.extension_opcode == 3

> +		  || i.tm.extension_opcode == 6

> +		  || i.tm.extension_opcode == 7))

> +	    return 0;

> +

> +	  /* Skip fisttp, fist, fistp, fstp.  */

> +	  if (i.tm.base_opcode == 0xdb

> +	      && (i.tm.extension_opcode == 1

> +		  || i.tm.extension_opcode == 2

> +		  || i.tm.extension_opcode == 3

> +		  || i.tm.extension_opcode == 7))

> +	    return 0;

> +

> +	  /* Skip fisttp, fst, fstp, fsave, fstsw.  */

> +	  if (i.tm.base_opcode == 0xdd

> +	      && (i.tm.extension_opcode == 1

> +		  || i.tm.extension_opcode == 2

> +		  || i.tm.extension_opcode == 3

> +		  || i.tm.extension_opcode == 6

> +		  || i.tm.extension_opcode == 7))

> +	    return 0;

> +

> +	  /* Skip fisttp, fist, fistp, fbstp, fistp.  */

> +	  if (i.tm.base_opcode == 0xdf

> +	      && (i.tm.extension_opcode == 1

> +		  || i.tm.extension_opcode == 2

> +		  || i.tm.extension_opcode == 3

> +		  || i.tm.extension_opcode == 6

> +		  || i.tm.extension_opcode == 7))

> +	    return 0;

> +

> +	  return 1;

> +	}

> +    }

> +

> +  dest = i.operands - 1;

> +

> +  /* Check fake imm8 operand and 3 source operands.  */

> +  if ((i.tm.opcode_modifier.immext

> +       || i.tm.opcode_modifier.vexsources == VEX3SOURCES)

> +      && i.types[dest].bitfield.imm8)

> +    dest--;

> +

> +  /* add, or, adc, sbb, and, sub, xor, cmp, test, xchg, xadd  */

> +  if (!any_vex_p

> +      && (i.tm.base_opcode == 0x0

> +	  || i.tm.base_opcode == 0x1

> +	  || i.tm.base_opcode == 0x8

> +	  || i.tm.base_opcode == 0x9

> +	  || i.tm.base_opcode == 0x10

> +	  || i.tm.base_opcode == 0x11

> +	  || i.tm.base_opcode == 0x18

> +	  || i.tm.base_opcode == 0x19

> +	  || i.tm.base_opcode == 0x20

> +	  || i.tm.base_opcode == 0x21

> +	  || i.tm.base_opcode == 0x28

> +	  || i.tm.base_opcode == 0x29

> +	  || i.tm.base_opcode == 0x30

> +	  || i.tm.base_opcode == 0x31

> +	  || i.tm.base_opcode == 0x38

> +	  || i.tm.base_opcode == 0x39

> +	  || (i.tm.base_opcode >= 0x84 && i.tm.base_opcode <= 0x87)

> +	  || i.tm.base_opcode == 0xfc0

> +	  || i.tm.base_opcode == 0xfc1))

> +    return 1;


Don't quite a few of these fit very well with ...

> +  /* Check for load instruction.  */

> +  return (i.types[dest].bitfield.class != ClassNone

> +	  || i.types[dest].bitfield.instance == Accum);


... this generic expression? It would seem to me that only TEST
and XCHG need special casing, for allowing either operand order.
Same seems to apply to quite a few of the special cases in the
big "else" block further up, and even its if() [vldmxcsr] part.

> +static void

> +insert_lfence_before (void)

> +{

> +  char *p;

> +

> +  if (i.tm.base_opcode == 0xff

> +      && (i.tm.extension_opcode == 2 || i.tm.extension_opcode == 4))


Also exclude VEX- and alike encoded insn here again?

> +    {

> +      /* Insert lfence before indirect branch if needed.  */

> +

> +      if (lfence_before_indirect_branch == lfence_branch_none)

> +	return;

> +

> +      if (i.operands != 1)

> +	abort ();

> +

> +      if (i.reg_operands == 1)

> +	{

> +	  /* Indirect branch via register.  Don't insert lfence with

> +	     -mlfence-after-load=yes.  */

> +	  if (lfence_after_load

> +	      || lfence_before_indirect_branch == lfence_branch_memory)

> +	    return;

> +	}

> +      else if (i.mem_operands == 1

> +	       && lfence_before_indirect_branch != lfence_branch_register)

> +	{

> +	  as_warn (_("indirect branch `%s` over memory should be avoided"),

> +		   i.tm.name);


Perhaps drop "branch" and replace "over memory" by "with memory operand"?

> +	  return;

> +	}

> +      else

> +	return;

> +

> +      if (last_insn.kind != last_insn_other

> +	  && last_insn.seg == now_seg)

> +	{

> +	  as_warn_where (last_insn.file, last_insn.line,

> +			 _("`%s` skips -mlfence-before-indirect-branch on `%s`"),

> +			 last_insn.name, i.tm.name);

> +	  return;

> +	}

> +

> +      p = frag_more (3);

> +      *p++ = 0xf;

> +      *p++ = 0xae;

> +      *p = 0xe8;

> +      return;

> +    }

> +

> +  /* Output orl/notl and lfence before ret.  */


May I suggest to either drop the insn suffixes here (and below),
or make them correctly reflect the code below (which may also
produce q- or w-suffixed insns)?

> +  if (lfence_before_ret != lfence_before_ret_none

> +      && (i.tm.base_opcode == 0xc2

> +	  || i.tm.base_opcode == 0xc3

> +	  || i.tm.base_opcode == 0xca

> +	  || i.tm.base_opcode == 0xcb))

> +    {

> +      if (last_insn.kind != last_insn_other

> +	  && last_insn.seg == now_seg)

> +	{

> +	  as_warn_where (last_insn.file, last_insn.line,

> +			 _("`%s` skips -mlfence-before-ret on `%s`"),

> +			 last_insn.name, i.tm.name);

> +	  return;

> +	}

> +      if (lfence_before_ret == lfence_before_ret_or)

> +	{

> +	  /* orl: 0x830c2400.  */

> +	  p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

> +	  if (flag_code == CODE_64BIT)

> +	    *p++ = 0x48;


Shouldn't this depend on RET's operand size? Likewise wouldn't you
also need to insert 0x66/0x67 in certain cases?

> +	  *p++ = 0x83;

> +	  *p++ = 0xc;

> +	  *p++ = 0x24;

> +	  *p++ = 0x0;

> +	}

> +      else

> +	{

> +	  p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);

> +	  /* notl: 0xf71424.  */

> +	  if (flag_code == CODE_64BIT)

> +	    *p++ = 0x48;

> +	  *p++ = 0xf7;

> +	  *p++ = 0x14;

> +	  *p++ = 0x24;

> +	  if (flag_code == CODE_64BIT)

> +	    *p++ = 0x48;

> +	  /* notl: 0xf71424.  */

> +	  *p++ = 0xf7;

> +	  *p++ = 0x14;

> +	  *p++ = 0x24;


When reading the description I was wondering about the use of NOT.
I think the doc should mention that it's _two_ NOTs that get inserted,
as this is even more growth of code size than the OR variant. Is
there a performance reason for having this extra, more expensive (in
terms of code size) variant? Or is it rather because of the OR
variant clobbering EFLAGS (which ought to be called out in the doc)?
In which case - was it considered to use e.g. SHL with an immediate
of zero, thus having smaller code _and_ untouched EFLAGS (but of
course requiring at least an 80186, albeit the addressing mode
used requires a 386 anyway, which you don't seem to be checking
anywhere)?

Also I guess the last comment above would better move two lines up?

> @@ -12668,6 +12986,41 @@ md_parse_option (int c, const char *arg)

>          as_fatal (_("invalid -mfence-as-lock-add= option: `%s'"), arg);

>        break;

>  

> +    case OPTION_MLFENCE_AFTER_LOAD:

> +      if (strcasecmp (arg, "yes") == 0)

> +	lfence_after_load = 1;

> +      else if (strcasecmp (arg, "no") == 0)

> +	lfence_after_load = 0;

> +      else

> +        as_fatal (_("invalid -mlfence-after-load= option: `%s'"), arg);

> +      break;

> +

> +    case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:

> +      if (strcasecmp (arg, "all") == 0)

> +	lfence_before_indirect_branch = lfence_branch_all;


I wonder whether this shouldn't also enable a safe lfence_before_ret
mode (i.e. not the OR one), for RET also being an indirect branch. Of
course care would need to be taken to avoid clobbering an already set
lfence_before_ret mode.

> @@ -13254,6 +13616,10 @@ i386_cons_align (int ignore ATTRIBUTE_UNUSED)

>        last_insn.kind = last_insn_directive;

>        last_insn.name = "constant directive";

>        last_insn.file = as_where (&last_insn.line);

> +      if (lfence_before_ret != lfence_before_ret_none)

> +	as_warn (_("constant directive skips -mlfence-before-ret"));

> +      if (lfence_before_indirect_branch != lfence_branch_none)

> +	as_warn (_("constant directive skips -mlfence-before-indirect-branch"));


Could these be folded into a single warning, to avoid getting overly
verbose?

Jan
Fangrui Song via Binutils March 11, 2020, 4:17 p.m. | #2
On Wed, Mar 11, 2020 at 3:55 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 10.03.2020 17:05, H.J. Lu wrote:

> > @@ -4311,6 +4333,291 @@ optimize_encoding (void)

> >      }

> >  }

> >

> > +/* Return non-zero for load instruction.  */

> > +

> > +static int

> > +load_insn_p (void)

> > +{

> > +  unsigned int dest;

> > +  int any_vex_p = is_any_vex_encoding (&i.tm);

> > +

> > +  if (!any_vex_p)

> > +    {

> > +      /* lea  */

> > +      if (i.tm.base_opcode == 0x8d)

> > +     return 0;

>

> Also include INVLPG, CLFLUSH etc, and maybe some prefetches here?

> (I'll mention the LEA-like MPX insns as well, but I think I can

> predict your reply.)


Hongtao, can you look into it?

> > +      /* pop  */

> > +      if ((i.tm.base_opcode & 0xfffffff8) == 0x58

>

> Mind using ~7 instead?


Changed.

> > +       || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))

> > +     return 1;

>

> What about segment register POPs, POPF, POPA, ENTER, and LEAVE?


We decided that ENTER and LEAVE are safe.  Hongtao, can you look into others?

> > +      /* movs, cmps, lods, scas.  */

> > +      if ((i.tm.base_opcode >= 0xa4 && i.tm.base_opcode <= 0xa7)

> > +       || (i.tm.base_opcode >= 0xac && i.tm.base_opcode <= 0xaf))

>

> This can be had with a single comparison:

>

>       if ((i.tm.base_opcode | 0xb) == 0xaf)


Changed.

> > +     return 1;

> > +

> > +      /* outs */

> > +      if (i.tm.base_opcode == 0x6e || i.tm.base_opcode == 0x6f)

>

> And here:

>

>       if ((i.tm.base_opcode | 1) == 0x6f)

>

> Similar folding of comparisons may also be desirable further down.


Changed,

> Also, what about XLATB? What about implicit memory accesses done by


Hongtao, can you look into it?

> e.g. segment register loads? As to AMD-specific insns with implicit

> memory operands (often accessed through rAX), should the doc

> perhaps mention they're intentionally not covered?


Yes, AMD specific insns are skipped.  Hongtao, can you look into it?

> > +     return 1;

> > +    }

> > +

> > +  /* No memory operand.  */

> > +  if (!i.mem_operands)

> > +    return 0;

> > +

> > +  if (any_vex_p)

> > +    {

> > +      /* vldmxcsr.  */

> > +      if (i.tm.base_opcode == 0xae

> > +       && i.tm.opcode_modifier.vex

> > +       && i.tm.opcode_modifier.vexopcode == VEX0F

> > +       && i.tm.extension_opcode == 2)

> > +     return 1;

> > +    }

> > +  else

> > +    {

> > +      /* test, not, neg, mul, imul, div, idiv.  */

> > +      if ((i.tm.base_opcode == 0xf6 || i.tm.base_opcode == 0xf7)

> > +       && i.tm.extension_opcode != 1)

> > +     return 1;

> > +

> > +      /* inc, dec.  */

> > +      if ((i.tm.base_opcode == 0xfe || i.tm.base_opcode == 0xff)

> > +       && i.tm.extension_opcode <= 1)

> > +     return 1;

> > +

> > +      /* add, or, adc, sbb, and, sub, xor, cmp.  */

> > +      if (i.tm.base_opcode >= 0x80 && i.tm.base_opcode <= 0x83)

> > +     return 1;

> > +

> > +      /* bt, bts, btr, btc.  */

> > +      if (i.tm.base_opcode == 0xfba

> > +       && (i.tm.extension_opcode >= 4 && i.tm.extension_opcode <= 7))

> > +     return 1;

> > +

> > +      /* rol, ror, rcl, rcr, shl/sal, shr, sar. */

> > +      if ((i.tm.base_opcode == 0xc0

> > +        || i.tm.base_opcode == 0xc1

> > +        || (i.tm.base_opcode >= 0xd0 && i.tm.base_opcode <= 0xd3))

> > +       && i.tm.extension_opcode != 6)

> > +     return 1;

> > +

> > +      /* cmpxchg8b, cmpxchg16b, xrstors.  */

> > +      if (i.tm.base_opcode == 0xfc7

> > +       && (i.tm.extension_opcode == 1 || i.tm.extension_opcode == 3))

> > +     return 1;

> > +

> > +      /* fxrstor, ldmxcsr, xrstor.  */

> > +      if (i.tm.base_opcode == 0xfae

> > +       && (i.tm.extension_opcode == 1

> > +           || i.tm.extension_opcode == 2

> > +           || i.tm.extension_opcode == 5))

> > +     return 1;

> > +

> > +      /* lgdt, lidt, lmsw.  */

> > +      if (i.tm.base_opcode == 0xf01

> > +       && (i.tm.extension_opcode == 2

> > +           || i.tm.extension_opcode == 3

> > +           || i.tm.extension_opcode == 6))

> > +     return 1;

> > +

> > +      /* vmptrld */

> > +      if (i.tm.base_opcode == 0xfc7

> > +       && i.tm.extension_opcode == 6)

> > +     return 1;

> > +

> > +      /* Check for x87 instructions.  */

> > +      if (i.tm.base_opcode >= 0xd8 && i.tm.base_opcode <= 0xdf)

> > +     {

> > +       /* Skip fst, fstp, fstenv, fstcw.  */

> > +       if (i.tm.base_opcode == 0xd9

> > +           && (i.tm.extension_opcode == 2

> > +               || i.tm.extension_opcode == 3

> > +               || i.tm.extension_opcode == 6

> > +               || i.tm.extension_opcode == 7))

> > +         return 0;

> > +

> > +       /* Skip fisttp, fist, fistp, fstp.  */

> > +       if (i.tm.base_opcode == 0xdb

> > +           && (i.tm.extension_opcode == 1

> > +               || i.tm.extension_opcode == 2

> > +               || i.tm.extension_opcode == 3

> > +               || i.tm.extension_opcode == 7))

> > +         return 0;

> > +

> > +       /* Skip fisttp, fst, fstp, fsave, fstsw.  */

> > +       if (i.tm.base_opcode == 0xdd

> > +           && (i.tm.extension_opcode == 1

> > +               || i.tm.extension_opcode == 2

> > +               || i.tm.extension_opcode == 3

> > +               || i.tm.extension_opcode == 6

> > +               || i.tm.extension_opcode == 7))

> > +         return 0;

> > +

> > +       /* Skip fisttp, fist, fistp, fbstp, fistp.  */

> > +       if (i.tm.base_opcode == 0xdf

> > +           && (i.tm.extension_opcode == 1

> > +               || i.tm.extension_opcode == 2

> > +               || i.tm.extension_opcode == 3

> > +               || i.tm.extension_opcode == 6

> > +               || i.tm.extension_opcode == 7))

> > +         return 0;

> > +

> > +       return 1;

> > +     }

> > +    }

> > +

> > +  dest = i.operands - 1;

> > +

> > +  /* Check fake imm8 operand and 3 source operands.  */

> > +  if ((i.tm.opcode_modifier.immext

> > +       || i.tm.opcode_modifier.vexsources == VEX3SOURCES)

> > +      && i.types[dest].bitfield.imm8)

> > +    dest--;

> > +

> > +  /* add, or, adc, sbb, and, sub, xor, cmp, test, xchg, xadd  */

> > +  if (!any_vex_p

> > +      && (i.tm.base_opcode == 0x0

> > +       || i.tm.base_opcode == 0x1

> > +       || i.tm.base_opcode == 0x8

> > +       || i.tm.base_opcode == 0x9

> > +       || i.tm.base_opcode == 0x10

> > +       || i.tm.base_opcode == 0x11

> > +       || i.tm.base_opcode == 0x18

> > +       || i.tm.base_opcode == 0x19

> > +       || i.tm.base_opcode == 0x20

> > +       || i.tm.base_opcode == 0x21

> > +       || i.tm.base_opcode == 0x28

> > +       || i.tm.base_opcode == 0x29

> > +       || i.tm.base_opcode == 0x30

> > +       || i.tm.base_opcode == 0x31

> > +       || i.tm.base_opcode == 0x38

> > +       || i.tm.base_opcode == 0x39

> > +       || (i.tm.base_opcode >= 0x84 && i.tm.base_opcode <= 0x87)

> > +       || i.tm.base_opcode == 0xfc0

> > +       || i.tm.base_opcode == 0xfc1))

> > +    return 1;

>

> Don't quite a few of these fit very well with ...

>


Changed.

> > +  /* Check for load instruction.  */

> > +  return (i.types[dest].bitfield.class != ClassNone

> > +       || i.types[dest].bitfield.instance == Accum);

>

> ... this generic expression? It would seem to me that only TEST

> and XCHG need special casing, for allowing either operand order.

> Same seems to apply to quite a few of the special cases in the

> big "else" block further up, and even its if() [vldmxcsr] part.


Hongtao, can you look into it?

> > +static void

> > +insert_lfence_before (void)

> > +{

> > +  char *p;

> > +

> > +  if (i.tm.base_opcode == 0xff

> > +      && (i.tm.extension_opcode == 2 || i.tm.extension_opcode == 4))

>

> Also exclude VEX- and alike encoded insn here again?


I changed to:

static void
insert_lfence_before (void)
{
  char *p;

  if (is_any_vex_encoding (&i.tm))
    return;

> > +    {

> > +      /* Insert lfence before indirect branch if needed.  */

> > +

> > +      if (lfence_before_indirect_branch == lfence_branch_none)

> > +     return;

> > +

> > +      if (i.operands != 1)

> > +     abort ();

> > +

> > +      if (i.reg_operands == 1)

> > +     {

> > +       /* Indirect branch via register.  Don't insert lfence with

> > +          -mlfence-after-load=yes.  */

> > +       if (lfence_after_load

> > +           || lfence_before_indirect_branch == lfence_branch_memory)

> > +         return;

> > +     }

> > +      else if (i.mem_operands == 1

> > +            && lfence_before_indirect_branch != lfence_branch_register)

> > +     {

> > +       as_warn (_("indirect branch `%s` over memory should be avoided"),

> > +                i.tm.name);

>

> Perhaps drop "branch" and replace "over memory" by "with memory operand"?


Changed.

> > +       return;

> > +     }

> > +      else

> > +     return;

> > +

> > +      if (last_insn.kind != last_insn_other

> > +       && last_insn.seg == now_seg)

> > +     {

> > +       as_warn_where (last_insn.file, last_insn.line,

> > +                      _("`%s` skips -mlfence-before-indirect-branch on `%s`"),

> > +                      last_insn.name, i.tm.name);

> > +       return;

> > +     }

> > +

> > +      p = frag_more (3);

> > +      *p++ = 0xf;

> > +      *p++ = 0xae;

> > +      *p = 0xe8;

> > +      return;

> > +    }

> > +

> > +  /* Output orl/notl and lfence before ret.  */

>

> May I suggest to either drop the insn suffixes here (and below),

> or make them correctly reflect the code below (which may also

> produce q- or w-suffixed insns)?


Changed.

> > +  if (lfence_before_ret != lfence_before_ret_none

> > +      && (i.tm.base_opcode == 0xc2

> > +       || i.tm.base_opcode == 0xc3

> > +       || i.tm.base_opcode == 0xca

> > +       || i.tm.base_opcode == 0xcb))

> > +    {

> > +      if (last_insn.kind != last_insn_other

> > +       && last_insn.seg == now_seg)

> > +     {

> > +       as_warn_where (last_insn.file, last_insn.line,

> > +                      _("`%s` skips -mlfence-before-ret on `%s`"),

> > +                      last_insn.name, i.tm.name);

> > +       return;

> > +     }

> > +      if (lfence_before_ret == lfence_before_ret_or)

> > +     {

> > +       /* orl: 0x830c2400.  */

> > +       p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

> > +       if (flag_code == CODE_64BIT)

> > +         *p++ = 0x48;

>

> Shouldn't this depend on RET's operand size? Likewise wouldn't you

> also need to insert 0x66/0x67 in certain cases?


Hongtao, can you look into it?

> > +       *p++ = 0x83;

> > +       *p++ = 0xc;

> > +       *p++ = 0x24;

> > +       *p++ = 0x0;

> > +     }

> > +      else

> > +     {

> > +       p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);

> > +       /* notl: 0xf71424.  */

> > +       if (flag_code == CODE_64BIT)

> > +         *p++ = 0x48;

> > +       *p++ = 0xf7;

> > +       *p++ = 0x14;

> > +       *p++ = 0x24;

> > +       if (flag_code == CODE_64BIT)

> > +         *p++ = 0x48;

> > +       /* notl: 0xf71424.  */

> > +       *p++ = 0xf7;

> > +       *p++ = 0x14;

> > +       *p++ = 0x24;

>

> When reading the description I was wondering about the use of NOT.

> I think the doc should mention that it's _two_ NOTs that get inserted,

> as this is even more growth of code size than the OR variant. Is

> there a performance reason for having this extra, more expensive (in

> terms of code size) variant? Or is it rather because of the OR

> variant clobbering EFLAGS (which ought to be called out in the doc)?

> In which case - was it considered to use e.g. SHL with an immediate

> of zero, thus having smaller code _and_ untouched EFLAGS (but of

> course requiring at least an 80186, albeit the addressing mode

> used requires a 386 anyway, which you don't seem to be checking

> anywhere)?


This is a very good suggestion.  I will talk to our people.  In meantime,
I'd like to keep it as is since this version has been tested extensively.
We can change it to SHL 0 later.

> Also I guess the last comment above would better move two lines up?


Changed.

> > @@ -12668,6 +12986,41 @@ md_parse_option (int c, const char *arg)

> >          as_fatal (_("invalid -mfence-as-lock-add= option: `%s'"), arg);

> >        break;

> >

> > +    case OPTION_MLFENCE_AFTER_LOAD:

> > +      if (strcasecmp (arg, "yes") == 0)

> > +     lfence_after_load = 1;

> > +      else if (strcasecmp (arg, "no") == 0)

> > +     lfence_after_load = 0;

> > +      else

> > +        as_fatal (_("invalid -mlfence-after-load= option: `%s'"), arg);

> > +      break;

> > +

> > +    case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:

> > +      if (strcasecmp (arg, "all") == 0)

> > +     lfence_before_indirect_branch = lfence_branch_all;

>

> I wonder whether this shouldn't also enable a safe lfence_before_ret

> mode (i.e. not the OR one), for RET also being an indirect branch. Of

> course care would need to be taken to avoid clobbering an already set

> lfence_before_ret mode.


Hongtao, can you look into it?

> > @@ -13254,6 +13616,10 @@ i386_cons_align (int ignore ATTRIBUTE_UNUSED)

> >        last_insn.kind = last_insn_directive;

> >        last_insn.name = "constant directive";

> >        last_insn.file = as_where (&last_insn.line);

> > +      if (lfence_before_ret != lfence_before_ret_none)

> > +     as_warn (_("constant directive skips -mlfence-before-ret"));

> > +      if (lfence_before_indirect_branch != lfence_branch_none)

> > +     as_warn (_("constant directive skips -mlfence-before-indirect-branch"));

>

> Could these be folded into a single warning, to avoid getting overly

> verbose?

>


Changed.

This is the patch I am checking in.

-- 
H.J.
From c4b77b8bd61dfdd17d51f4e1c9178234e1139fd6 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Tue, 10 Mar 2020 08:29:57 -0700
Subject: [PATCH] i386: Generate lfence with load/indirect branch/ret
 [CVE-2020-0551]

Add 3 command-line options to generate lfence for load, indirect near
branch and ret to help mitigate:

https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00334.html
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-0551

1. -mlfence-after-load=[no|yes]:
  -mlfence-after-load=yes generates lfence after load instructions.
2. -mlfence-before-indirect-branch=[none|all|memory|register]:
  a. -mlfence-before-indirect-branch=all generates lfence before indirect
  near branches via register and a warning before indirect near branches
  via memory.
  b. -mlfence-before-indirect-branch=memory issue a warning before
  indirect near branches via memory.
  c. -mlfence-before-indirect-branch=register generates lfence before
  indirect near branches via register.
Note that lfence won't be generated before indirect near branches via
register with -mlfence-after-load=yes since lfence will be generated
after loading branch target register.
3. -mlfence-before-ret=[none|or|not]
  a. -mlfence-before-ret=or generates or with lfence before ret.
  b. -mlfence-before-ret=not generates not with lfence before ret.

A warning will be issued and lfence won't be generated before indirect
near branch and ret if the previous item is a prefix or a constant
directive, which may be used to hardcode an instruction, since there
is no clear instruction boundary.

	* config/tc-i386.c (lfence_after_load): New.
	(lfence_before_indirect_branch_kind): New.
	(lfence_before_indirect_branch): New.
	(lfence_before_ret_kind): New.
	(lfence_before_ret): New.
	(last_insn): New.
	(load_insn_p): New.
	(insert_lfence_after): New.
	(insert_lfence_before): New.
	(md_assemble): Call insert_lfence_before and insert_lfence_after.
	Set last_insn.
	(OPTION_MLFENCE_AFTER_LOAD): New.
	(OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH): New.
	(OPTION_MLFENCE_BEFORE_RET): New.
	(md_longopts): Add -mlfence-after-load=,
	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
	(md_parse_option): Handle -mlfence-after-load=,
	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
	(md_show_usage): Display -mlfence-after-load=,
	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
	(i386_cons_align): New.
	* config/tc-i386.h (i386_cons_align): New.
	(md_cons_align): New.
	* doc/c-i386.texi: Document -mlfence-after-load=,
	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
---
 gas/ChangeLog        |  28 ++++
 gas/config/tc-i386.c | 366 ++++++++++++++++++++++++++++++++++++++++++-
 gas/doc/c-i386.texi  |  43 +++++
 3 files changed, 436 insertions(+), 1 deletion(-)

diff --git a/gas/ChangeLog b/gas/ChangeLog
index 836cb5c6d9..d581cc3d47 100644
--- a/gas/ChangeLog
+++ b/gas/ChangeLog
@@ -1,3 +1,31 @@
+2020-03-10  H.J. Lu  <hongjiu.lu@intel.com>
+
+	* config/tc-i386.c (lfence_after_load): New.
+	(lfence_before_indirect_branch_kind): New.
+	(lfence_before_indirect_branch): New.
+	(lfence_before_ret_kind): New.
+	(lfence_before_ret): New.
+	(last_insn): New.
+	(load_insn_p): New.
+	(insert_lfence_after): New.
+	(insert_lfence_before): New.
+	(md_assemble): Call insert_lfence_before and insert_lfence_after.
+	Set last_insn.
+	(OPTION_MLFENCE_AFTER_LOAD): New.
+	(OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH): New.
+	(OPTION_MLFENCE_BEFORE_RET): New.
+	(md_longopts): Add -mlfence-after-load=,
+	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
+	(md_parse_option): Handle -mlfence-after-load=,
+	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
+	(md_show_usage): Display -mlfence-after-load=,
+	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
+	(i386_cons_align): New.
+	* config/tc-i386.h (i386_cons_align): New.
+	(md_cons_align): New.
+	* doc/c-i386.texi: Document -mlfence-after-load=,
+	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
+
 2020-03-10  Alan Modra  <amodra@gmail.com>
 
 	* config/tc-csky.c (get_operand_value): Rewrite 1 << 31 expressions
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index b020f39c86..09063f784b 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -629,7 +629,29 @@ static int omit_lock_prefix = 0;
    "lock addl $0, (%{re}sp)".  */
 static int avoid_fence = 0;
 
-/* Type of the previous instruction.  */
+/* 1 if lfence should be inserted after every load.  */
+static int lfence_after_load = 0;
+
+/* Non-zero if lfence should be inserted before indirect branch.  */
+static enum lfence_before_indirect_branch_kind
+  {
+    lfence_branch_none = 0,
+    lfence_branch_register,
+    lfence_branch_memory,
+    lfence_branch_all
+  }
+lfence_before_indirect_branch;
+
+/* Non-zero if lfence should be inserted before ret.  */
+static enum lfence_before_ret_kind
+  {
+    lfence_before_ret_none = 0,
+    lfence_before_ret_not,
+    lfence_before_ret_or
+  }
+lfence_before_ret;
+
+/* Types of previous instruction is .byte or prefix.  */
 static struct
   {
     segT seg;
@@ -4311,6 +4333,283 @@ optimize_encoding (void)
     }
 }
 
+/* Return non-zero for load instruction.  */
+
+static int
+load_insn_p (void)
+{
+  unsigned int dest;
+  int any_vex_p = is_any_vex_encoding (&i.tm);
+  unsigned int base_opcode = i.tm.base_opcode | 1;
+
+  if (!any_vex_p)
+    {
+      /* lea  */
+      if (i.tm.base_opcode == 0x8d)
+	return 0;
+
+      /* pop  */
+      if ((i.tm.base_opcode & ~7) == 0x58
+	  || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))
+	return 1;
+
+      /* movs, cmps, lods, scas.  */
+      if ((i.tm.base_opcode | 0xb) == 0xaf)
+	return 1;
+
+      /* outs */
+      if (base_opcode == 0x6f)
+	return 1;
+    }
+
+  /* No memory operand.  */
+  if (!i.mem_operands)
+    return 0;
+
+  if (any_vex_p)
+    {
+      /* vldmxcsr.  */
+      if (i.tm.base_opcode == 0xae
+	  && i.tm.opcode_modifier.vex
+	  && i.tm.opcode_modifier.vexopcode == VEX0F
+	  && i.tm.extension_opcode == 2)
+	return 1;
+    }
+  else
+    {
+      /* test, not, neg, mul, imul, div, idiv.  */
+      if ((i.tm.base_opcode == 0xf6 || i.tm.base_opcode == 0xf7)
+	  && i.tm.extension_opcode != 1)
+	return 1;
+
+      /* inc, dec.  */
+      if (base_opcode == 0xff && i.tm.extension_opcode <= 1)
+	return 1;
+
+      /* add, or, adc, sbb, and, sub, xor, cmp.  */
+      if (i.tm.base_opcode >= 0x80 && i.tm.base_opcode <= 0x83)
+	return 1;
+
+      /* bt, bts, btr, btc.  */
+      if (i.tm.base_opcode == 0xfba
+	  && (i.tm.extension_opcode >= 4 && i.tm.extension_opcode <= 7))
+	return 1;
+
+      /* rol, ror, rcl, rcr, shl/sal, shr, sar. */
+      if ((base_opcode == 0xc1
+	   || (i.tm.base_opcode >= 0xd0 && i.tm.base_opcode <= 0xd3))
+	  && i.tm.extension_opcode != 6)
+	return 1;
+
+      /* cmpxchg8b, cmpxchg16b, xrstors.  */
+      if (i.tm.base_opcode == 0xfc7
+	  && (i.tm.extension_opcode == 1 || i.tm.extension_opcode == 3))
+	return 1;
+
+      /* fxrstor, ldmxcsr, xrstor.  */
+      if (i.tm.base_opcode == 0xfae
+	  && (i.tm.extension_opcode == 1
+	      || i.tm.extension_opcode == 2
+	      || i.tm.extension_opcode == 5))
+	return 1;
+
+      /* lgdt, lidt, lmsw.  */
+      if (i.tm.base_opcode == 0xf01
+	  && (i.tm.extension_opcode == 2
+	      || i.tm.extension_opcode == 3
+	      || i.tm.extension_opcode == 6))
+	return 1;
+
+      /* vmptrld */
+      if (i.tm.base_opcode == 0xfc7
+	  && i.tm.extension_opcode == 6)
+	return 1;
+
+      /* Check for x87 instructions.  */
+      if (i.tm.base_opcode >= 0xd8 && i.tm.base_opcode <= 0xdf)
+	{
+	  /* Skip fst, fstp, fstenv, fstcw.  */
+	  if (i.tm.base_opcode == 0xd9
+	      && (i.tm.extension_opcode == 2
+		  || i.tm.extension_opcode == 3
+		  || i.tm.extension_opcode == 6
+		  || i.tm.extension_opcode == 7))
+	    return 0;
+
+	  /* Skip fisttp, fist, fistp, fstp.  */
+	  if (i.tm.base_opcode == 0xdb
+	      && (i.tm.extension_opcode == 1
+		  || i.tm.extension_opcode == 2
+		  || i.tm.extension_opcode == 3
+		  || i.tm.extension_opcode == 7))
+	    return 0;
+
+	  /* Skip fisttp, fst, fstp, fsave, fstsw.  */
+	  if (i.tm.base_opcode == 0xdd
+	      && (i.tm.extension_opcode == 1
+		  || i.tm.extension_opcode == 2
+		  || i.tm.extension_opcode == 3
+		  || i.tm.extension_opcode == 6
+		  || i.tm.extension_opcode == 7))
+	    return 0;
+
+	  /* Skip fisttp, fist, fistp, fbstp, fistp.  */
+	  if (i.tm.base_opcode == 0xdf
+	      && (i.tm.extension_opcode == 1
+		  || i.tm.extension_opcode == 2
+		  || i.tm.extension_opcode == 3
+		  || i.tm.extension_opcode == 6
+		  || i.tm.extension_opcode == 7))
+	    return 0;
+
+	  return 1;
+	}
+    }
+
+  dest = i.operands - 1;
+
+  /* Check fake imm8 operand and 3 source operands.  */
+  if ((i.tm.opcode_modifier.immext
+       || i.tm.opcode_modifier.vexsources == VEX3SOURCES)
+      && i.types[dest].bitfield.imm8)
+    dest--;
+
+  /* add, or, adc, sbb, and, sub, xor, cmp, test, xchg, xadd  */
+  if (!any_vex_p
+      && (base_opcode == 0x1
+	  || base_opcode == 0x9
+	  || base_opcode == 0x11
+	  || base_opcode == 0x19
+	  || base_opcode == 0x21
+	  || base_opcode == 0x29
+	  || base_opcode == 0x31
+	  || base_opcode == 0x39
+	  || (i.tm.base_opcode >= 0x84 && i.tm.base_opcode <= 0x87)
+	  || base_opcode == 0xfc1))
+    return 1;
+
+  /* Check for load instruction.  */
+  return (i.types[dest].bitfield.class != ClassNone
+	  || i.types[dest].bitfield.instance == Accum);
+}
+
+/* Output lfence, 0xfaee8, after instruction.  */
+
+static void
+insert_lfence_after (void)
+{
+  if (lfence_after_load && load_insn_p ())
+    {
+      char *p = frag_more (3);
+      *p++ = 0xf;
+      *p++ = 0xae;
+      *p = 0xe8;
+    }
+}
+
+/* Output lfence, 0xfaee8, before instruction.  */
+
+static void
+insert_lfence_before (void)
+{
+  char *p;
+
+  if (is_any_vex_encoding (&i.tm))
+    return;
+
+  if (i.tm.base_opcode == 0xff
+      && (i.tm.extension_opcode == 2 || i.tm.extension_opcode == 4))
+    {
+      /* Insert lfence before indirect branch if needed.  */
+
+      if (lfence_before_indirect_branch == lfence_branch_none)
+	return;
+
+      if (i.operands != 1)
+	abort ();
+
+      if (i.reg_operands == 1)
+	{
+	  /* Indirect branch via register.  Don't insert lfence with
+	     -mlfence-after-load=yes.  */
+	  if (lfence_after_load
+	      || lfence_before_indirect_branch == lfence_branch_memory)
+	    return;
+	}
+      else if (i.mem_operands == 1
+	       && lfence_before_indirect_branch != lfence_branch_register)
+	{
+	  as_warn (_("indirect `%s` with memory operand should be avoided"),
+		   i.tm.name);
+	  return;
+	}
+      else
+	return;
+
+      if (last_insn.kind != last_insn_other
+	  && last_insn.seg == now_seg)
+	{
+	  as_warn_where (last_insn.file, last_insn.line,
+			 _("`%s` skips -mlfence-before-indirect-branch on `%s`"),
+			 last_insn.name, i.tm.name);
+	  return;
+	}
+
+      p = frag_more (3);
+      *p++ = 0xf;
+      *p++ = 0xae;
+      *p = 0xe8;
+      return;
+    }
+
+  /* Output or/not and lfence before ret.  */
+  if (lfence_before_ret != lfence_before_ret_none
+      && (i.tm.base_opcode == 0xc2
+	  || i.tm.base_opcode == 0xc3
+	  || i.tm.base_opcode == 0xca
+	  || i.tm.base_opcode == 0xcb))
+    {
+      if (last_insn.kind != last_insn_other
+	  && last_insn.seg == now_seg)
+	{
+	  as_warn_where (last_insn.file, last_insn.line,
+			 _("`%s` skips -mlfence-before-ret on `%s`"),
+			 last_insn.name, i.tm.name);
+	  return;
+	}
+      if (lfence_before_ret == lfence_before_ret_or)
+	{
+	  /* orl: 0x830c2400.  */
+	  p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);
+	  if (flag_code == CODE_64BIT)
+	    *p++ = 0x48;
+	  *p++ = 0x83;
+	  *p++ = 0xc;
+	  *p++ = 0x24;
+	  *p++ = 0x0;
+	}
+      else
+	{
+	  p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);
+	  /* notl: 0xf71424.  */
+	  if (flag_code == CODE_64BIT)
+	    *p++ = 0x48;
+	  *p++ = 0xf7;
+	  *p++ = 0x14;
+	  *p++ = 0x24;
+	  /* notl: 0xf71424.  */
+	  if (flag_code == CODE_64BIT)
+	    *p++ = 0x48;
+	  *p++ = 0xf7;
+	  *p++ = 0x14;
+	  *p++ = 0x24;
+	}
+      *p++ = 0xf;
+      *p++ = 0xae;
+      *p = 0xe8;
+    }
+}
+
 /* This is the guts of the machine-dependent assembler.  LINE points to a
    machine dependent instruction.  This function is supposed to emit
    the frags/bytes it assembles to.  */
@@ -4628,9 +4927,13 @@ md_assemble (char *line)
   if (i.rex != 0)
     add_prefix (REX_OPCODE | i.rex);
 
+  insert_lfence_before ();
+
   /* We are ready to output the insn.  */
   output_insn ();
 
+  insert_lfence_after ();
+
   last_insn.seg = now_seg;
 
   if (i.tm.opcode_modifier.isprefix)
@@ -12250,6 +12553,9 @@ const char *md_shortopts = "qnO::";
 #define OPTION_MALIGN_BRANCH_PREFIX_SIZE (OPTION_MD_BASE + 28)
 #define OPTION_MALIGN_BRANCH (OPTION_MD_BASE + 29)
 #define OPTION_MBRANCHES_WITH_32B_BOUNDARIES (OPTION_MD_BASE + 30)
+#define OPTION_MLFENCE_AFTER_LOAD (OPTION_MD_BASE + 31)
+#define OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH (OPTION_MD_BASE + 32)
+#define OPTION_MLFENCE_BEFORE_RET (OPTION_MD_BASE + 33)
 
 struct option md_longopts[] =
 {
@@ -12289,6 +12595,10 @@ struct option md_longopts[] =
   {"malign-branch-prefix-size", required_argument, NULL, OPTION_MALIGN_BRANCH_PREFIX_SIZE},
   {"malign-branch", required_argument, NULL, OPTION_MALIGN_BRANCH},
   {"mbranches-within-32B-boundaries", no_argument, NULL, OPTION_MBRANCHES_WITH_32B_BOUNDARIES},
+  {"mlfence-after-load", required_argument, NULL, OPTION_MLFENCE_AFTER_LOAD},
+  {"mlfence-before-indirect-branch", required_argument, NULL,
+   OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH},
+  {"mlfence-before-ret", required_argument, NULL, OPTION_MLFENCE_BEFORE_RET},
   {"mamd64", no_argument, NULL, OPTION_MAMD64},
   {"mintel64", no_argument, NULL, OPTION_MINTEL64},
   {NULL, no_argument, NULL, 0}
@@ -12668,6 +12978,41 @@ md_parse_option (int c, const char *arg)
         as_fatal (_("invalid -mfence-as-lock-add= option: `%s'"), arg);
       break;
 
+    case OPTION_MLFENCE_AFTER_LOAD:
+      if (strcasecmp (arg, "yes") == 0)
+	lfence_after_load = 1;
+      else if (strcasecmp (arg, "no") == 0)
+	lfence_after_load = 0;
+      else
+        as_fatal (_("invalid -mlfence-after-load= option: `%s'"), arg);
+      break;
+
+    case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:
+      if (strcasecmp (arg, "all") == 0)
+	lfence_before_indirect_branch = lfence_branch_all;
+      else if (strcasecmp (arg, "memory") == 0)
+	lfence_before_indirect_branch = lfence_branch_memory;
+      else if (strcasecmp (arg, "register") == 0)
+	lfence_before_indirect_branch = lfence_branch_register;
+      else if (strcasecmp (arg, "none") == 0)
+	lfence_before_indirect_branch = lfence_branch_none;
+      else
+        as_fatal (_("invalid -mlfence-before-indirect-branch= option: `%s'"),
+		  arg);
+      break;
+
+    case OPTION_MLFENCE_BEFORE_RET:
+      if (strcasecmp (arg, "or") == 0)
+	lfence_before_ret = lfence_before_ret_or;
+      else if (strcasecmp (arg, "not") == 0)
+	lfence_before_ret = lfence_before_ret_not;
+      else if (strcasecmp (arg, "none") == 0)
+	lfence_before_ret = lfence_before_ret_none;
+      else
+        as_fatal (_("invalid -mlfence-before-ret= option: `%s'"),
+		  arg);
+      break;
+
     case OPTION_MRELAX_RELOCATIONS:
       if (strcasecmp (arg, "yes") == 0)
         generate_relax_relocations = 1;
@@ -13025,6 +13370,15 @@ md_show_usage (FILE *stream)
   -mbranches-within-32B-boundaries\n\
                           align branches within 32 byte boundary\n"));
   fprintf (stream, _("\
+  -mlfence-after-load=[no|yes] (default: no)\n\
+                          generate lfence after load\n"));
+  fprintf (stream, _("\
+  -mlfence-before-indirect-branch=[none|all|register|memory] (default: none)\n\
+                          generate lfence before indirect near branch\n"));
+  fprintf (stream, _("\
+  -mlfence-before-ret=[none|or|not] (default: none)\n\
+                          generate lfence before ret\n"));
+  fprintf (stream, _("\
   -mamd64                 accept only AMD64 ISA [default]\n"));
   fprintf (stream, _("\
   -mintel64               accept only Intel64 ISA\n"));
@@ -13254,6 +13608,16 @@ i386_cons_align (int ignore ATTRIBUTE_UNUSED)
       last_insn.kind = last_insn_directive;
       last_insn.name = "constant directive";
       last_insn.file = as_where (&last_insn.line);
+      if (lfence_before_ret != lfence_before_ret_none)
+	{
+	  if (lfence_before_indirect_branch != lfence_branch_none)
+	    as_warn (_("constant directive skips -mlfence-before-ret "
+		       "and -mlfence-before-indirect-branch"));
+	  else
+	    as_warn (_("constant directive skips -mlfence-before-ret"));
+	}
+      else if (lfence_before_indirect_branch != lfence_branch_none)
+	as_warn (_("constant directive skips -mlfence-before-indirect-branch"));
     }
 }
 
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index c536759cb3..1dd99f91bb 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -464,6 +464,49 @@ on an instruction.  It is equivalent to
 @option{-malign-branch-prefix-size=5}.
 The default doesn't align branches.
 
+@cindex @samp{-mlfence-after-load=} option, i386
+@cindex @samp{-mlfence-after-load=} option, x86-64
+@item -mlfence-after-load=@var{no}
+@itemx -mlfence-after-load=@var{yes}
+These options control whether the assembler should generate lfence
+after load instructions.  @option{-mlfence-after-load=@var{yes}} will
+generate lfence.  @option{-mlfence-after-load=@var{no}} will not generate
+lfence, which is the default.
+
+@cindex @samp{-mlfence-before-indirect-branch=} option, i386
+@cindex @samp{-mlfence-before-indirect-branch=} option, x86-64
+@item -mlfence-before-indirect-branch=@var{none}
+@item -mlfence-before-indirect-branch=@var{all}
+@item -mlfence-before-indirect-branch=@var{register}
+@itemx -mlfence-before-indirect-branch=@var{memory}
+These options control whether the assembler should generate lfence
+after indirect near branch instructions.
+@option{-mlfence-before-indirect-branch=@var{all}} will generate lfence
+after indirect near branch via register and issue a warning before
+indirect near branch via memory.
+@option{-mlfence-before-indirect-branch=@var{register}} will generate
+lfence after indirect near branch via register.
+@option{-mlfence-before-indirect-branch=@var{memory}} will issue a
+warning before indirect near branch via memory.
+@option{-mlfence-before-indirect-branch=@var{none}} will not generate
+lfence nor issue warning, which is the default.  Note that lfence won't
+be generated before indirect near branch via register with
+@option{-mlfence-after-load=@var{yes}} since lfence will be generated
+after loading branch target register.
+
+@cindex @samp{-mlfence-before-ret=} option, i386
+@cindex @samp{-mlfence-before-ret=} option, x86-64
+@item -mlfence-before-ret=@var{none}
+@item -mlfence-before-ret=@var{or}
+@itemx -mlfence-before-ret=@var{not}
+These options control whether the assembler should generate lfence
+before ret.  @option{-mlfence-before-ret=@var{or}} will generate
+generate or instruction with lfence.
+@option{-mlfence-before-ret=@var{not}} will generate not instruction
+with lfence.
+@option{-mlfence-before-ret=@var{none}} will not generate lfence,
+which is the default.
+
 @cindex @samp{-mx86-used-note=} option, i386
 @cindex @samp{-mx86-used-note=} option, x86-64
 @item -mx86-used-note=@var{no}
Fangrui Song via Binutils March 25, 2020, 9:27 a.m. | #3
On Thu, Mar 12, 2020 at 12:17 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Wed, Mar 11, 2020 at 3:55 AM Jan Beulich <jbeulich@suse.com> wrote:

> >

> > On 10.03.2020 17:05, H.J. Lu wrote:

> > > @@ -4311,6 +4333,291 @@ optimize_encoding (void)

> > >      }

> > >  }

> > >

> > > +/* Return non-zero for load instruction.  */

> > > +

> > > +static int

> > > +load_insn_p (void)

> > > +{

> > > +  unsigned int dest;

> > > +  int any_vex_p = is_any_vex_encoding (&i.tm);

> > > +

> > > +  if (!any_vex_p)

> > > +    {

> > > +      /* lea  */

> > > +      if (i.tm.base_opcode == 0x8d)

> > > +     return 0;

> >

> > Also include INVLPG, CLFLUSH etc, and maybe some prefetches here?

> > (I'll mention the LEA-like MPX insns as well, but I think I can

> > predict your reply.)

>

> Hongtao, can you look into it?

>

> > > +      /* pop  */

> > > +      if ((i.tm.base_opcode & 0xfffffff8) == 0x58

> >

> > Mind using ~7 instead?

>

> Changed.

>

> > > +       || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))

> > > +     return 1;

> >

> > What about segment register POPs, POPF, POPA, ENTER, and LEAVE?

>

> We decided that ENTER and LEAVE are safe.  Hongtao, can you look into others?

>

> > > +      /* movs, cmps, lods, scas.  */

> > > +      if ((i.tm.base_opcode >= 0xa4 && i.tm.base_opcode <= 0xa7)

> > > +       || (i.tm.base_opcode >= 0xac && i.tm.base_opcode <= 0xaf))

> >

> > This can be had with a single comparison:

> >

> >       if ((i.tm.base_opcode | 0xb) == 0xaf)

>

> Changed.

>

> > > +     return 1;

> > > +

> > > +      /* outs */

> > > +      if (i.tm.base_opcode == 0x6e || i.tm.base_opcode == 0x6f)

> >

> > And here:

> >

> >       if ((i.tm.base_opcode | 1) == 0x6f)

> >

> > Similar folding of comparisons may also be desirable further down.

>

> Changed,

>

> > Also, what about XLATB? What about implicit memory accesses done by

>

> Hongtao, can you look into it?

>

> > e.g. segment register loads? As to AMD-specific insns with implicit

> > memory operands (often accessed through rAX), should the doc

> > perhaps mention they're intentionally not covered?

>

> Yes, AMD specific insns are skipped.  Hongtao, can you look into it?

>

> > > +     return 1;

> > > +    }

> > > +

> > > +  /* No memory operand.  */

> > > +  if (!i.mem_operands)

> > > +    return 0;

> > > +

> > > +  if (any_vex_p)

> > > +    {

> > > +      /* vldmxcsr.  */

> > > +      if (i.tm.base_opcode == 0xae

> > > +       && i.tm.opcode_modifier.vex

> > > +       && i.tm.opcode_modifier.vexopcode == VEX0F

> > > +       && i.tm.extension_opcode == 2)

> > > +     return 1;

> > > +    }

> > > +  else

> > > +    {

> > > +      /* test, not, neg, mul, imul, div, idiv.  */

> > > +      if ((i.tm.base_opcode == 0xf6 || i.tm.base_opcode == 0xf7)

> > > +       && i.tm.extension_opcode != 1)

> > > +     return 1;

> > > +

> > > +      /* inc, dec.  */

> > > +      if ((i.tm.base_opcode == 0xfe || i.tm.base_opcode == 0xff)

> > > +       && i.tm.extension_opcode <= 1)

> > > +     return 1;

> > > +

> > > +      /* add, or, adc, sbb, and, sub, xor, cmp.  */

> > > +      if (i.tm.base_opcode >= 0x80 && i.tm.base_opcode <= 0x83)

> > > +     return 1;

> > > +

> > > +      /* bt, bts, btr, btc.  */

> > > +      if (i.tm.base_opcode == 0xfba

> > > +       && (i.tm.extension_opcode >= 4 && i.tm.extension_opcode <= 7))

> > > +     return 1;

> > > +

> > > +      /* rol, ror, rcl, rcr, shl/sal, shr, sar. */

> > > +      if ((i.tm.base_opcode == 0xc0

> > > +        || i.tm.base_opcode == 0xc1

> > > +        || (i.tm.base_opcode >= 0xd0 && i.tm.base_opcode <= 0xd3))

> > > +       && i.tm.extension_opcode != 6)

> > > +     return 1;

> > > +

> > > +      /* cmpxchg8b, cmpxchg16b, xrstors.  */

> > > +      if (i.tm.base_opcode == 0xfc7

> > > +       && (i.tm.extension_opcode == 1 || i.tm.extension_opcode == 3))

> > > +     return 1;

> > > +

> > > +      /* fxrstor, ldmxcsr, xrstor.  */

> > > +      if (i.tm.base_opcode == 0xfae

> > > +       && (i.tm.extension_opcode == 1

> > > +           || i.tm.extension_opcode == 2

> > > +           || i.tm.extension_opcode == 5))

> > > +     return 1;

> > > +

> > > +      /* lgdt, lidt, lmsw.  */

> > > +      if (i.tm.base_opcode == 0xf01

> > > +       && (i.tm.extension_opcode == 2

> > > +           || i.tm.extension_opcode == 3

> > > +           || i.tm.extension_opcode == 6))

> > > +     return 1;

> > > +

> > > +      /* vmptrld */

> > > +      if (i.tm.base_opcode == 0xfc7

> > > +       && i.tm.extension_opcode == 6)

> > > +     return 1;

> > > +

> > > +      /* Check for x87 instructions.  */

> > > +      if (i.tm.base_opcode >= 0xd8 && i.tm.base_opcode <= 0xdf)

> > > +     {

> > > +       /* Skip fst, fstp, fstenv, fstcw.  */

> > > +       if (i.tm.base_opcode == 0xd9

> > > +           && (i.tm.extension_opcode == 2

> > > +               || i.tm.extension_opcode == 3

> > > +               || i.tm.extension_opcode == 6

> > > +               || i.tm.extension_opcode == 7))

> > > +         return 0;

> > > +

> > > +       /* Skip fisttp, fist, fistp, fstp.  */

> > > +       if (i.tm.base_opcode == 0xdb

> > > +           && (i.tm.extension_opcode == 1

> > > +               || i.tm.extension_opcode == 2

> > > +               || i.tm.extension_opcode == 3

> > > +               || i.tm.extension_opcode == 7))

> > > +         return 0;

> > > +

> > > +       /* Skip fisttp, fst, fstp, fsave, fstsw.  */

> > > +       if (i.tm.base_opcode == 0xdd

> > > +           && (i.tm.extension_opcode == 1

> > > +               || i.tm.extension_opcode == 2

> > > +               || i.tm.extension_opcode == 3

> > > +               || i.tm.extension_opcode == 6

> > > +               || i.tm.extension_opcode == 7))

> > > +         return 0;

> > > +

> > > +       /* Skip fisttp, fist, fistp, fbstp, fistp.  */

> > > +       if (i.tm.base_opcode == 0xdf

> > > +           && (i.tm.extension_opcode == 1

> > > +               || i.tm.extension_opcode == 2

> > > +               || i.tm.extension_opcode == 3

> > > +               || i.tm.extension_opcode == 6

> > > +               || i.tm.extension_opcode == 7))

> > > +         return 0;

> > > +

> > > +       return 1;

> > > +     }

> > > +    }

> > > +

> > > +  dest = i.operands - 1;

> > > +

> > > +  /* Check fake imm8 operand and 3 source operands.  */

> > > +  if ((i.tm.opcode_modifier.immext

> > > +       || i.tm.opcode_modifier.vexsources == VEX3SOURCES)

> > > +      && i.types[dest].bitfield.imm8)

> > > +    dest--;

> > > +

> > > +  /* add, or, adc, sbb, and, sub, xor, cmp, test, xchg, xadd  */

> > > +  if (!any_vex_p

> > > +      && (i.tm.base_opcode == 0x0

> > > +       || i.tm.base_opcode == 0x1

> > > +       || i.tm.base_opcode == 0x8

> > > +       || i.tm.base_opcode == 0x9

> > > +       || i.tm.base_opcode == 0x10

> > > +       || i.tm.base_opcode == 0x11

> > > +       || i.tm.base_opcode == 0x18

> > > +       || i.tm.base_opcode == 0x19

> > > +       || i.tm.base_opcode == 0x20

> > > +       || i.tm.base_opcode == 0x21

> > > +       || i.tm.base_opcode == 0x28

> > > +       || i.tm.base_opcode == 0x29

> > > +       || i.tm.base_opcode == 0x30

> > > +       || i.tm.base_opcode == 0x31

> > > +       || i.tm.base_opcode == 0x38

> > > +       || i.tm.base_opcode == 0x39

> > > +       || (i.tm.base_opcode >= 0x84 && i.tm.base_opcode <= 0x87)

> > > +       || i.tm.base_opcode == 0xfc0

> > > +       || i.tm.base_opcode == 0xfc1))

> > > +    return 1;

> >

> > Don't quite a few of these fit very well with ...

> >

>

> Changed.

>

> > > +  /* Check for load instruction.  */

> > > +  return (i.types[dest].bitfield.class != ClassNone

> > > +       || i.types[dest].bitfield.instance == Accum);

> >

> > ... this generic expression? It would seem to me that only TEST

> > and XCHG need special casing, for allowing either operand order.

> > Same seems to apply to quite a few of the special cases in the

> > big "else" block further up, and even its if() [vldmxcsr] part.

>

> Hongtao, can you look into it?

>


Many instruction take mem operand both as input and output, they also
need to be handled. But they're not fitting well in generic
expressions, because they either only 1 operand, or mem operand is in
the dest place.

> > > +static void

> > > +insert_lfence_before (void)

> > > +{

> > > +  char *p;

> > > +

> > > +  if (i.tm.base_opcode == 0xff

> > > +      && (i.tm.extension_opcode == 2 || i.tm.extension_opcode == 4))

> >

> > Also exclude VEX- and alike encoded insn here again?

>

> I changed to:

>

> static void

> insert_lfence_before (void)

> {

>   char *p;

>

>   if (is_any_vex_encoding (&i.tm))

>     return;

>

> > > +    {

> > > +      /* Insert lfence before indirect branch if needed.  */

> > > +

> > > +      if (lfence_before_indirect_branch == lfence_branch_none)

> > > +     return;

> > > +

> > > +      if (i.operands != 1)

> > > +     abort ();

> > > +

> > > +      if (i.reg_operands == 1)

> > > +     {

> > > +       /* Indirect branch via register.  Don't insert lfence with

> > > +          -mlfence-after-load=yes.  */

> > > +       if (lfence_after_load

> > > +           || lfence_before_indirect_branch == lfence_branch_memory)

> > > +         return;

> > > +     }

> > > +      else if (i.mem_operands == 1

> > > +            && lfence_before_indirect_branch != lfence_branch_register)

> > > +     {

> > > +       as_warn (_("indirect branch `%s` over memory should be avoided"),

> > > +                i.tm.name);

> >

> > Perhaps drop "branch" and replace "over memory" by "with memory operand"?

>

> Changed.

>

> > > +       return;

> > > +     }

> > > +      else

> > > +     return;

> > > +

> > > +      if (last_insn.kind != last_insn_other

> > > +       && last_insn.seg == now_seg)

> > > +     {

> > > +       as_warn_where (last_insn.file, last_insn.line,

> > > +                      _("`%s` skips -mlfence-before-indirect-branch on `%s`"),

> > > +                      last_insn.name, i.tm.name);

> > > +       return;

> > > +     }

> > > +

> > > +      p = frag_more (3);

> > > +      *p++ = 0xf;

> > > +      *p++ = 0xae;

> > > +      *p = 0xe8;

> > > +      return;

> > > +    }

> > > +

> > > +  /* Output orl/notl and lfence before ret.  */

> >

> > May I suggest to either drop the insn suffixes here (and below),

> > or make them correctly reflect the code below (which may also

> > produce q- or w-suffixed insns)?

>

> Changed.

>

> > > +  if (lfence_before_ret != lfence_before_ret_none

> > > +      && (i.tm.base_opcode == 0xc2

> > > +       || i.tm.base_opcode == 0xc3

> > > +       || i.tm.base_opcode == 0xca

> > > +       || i.tm.base_opcode == 0xcb))

> > > +    {

> > > +      if (last_insn.kind != last_insn_other

> > > +       && last_insn.seg == now_seg)

> > > +     {

> > > +       as_warn_where (last_insn.file, last_insn.line,

> > > +                      _("`%s` skips -mlfence-before-ret on `%s`"),

> > > +                      last_insn.name, i.tm.name);

> > > +       return;

> > > +     }

> > > +      if (lfence_before_ret == lfence_before_ret_or)

> > > +     {

> > > +       /* orl: 0x830c2400.  */

> > > +       p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

> > > +       if (flag_code == CODE_64BIT)

> > > +         *p++ = 0x48;

> >

> > Shouldn't this depend on RET's operand size? Likewise wouldn't you

> > also need to insert 0x66/0x67 in certain cases?

>

> Hongtao, can you look into it?

>


I suppose you mean OR's operand size?

> > > +       *p++ = 0x83;

> > > +       *p++ = 0xc;

> > > +       *p++ = 0x24;

> > > +       *p++ = 0x0;

> > > +     }

> > > +      else

> > > +     {

> > > +       p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);

> > > +       /* notl: 0xf71424.  */

> > > +       if (flag_code == CODE_64BIT)

> > > +         *p++ = 0x48;

> > > +       *p++ = 0xf7;

> > > +       *p++ = 0x14;

> > > +       *p++ = 0x24;

> > > +       if (flag_code == CODE_64BIT)

> > > +         *p++ = 0x48;

> > > +       /* notl: 0xf71424.  */

> > > +       *p++ = 0xf7;

> > > +       *p++ = 0x14;

> > > +       *p++ = 0x24;

> >

> > When reading the description I was wondering about the use of NOT.

> > I think the doc should mention that it's _two_ NOTs that get inserted,

> > as this is even more growth of code size than the OR variant. Is

> > there a performance reason for having this extra, more expensive (in

> > terms of code size) variant? Or is it rather because of the OR

> > variant clobbering EFLAGS (which ought to be called out in the doc)?

> > In which case - was it considered to use e.g. SHL with an immediate

> > of zero, thus having smaller code _and_ untouched EFLAGS (but of

> > course requiring at least an 80186, albeit the addressing mode

> > used requires a 386 anyway, which you don't seem to be checking

> > anywhere)?

>

> This is a very good suggestion.  I will talk to our people.  In meantime,

> I'd like to keep it as is since this version has been tested extensively.

> We can change it to SHL 0 later.

>

> > Also I guess the last comment above would better move two lines up?

>

> Changed.

>

> > > @@ -12668,6 +12986,41 @@ md_parse_option (int c, const char *arg)

> > >          as_fatal (_("invalid -mfence-as-lock-add= option: `%s'"), arg);

> > >        break;

> > >

> > > +    case OPTION_MLFENCE_AFTER_LOAD:

> > > +      if (strcasecmp (arg, "yes") == 0)

> > > +     lfence_after_load = 1;

> > > +      else if (strcasecmp (arg, "no") == 0)

> > > +     lfence_after_load = 0;

> > > +      else

> > > +        as_fatal (_("invalid -mlfence-after-load= option: `%s'"), arg);

> > > +      break;

> > > +

> > > +    case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:

> > > +      if (strcasecmp (arg, "all") == 0)

> > > +     lfence_before_indirect_branch = lfence_branch_all;

> >

> > I wonder whether this shouldn't also enable a safe lfence_before_ret

> > mode (i.e. not the OR one), for RET also being an indirect branch. Of

> > course care would need to be taken to avoid clobbering an already set

> > lfence_before_ret mode.

>

> Hongtao, can you look into it?

>

> > > @@ -13254,6 +13616,10 @@ i386_cons_align (int ignore ATTRIBUTE_UNUSED)

> > >        last_insn.kind = last_insn_directive;

> > >        last_insn.name = "constant directive";

> > >        last_insn.file = as_where (&last_insn.line);

> > > +      if (lfence_before_ret != lfence_before_ret_none)

> > > +     as_warn (_("constant directive skips -mlfence-before-ret"));

> > > +      if (lfence_before_indirect_branch != lfence_branch_none)

> > > +     as_warn (_("constant directive skips -mlfence-before-indirect-branch"));

> >

> > Could these be folded into a single warning, to avoid getting overly

> > verbose?

> >

>

> Changed.

>

> This is the patch I am checking in.

>

> --

> H.J.


For other parts, i'm working on fix/change them.

-- 
BR,
Hongtao
Jan Beulich March 25, 2020, 10:03 a.m. | #4
On 25.03.2020 10:27, Hongtao Liu wrote:
> On Thu, Mar 12, 2020 at 12:17 AM H.J. Lu <hjl.tools@gmail.com> wrote:

>> On Wed, Mar 11, 2020 at 3:55 AM Jan Beulich <jbeulich@suse.com> wrote:

>>> On 10.03.2020 17:05, H.J. Lu wrote:

>>>> +  dest = i.operands - 1;

>>>> +

>>>> +  /* Check fake imm8 operand and 3 source operands.  */

>>>> +  if ((i.tm.opcode_modifier.immext

>>>> +       || i.tm.opcode_modifier.vexsources == VEX3SOURCES)

>>>> +      && i.types[dest].bitfield.imm8)

>>>> +    dest--;

>>>> +

>>>> +  /* add, or, adc, sbb, and, sub, xor, cmp, test, xchg, xadd  */

>>>> +  if (!any_vex_p

>>>> +      && (i.tm.base_opcode == 0x0

>>>> +       || i.tm.base_opcode == 0x1

>>>> +       || i.tm.base_opcode == 0x8

>>>> +       || i.tm.base_opcode == 0x9

>>>> +       || i.tm.base_opcode == 0x10

>>>> +       || i.tm.base_opcode == 0x11

>>>> +       || i.tm.base_opcode == 0x18

>>>> +       || i.tm.base_opcode == 0x19

>>>> +       || i.tm.base_opcode == 0x20

>>>> +       || i.tm.base_opcode == 0x21

>>>> +       || i.tm.base_opcode == 0x28

>>>> +       || i.tm.base_opcode == 0x29

>>>> +       || i.tm.base_opcode == 0x30

>>>> +       || i.tm.base_opcode == 0x31

>>>> +       || i.tm.base_opcode == 0x38

>>>> +       || i.tm.base_opcode == 0x39

>>>> +       || (i.tm.base_opcode >= 0x84 && i.tm.base_opcode <= 0x87)

>>>> +       || i.tm.base_opcode == 0xfc0

>>>> +       || i.tm.base_opcode == 0xfc1))

>>>> +    return 1;

>>>

>>> Don't quite a few of these fit very well with ...

>>>

>>

>> Changed.

>>

>>>> +  /* Check for load instruction.  */

>>>> +  return (i.types[dest].bitfield.class != ClassNone

>>>> +       || i.types[dest].bitfield.instance == Accum);

>>>

>>> ... this generic expression? It would seem to me that only TEST

>>> and XCHG need special casing, for allowing either operand order.

>>> Same seems to apply to quite a few of the special cases in the

>>> big "else" block further up, and even its if() [vldmxcsr] part.

>>

>> Hongtao, can you look into it?

>>

> 

> Many instruction take mem operand both as input and output, they also

> need to be handled. But they're not fitting well in generic

> expressions, because they either only 1 operand, or mem operand is in

> the dest place.


Well, my earlier reply wasn't quite precise enough, I think. Aiui
what you're after to exclude are insns only writing their memory
operand. With the pretty long list of excluded opcodes I wonder
whether this can be re-arranged to use a common pattern (memory
operand is destination) and only exclude the few ones which don't
also read this operand. Then again by using a few & and | the list
above could be shrunk significantly, and hence may no longer look
this odd (I notice the committed version has this reduced a little,
but not quite as much as would be possible).

>>>> +  if (lfence_before_ret != lfence_before_ret_none

>>>> +      && (i.tm.base_opcode == 0xc2

>>>> +       || i.tm.base_opcode == 0xc3

>>>> +       || i.tm.base_opcode == 0xca

>>>> +       || i.tm.base_opcode == 0xcb))

>>>> +    {

>>>> +      if (last_insn.kind != last_insn_other

>>>> +       && last_insn.seg == now_seg)

>>>> +     {

>>>> +       as_warn_where (last_insn.file, last_insn.line,

>>>> +                      _("`%s` skips -mlfence-before-ret on `%s`"),

>>>> +                      last_insn.name, i.tm.name);

>>>> +       return;

>>>> +     }

>>>> +      if (lfence_before_ret == lfence_before_ret_or)

>>>> +     {

>>>> +       /* orl: 0x830c2400.  */

>>>> +       p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

>>>> +       if (flag_code == CODE_64BIT)

>>>> +         *p++ = 0x48;

>>>

>>> Shouldn't this depend on RET's operand size? Likewise wouldn't you

>>> also need to insert 0x66/0x67 in certain cases?

>>

>> Hongtao, can you look into it?

> 

> I suppose you mean OR's operand size?


Not exactly - I mean RET's operand size ought to affect the one
chosen for OR.

Jan
Fangrui Song via Binutils March 26, 2020, 2:23 a.m. | #5
On Wed, Mar 25, 2020 at 6:03 PM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 25.03.2020 10:27, Hongtao Liu wrote:

> > On Thu, Mar 12, 2020 at 12:17 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> >> On Wed, Mar 11, 2020 at 3:55 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>> On 10.03.2020 17:05, H.J. Lu wrote:

> >>>> +  dest = i.operands - 1;

> >>>> +

> >>>> +  /* Check fake imm8 operand and 3 source operands.  */

> >>>> +  if ((i.tm.opcode_modifier.immext

> >>>> +       || i.tm.opcode_modifier.vexsources == VEX3SOURCES)

> >>>> +      && i.types[dest].bitfield.imm8)

> >>>> +    dest--;

> >>>> +

> >>>> +  /* add, or, adc, sbb, and, sub, xor, cmp, test, xchg, xadd  */

> >>>> +  if (!any_vex_p

> >>>> +      && (i.tm.base_opcode == 0x0

> >>>> +       || i.tm.base_opcode == 0x1

> >>>> +       || i.tm.base_opcode == 0x8

> >>>> +       || i.tm.base_opcode == 0x9

> >>>> +       || i.tm.base_opcode == 0x10

> >>>> +       || i.tm.base_opcode == 0x11

> >>>> +       || i.tm.base_opcode == 0x18

> >>>> +       || i.tm.base_opcode == 0x19

> >>>> +       || i.tm.base_opcode == 0x20

> >>>> +       || i.tm.base_opcode == 0x21

> >>>> +       || i.tm.base_opcode == 0x28

> >>>> +       || i.tm.base_opcode == 0x29

> >>>> +       || i.tm.base_opcode == 0x30

> >>>> +       || i.tm.base_opcode == 0x31

> >>>> +       || i.tm.base_opcode == 0x38

> >>>> +       || i.tm.base_opcode == 0x39

> >>>> +       || (i.tm.base_opcode >= 0x84 && i.tm.base_opcode <= 0x87)

> >>>> +       || i.tm.base_opcode == 0xfc0

> >>>> +       || i.tm.base_opcode == 0xfc1))

> >>>> +    return 1;

> >>>

> >>> Don't quite a few of these fit very well with ...

> >>>

> >>

> >> Changed.

> >>

> >>>> +  /* Check for load instruction.  */

> >>>> +  return (i.types[dest].bitfield.class != ClassNone

> >>>> +       || i.types[dest].bitfield.instance == Accum);

> >>>

> >>> ... this generic expression? It would seem to me that only TEST

> >>> and XCHG need special casing, for allowing either operand order.

> >>> Same seems to apply to quite a few of the special cases in the

> >>> big "else" block further up, and even its if() [vldmxcsr] part.

> >>

> >> Hongtao, can you look into it?

> >>

> >

> > Many instruction take mem operand both as input and output, they also

> > need to be handled. But they're not fitting well in generic

> > expressions, because they either only 1 operand, or mem operand is in

> > the dest place.

>

> Well, my earlier reply wasn't quite precise enough, I think. Aiui

> what you're after to exclude are insns only writing their memory

> operand. With the pretty long list of excluded opcodes I wonder

> whether this can be re-arranged to use a common pattern (memory

> operand is destination) and only exclude the few ones which don't

> also read this operand. Then again by using a few & and | the list

> above could be shrunk significantly, and hence may no longer look

> this odd (I notice the committed version has this reduced a little,

> but not quite as much as would be possible).

>


Yes, understand.

> >>>> +  if (lfence_before_ret != lfence_before_ret_none

> >>>> +      && (i.tm.base_opcode == 0xc2

> >>>> +       || i.tm.base_opcode == 0xc3

> >>>> +       || i.tm.base_opcode == 0xca

> >>>> +       || i.tm.base_opcode == 0xcb))

> >>>> +    {

> >>>> +      if (last_insn.kind != last_insn_other

> >>>> +       && last_insn.seg == now_seg)

> >>>> +     {

> >>>> +       as_warn_where (last_insn.file, last_insn.line,

> >>>> +                      _("`%s` skips -mlfence-before-ret on `%s`"),

> >>>> +                      last_insn.name, i.tm.name);

> >>>> +       return;

> >>>> +     }

> >>>> +      if (lfence_before_ret == lfence_before_ret_or)

> >>>> +     {

> >>>> +       /* orl: 0x830c2400.  */

> >>>> +       p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

> >>>> +       if (flag_code == CODE_64BIT)

> >>>> +         *p++ = 0x48;

> >>>

> >>> Shouldn't this depend on RET's operand size? Likewise wouldn't you

> >>> also need to insert 0x66/0x67 in certain cases?

> >>

> >> Hongtao, can you look into it?

> >

> > I suppose you mean OR's operand size?

>

> Not exactly - I mean RET's operand size ought to affect the one

> chosen for OR.

>

> Jan


> > I wonder whether this shouldn't also enable a safe lfence_before_ret

> > mode (i.e. not the OR one), for RET also being an indirect branch. Of

> > course care would need to be taken to avoid clobbering an already set

> > lfence_before_ret mode.


Also for this part, maybe i'll add some comments to indicate
-mlfence-before-indirect-branch doesn't include ret. Orelse it would
be weird for user when clobber happens, Is it ok for you?

-- 
BR,
Hongtao
Jan Beulich March 26, 2020, 9:12 a.m. | #6
On 26.03.2020 03:23, Hongtao Liu wrote:
> On Wed, Mar 25, 2020 at 6:03 PM Jan Beulich <jbeulich@suse.com> wrote:

>> On 25.03.2020 10:27, Hongtao Liu wrote:

>>> On Thu, Mar 12, 2020 at 12:17 AM H.J. Lu <hjl.tools@gmail.com> wrote:

>>>> On Wed, Mar 11, 2020 at 3:55 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>> On 10.03.2020 17:05, H.J. Lu wrote:

>>>>>> +  if (lfence_before_ret != lfence_before_ret_none

>>>>>> +      && (i.tm.base_opcode == 0xc2

>>>>>> +       || i.tm.base_opcode == 0xc3

>>>>>> +       || i.tm.base_opcode == 0xca

>>>>>> +       || i.tm.base_opcode == 0xcb))

>>>>>> +    {

>>>>>> +      if (last_insn.kind != last_insn_other

>>>>>> +       && last_insn.seg == now_seg)

>>>>>> +     {

>>>>>> +       as_warn_where (last_insn.file, last_insn.line,

>>>>>> +                      _("`%s` skips -mlfence-before-ret on `%s`"),

>>>>>> +                      last_insn.name, i.tm.name);

>>>>>> +       return;

>>>>>> +     }

>>>>>> +      if (lfence_before_ret == lfence_before_ret_or)

>>>>>> +     {

>>>>>> +       /* orl: 0x830c2400.  */

>>>>>> +       p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

>>>>>> +       if (flag_code == CODE_64BIT)

>>>>>> +         *p++ = 0x48;

>>>>>

>>>>> Shouldn't this depend on RET's operand size? Likewise wouldn't you

>>>>> also need to insert 0x66/0x67 in certain cases?

>>>>

>>>> Hongtao, can you look into it?

>>>

>>> I suppose you mean OR's operand size?

>>

>> Not exactly - I mean RET's operand size ought to affect the one

>> chosen for OR.

>>

>> Jan

> 

>>> I wonder whether this shouldn't also enable a safe lfence_before_ret

>>> mode (i.e. not the OR one), for RET also being an indirect branch. Of

>>> course care would need to be taken to avoid clobbering an already set

>>> lfence_before_ret mode.

> 

> Also for this part, maybe i'll add some comments to indicate

> -mlfence-before-indirect-branch doesn't include ret. Orelse it would

> be weird for user when clobber happens, Is it ok for you?


Well, extending the description / comments to be more precise is one
solution, but only the the 2nd best one. I continue to think that
there would better be an implication as the one suggested.

Jan
Fangrui Song via Binutils April 16, 2020, 5:34 a.m. | #7
On Thu, Mar 26, 2020 at 5:12 PM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 26.03.2020 03:23, Hongtao Liu wrote:

> > On Wed, Mar 25, 2020 at 6:03 PM Jan Beulich <jbeulich@suse.com> wrote:

> >> On 25.03.2020 10:27, Hongtao Liu wrote:

> >>> On Thu, Mar 12, 2020 at 12:17 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> >>>> On Wed, Mar 11, 2020 at 3:55 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>>> On 10.03.2020 17:05, H.J. Lu wrote:

> >>>>>> +  if (lfence_before_ret != lfence_before_ret_none

> >>>>>> +      && (i.tm.base_opcode == 0xc2

> >>>>>> +       || i.tm.base_opcode == 0xc3

> >>>>>> +       || i.tm.base_opcode == 0xca

> >>>>>> +       || i.tm.base_opcode == 0xcb))

> >>>>>> +    {

> >>>>>> +      if (last_insn.kind != last_insn_other

> >>>>>> +       && last_insn.seg == now_seg)

> >>>>>> +     {

> >>>>>> +       as_warn_where (last_insn.file, last_insn.line,

> >>>>>> +                      _("`%s` skips -mlfence-before-ret on `%s`"),

> >>>>>> +                      last_insn.name, i.tm.name);

> >>>>>> +       return;

> >>>>>> +     }

> >>>>>> +      if (lfence_before_ret == lfence_before_ret_or)

> >>>>>> +     {

> >>>>>> +       /* orl: 0x830c2400.  */

> >>>>>> +       p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

> >>>>>> +       if (flag_code == CODE_64BIT)

> >>>>>> +         *p++ = 0x48;

> >>>>>

> >>>>> Shouldn't this depend on RET's operand size? Likewise wouldn't you

> >>>>> also need to insert 0x66/0x67 in certain cases?

> >>>>

> >>>> Hongtao, can you look into it?

> >>>

> >>> I suppose you mean OR's operand size?

> >>

> >> Not exactly - I mean RET's operand size ought to affect the one

> >> chosen for OR.

> >>

> >> Jan

> >

> >>> I wonder whether this shouldn't also enable a safe lfence_before_ret

> >>> mode (i.e. not the OR one), for RET also being an indirect branch. Of

> >>> course care would need to be taken to avoid clobbering an already set

> >>> lfence_before_ret mode.

> >

> > Also for this part, maybe i'll add some comments to indicate

> > -mlfence-before-indirect-branch doesn't include ret. Orelse it would

> > be weird for user when clobber happens, Is it ok for you?

>

> Well, extending the description / comments to be more precise is one

> solution, but only the the 2nd best one. I continue to think that

> there would better be an implication as the one suggested.

>

> Jan


Apologies for the delayed response.
I tried to re-arranged to use a common pattern (memory operand is
destination) and only exclude those which don't also read this
operand. But it turn out there still a lot of such instructions
include all mov instruction, store instruction for i387 and cet,
extract instructions, vgather instructions, vscatter instrcutions,
convert instrcutions and so on, so i didn't re-arrange them.
Other requests are done by the updated patch, also plus handling REP
CMP/SCAS specially since they would set EFLAGS which affects control
flow behavior.

  1. No load for INVPCID, Implict load for POPS/POPF/POPA/XLATB
  2. Add -mlfence-before-ret=shl, adjust operand size of or/not/shl to
  ret's.
  3. Ajust -mlfence-after-load=[yes/no] to
  -mlfence-after-load=[none|general|all]. -mlfence-after-load=[none/all]
  equal original -mlfence-after-load=[no/yes],
  -mlfence-after-load=general won't add lfence after REP CMPS/SCAS
  since they would affect control flow behavior.
  -mlfence-after-load=all will issue an warning when adding lfence
  after REP CMPS/SCAS.
  4. Adjust testcases and documents.

gas/Changelog:
        * config/tc-i386.c (lfence_after_load_kine): New.
        (lfence_before_ret_shl): Change from lfence_before_ret_not.
        (load_insn_p): No load for INVPCID, implict load for
        POPS/POPA/POPF/XLATB.
        (insert_after_load): Insert lfence under
        -mlfence-after-load=[general|all],issue an warning when encounter
        REP CMPS/SCAS.
        (insert_before_before): Replace -mlfence-before-ret=not to
        -mlfence-before-ret=shl.
        (md_parse_option): Adjust -mlfence-after-load=[yes|no] to
        -mlfence-after-load=[none|general|all], Replace
        -mlfence-before-ret=not to -mlfence-before-ret=shl. Enable
        -mlfence-before-ret=shl when
        -mlfence-beofre-indirect-branch=all.
        (md_show_usage): Ditto.
        * doc/c-i386.texi: Ditto.
        * testsuite/gas/i386/i386.exp: Add new testcases.
        * gas/testsuite/gas/i386/lfence-load-b.d: New.
        * gas/testsuite/gas/i386/lfence-load-b.e: New.
        * gas/testsuite/gas/i386/lfence-load.d: Modified.
        * gas/testsuite/gas/i386/lfence-load.e: New.
        * gas/testsuite/gas/i386/lfence-load.s: Modified.
        * gas/testsuite/gas/i386/lfence-ret-a.d: Modified.
        * gas/testsuite/gas/i386/lfence-ret-b.d: Modified.
        * gas/testsuite/gas/i386/lfence-ret-c.d: New.
        * gas/testsuite/gas/i386/lfence-ret-d.d: New.
        * gas/testsuite/gas/i386/lfence-ret.s: Modified
        * gas/testsuite/gas/i386/x86-64-lfence-load-b.d: New.
        * gas/testsuite/gas/i386/x86-64-lfence-load.d: Modified.
        * gas/testsuite/gas/i386/x86-64-lfence-load.s: Modified.
        * gas/testsuite/gas/i386/x86-64-lfence-ret-a.d: Modified.
        * gas/testsuite/gas/i386/x86-64-lfence-ret-b.d: Modified.
        * gas/testsuite/gas/i386/x86-64-lfence-ret-c.d: New.
        * gas/testsuite/gas/i386/x86-64-lfence-ret-d.d: New.


-- 
BR,
Hongtao
From e640c20ec775a2601d6565eb42a38fa79e036190 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>
Date: Mon, 16 Mar 2020 11:03:12 +0800
Subject: [PATCH] Improve -mlfence-after-load

  1. No load for INVPCID, Implict load for POPS/POPF/POPA/XLATB
  2. Add -mlfence-before-ret=shl, adjust operand size of or/not/shl to
  ret's.
  3. Ajust -mlfence-after-load=[yes/no] to
  -mlfence-after-load=[none|general|all]. -mlfence-after-load=[none/all]
  equal original -mlfence-after-load=[no/yes],
  -mlfence-after-load=general won't add lfence after REP CMPS/SCAS
  since they would affect control flow behavior.
  -mlfence-after-load=all will issue an warning when adding lfence
  after REP CMPS/SCAS.
  4. Adjust testcases and documents.

gas/Changelog:
	* config/tc-i386.c (lfence_after_load_kine): New.
	(lfence_before_ret_shl): Change from lfence_before_ret_not.
	(load_insn_p): No load for INVPCID, implict load for
	POPS/POPA/POPF/XLATB.
	(insert_after_load): Insert lfence under
	-mlfence-after-load=[general|all],issue an warning when encounter
	REP CMPS/SCAS.
	(insert_before_before): Replace -mlfence-before-ret=not to
	-mlfence-before-ret=shl.
	(md_parse_option): Adjust -mlfence-after-load=[yes|no] to
	-mlfence-after-load=[none|general|all], Replace
	-mlfence-before-ret=not to -mlfence-before-ret=shl. Enable
	-mlfence-before-ret=shl when
	-mlfence-beofre-indirect-branch=all.
	(md_show_usage): Ditto.
	* doc/c-i386.texi: Ditto.
	* testsuite/gas/i386/i386.exp: Add new testcases.
	* gas/testsuite/gas/i386/lfence-load-b.d: New.
	* gas/testsuite/gas/i386/lfence-load-b.e: New.
	* gas/testsuite/gas/i386/lfence-load.d: Modified.
	* gas/testsuite/gas/i386/lfence-load.e: New.
	* gas/testsuite/gas/i386/lfence-load.s: Modified.
	* gas/testsuite/gas/i386/lfence-ret-a.d: Modified.
	* gas/testsuite/gas/i386/lfence-ret-b.d: Modified.
	* gas/testsuite/gas/i386/lfence-ret-c.d: New.
	* gas/testsuite/gas/i386/lfence-ret-d.d: New.
	* gas/testsuite/gas/i386/lfence-ret.s: Modified
	* gas/testsuite/gas/i386/x86-64-lfence-load-b.d: New.
	* gas/testsuite/gas/i386/x86-64-lfence-load.d: Modified.
	* gas/testsuite/gas/i386/x86-64-lfence-load.s: Modified.
	* gas/testsuite/gas/i386/x86-64-lfence-ret-a.d: Modified.
	* gas/testsuite/gas/i386/x86-64-lfence-ret-b.d: Modified.
	* gas/testsuite/gas/i386/x86-64-lfence-ret-c.d: New.
	* gas/testsuite/gas/i386/x86-64-lfence-ret-d.d: New.
---
 gas/config/tc-i386.c                          | 127 +++++++++++------
 gas/doc/c-i386.texi                           |  29 ++--
 gas/testsuite/gas/i386/i386.exp               |   6 +
 gas/testsuite/gas/i386/lfence-load-b.d        | 129 ++++++++++++++++++
 gas/testsuite/gas/i386/lfence-load-b.e        |   3 +
 gas/testsuite/gas/i386/lfence-load.d          |  22 ++-
 gas/testsuite/gas/i386/lfence-load.e          |   3 +
 gas/testsuite/gas/i386/lfence-load.s          |  12 ++
 gas/testsuite/gas/i386/lfence-ret-a.d         |   6 +
 gas/testsuite/gas/i386/lfence-ret-b.d         |   8 ++
 gas/testsuite/gas/i386/lfence-ret-c.d         |  23 ++++
 gas/testsuite/gas/i386/lfence-ret-d.d         |  24 ++++
 gas/testsuite/gas/i386/lfence-ret.s           |   2 +
 gas/testsuite/gas/i386/x86-64-lfence-load-b.d | 129 ++++++++++++++++++
 gas/testsuite/gas/i386/x86-64-lfence-load.d   |  20 ++-
 gas/testsuite/gas/i386/x86-64-lfence-load.s   |  11 ++
 gas/testsuite/gas/i386/x86-64-lfence-ret-a.d  |   6 +
 gas/testsuite/gas/i386/x86-64-lfence-ret-b.d  |   8 ++
 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d  |  23 ++++
 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d  |  24 ++++
 20 files changed, 561 insertions(+), 54 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/lfence-load-b.d
 create mode 100644 gas/testsuite/gas/i386/lfence-load-b.e
 create mode 100644 gas/testsuite/gas/i386/lfence-load.e
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-d.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-load-b.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 093497becd..ae002132ee 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -629,8 +629,14 @@ static int omit_lock_prefix = 0;
    "lock addl $0, (%{re}sp)".  */
 static int avoid_fence = 0;
 
-/* 1 if lfence should be inserted after every load.  */
-static int lfence_after_load = 0;
+/* Non-zero if lfence shoulde be inserted after load.  */
+static enum lfence_after_load_kind
+  {
+   lfence_load_none = 0,
+   lfence_load_general,
+   lfence_load_all
+  }
+lfence_after_load;
 
 /* Non-zero if lfence should be inserted before indirect branch.  */
 static enum lfence_before_indirect_branch_kind
@@ -647,7 +653,8 @@ static enum lfence_before_ret_kind
   {
     lfence_before_ret_none = 0,
     lfence_before_ret_not,
-    lfence_before_ret_or
+    lfence_before_ret_or,
+    lfence_before_ret_shl
   }
 lfence_before_ret;
 
@@ -4350,21 +4357,28 @@ load_insn_p (void)
 
   if (!any_vex_p)
     {
-      /* lea  */
-      if (i.tm.base_opcode == 0x8d)
+      /* Note: invlpg, invpcid, clflush, clflushopt, prefetchh, prefetchw
+	 could be excluded by the later pattern.  */
+      /* lea, invpcid.  */
+      if (i.tm.base_opcode == 0x8d
+	  || i.tm.base_opcode == 0xf3882)
 	return 0;
 
-      /* pop  */
-      if ((i.tm.base_opcode & ~7) == 0x58
-	  || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))
+      /* pop, popf, popa.   */
+      if (strcmp (i.tm.name, "pop") == 0
+	  || i.tm.base_opcode == 0x9d
+	  || i.tm.base_opcode == 0x61)
 	return 1;
 
       /* movs, cmps, lods, scas.  */
       if ((i.tm.base_opcode | 0xb) == 0xaf)
 	return 1;
 
-      /* outs */
-      if (base_opcode == 0x6f)
+      /* NB: For AMD-specific insns with implicit memory operands,
+	 they're intentionally not covered.
+	 outs, xlatb.  */
+      if (base_opcode == 0x6f
+	  || i.tm.base_opcode == 0xD7)
 	return 1;
     }
 
@@ -4506,6 +4520,22 @@ insert_lfence_after (void)
 {
   if (lfence_after_load && load_insn_p ())
     {
+      /* Insert lfence after rep cmps/scas only under
+	 -mlfence-after-load=all.  */
+      if (((i.tm.base_opcode | 0x1) == 0xa7
+	   || (i.tm.base_opcode | 0x1) == 0xaf)
+	  && i.prefix[REP_PREFIX])
+	{
+	  if (lfence_after_load == lfence_load_general)
+	    {
+	      as_warn (_("`%s` skips -mlfence-after-general=general"),
+		       i.tm.name);
+	      return;
+	    }
+	  else
+	    as_warn (_("`%s` changes flags which would affect control flow behavior"),
+		     i.tm.name);
+	}
       char *p = frag_more (3);
       *p++ = 0xf;
       *p++ = 0xae;
@@ -4536,8 +4566,8 @@ insert_lfence_before (void)
 
       if (i.reg_operands == 1)
 	{
-	  /* Indirect branch via register.  Don't insert lfence with
-	     -mlfence-after-load=yes.  */
+	  /* Indirect branch via register. Insert lfence when
+	     -mlfence-after-load=none.  */
 	  if (lfence_after_load
 	      || lfence_before_indirect_branch == lfence_branch_memory)
 	    return;
@@ -4568,7 +4598,7 @@ insert_lfence_before (void)
       return;
     }
 
-  /* Output or/not and lfence before ret.  */
+  /* Output or/not/shl and lfence before ret.  */
   if (lfence_before_ret != lfence_before_ret_none
       && (i.tm.base_opcode == 0xc2
 	  || i.tm.base_opcode == 0xc3
@@ -4583,33 +4613,47 @@ insert_lfence_before (void)
 			 last_insn.name, i.tm.name);
 	  return;
 	}
-      if (lfence_before_ret == lfence_before_ret_or)
-	{
-	  /* orl: 0x830c2400.  */
-	  p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);
-	  if (flag_code == CODE_64BIT)
-	    *p++ = 0x48;
-	  *p++ = 0x83;
-	  *p++ = 0xc;
-	  *p++ = 0x24;
-	  *p++ = 0x0;
-	}
-      else
+
+      char prefix = i.prefix[DATA_PREFIX] ? 0x66
+	: flag_code == CODE_64BIT ? 0x48 : 0x0;
+
+      if (lfence_before_ret == lfence_before_ret_not)
 	{
-	  p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);
 	  /* notl: 0xf71424.  */
-	  if (flag_code == CODE_64BIT)
-	    *p++ = 0x48;
+	  p = frag_more ((prefix ? 2 : 0) + 6 + 3);
+	  if (prefix)
+	    *p++ = prefix;
 	  *p++ = 0xf7;
 	  *p++ = 0x14;
 	  *p++ = 0x24;
-	  /* notl: 0xf71424.  */
-	  if (flag_code == CODE_64BIT)
-	    *p++ = 0x48;
+	  if (prefix)
+	    *p++ = prefix;
 	  *p++ = 0xf7;
 	  *p++ = 0x14;
 	  *p++ = 0x24;
 	}
+      else
+	{
+	  p = frag_more ((prefix ? 1 : 0) + 4 + 3);
+	  if (prefix)
+	    *p++ = prefix;
+	  if (lfence_before_ret == lfence_before_ret_or)
+	    {
+	      /* orl: 0x830c2400.  */
+	      *p++ = 0x83;
+	      *p++ = 0x0c;
+	    }
+	  else
+	    {
+	      /* shl: 0xc1242400.  */
+	      *p++ = 0xc1;
+	      *p++ = 0x24;
+	    }
+
+	  *p++ = 0x24;
+	  *p++ = 0x0;
+	}
+
       *p++ = 0xf;
       *p++ = 0xae;
       *p = 0xe8;
@@ -12985,17 +13029,22 @@ md_parse_option (int c, const char *arg)
       break;
 
     case OPTION_MLFENCE_AFTER_LOAD:
-      if (strcasecmp (arg, "yes") == 0)
-	lfence_after_load = 1;
-      else if (strcasecmp (arg, "no") == 0)
-	lfence_after_load = 0;
+      if (strcasecmp (arg, "general") == 0)
+	lfence_after_load = lfence_load_general;
+      else if (strcasecmp (arg, "all") == 0)
+	lfence_after_load = lfence_load_all;
+      else if (strcasecmp (arg, "none") == 0)
+	lfence_after_load = lfence_load_none;
       else
         as_fatal (_("invalid -mlfence-after-load= option: `%s'"), arg);
       break;
 
     case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:
       if (strcasecmp (arg, "all") == 0)
-	lfence_before_indirect_branch = lfence_branch_all;
+	{
+	  lfence_before_indirect_branch = lfence_branch_all;
+	  lfence_before_ret = lfence_before_ret_shl;
+	}
       else if (strcasecmp (arg, "memory") == 0)
 	lfence_before_indirect_branch = lfence_branch_memory;
       else if (strcasecmp (arg, "register") == 0)
@@ -13012,6 +13061,8 @@ md_parse_option (int c, const char *arg)
 	lfence_before_ret = lfence_before_ret_or;
       else if (strcasecmp (arg, "not") == 0)
 	lfence_before_ret = lfence_before_ret_not;
+      else if (strcasecmp (arg, "shl") == 0)
+	lfence_before_ret = lfence_before_ret_shl;
       else if (strcasecmp (arg, "none") == 0)
 	lfence_before_ret = lfence_before_ret_none;
       else
@@ -13376,13 +13427,13 @@ md_show_usage (FILE *stream)
   -mbranches-within-32B-boundaries\n\
                           align branches within 32 byte boundary\n"));
   fprintf (stream, _("\
-  -mlfence-after-load=[no|yes] (default: no)\n\
+  -mlfence-after-load=[none|general|all] (default: none)\n\
                           generate lfence after load\n"));
   fprintf (stream, _("\
   -mlfence-before-indirect-branch=[none|all|register|memory] (default: none)\n\
                           generate lfence before indirect near branch\n"));
   fprintf (stream, _("\
-  -mlfence-before-ret=[none|or|not] (default: none)\n\
+  -mlfence-before-ret=[none|or|not|shl] (default: none)\n\
                           generate lfence before ret\n"));
   fprintf (stream, _("\
   -mamd64                 accept only AMD64 ISA [default]\n"));
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 628fb1ad5a..d595d526bb 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -470,12 +470,15 @@ The default doesn't align branches.
 
 @cindex @samp{-mlfence-after-load=} option, i386
 @cindex @samp{-mlfence-after-load=} option, x86-64
-@item -mlfence-after-load=@var{no}
-@itemx -mlfence-after-load=@var{yes}
+@item -mlfence-after-load=@var{none}
+@item -mlfence-after-load=@var{general}
+@itemx -mlfence-after-load=@var{all}
 These options control whether the assembler should generate lfence
-after load instructions.  @option{-mlfence-after-load=@var{yes}} will
-generate lfence.  @option{-mlfence-after-load=@var{no}} will not generate
-lfence, which is the default.
+after load instructions.  @option{-mlfence-after-load=@var{all}} will
+generate lfence for all load instructions,
+@option{-mlfence-after-load=@var{general}}will generate lfence for all
+load instruction except rep cmps/scas, @option{-mlfence-after-load=@var{none}}
+will not generate lfence, which is the default.
 
 @cindex @samp{-mlfence-before-indirect-branch=} option, i386
 @cindex @samp{-mlfence-before-indirect-branch=} option, x86-64
@@ -488,28 +491,30 @@ before indirect near branch instructions.
 @option{-mlfence-before-indirect-branch=@var{all}} will generate lfence
 before indirect near branch via register and issue a warning before
 indirect near branch via memory.
+It also implicitly sets @option{-mlfence-before-ret=@var{shl}}.
 @option{-mlfence-before-indirect-branch=@var{register}} will generate
 lfence before indirect near branch via register.
 @option{-mlfence-before-indirect-branch=@var{memory}} will issue a
 warning before indirect near branch via memory.
 @option{-mlfence-before-indirect-branch=@var{none}} will not generate
-lfence nor issue warning, which is the default.  Note that lfence won't
-be generated before indirect near branch via register with
-@option{-mlfence-after-load=@var{yes}} since lfence will be generated
+lfence nor issue warning, which is the default.  Note that lfence will
+generate before indirect near branch via register only with
+@option{-mlfence-after-load=@var{none}} since lfence will be generated
 after loading branch target register.
 
 @cindex @samp{-mlfence-before-ret=} option, i386
 @cindex @samp{-mlfence-before-ret=} option, x86-64
 @item -mlfence-before-ret=@var{none}
+@item -mlfence-before-ret=@var{shl}
 @item -mlfence-before-ret=@var{or}
 @itemx -mlfence-before-ret=@var{not}
 These options control whether the assembler should generate lfence
 before ret.  @option{-mlfence-before-ret=@var{or}} will generate
 generate or instruction with lfence.
-@option{-mlfence-before-ret=@var{not}} will generate not instruction
-with lfence.
-@option{-mlfence-before-ret=@var{none}} will not generate lfence,
-which is the default.
+@option{-mlfence-before-ret=@var{shl}} will generate shl instruction
+with lfence. @option{-mlfence-before-ret=@var{not}} will generate not
+instruction with lfence. @option{-mlfence-before-ret=@var{none}} will not
+generate lfence, which is the default.
 
 @cindex @samp{-mx86-used-note=} option, i386
 @cindex @samp{-mx86-used-note=} option, x86-64
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 9dacc11906..a2bdb569b7 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -530,11 +530,14 @@ if [expr ([istarget "i*86-*-*"] ||  [istarget "x86_64-*-*"]) && [gas_32_check]]
     run_dump_test "align-branch-8"
     run_dump_test "align-branch-9"
     run_dump_test "lfence-load"
+    run_dump_test "lfence-load-b"
     run_dump_test "lfence-indbr-a"
     run_dump_test "lfence-indbr-b"
     run_dump_test "lfence-indbr-c"
     run_dump_test "lfence-ret-a"
     run_dump_test "lfence-ret-b"
+    run_dump_test "lfence-ret-c"
+    run_dump_test "lfence-ret-d"
     run_dump_test "lfence-byte"
 
     # These tests require support for 8 and 16 bit relocs,
@@ -1117,11 +1120,14 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-align-branch-8"
     run_dump_test "x86-64-align-branch-9"
     run_dump_test "x86-64-lfence-load"
+    run_dump_test "x86-64-lfence-load-b"
     run_dump_test "x86-64-lfence-indbr-a"
     run_dump_test "x86-64-lfence-indbr-b"
     run_dump_test "x86-64-lfence-indbr-c"
     run_dump_test "x86-64-lfence-ret-a"
     run_dump_test "x86-64-lfence-ret-b"
+    run_dump_test "x86-64-lfence-ret-c"
+    run_dump_test "x86-64-lfence-ret-d"
     run_dump_test "x86-64-lfence-byte"
 
     if { ![istarget "*-*-aix*"]
diff --git a/gas/testsuite/gas/i386/lfence-load-b.d b/gas/testsuite/gas/i386/lfence-load-b.d
new file mode 100644
index 0000000000..db78fa762b
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load-b.d
@@ -0,0 +1,129 @@
+#source: lfence-load.s
+#as: -mlfence-after-load=general
+#objdump: -dw
+#warning_output: lfence-load-b.e
+#name: -mlfence-after-load=general
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	c5 f8 ae 55 00       	vldmxcsr 0x0\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	0f 01 55 00          	lgdtl  0x0\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	0f c7 75 00          	vmptrld 0x0\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 0f c7 75 00       	vmclear 0x0\(%ebp\)
+ +[a-f0-9]+:	66 0f 38 82 55 00    	invpcid 0x0\(%ebp\),%edx
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	0f 01 7d 00          	invlpg 0x0\(%ebp\)
+ +[a-f0-9]+:	0f ae 7d 00          	clflush 0x0\(%ebp\)
+ +[a-f0-9]+:	66 0f ae 7d 00       	clflushopt 0x0\(%ebp\)
+ +[a-f0-9]+:	0f 18 4d 00          	prefetcht0 0x0\(%ebp\)
+ +[a-f0-9]+:	0f 18 55 00          	prefetcht1 0x0\(%ebp\)
+ +[a-f0-9]+:	0f 18 5d 00          	prefetcht2 0x0\(%ebp\)
+ +[a-f0-9]+:	0f 0d 4d 00          	prefetchw 0x0\(%ebp\)
+ +[a-f0-9]+:	1f                   	pop    %ds
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	9d                   	popf   
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	61                   	popa   
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	d7                   	xlat   %ds:\(%ebx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	d9 55 00             	fsts   0x0\(%ebp\)
+ +[a-f0-9]+:	d9 45 00             	flds   0x0\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	db 55 00             	fistl  0x0\(%ebp\)
+ +[a-f0-9]+:	df 55 00             	fists  0x0\(%ebp\)
+ +[a-f0-9]+:	db 45 00             	fildl  0x0\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	df 45 00             	filds  0x0\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	9b dd 75 00          	fsave  0x0\(%ebp\)
+ +[a-f0-9]+:	dd 65 00             	frstor 0x0\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	df 45 00             	filds  0x0\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	df 4d 00             	fisttps 0x0\(%ebp\)
+ +[a-f0-9]+:	d9 65 00             	fldenv 0x0\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	9b d9 75 00          	fstenv 0x0\(%ebp\)
+ +[a-f0-9]+:	d8 45 00             	fadds  0x0\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	d8 04 24             	fadds  \(%esp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	d8 c3                	fadd   %st\(3\),%st
+ +[a-f0-9]+:	d8 01                	fadds  \(%ecx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	df 01                	filds  \(%ecx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	df 11                	fists  \(%ecx\)
+ +[a-f0-9]+:	0f ae 29             	xrstor \(%ecx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	0f 18 01             	prefetchnta \(%ecx\)
+ +[a-f0-9]+:	0f c7 09             	cmpxchg8b \(%ecx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	41                   	inc    %ecx
+ +[a-f0-9]+:	0f 01 10             	lgdtl  \(%eax\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	0f 0f 66 02 b0       	pfcmpeq 0x2\(%esi\),%mm4
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	8f 00                	popl   \(%eax\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	58                   	pop    %eax
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 d1 11             	rclw   \(%ecx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f7 01 01 00 00 00    	testl  \$0x1,\(%ecx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	ff 01                	incl   \(%ecx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f7 11                	notl   \(%ecx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f7 31                	divl   \(%ecx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f7 21                	mull   \(%ecx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f7 39                	idivl  \(%ecx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f7 29                	imull  \(%ecx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	8d 04 40             	lea    \(%eax,%eax,2\),%eax
+ +[a-f0-9]+:	c9                   	leave  
+ +[a-f0-9]+:	6e                   	outsb  %ds:\(%esi\),\(%dx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	ac                   	lods   %ds:\(%esi\),%al
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f3 a5                	rep movsl %ds:\(%esi\),%es:\(%edi\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f3 af                	repz scas %es:\(%edi\),%eax
+ +[a-f0-9]+:	f3 a7                	repz cmpsl %es:\(%edi\),%ds:\(%esi\)
+ +[a-f0-9]+:	f3 ad                	rep lods %ds:\(%esi\),%eax
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	83 00 01             	addl   \$0x1,\(%eax\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	0f ba 20 01          	btl    \$0x1,\(%eax\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	0f c1 03             	xadd   %eax,\(%ebx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	0f c1 c3             	xadd   %eax,%ebx
+ +[a-f0-9]+:	87 03                	xchg   %eax,\(%ebx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	93                   	xchg   %eax,%ebx
+ +[a-f0-9]+:	39 45 40             	cmp    %eax,0x40\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	3b 45 40             	cmp    0x40\(%ebp\),%eax
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	01 45 40             	add    %eax,0x40\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	03 00                	add    \(%eax\),%eax
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	85 45 40             	test   %eax,0x40\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	85 45 40             	test   %eax,0x40\(%ebp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-load-b.e b/gas/testsuite/gas/i386/lfence-load-b.e
new file mode 100644
index 0000000000..80626ccd82
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load-b.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:5[78]: Warning: `scas` skips -mlfence-after-general=general
+.*:5[89]: Warning: `cmps` skips -mlfence-after-general=general
\ No newline at end of file
diff --git a/gas/testsuite/gas/i386/lfence-load.d b/gas/testsuite/gas/i386/lfence-load.d
index cd7e7f76df..c13a3ac8cf 100644
--- a/gas/testsuite/gas/i386/lfence-load.d
+++ b/gas/testsuite/gas/i386/lfence-load.d
@@ -1,6 +1,7 @@
-#as: -mlfence-after-load=yes
+#as: -mlfence-after-load=all
 #objdump: -dw
-#name: -mlfence-after-load=yes
+#warning_output: lfence-load.e
+#name: -mlfence-after-load=all
 
 .*: +file format .*
 
@@ -15,6 +16,23 @@ Disassembly of section .text:
  +[a-f0-9]+:	0f c7 75 00          	vmptrld 0x0\(%ebp\)
  +[a-f0-9]+:	0f ae e8             	lfence 
  +[a-f0-9]+:	66 0f c7 75 00       	vmclear 0x0\(%ebp\)
+ +[a-f0-9]+:	66 0f 38 82 55 00    	invpcid 0x0\(%ebp\),%edx
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	0f 01 7d 00          	invlpg 0x0\(%ebp\)
+ +[a-f0-9]+:	0f ae 7d 00          	clflush 0x0\(%ebp\)
+ +[a-f0-9]+:	66 0f ae 7d 00       	clflushopt 0x0\(%ebp\)
+ +[a-f0-9]+:	0f 18 4d 00          	prefetcht0 0x0\(%ebp\)
+ +[a-f0-9]+:	0f 18 55 00          	prefetcht1 0x0\(%ebp\)
+ +[a-f0-9]+:	0f 18 5d 00          	prefetcht2 0x0\(%ebp\)
+ +[a-f0-9]+:	0f 0d 4d 00          	prefetchw 0x0\(%ebp\)
+ +[a-f0-9]+:	1f                   	pop    %ds
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	9d                   	popf   
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	61                   	popa   
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	d7                   	xlat   %ds:\(%ebx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
  +[a-f0-9]+:	d9 55 00             	fsts   0x0\(%ebp\)
  +[a-f0-9]+:	d9 45 00             	flds   0x0\(%ebp\)
  +[a-f0-9]+:	0f ae e8             	lfence 
diff --git a/gas/testsuite/gas/i386/lfence-load.e b/gas/testsuite/gas/i386/lfence-load.e
new file mode 100644
index 0000000000..bf8343ed96
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:58: Warning: `scas` changes flags which would affect control flow behavior
+.*:59: Warning: `cmps` changes flags which would affect control flow behavior
diff --git a/gas/testsuite/gas/i386/lfence-load.s b/gas/testsuite/gas/i386/lfence-load.s
index b417ac644e..5d76e19f4b 100644
--- a/gas/testsuite/gas/i386/lfence-load.s
+++ b/gas/testsuite/gas/i386/lfence-load.s
@@ -4,6 +4,18 @@ _start:
 	lgdt (%ebp)
 	vmptrld (%ebp)
 	vmclear (%ebp)
+	invpcid (%ebp), %edx
+	invlpg (%ebp)
+	clflush (%ebp)
+	clflushopt (%ebp)
+	prefetcht0 (%ebp)
+	prefetcht1 (%ebp)
+	prefetcht2 (%ebp)
+	prefetchw (%ebp)
+	pop %ds
+	popf
+	popa
+	xlatb (%ebx)
 	fsts (%ebp)
 	flds (%ebp)
 	fistl (%ebp)
diff --git a/gas/testsuite/gas/i386/lfence-ret-a.d b/gas/testsuite/gas/i386/lfence-ret-a.d
index 719cf1b472..613d1d50a2 100644
--- a/gas/testsuite/gas/i386/lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/lfence-ret-a.d
@@ -9,6 +9,12 @@
 Disassembly of section .text:
 
 0+ <_start>:
+ +[a-f0-9]+:	66 83 0c 24 00       	orw    \$0x0,\(%esp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c3                	retw   
+ +[a-f0-9]+:	66 83 0c 24 00       	orw    \$0x0,\(%esp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c2 14 00          	retw   \$0x14
  +[a-f0-9]+:	83 0c 24 00          	orl    \$0x0,\(%esp\)
  +[a-f0-9]+:	0f ae e8             	lfence 
  +[a-f0-9]+:	c3                   	ret    
diff --git a/gas/testsuite/gas/i386/lfence-ret-b.d b/gas/testsuite/gas/i386/lfence-ret-b.d
index e3914b9c28..e6dd4f4bf6 100644
--- a/gas/testsuite/gas/i386/lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/lfence-ret-b.d
@@ -9,6 +9,14 @@
 Disassembly of section .text:
 
 0+ <_start>:
+ +[a-f0-9]+:	66 f7 14 24          	notw   \(%esp\)
+ +[a-f0-9]+:	66 f7 14 24          	notw   \(%esp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c3                	retw   
+ +[a-f0-9]+:	66 f7 14 24          	notw   \(%esp\)
+ +[a-f0-9]+:	66 f7 14 24          	notw   \(%esp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c2 14 00          	retw   \$0x14
  +[a-f0-9]+:	f7 14 24             	notl   \(%esp\)
  +[a-f0-9]+:	f7 14 24             	notl   \(%esp\)
  +[a-f0-9]+:	0f ae e8             	lfence 
diff --git a/gas/testsuite/gas/i386/lfence-ret-c.d b/gas/testsuite/gas/i386/lfence-ret-c.d
new file mode 100644
index 0000000000..0227d820ec
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-c.d
@@ -0,0 +1,23 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	66 c1 24 24 00       	shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c3                	retw   
+ +[a-f0-9]+:	66 c1 24 24 00       	shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c2 14 00          	retw   \$0x14
+ +[a-f0-9]+:	c1 24 24 00          	shll   \$0x0,\(%esp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	c3                   	ret    
+ +[a-f0-9]+:	c1 24 24 00          	shll   \$0x0,\(%esp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	c2 1e 00             	ret    \$0x1e
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret-d.d b/gas/testsuite/gas/i386/lfence-ret-d.d
new file mode 100644
index 0000000000..9078216e53
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-d.d
@@ -0,0 +1,24 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=shl
+#objdump: -dw
+#name: -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	66 c1 24 24 00       	shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c3                	retw   
+ +[a-f0-9]+:	66 c1 24 24 00       	shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c2 14 00          	retw   \$0x14
+ +[a-f0-9]+:	c1 24 24 00          	shll   \$0x0,\(%esp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	c3                   	ret    
+ +[a-f0-9]+:	c1 24 24 00          	shll   \$0x0,\(%esp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	c2 1e 00             	ret    \$0x1e
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret.s b/gas/testsuite/gas/i386/lfence-ret.s
index 35c4e6eeaa..5de4f08447 100644
--- a/gas/testsuite/gas/i386/lfence-ret.s
+++ b/gas/testsuite/gas/i386/lfence-ret.s
@@ -1,4 +1,6 @@
 	.text
 _start:
+	retw
+	retw	$20
 	ret
 	ret	$30
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load-b.d b/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
new file mode 100644
index 0000000000..599c5f06f4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
@@ -0,0 +1,129 @@
+#source: x86-64-lfence-load.s
+#as: -mlfence-after-load=general
+#objdump: -dw
+#warning_output: lfence-load-b.e
+#name: x86-64 -mlfence-after-load=general
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	c5 f8 ae 55 00       	vldmxcsr 0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	0f 01 55 00          	lgdt   0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	0f c7 75 00          	vmptrld 0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 0f c7 75 00       	vmclear 0x0\(%rbp\)
+ +[a-f0-9]+:	66 0f 38 82 55 00    	invpcid 0x0\(%rbp\),%rdx
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	67 0f 01 38          	invlpg \(%eax\)
+ +[a-f0-9]+:	0f ae 7d 00          	clflush 0x0\(%rbp\)
+ +[a-f0-9]+:	66 0f ae 7d 00       	clflushopt 0x0\(%rbp\)
+ +[a-f0-9]+:	0f 18 4d 00          	prefetcht0 0x0\(%rbp\)
+ +[a-f0-9]+:	0f 18 55 00          	prefetcht1 0x0\(%rbp\)
+ +[a-f0-9]+:	0f 18 5d 00          	prefetcht2 0x0\(%rbp\)
+ +[a-f0-9]+:	0f 0d 4d 00          	prefetchw 0x0\(%rbp\)
+ +[a-f0-9]+:	0f a1                	popq   %fs
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	9d                   	popfq  
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	d7                   	xlat   %ds:\(%rbx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	d9 55 00             	fsts   0x0\(%rbp\)
+ +[a-f0-9]+:	d9 45 00             	flds   0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	db 55 00             	fistl  0x0\(%rbp\)
+ +[a-f0-9]+:	df 55 00             	fists  0x0\(%rbp\)
+ +[a-f0-9]+:	db 45 00             	fildl  0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	df 45 00             	filds  0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	9b dd 75 00          	fsave  0x0\(%rbp\)
+ +[a-f0-9]+:	dd 65 00             	frstor 0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	df 45 00             	filds  0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	df 4d 00             	fisttps 0x0\(%rbp\)
+ +[a-f0-9]+:	d9 65 00             	fldenv 0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	9b d9 75 00          	fstenv 0x0\(%rbp\)
+ +[a-f0-9]+:	d8 45 00             	fadds  0x0\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	d8 04 24             	fadds  \(%rsp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	d8 c3                	fadd   %st\(3\),%st
+ +[a-f0-9]+:	d8 01                	fadds  \(%rcx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	df 01                	filds  \(%rcx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	df 11                	fists  \(%rcx\)
+ +[a-f0-9]+:	0f ae 29             	xrstor \(%rcx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	0f 18 01             	prefetchnta \(%rcx\)
+ +[a-f0-9]+:	0f c7 09             	cmpxchg8b \(%rcx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	48 0f c7 09          	cmpxchg16b \(%rcx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	ff c1                	inc    %ecx
+ +[a-f0-9]+:	0f 01 10             	lgdt   \(%rax\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	0f 0f 66 02 b0       	pfcmpeq 0x2\(%rsi\),%mm4
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	8f 00                	popq   \(%rax\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	58                   	pop    %rax
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 d1 11             	rclw   \(%rcx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f7 01 01 00 00 00    	testl  \$0x1,\(%rcx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	ff 01                	incl   \(%rcx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f7 11                	notl   \(%rcx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f7 31                	divl   \(%rcx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f7 21                	mull   \(%rcx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f7 39                	idivl  \(%rcx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f7 29                	imull  \(%rcx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	48 8d 04 40          	lea    \(%rax,%rax,2\),%rax
+ +[a-f0-9]+:	c9                   	leaveq 
+ +[a-f0-9]+:	6e                   	outsb  %ds:\(%rsi\),\(%dx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	ac                   	lods   %ds:\(%rsi\),%al
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f3 a5                	rep movsl %ds:\(%rsi\),%es:\(%rdi\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	f3 af                	repz scas %es:\(%rdi\),%eax
+ +[a-f0-9]+:	f3 a7                	repz cmpsl %es:\(%rdi\),%ds:\(%rsi\)
+ +[a-f0-9]+:	f3 ad                	rep lods %ds:\(%rsi\),%eax
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	41 83 03 01          	addl   \$0x1,\(%r11\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	41 0f ba 23 01       	btl    \$0x1,\(%r11\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	48 0f c1 03          	xadd   %rax,\(%rbx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	48 0f c1 c3          	xadd   %rax,%rbx
+ +[a-f0-9]+:	48 87 03             	xchg   %rax,\(%rbx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	48 93                	xchg   %rax,%rbx
+ +[a-f0-9]+:	48 39 45 40          	cmp    %rax,0x40\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	48 3b 45 40          	cmp    0x40\(%rbp\),%rax
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	48 01 45 40          	add    %rax,0x40\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	48 03 00             	add    \(%rax\),%rax
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	48 85 45 40          	test   %rax,0x40\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	48 85 45 40          	test   %rax,0x40\(%rbp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.d b/gas/testsuite/gas/i386/x86-64-lfence-load.d
index 4f6cd00edf..ad623595ad 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
@@ -1,6 +1,7 @@
-#as: -mlfence-after-load=yes
+#as: -mlfence-after-load=all
 #objdump: -dw
-#name: x86-64 -mlfence-after-load=yes
+#warning_output: lfence-load.e
+#name: x86-64 -mlfence-after-load=all
 
 .*: +file format .*
 
@@ -15,6 +16,21 @@ Disassembly of section .text:
  +[a-f0-9]+:	0f c7 75 00          	vmptrld 0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence 
  +[a-f0-9]+:	66 0f c7 75 00       	vmclear 0x0\(%rbp\)
+ +[a-f0-9]+:	66 0f 38 82 55 00    	invpcid 0x0\(%rbp\),%rdx
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	67 0f 01 38          	invlpg \(%eax\)
+ +[a-f0-9]+:	0f ae 7d 00          	clflush 0x0\(%rbp\)
+ +[a-f0-9]+:	66 0f ae 7d 00       	clflushopt 0x0\(%rbp\)
+ +[a-f0-9]+:	0f 18 4d 00          	prefetcht0 0x0\(%rbp\)
+ +[a-f0-9]+:	0f 18 55 00          	prefetcht1 0x0\(%rbp\)
+ +[a-f0-9]+:	0f 18 5d 00          	prefetcht2 0x0\(%rbp\)
+ +[a-f0-9]+:	0f 0d 4d 00          	prefetchw 0x0\(%rbp\)
+ +[a-f0-9]+:	0f a1                	popq   %fs
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	9d                   	popfq  
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	d7                   	xlat   %ds:\(%rbx\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
  +[a-f0-9]+:	d9 55 00             	fsts   0x0\(%rbp\)
  +[a-f0-9]+:	d9 45 00             	flds   0x0\(%rbp\)
  +[a-f0-9]+:	0f ae e8             	lfence 
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.s b/gas/testsuite/gas/i386/x86-64-lfence-load.s
index 76d0886617..d88d213301 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.s
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s
@@ -4,6 +4,17 @@ _start:
 	lgdt (%rbp)
 	vmptrld (%rbp)
 	vmclear (%rbp)
+	invpcid (%rbp), %rdx
+	invlpg (%eax)
+	clflush (%rbp)
+	clflushopt (%rbp)
+	prefetcht0 (%rbp)
+	prefetcht1 (%rbp)
+	prefetcht2 (%rbp)
+	prefetchw (%rbp)
+	pop %fs
+	popf
+	xlatb (%rbx)
 	fsts (%rbp)
 	flds (%rbp)
 	fistl (%rbp)
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
index 26e5b48bec..43343a9a44 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
@@ -9,6 +9,12 @@
 Disassembly of section .text:
 
 0+ <_start>:
+ +[a-f0-9]+:	66 83 0c 24 00       	orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c3                	retw   
+ +[a-f0-9]+:	66 83 0c 24 00       	orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c2 14 00          	retw   \$0x14
  +[a-f0-9]+:	48 83 0c 24 00       	orq    \$0x0,\(%rsp\)
  +[a-f0-9]+:	0f ae e8             	lfence 
  +[a-f0-9]+:	c3                   	retq   
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
index 340488831d..6c34affdc0 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
@@ -9,6 +9,14 @@
 Disassembly of section .text:
 
 0+ <_start>:
+ +[a-f0-9]+:	66 f7 14 24          	notw   \(%rsp\)
+ +[a-f0-9]+:	66 f7 14 24          	notw   \(%rsp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c3                	retw   
+ +[a-f0-9]+:	66 f7 14 24          	notw   \(%rsp\)
+ +[a-f0-9]+:	66 f7 14 24          	notw   \(%rsp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c2 14 00          	retw   \$0x14
  +[a-f0-9]+:	48 f7 14 24          	notq   \(%rsp\)
  +[a-f0-9]+:	48 f7 14 24          	notq   \(%rsp\)
  +[a-f0-9]+:	0f ae e8             	lfence 
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
new file mode 100644
index 0000000000..10c6189cf6
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
@@ -0,0 +1,23 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	66 c1 24 24 00       	shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c3                	retw   
+ +[a-f0-9]+:	66 c1 24 24 00       	shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c2 14 00          	retw   \$0x14
+ +[a-f0-9]+:	48 c1 24 24 00       	shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	c3                   	retq   
+ +[a-f0-9]+:	48 c1 24 24 00       	shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	c2 1e 00             	retq   \$0x1e
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
new file mode 100644
index 0000000000..6c39b5d747
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
@@ -0,0 +1,24 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=shl
+#objdump: -dw
+#name: x86-64 -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	66 c1 24 24 00       	shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c3                	retw   
+ +[a-f0-9]+:	66 c1 24 24 00       	shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	66 c2 14 00          	retw   \$0x14
+ +[a-f0-9]+:	48 c1 24 24 00       	shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	c3                   	retq   
+ +[a-f0-9]+:	48 c1 24 24 00       	shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+:	0f ae e8             	lfence 
+ +[a-f0-9]+:	c2 1e 00             	retq   \$0x1e
+#pass
Jan Beulich April 16, 2020, 8:33 a.m. | #8
On 16.04.2020 07:34, Hongtao Liu wrote:
> I tried to re-arranged to use a common pattern (memory operand is

> destination) and only exclude those which don't also read this

> operand. But it turn out there still a lot of such instructions

> include all mov instruction, store instruction for i387 and cet,

> extract instructions, vgather instructions, vscatter instrcutions,

> convert instrcutions and so on, so i didn't re-arrange them.

> Other requests are done by the updated patch, also plus handling REP

> CMP/SCAS specially since they would set EFLAGS which affects control

> flow behavior.

> 

>   1. No load for INVPCID, Implict load for POPS/POPF/POPA/XLATB


Why INVPCID? Whether it accesses its memory operand depends on
the value in the register operand. And what's POPS?

>   2. Add -mlfence-before-ret=shl, adjust operand size of or/not/shl to

>   ret's.

>   3. Ajust -mlfence-after-load=[yes/no] to

>   -mlfence-after-load=[none|general|all]. -mlfence-after-load=[none/all]

>   equal original -mlfence-after-load=[no/yes],


While there wasn't any official release with the prior option forms
yet, I'm not sure it is a good idea to disallow the old forms
altogether now; they may need deprecating but still permitting
instead.

>   -mlfence-after-load=general won't add lfence after REP CMPS/SCAS

>   since they would affect control flow behavior.

>   -mlfence-after-load=all will issue an warning when adding lfence

>   after REP CMPS/SCAS.


I also think the various independent behavioral changes here would
better be split into separate patches (e.g. at least one patch per
numbered item in your enumeration above).

>   4. Adjust testcases and documents.

> 

> gas/Changelog:

>         * config/tc-i386.c (lfence_after_load_kine): New.

>         (lfence_before_ret_shl): Change from lfence_before_ret_not.

>         (load_insn_p): No load for INVPCID, implict load for

>         POPS/POPA/POPF/XLATB.

>         (insert_after_load): Insert lfence under

>         -mlfence-after-load=[general|all],issue an warning when encounter

>         REP CMPS/SCAS.

>         (insert_before_before): Replace -mlfence-before-ret=not to

>         -mlfence-before-ret=shl.

>         (md_parse_option): Adjust -mlfence-after-load=[yes|no] to

>         -mlfence-after-load=[none|general|all], Replace

>         -mlfence-before-ret=not to -mlfence-before-ret=shl. Enable

>         -mlfence-before-ret=shl when

>         -mlfence-beofre-indirect-branch=all.

>         (md_show_usage): Ditto.

>         * doc/c-i386.texi: Ditto.

>         * testsuite/gas/i386/i386.exp: Add new testcases.

>         * gas/testsuite/gas/i386/lfence-load-b.d: New.

>         * gas/testsuite/gas/i386/lfence-load-b.e: New.

>         * gas/testsuite/gas/i386/lfence-load.d: Modified.

>         * gas/testsuite/gas/i386/lfence-load.e: New.

>         * gas/testsuite/gas/i386/lfence-load.s: Modified.

>         * gas/testsuite/gas/i386/lfence-ret-a.d: Modified.

>         * gas/testsuite/gas/i386/lfence-ret-b.d: Modified.

>         * gas/testsuite/gas/i386/lfence-ret-c.d: New.

>         * gas/testsuite/gas/i386/lfence-ret-d.d: New.

>         * gas/testsuite/gas/i386/lfence-ret.s: Modified

>         * gas/testsuite/gas/i386/x86-64-lfence-load-b.d: New.

>         * gas/testsuite/gas/i386/x86-64-lfence-load.d: Modified.

>         * gas/testsuite/gas/i386/x86-64-lfence-load.s: Modified.

>         * gas/testsuite/gas/i386/x86-64-lfence-ret-a.d: Modified.

>         * gas/testsuite/gas/i386/x86-64-lfence-ret-b.d: Modified.

>         * gas/testsuite/gas/i386/x86-64-lfence-ret-c.d: New.

>         * gas/testsuite/gas/i386/x86-64-lfence-ret-d.d: New.


There's a stray leading gas/ on the last so many lines above.

Also could you please send patches inline, unless they're too
big to be permitted by list restrictions? Commenting on an
attachment is quite a bit more cumbersome. Anyway, I'll try to.

>-/* 1 if lfence should be inserted after every load.  */

>-static int lfence_after_load = 0;

>+/* Non-zero if lfence shoulde be inserted after load.  */


Please try to avoid breaking correct spelling ("should"). I
also think the comment should briefly explain the difference
between lfence_load_general and lfence_load_all, even if
this may seem redundant with the command line option doc.

>@@ -4350,21 +4357,28 @@ load_insn_p (void)

> 

>   if (!any_vex_p)

>     {

>-      /* lea  */

>-      if (i.tm.base_opcode == 0x8d)

>+      /* Note: invlpg, invpcid, clflush, clflushopt, prefetchh, prefetchw

>+	 could be excluded by the later pattern.  */

>+      /* lea, invpcid.  */

>+      if (i.tm.base_opcode == 0x8d

>+	  || i.tm.base_opcode == 0xf3882)


The first comment mentions INVPCID, but the second does, too,
which is not logical. Also what about CLDEMOTE or CLWB, just
to name a few examples not listed? Instead of relying on
later patterns, could you perhaps bail for all AnySize insns
here?

>-      /* pop  */

>-      if ((i.tm.base_opcode & ~7) == 0x58

>-	  || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))

>+      /* pop, popf, popa.   */

>+      if (strcmp (i.tm.name, "pop") == 0

>+	  || i.tm.base_opcode == 0x9d

>+	  || i.tm.base_opcode == 0x61)


Personally I'd recommend against string matching, and even
more so against a mixture of it and opcode matching. But I'm
not the maintainer of this code.

>-      /* outs */

>-      if (base_opcode == 0x6f)

>+      /* NB: For AMD-specific insns with implicit memory operands,

>+	 they're intentionally not covered.

>+	 outs, xlatb.  */

>+      if (base_opcode == 0x6f

>+	  || i.tm.base_opcode == 0xD7)

> 	return 1;


I'd like to request consistency in choice of case in numeric
(hex) constant. I'd also think the AMD part of the comment
would better go after this if()+return.

While RET/LRET get handled specially anyway, what about e.g.
IRET which also loads data from memory?

>@@ -4506,6 +4520,22 @@ insert_lfence_after (void)

> {

>   if (lfence_after_load && load_insn_p ())

>     {

>+      /* Insert lfence after rep cmps/scas only under

>+	 -mlfence-after-load=all.  */

>+      if (((i.tm.base_opcode | 0x1) == 0xa7

>+	   || (i.tm.base_opcode | 0x1) == 0xaf)

>+	  && i.prefix[REP_PREFIX])


I'm afraid I don't understand why the REP forms need treating
differently from the non-REP ones of the same insns.

>+	{

>+	  if (lfence_after_load == lfence_load_general)

>+	    {

>+	      as_warn (_("`%s` skips -mlfence-after-general=general"),


Mis-spelled option name?

>@@ -4583,33 +4613,47 @@ insert_lfence_before (void)

> 			 last_insn.name, i.tm.name);

> 	  return;

> 	}

>-      if (lfence_before_ret == lfence_before_ret_or)

>-	{

>-	  /* orl: 0x830c2400.  */

>-	  p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

>-	  if (flag_code == CODE_64BIT)

>-	    *p++ = 0x48;

>-	  *p++ = 0x83;

>-	  *p++ = 0xc;

>-	  *p++ = 0x24;

>-	  *p++ = 0x0;

>-	}

>-      else

>+

>+      char prefix = i.prefix[DATA_PREFIX] ? 0x66

>+	: flag_code == CODE_64BIT ? 0x48 : 0x0;


Is this correct when the RET _also_ has an explicitly specified
REX.W prefix? Also indentation looks somewhat odd on the last
line of this block.

>+

>+      if (lfence_before_ret == lfence_before_ret_not)

> 	{

>-	  p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);

> 	  /* notl: 0xf71424.  */


Comments like this one are no longer precise: The l suffix is
generally wrong for 64-bit code, and would also be wrong if
there was an operand size override on the RET.

>     case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:

>       if (strcasecmp (arg, "all") == 0)

>-	lfence_before_indirect_branch = lfence_branch_all;

>+	{

>+	  lfence_before_indirect_branch = lfence_branch_all;

>+	  lfence_before_ret = lfence_before_ret_shl;

>+	}


I don't think this should override an earlier explicit
-mlfence-before-ret= (i.e. in particular the order the two
options would be specified in should imo not matter).

>@@ -13012,6 +13061,8 @@ md_parse_option (int c, const char *arg)

> 	lfence_before_ret = lfence_before_ret_or;

>       else if (strcasecmp (arg, "not") == 0)

> 	lfence_before_ret = lfence_before_ret_not;

>+      else if (strcasecmp (arg, "shl") == 0)

>+	lfence_before_ret = lfence_before_ret_shl;

>       else if (strcasecmp (arg, "none") == 0)

> 	lfence_before_ret = lfence_before_ret_none;

>       else


With the SHL variant being truly benign (except for the
performance impact of course), would it make sense to also
allow for a simple "=yes" form now?

Jan
Fangrui Song via Binutils April 20, 2020, 7:20 a.m. | #9
On Thu, Apr 16, 2020 at 4:33 PM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 16.04.2020 07:34, Hongtao Liu wrote:

> > I tried to re-arranged to use a common pattern (memory operand is

> > destination) and only exclude those which don't also read this

> > operand. But it turn out there still a lot of such instructions

> > include all mov instruction, store instruction for i387 and cet,

> > extract instructions, vgather instructions, vscatter instrcutions,

> > convert instrcutions and so on, so i didn't re-arrange them.

> > Other requests are done by the updated patch, also plus handling REP

> > CMP/SCAS specially since they would set EFLAGS which affects control

> > flow behavior.

> >

> >   1. No load for INVPCID, Implict load for POPS/POPF/POPA/XLATB

>

> Why INVPCID? Whether it accesses its memory operand depends on

> the value in the register operand. And what's POPS?

>


Changed for INVPCID, POPS means POP for segment registers, i'll change
it to avoid misunderstanding.

> >   2. Add -mlfence-before-ret=shl, adjust operand size of or/not/shl to

> >   ret's.

> >   3. Ajust -mlfence-after-load=[yes/no] to

> >   -mlfence-after-load=[none|general|all]. -mlfence-after-load=[none/all]

> >   equal original -mlfence-after-load=[no/yes],

>

> While there wasn't any official release with the prior option forms

> yet, I'm not sure it is a good idea to disallow the old forms

> altogether now; they may need deprecating but still permitting

> instead.

>


I prefer to change it before next release.

> >   -mlfence-after-load=general won't add lfence after REP CMPS/SCAS

> >   since they would affect control flow behavior.

> >   -mlfence-after-load=all will issue an warning when adding lfence

> >   after REP CMPS/SCAS.

>

> I also think the various independent behavioral changes here would

> better be split into separate patches (e.g. at least one patch per

> numbered item in your enumeration above).

>


REP CMPS/SCAS is special cases for -mlfence-after-load, maybe better
in same thread.

> >   4. Adjust testcases and documents.

> >

> > gas/Changelog:

> >         * config/tc-i386.c (lfence_after_load_kine): New.

> >         (lfence_before_ret_shl): Change from lfence_before_ret_not.

> >         (load_insn_p): No load for INVPCID, implict load for

> >         POPS/POPA/POPF/XLATB.

> >         (insert_after_load): Insert lfence under

> >         -mlfence-after-load=[general|all],issue an warning when encounter

> >         REP CMPS/SCAS.

> >         (insert_before_before): Replace -mlfence-before-ret=not to

> >         -mlfence-before-ret=shl.

> >         (md_parse_option): Adjust -mlfence-after-load=[yes|no] to

> >         -mlfence-after-load=[none|general|all], Replace

> >         -mlfence-before-ret=not to -mlfence-before-ret=shl. Enable

> >         -mlfence-before-ret=shl when

> >         -mlfence-beofre-indirect-branch=all.

> >         (md_show_usage): Ditto.

> >         * doc/c-i386.texi: Ditto.

> >         * testsuite/gas/i386/i386.exp: Add new testcases.

> >         * gas/testsuite/gas/i386/lfence-load-b.d: New.

> >         * gas/testsuite/gas/i386/lfence-load-b.e: New.

> >         * gas/testsuite/gas/i386/lfence-load.d: Modified.

> >         * gas/testsuite/gas/i386/lfence-load.e: New.

> >         * gas/testsuite/gas/i386/lfence-load.s: Modified.

> >         * gas/testsuite/gas/i386/lfence-ret-a.d: Modified.

> >         * gas/testsuite/gas/i386/lfence-ret-b.d: Modified.

> >         * gas/testsuite/gas/i386/lfence-ret-c.d: New.

> >         * gas/testsuite/gas/i386/lfence-ret-d.d: New.

> >         * gas/testsuite/gas/i386/lfence-ret.s: Modified

> >         * gas/testsuite/gas/i386/x86-64-lfence-load-b.d: New.

> >         * gas/testsuite/gas/i386/x86-64-lfence-load.d: Modified.

> >         * gas/testsuite/gas/i386/x86-64-lfence-load.s: Modified.

> >         * gas/testsuite/gas/i386/x86-64-lfence-ret-a.d: Modified.

> >         * gas/testsuite/gas/i386/x86-64-lfence-ret-b.d: Modified.

> >         * gas/testsuite/gas/i386/x86-64-lfence-ret-c.d: New.

> >         * gas/testsuite/gas/i386/x86-64-lfence-ret-d.d: New.

>

> There's a stray leading gas/ on the last so many lines above.

>


Changed.


> Also could you please send patches inline, unless they're too

> big to be permitted by list restrictions? Commenting on an

> attachment is quite a bit more cumbersome. Anyway, I'll try to.

>

> >-/* 1 if lfence should be inserted after every load.  */

> >-static int lfence_after_load = 0;

> >+/* Non-zero if lfence shoulde be inserted after load.  */

>

> Please try to avoid breaking correct spelling ("should"). I

> also think the comment should briefly explain the difference

> between lfence_load_general and lfence_load_all, even if

> this may seem redundant with the command line option doc.

>


Changed.

> >@@ -4350,21 +4357,28 @@ load_insn_p (void)

> >

> >   if (!any_vex_p)

> >     {

> >-      /* lea  */

> >-      if (i.tm.base_opcode == 0x8d)

> >+      /* Note: invlpg, invpcid, clflush, clflushopt, prefetchh, prefetchw

> >+       could be excluded by the later pattern.  */

> >+      /* lea, invpcid.  */

> >+      if (i.tm.base_opcode == 0x8d

> >+        || i.tm.base_opcode == 0xf3882)

>

> The first comment mentions INVPCID, but the second does, too,

> which is not logical.


Changed

>Also what about CLDEMOTE or CLWB, just

> to name a few examples not listed? Instead of relying on

> later patterns, could you perhaps bail for all AnySize insns

> here?

>


Changed.

> >-      /* pop  */

> >-      if ((i.tm.base_opcode & ~7) == 0x58

> >-        || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))

> >+      /* pop, popf, popa.   */

> >+      if (strcmp (i.tm.name, "pop") == 0

> >+        || i.tm.base_opcode == 0x9d

> >+        || i.tm.base_opcode == 0x61)

>

> Personally I'd recommend against string matching, and even

> more so against a mixture of it and opcode matching. But I'm

> not the maintainer of this code.

>

> >-      /* outs */

> >-      if (base_opcode == 0x6f)

> >+      /* NB: For AMD-specific insns with implicit memory operands,

> >+       they're intentionally not covered.

> >+       outs, xlatb.  */

> >+      if (base_opcode == 0x6f

> >+        || i.tm.base_opcode == 0xD7)

> >       return 1;

>

> I'd like to request consistency in choice of case in numeric

> (hex) constant. I'd also think the AMD part of the comment

> would better go after this if()+return.

>


Changed.

> While RET/LRET get handled specially anyway, what about e.g.

> IRET which also loads data from memory?

>


Adding IRET.

> >@@ -4506,6 +4520,22 @@ insert_lfence_after (void)

> > {

> >   if (lfence_after_load && load_insn_p ())

> >     {

> >+      /* Insert lfence after rep cmps/scas only under

> >+       -mlfence-after-load=all.  */

> >+      if (((i.tm.base_opcode | 0x1) == 0xa7

> >+         || (i.tm.base_opcode | 0x1) == 0xaf)

> >+        && i.prefix[REP_PREFIX])

>

> I'm afraid I don't understand why the REP forms need treating

> differently from the non-REP ones of the same insns.

>


Not all REP forms, just REP CMPS/SCAS which would change EFLAGS.

> >+      {

> >+        if (lfence_after_load == lfence_load_general)

> >+          {

> >+            as_warn (_("`%s` skips -mlfence-after-general=general"),

>

> Mis-spelled option name?

>


Sor, changed.

> >@@ -4583,33 +4613,47 @@ insert_lfence_before (void)

> >                        last_insn.name, i.tm.name);

> >         return;

> >       }

> >-      if (lfence_before_ret == lfence_before_ret_or)

> >-      {

> >-        /* orl: 0x830c2400.  */

> >-        p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

> >-        if (flag_code == CODE_64BIT)

> >-          *p++ = 0x48;

> >-        *p++ = 0x83;

> >-        *p++ = 0xc;

> >-        *p++ = 0x24;

> >-        *p++ = 0x0;

> >-      }

> >-      else

> >+

> >+      char prefix = i.prefix[DATA_PREFIX] ? 0x66

> >+      : flag_code == CODE_64BIT ? 0x48 : 0x0;

>

> Is this correct when the RET _also_ has an explicitly specified

> REX.W prefix? Also indentation looks somewhat odd on the last

> line of this block.

>


I think yes.

> >+

> >+      if (lfence_before_ret == lfence_before_ret_not)

> >       {

> >-        p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);

> >         /* notl: 0xf71424.  */

>

> Comments like this one are no longer precise: The l suffix is

> generally wrong for 64-bit code, and would also be wrong if

> there was an operand size override on the RET.

>


Yes, add comments for prefix rewrite.

> >     case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:

> >       if (strcasecmp (arg, "all") == 0)

> >-      lfence_before_indirect_branch = lfence_branch_all;

> >+      {

> >+        lfence_before_indirect_branch = lfence_branch_all;

> >+        lfence_before_ret = lfence_before_ret_shl;

> >+      }

>

> I don't think this should override an earlier explicit

> -mlfence-before-ret= (i.e. in particular the order the two

> options would be specified in should imo not matter).

>


Changed.

> >@@ -13012,6 +13061,8 @@ md_parse_option (int c, const char *arg)

> >       lfence_before_ret = lfence_before_ret_or;

> >       else if (strcasecmp (arg, "not") == 0)

> >       lfence_before_ret = lfence_before_ret_not;

> >+      else if (strcasecmp (arg, "shl") == 0)

> >+      lfence_before_ret = lfence_before_ret_shl;

> >       else if (strcasecmp (arg, "none") == 0)

> >       lfence_before_ret = lfence_before_ret_none;

> >       else

>

> With the SHL variant being truly benign (except for the

> performance impact of course), would it make sense to also

> allow for a simple "=yes" form now?


Do you means add -mlfence-before-ret=yes which indicates
-mlfence-before-ret=shl?
>

> Jan


Update my patch:

From 9038b3e2689019bb41351c1a6f426e3d0926c651 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>

Date: Mon, 16 Mar 2020 11:03:12 +0800
Subject: [PATCH] Improve -mlfence-after-load

  1.Implict load for POP/POPF/POPA/XLATB and Anysize insns
  2. Add -mlfence-before-ret=shl, adjust operand size of or/not/shl to
  ret's.
  3. Ajust -mlfence-after-load=[yes/no] to
  -mlfence-after-load=[none|general|all]. -mlfence-after-load=[none/all]
  equal original -mlfence-after-load=[no/yes],
  -mlfence-after-load=general won't add lfence after REP CMPS/SCAS
  since they would affect control flow behavior.
  -mlfence-after-load=all will issue an warning when adding lfence
  after REP CMPS/SCAS.
  4. Adjust testcases and documents.

gas/Changelog:
        * config/tc-i386.c (lfence_after_load_kind): New.
        (lfence_before_ret_shl): Change from lfence_before_ret_not.
        (load_insn_p): implict load for POP/POPA/POPF/XLATB and
        Anysize insns.
        (insert_after_load): Insert lfence under
        -mlfence-after-load=[general|all],issue an warning when encounter
        REP CMPS/SCAS.
        (insert_before_before): Replace -mlfence-before-ret=not to
        -mlfence-before-ret=shl.
        (md_parse_option): Adjust -mlfence-after-load=[yes|no] to
        -mlfence-after-load=[none|general|all], Replace
        -mlfence-before-ret=not to -mlfence-before-ret=shl. Enable
        -mlfence-before-ret=shl when
        -mlfence-beofre-indirect-branch=all.
        (md_show_usage): Ditto.
        * doc/c-i386.texi: Ditto.
        * testsuite/gas/i386/i386.exp: Add new testcases.
        * testsuite/gas/i386/lfence-load-b.d: New.
        * testsuite/gas/i386/lfence-load-b.e: New.
        * testsuite/gas/i386/lfence-load.d: Modified.
        * testsuite/gas/i386/lfence-load.e: New.
        * testsuite/gas/i386/lfence-load.s: Modified.
        * testsuite/gas/i386/lfence-ret-a.d: Modified.
        * testsuite/gas/i386/lfence-ret-b.d: Modified.
        * testsuite/gas/i386/lfence-ret-c.d: New.
        * testsuite/gas/i386/lfence-ret-d.d: New.
        * testsuite/gas/i386/lfence-ret.s: Modified
        * testsuite/gas/i386/x86-64-lfence-load-b.d: New.
        * testsuite/gas/i386/x86-64-lfence-load.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-load.s: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-a.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-b.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-c.d: New.
        * testsuite/gas/i386/x86-64-lfence-ret-d.d: New.
---
 gas/config/tc-i386.c                          | 138 +++++++++++++-----
 gas/doc/c-i386.texi                           |  30 ++--
 gas/testsuite/gas/i386/i386.exp               |   6 +
 gas/testsuite/gas/i386/lfence-load-b.d        | 137 +++++++++++++++++
 gas/testsuite/gas/i386/lfence-load-b.e        |   3 +
 gas/testsuite/gas/i386/lfence-load.d          |  30 +++-
 gas/testsuite/gas/i386/lfence-load.e          |   3 +
 gas/testsuite/gas/i386/lfence-load.s          |  20 +++
 gas/testsuite/gas/i386/lfence-ret-a.d         |   6 +
 gas/testsuite/gas/i386/lfence-ret-b.d         |   8 +
 gas/testsuite/gas/i386/lfence-ret-c.d         |  23 +++
 gas/testsuite/gas/i386/lfence-ret-d.d         |  24 +++
 gas/testsuite/gas/i386/lfence-ret.s           |   2 +
 gas/testsuite/gas/i386/x86-64-lfence-load-b.d | 137 +++++++++++++++++
 gas/testsuite/gas/i386/x86-64-lfence-load.d   |  28 +++-
 gas/testsuite/gas/i386/x86-64-lfence-load.s   |  19 +++
 gas/testsuite/gas/i386/x86-64-lfence-ret-a.d  |   6 +
 gas/testsuite/gas/i386/x86-64-lfence-ret-b.d  |   8 +
 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d  |  23 +++
 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d  |  24 +++
 20 files changed, 619 insertions(+), 56 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/lfence-load-b.d
 create mode 100644 gas/testsuite/gas/i386/lfence-load-b.e
 create mode 100644 gas/testsuite/gas/i386/lfence-load.e
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-d.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-load-b.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 093497becd..5243569362 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -629,8 +629,17 @@ static int omit_lock_prefix = 0;
    "lock addl $0, (%{re}sp)".  */
 static int avoid_fence = 0;

-/* 1 if lfence should be inserted after every load.  */
-static int lfence_after_load = 0;
+/* Non-zero if lfence should be inserted after load.
+   lfence_load_all will generate lfence for all load instructions,
+   lfence_load_general will generate lfence for all
+   load instruction except REP CMPS/SCAS.  */
+static enum lfence_after_load_kind
+  {
+   lfence_load_none = 0,
+   lfence_load_general,
+   lfence_load_all
+  }
+lfence_after_load;

 /* Non-zero if lfence should be inserted before indirect branch.  */
 static enum lfence_before_indirect_branch_kind
@@ -647,7 +656,8 @@ static enum lfence_before_ret_kind
   {
     lfence_before_ret_none = 0,
     lfence_before_ret_not,
-    lfence_before_ret_or
+    lfence_before_ret_or,
+    lfence_before_ret_shl
   }
 lfence_before_ret;

@@ -4350,22 +4360,28 @@ load_insn_p (void)

   if (!any_vex_p)
     {
-      /* lea  */
-      if (i.tm.base_opcode == 0x8d)
+      /* Anysize insns: lea, invlpg, clflush, prefetchnta, prefetcht0,
+         prefetcht1, prefetcht2, prefetchtw, bndmk, bndcl, bndcu, bndcn,
+         bndstx, bndldx, prefetchwt1, clflushopt, clwb, cldemote.  */
+      if (i.tm.opcode_modifier.anysize)
         return 0;

-      /* pop  */
-      if ((i.tm.base_opcode & ~7) == 0x58
-          || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))
+      /* pop, popf, popa.   */
+      if (strcmp (i.tm.name, "pop") == 0
+          || i.tm.base_opcode == 0x9d
+          || i.tm.base_opcode == 0x61)
         return 1;

       /* movs, cmps, lods, scas.  */
       if ((i.tm.base_opcode | 0xb) == 0xaf)
         return 1;

-      /* outs */
-      if (base_opcode == 0x6f)
+      /* outs, xlatb.  */
+      if (base_opcode == 0x6f
+          || i.tm.base_opcode == 0xd7)
         return 1;
+      /* NB: For AMD-specific insns with implicit memory operands,
+         they're intentionally not covered.  */
     }

   /* No memory operand.  */
@@ -4506,6 +4522,22 @@ insert_lfence_after (void)
 {
   if (lfence_after_load && load_insn_p ())
     {
+      /* Insert lfence after rep cmps/scas only under
+         -mlfence-after-load=all.  */
+      if (((i.tm.base_opcode | 0x1) == 0xa7
+           || (i.tm.base_opcode | 0x1) == 0xaf)
+          && i.prefix[REP_PREFIX])
+        {
+          if (lfence_after_load == lfence_load_general)
+            {
+              as_warn (_("`%s` skips -mlfence-after-load=general"),
+                       i.tm.name);
+              return;
+            }
+          else
+            as_warn (_("`%s` changes flags which would affect control
flow behavior"),
+                     i.tm.name);
+        }
       char *p = frag_more (3);
       *p++ = 0xf;
       *p++ = 0xae;
@@ -4536,8 +4568,8 @@ insert_lfence_before (void)

       if (i.reg_operands == 1)
         {
-          /* Indirect branch via register.  Don't insert lfence with
-             -mlfence-after-load=yes.  */
+          /* Indirect branch via register. Insert lfence when
+             -mlfence-after-load=none.  */
           if (lfence_after_load
               || lfence_before_indirect_branch == lfence_branch_memory)
             return;
@@ -4568,12 +4600,13 @@ insert_lfence_before (void)
       return;
     }

-  /* Output or/not and lfence before ret.  */
+  /* Output or/not/shl and lfence before ret/lret/iret.  */
   if (lfence_before_ret != lfence_before_ret_none
       && (i.tm.base_opcode == 0xc2
           || i.tm.base_opcode == 0xc3
           || i.tm.base_opcode == 0xca
-          || i.tm.base_opcode == 0xcb))
+          || i.tm.base_opcode == 0xcb
+          || i.tm.base_opcode == 0xcf))
     {
       if (last_insn.kind != last_insn_other
           && last_insn.seg == now_seg)
@@ -4583,33 +4616,50 @@ insert_lfence_before (void)
                          last_insn.name, i.tm.name);
           return;
         }
-      if (lfence_before_ret == lfence_before_ret_or)
-        {
-          /* orl: 0x830c2400.  */
-          p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
-          *p++ = 0x83;
-          *p++ = 0xc;
-          *p++ = 0x24;
-          *p++ = 0x0;
-        }
-      else
+
+      char prefix = i.prefix[DATA_PREFIX]
+        ? 0x66 : flag_code == CODE_64BIT ? 0x48 : 0x0;
+
+      if (lfence_before_ret == lfence_before_ret_not)
         {
-          p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);
-          /* notl: 0xf71424.  */
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
+          /* notl: 0xf71424, may add prefix
+             for operand size overwrite or 64-bit code.  */
+          p = frag_more ((prefix ? 2 : 0) + 6 + 3);
+          if (prefix)
+            *p++ = prefix;
           *p++ = 0xf7;
           *p++ = 0x14;
           *p++ = 0x24;
-          /* notl: 0xf71424.  */
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
+          if (prefix)
+            *p++ = prefix;
           *p++ = 0xf7;
           *p++ = 0x14;
           *p++ = 0x24;
         }
+      else
+        {
+          p = frag_more ((prefix ? 1 : 0) + 4 + 3);
+          if (prefix)
+            *p++ = prefix;
+          if (lfence_before_ret == lfence_before_ret_or)
+            {
+              /* orl: 0x830c2400, may add prefix
+                 for operand size overwrite or 64-bit code.  */
+              *p++ = 0x83;
+              *p++ = 0x0c;
+            }
+          else
+            {
+              /* shl: 0xc1242400, may add prefix
+                 for operand size overwrite or 64-bit code.  */
+              *p++ = 0xc1;
+              *p++ = 0x24;
+            }
+
+          *p++ = 0x24;
+          *p++ = 0x0;
+        }
+
       *p++ = 0xf;
       *p++ = 0xae;
       *p = 0xe8;
@@ -12985,17 +13035,23 @@ md_parse_option (int c, const char *arg)
       break;

     case OPTION_MLFENCE_AFTER_LOAD:
-      if (strcasecmp (arg, "yes") == 0)
-        lfence_after_load = 1;
-      else if (strcasecmp (arg, "no") == 0)
-        lfence_after_load = 0;
+      if (strcasecmp (arg, "general") == 0)
+        lfence_after_load = lfence_load_general;
+      else if (strcasecmp (arg, "all") == 0)
+        lfence_after_load = lfence_load_all;
+      else if (strcasecmp (arg, "none") == 0)
+        lfence_after_load = lfence_load_none;
       else
         as_fatal (_("invalid -mlfence-after-load= option: `%s'"), arg);
       break;

     case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:
       if (strcasecmp (arg, "all") == 0)
-        lfence_before_indirect_branch = lfence_branch_all;
+        {
+          lfence_before_indirect_branch = lfence_branch_all;
+          if (lfence_before_ret == lfence_before_ret_none)
+            lfence_before_ret = lfence_before_ret_shl;
+        }
       else if (strcasecmp (arg, "memory") == 0)
         lfence_before_indirect_branch = lfence_branch_memory;
       else if (strcasecmp (arg, "register") == 0)
@@ -13012,6 +13068,8 @@ md_parse_option (int c, const char *arg)
         lfence_before_ret = lfence_before_ret_or;
       else if (strcasecmp (arg, "not") == 0)
         lfence_before_ret = lfence_before_ret_not;
+      else if (strcasecmp (arg, "shl") == 0)
+        lfence_before_ret = lfence_before_ret_shl;
       else if (strcasecmp (arg, "none") == 0)
         lfence_before_ret = lfence_before_ret_none;
       else
@@ -13376,13 +13434,13 @@ md_show_usage (FILE *stream)
   -mbranches-within-32B-boundaries\n\
                           align branches within 32 byte boundary\n"));
   fprintf (stream, _("\
-  -mlfence-after-load=[no|yes] (default: no)\n\
+  -mlfence-after-load=[none|general|all] (default: none)\n\
                           generate lfence after load\n"));
   fprintf (stream, _("\
   -mlfence-before-indirect-branch=[none|all|register|memory] (default: none)\n\
                           generate lfence before indirect near branch\n"));
   fprintf (stream, _("\
-  -mlfence-before-ret=[none|or|not] (default: none)\n\
+  -mlfence-before-ret=[none|or|not|shl] (default: none)\n\
                           generate lfence before ret\n"));
   fprintf (stream, _("\
   -mamd64                 accept only AMD64 ISA [default]\n"));
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 628fb1ad5a..b8192ff3ea 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -470,12 +470,15 @@ The default doesn't align branches.

 @cindex @samp{-mlfence-after-load=} option, i386
 @cindex @samp{-mlfence-after-load=} option, x86-64
-@item -mlfence-after-load=@var{no}
-@itemx -mlfence-after-load=@var{yes}
+@item -mlfence-after-load=@var{none}
+@item -mlfence-after-load=@var{general}
+@itemx -mlfence-after-load=@var{all}
 These options control whether the assembler should generate lfence
-after load instructions.  @option{-mlfence-after-load=@var{yes}} will
-generate lfence.  @option{-mlfence-after-load=@var{no}} will not generate
-lfence, which is the default.
+after load instructions.  @option{-mlfence-after-load=@var{all}} will
+generate lfence for all load instructions,
+@option{-mlfence-after-load=@var{general}}will generate lfence for all
+load instruction except rep cmps/scas, @option{-mlfence-after-load=@var{none}}
+will not generate lfence, which is the default.

 @cindex @samp{-mlfence-before-indirect-branch=} option, i386
 @cindex @samp{-mlfence-before-indirect-branch=} option, x86-64
@@ -488,28 +491,31 @@ before indirect near branch instructions.
 @option{-mlfence-before-indirect-branch=@var{all}} will generate lfence
 before indirect near branch via register and issue a warning before
 indirect near branch via memory.
+It also implicitly sets @option{-mlfence-before-ret=@var{shl}} when
+there's no explict @option{-mlfence-before-ret=}.
 @option{-mlfence-before-indirect-branch=@var{register}} will generate
 lfence before indirect near branch via register.
 @option{-mlfence-before-indirect-branch=@var{memory}} will issue a
 warning before indirect near branch via memory.
 @option{-mlfence-before-indirect-branch=@var{none}} will not generate
-lfence nor issue warning, which is the default.  Note that lfence won't
-be generated before indirect near branch via register with
-@option{-mlfence-after-load=@var{yes}} since lfence will be generated
+lfence nor issue warning, which is the default.  Note that lfence will
+generate before indirect near branch via register only with
+@option{-mlfence-after-load=@var{none}} since lfence will be generated
 after loading branch target register.

 @cindex @samp{-mlfence-before-ret=} option, i386
 @cindex @samp{-mlfence-before-ret=} option, x86-64
 @item -mlfence-before-ret=@var{none}
+@item -mlfence-before-ret=@var{shl}
 @item -mlfence-before-ret=@var{or}
 @itemx -mlfence-before-ret=@var{not}
 These options control whether the assembler should generate lfence
 before ret.  @option{-mlfence-before-ret=@var{or}} will generate
 generate or instruction with lfence.
-@option{-mlfence-before-ret=@var{not}} will generate not instruction
-with lfence.
-@option{-mlfence-before-ret=@var{none}} will not generate lfence,
-which is the default.
+@option{-mlfence-before-ret=@var{shl}} will generate shl instruction
+with lfence. @option{-mlfence-before-ret=@var{not}} will generate not
+instruction with lfence. @option{-mlfence-before-ret=@var{none}} will not
+generate lfence, which is the default.

 @cindex @samp{-mx86-used-note=} option, i386
 @cindex @samp{-mx86-used-note=} option, x86-64
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 9dacc11906..a2bdb569b7 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -530,11 +530,14 @@ if [expr ([istarget "i*86-*-*"] ||  [istarget
"x86_64-*-*"]) && [gas_32_check]]
     run_dump_test "align-branch-8"
     run_dump_test "align-branch-9"
     run_dump_test "lfence-load"
+    run_dump_test "lfence-load-b"
     run_dump_test "lfence-indbr-a"
     run_dump_test "lfence-indbr-b"
     run_dump_test "lfence-indbr-c"
     run_dump_test "lfence-ret-a"
     run_dump_test "lfence-ret-b"
+    run_dump_test "lfence-ret-c"
+    run_dump_test "lfence-ret-d"
     run_dump_test "lfence-byte"

     # These tests require support for 8 and 16 bit relocs,
@@ -1117,11 +1120,14 @@ if [expr ([istarget "i*86-*-*"] || [istarget
"x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-align-branch-8"
     run_dump_test "x86-64-align-branch-9"
     run_dump_test "x86-64-lfence-load"
+    run_dump_test "x86-64-lfence-load-b"
     run_dump_test "x86-64-lfence-indbr-a"
     run_dump_test "x86-64-lfence-indbr-b"
     run_dump_test "x86-64-lfence-indbr-c"
     run_dump_test "x86-64-lfence-ret-a"
     run_dump_test "x86-64-lfence-ret-b"
+    run_dump_test "x86-64-lfence-ret-c"
+    run_dump_test "x86-64-lfence-ret-d"
     run_dump_test "x86-64-lfence-byte"

     if { ![istarget "*-*-aix*"]
diff --git a/gas/testsuite/gas/i386/lfence-load-b.d
b/gas/testsuite/gas/i386/lfence-load-b.d
new file mode 100644
index 0000000000..b4f7bc0f19
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load-b.d
@@ -0,0 +1,137 @@
+#source: lfence-load.s
+#as: -mlfence-after-load=general
+#objdump: -dw
+#warning_output: lfence-load-b.e
+#name: lfence-load-b
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: c5 f8 ae 55 00        vldmxcsr 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 55 00          lgdtl  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%ebp\),%edx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 7d 00          invlpg 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%ebp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%ebp\)
+ +[a-f0-9]+: 1f                    pop    %ds
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popf
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 61                    popa
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d9 55 00              fsts   0x0\(%ebp\)
+ +[a-f0-9]+: d9 45 00              flds   0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: db 55 00              fistl  0x0\(%ebp\)
+ +[a-f0-9]+: df 55 00              fists  0x0\(%ebp\)
+ +[a-f0-9]+: db 45 00              fildl  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b dd 75 00          fsave  0x0\(%ebp\)
+ +[a-f0-9]+: dd 65 00              frstor 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 4d 00              fisttps 0x0\(%ebp\)
+ +[a-f0-9]+: d9 65 00              fldenv 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b d9 75 00          fstenv 0x0\(%ebp\)
+ +[a-f0-9]+: d8 45 00              fadds  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 04 24              fadds  \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 c3                fadd   %st\(3\),%st
+ +[a-f0-9]+: d8 01                fadds  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 01                filds  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 11                fists  \(%ecx\)
+ +[a-f0-9]+: 0f ae 29              xrstor \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 18 01              prefetchnta \(%ecx\)
+ +[a-f0-9]+: 0f c7 09              cmpxchg8b \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 41                    inc    %ecx
+ +[a-f0-9]+: 0f 01 10              lgdtl  \(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 0f 66 02 b0        pfcmpeq 0x2\(%esi\),%mm4
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 8f 00                popl   \(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 58                    pop    %eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 d1 11              rclw   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 01 01 00 00 00    testl  \$0x1,\(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ff 01                incl   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 11                notl   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 31                divl   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 21                mull   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 39                idivl  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 29                imull  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 8d 04 40              lea    \(%eax,%eax,2\),%eax
+ +[a-f0-9]+: c9                    leave
+ +[a-f0-9]+: 6e                    outsb  %ds:\(%esi\),\(%dx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ac                    lods   %ds:\(%esi\),%al
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 a5                rep movsl %ds:\(%esi\),%es:\(%edi\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 af                repz scas %es:\(%edi\),%eax
+ +[a-f0-9]+: f3 a7                repz cmpsl %es:\(%edi\),%ds:\(%esi\)
+ +[a-f0-9]+: f3 ad                rep lods %ds:\(%esi\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 83 00 01              addl   \$0x1,\(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f ba 20 01          btl    \$0x1,\(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c1 03              xadd   %eax,\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c1 c3              xadd   %eax,%ebx
+ +[a-f0-9]+: 87 03                xchg   %eax,\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 93                    xchg   %eax,%ebx
+ +[a-f0-9]+: 39 45 40              cmp    %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 3b 45 40              cmp    0x40\(%ebp\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 01 45 40              add    %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 03 00                add    \(%eax\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 85 45 40              test   %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 85 45 40              test   %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-load-b.e
b/gas/testsuite/gas/i386/lfence-load-b.e
new file mode 100644
index 0000000000..c394e02296
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load-b.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:??: Warning: `scas` skips -mlfence-after-load=general
+.*:??: Warning: `cmps` skips -mlfence-after-load=general
\ No newline at end of file
diff --git a/gas/testsuite/gas/i386/lfence-load.d
b/gas/testsuite/gas/i386/lfence-load.d
index cd7e7f76df..273e302f38 100644
--- a/gas/testsuite/gas/i386/lfence-load.d
+++ b/gas/testsuite/gas/i386/lfence-load.d
@@ -1,6 +1,7 @@
-#as: -mlfence-after-load=yes
+#as: -mlfence-after-load=all
 #objdump: -dw
-#name: -mlfence-after-load=yes
+#warning_output: lfence-load.e
+#name: -mlfence-after-load=all

 .*: +file format .*

@@ -15,6 +16,31 @@ Disassembly of section .text:
  +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%ebp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%ebp\),%edx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 7d 00          invlpg 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%ebp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%ebp\)
+ +[a-f0-9]+: 1f                    pop    %ds
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popf
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 61                    popa
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: d9 55 00              fsts   0x0\(%ebp\)
  +[a-f0-9]+: d9 45 00              flds   0x0\(%ebp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/lfence-load.e
b/gas/testsuite/gas/i386/lfence-load.e
new file mode 100644
index 0000000000..1ee49da7fd
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:??: Warning: `scas` changes flags which would affect control flow behavior
+.*:??: Warning: `cmps` changes flags which would affect control flow behavior
diff --git a/gas/testsuite/gas/i386/lfence-load.s
b/gas/testsuite/gas/i386/lfence-load.s
index b417ac644e..4b4aa1610b 100644
--- a/gas/testsuite/gas/i386/lfence-load.s
+++ b/gas/testsuite/gas/i386/lfence-load.s
@@ -4,6 +4,26 @@ _start:
  lgdt (%ebp)
  vmptrld (%ebp)
  vmclear (%ebp)
+ invpcid (%ebp), %edx
+ invlpg (%ebp)
+ clflush (%ebp)
+ clflushopt (%ebp)
+ clwb (%ebp)
+ cldemote (%ebp)
+ bndmk (%ebp), %bnd1
+ bndcl (%ebp), %bnd1
+ bndcu (%ebp), %bnd1
+ bndcn (%ebp), %bnd1
+ bndstx %bnd1, (%ebp)
+ bndldx (%ebp), %bnd1
+ prefetcht0 (%ebp)
+ prefetcht1 (%ebp)
+ prefetcht2 (%ebp)
+ prefetchw (%ebp)
+ pop %ds
+ popf
+ popa
+ xlatb (%ebx)
  fsts (%ebp)
  flds (%ebp)
  fistl (%ebp)
diff --git a/gas/testsuite/gas/i386/lfence-ret-a.d
b/gas/testsuite/gas/i386/lfence-ret-a.d
index 719cf1b472..613d1d50a2 100644
--- a/gas/testsuite/gas/i386/lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/lfence-ret-a.d
@@ -9,6 +9,12 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c3                    ret
diff --git a/gas/testsuite/gas/i386/lfence-ret-b.d
b/gas/testsuite/gas/i386/lfence-ret-b.d
index e3914b9c28..e6dd4f4bf6 100644
--- a/gas/testsuite/gas/i386/lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/lfence-ret-b.d
@@ -9,6 +9,14 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/lfence-ret-c.d
b/gas/testsuite/gas/i386/lfence-ret-c.d
new file mode 100644
index 0000000000..58f7e0a706
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-c.d
@@ -0,0 +1,23 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    ret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret-d.d
b/gas/testsuite/gas/i386/lfence-ret-d.d
new file mode 100644
index 0000000000..9078216e53
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-d.d
@@ -0,0 +1,24 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=shl
+#objdump: -dw
+#name: -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    ret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret.s
b/gas/testsuite/gas/i386/lfence-ret.s
index 35c4e6eeaa..5de4f08447 100644
--- a/gas/testsuite/gas/i386/lfence-ret.s
+++ b/gas/testsuite/gas/i386/lfence-ret.s
@@ -1,4 +1,6 @@
  .text
 _start:
+ retw
+ retw $20
  ret
  ret $30
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
b/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
new file mode 100644
index 0000000000..b1fd3cad42
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
@@ -0,0 +1,137 @@
+#source: x86-64-lfence-load.s
+#as: -mlfence-after-load=general
+#objdump: -dw
+#warning_output: lfence-load-b.e
+#name: x86-64 lfence-load-b
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: c5 f8 ae 55 00        vldmxcsr 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 55 00          lgdt   0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%rbp\),%rdx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 67 0f 01 38          invlpg \(%eax\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%rbp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%rbp\)
+ +[a-f0-9]+: 0f a1                popq   %fs
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popfq
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d9 55 00              fsts   0x0\(%rbp\)
+ +[a-f0-9]+: d9 45 00              flds   0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: db 55 00              fistl  0x0\(%rbp\)
+ +[a-f0-9]+: df 55 00              fists  0x0\(%rbp\)
+ +[a-f0-9]+: db 45 00              fildl  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b dd 75 00          fsave  0x0\(%rbp\)
+ +[a-f0-9]+: dd 65 00              frstor 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 4d 00              fisttps 0x0\(%rbp\)
+ +[a-f0-9]+: d9 65 00              fldenv 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b d9 75 00          fstenv 0x0\(%rbp\)
+ +[a-f0-9]+: d8 45 00              fadds  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 04 24              fadds  \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 c3                fadd   %st\(3\),%st
+ +[a-f0-9]+: d8 01                fadds  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 01                filds  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 11                fists  \(%rcx\)
+ +[a-f0-9]+: 0f ae 29              xrstor \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 18 01              prefetchnta \(%rcx\)
+ +[a-f0-9]+: 0f c7 09              cmpxchg8b \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 0f c7 09          cmpxchg16b \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ff c1                inc    %ecx
+ +[a-f0-9]+: 0f 01 10              lgdt   \(%rax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 0f 66 02 b0        pfcmpeq 0x2\(%rsi\),%mm4
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 8f 00                popq   \(%rax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 58                    pop    %rax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 d1 11              rclw   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 01 01 00 00 00    testl  \$0x1,\(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ff 01                incl   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 11                notl   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 31                divl   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 21                mull   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 39                idivl  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 29                imull  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 8d 04 40          lea    \(%rax,%rax,2\),%rax
+ +[a-f0-9]+: c9                    leaveq
+ +[a-f0-9]+: 6e                    outsb  %ds:\(%rsi\),\(%dx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ac                    lods   %ds:\(%rsi\),%al
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 a5                rep movsl %ds:\(%rsi\),%es:\(%rdi\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 af                repz scas %es:\(%rdi\),%eax
+ +[a-f0-9]+: f3 a7                repz cmpsl %es:\(%rdi\),%ds:\(%rsi\)
+ +[a-f0-9]+: f3 ad                rep lods %ds:\(%rsi\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 41 83 03 01          addl   \$0x1,\(%r11\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 41 0f ba 23 01        btl    \$0x1,\(%r11\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 0f c1 03          xadd   %rax,\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 0f c1 c3          xadd   %rax,%rbx
+ +[a-f0-9]+: 48 87 03              xchg   %rax,\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 93                xchg   %rax,%rbx
+ +[a-f0-9]+: 48 39 45 40          cmp    %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 3b 45 40          cmp    0x40\(%rbp\),%rax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 01 45 40          add    %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 03 00              add    \(%rax\),%rax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 85 45 40          test   %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 85 45 40          test   %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.d
b/gas/testsuite/gas/i386/x86-64-lfence-load.d
index 4f6cd00edf..f21aba85d5 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
@@ -1,6 +1,7 @@
-#as: -mlfence-after-load=yes
+#as: -mlfence-after-load=all
 #objdump: -dw
-#name: x86-64 -mlfence-after-load=yes
+#warning_output: lfence-load.e
+#name: x86-64 -mlfence-after-load=all

 .*: +file format .*

@@ -15,6 +16,29 @@ Disassembly of section .text:
  +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%rbp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%rbp\),%rdx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 67 0f 01 38          invlpg \(%eax\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%rbp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%rbp\)
+ +[a-f0-9]+: 0f a1                popq   %fs
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popfq
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: d9 55 00              fsts   0x0\(%rbp\)
  +[a-f0-9]+: d9 45 00              flds   0x0\(%rbp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.s
b/gas/testsuite/gas/i386/x86-64-lfence-load.s
index 76d0886617..2a3ac6b7d2 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.s
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s
@@ -4,6 +4,25 @@ _start:
  lgdt (%rbp)
  vmptrld (%rbp)
  vmclear (%rbp)
+ invpcid (%rbp), %rdx
+ invlpg (%eax)
+ clflush (%rbp)
+ clflushopt (%rbp)
+ clwb (%rbp)
+ cldemote (%rbp)
+ bndmk (%rbp), %bnd1
+ bndcl (%rbp), %bnd1
+ bndcu (%rbp), %bnd1
+ bndcn (%rbp), %bnd1
+ bndstx %bnd1, (%rbp)
+ bndldx (%rbp), %bnd1
+ prefetcht0 (%rbp)
+ prefetcht1 (%rbp)
+ prefetcht2 (%rbp)
+ prefetchw (%rbp)
+ pop %fs
+ popf
+ xlatb (%rbx)
  fsts (%rbp)
  flds (%rbp)
  fistl (%rbp)
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
index 26e5b48bec..43343a9a44 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
@@ -9,6 +9,12 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c3                    retq
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
index 340488831d..6c34affdc0 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
@@ -9,6 +9,14 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
new file mode 100644
index 0000000000..435d342a28
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
@@ -0,0 +1,23 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
new file mode 100644
index 0000000000..6c39b5d747
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
@@ -0,0 +1,24 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=shl
+#objdump: -dw
+#name: x86-64 -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+#pass
-- 
2.18.1


--
BR,
Hongtao
Jan Beulich April 20, 2020, 7:34 a.m. | #10
On 20.04.2020 09:20, Hongtao Liu wrote:
> On Thu, Apr 16, 2020 at 4:33 PM Jan Beulich <jbeulich@suse.com> wrote:

>> On 16.04.2020 07:34, Hongtao Liu wrote:

>>> @@ -4506,6 +4520,22 @@ insert_lfence_after (void)

>>> {

>>>   if (lfence_after_load && load_insn_p ())

>>>     {

>>> +      /* Insert lfence after rep cmps/scas only under

>>> +       -mlfence-after-load=all.  */

>>> +      if (((i.tm.base_opcode | 0x1) == 0xa7

>>> +         || (i.tm.base_opcode | 0x1) == 0xaf)

>>> +        && i.prefix[REP_PREFIX])

>>

>> I'm afraid I don't understand why the REP forms need treating

>> differently from the non-REP ones of the same insns.

>>

> 

> Not all REP forms, just REP CMPS/SCAS which would change EFLAGS.


Well, of course just the two. But this doesn't answer my question
as to why there is such a special case.

>>> @@ -4583,33 +4613,47 @@ insert_lfence_before (void)

>>>                        last_insn.name, i.tm.name);

>>>         return;

>>>       }

>>> -      if (lfence_before_ret == lfence_before_ret_or)

>>> -      {

>>> -        /* orl: 0x830c2400.  */

>>> -        p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

>>> -        if (flag_code == CODE_64BIT)

>>> -          *p++ = 0x48;

>>> -        *p++ = 0x83;

>>> -        *p++ = 0xc;

>>> -        *p++ = 0x24;

>>> -        *p++ = 0x0;

>>> -      }

>>> -      else

>>> +

>>> +      char prefix = i.prefix[DATA_PREFIX] ? 0x66

>>> +      : flag_code == CODE_64BIT ? 0x48 : 0x0;

>>

>> Is this correct when the RET _also_ has an explicitly specified

>> REX.W prefix? Also indentation looks somewhat odd on the last

>> line of this block.

> 

> I think yes.


Please explain yourself. 66 48 C3 accesses 8 bytes of memory
after all, whereas you'd generate a 2-byte access ahead of the
LFENCE afaics.

>>> @@ -13012,6 +13061,8 @@ md_parse_option (int c, const char *arg)

>>>       lfence_before_ret = lfence_before_ret_or;

>>>       else if (strcasecmp (arg, "not") == 0)

>>>       lfence_before_ret = lfence_before_ret_not;

>>> +      else if (strcasecmp (arg, "shl") == 0)

>>> +      lfence_before_ret = lfence_before_ret_shl;

>>>       else if (strcasecmp (arg, "none") == 0)

>>>       lfence_before_ret = lfence_before_ret_none;

>>>       else

>>

>> With the SHL variant being truly benign (except for the

>> performance impact of course), would it make sense to also

>> allow for a simple "=yes" form now?

> 

> Do you means add -mlfence-before-ret=yes which indicates

> -mlfence-before-ret=shl?


Yes.

> Update my patch:


I've not looked at this just yet - the issues above should be
resolved first (one way or another).

Jan
Fangrui Song via Binutils April 21, 2020, 2:24 a.m. | #11
On Mon, Apr 20, 2020 at 3:34 PM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 20.04.2020 09:20, Hongtao Liu wrote:

> > On Thu, Apr 16, 2020 at 4:33 PM Jan Beulich <jbeulich@suse.com> wrote:

> >> On 16.04.2020 07:34, Hongtao Liu wrote:

> >>> @@ -4506,6 +4520,22 @@ insert_lfence_after (void)

> >>> {

> >>>   if (lfence_after_load && load_insn_p ())

> >>>     {

> >>> +      /* Insert lfence after rep cmps/scas only under

> >>> +       -mlfence-after-load=all.  */

> >>> +      if (((i.tm.base_opcode | 0x1) == 0xa7

> >>> +         || (i.tm.base_opcode | 0x1) == 0xaf)

> >>> +        && i.prefix[REP_PREFIX])

> >>

> >> I'm afraid I don't understand why the REP forms need treating

> >> differently from the non-REP ones of the same insns.

> >>

> >

> > Not all REP forms, just REP CMPS/SCAS which would change EFLAGS.

>

> Well, of course just the two. But this doesn't answer my question

> as to why there is such a special case.

>


There are also two REP string instructions that require special
treatment. Specifically, the compare string (CMPS) and scan string
(SCAS) instructions set EFLAGS in a manner that depends on the data
being compared/scanned. When used with a REP prefix, the number of
iterations may therefore vary depending on this data. If the data is a
program secret chosen by the adversary using an LVI method, then this
data-dependent behavior may leak some aspect of the secret. The
solution is to unfold any REP CMPS and REP SCAS operations into a loop
and insert an LFENCE after the CMPS/SCAS instruction. For example,
REPNZ SCAS can be unfolded to:

.RepLoop:
  JRCXZ .ExitRepLoop
  DEC rcx  # or ecx if the REPNZ SCAS uses a 32-bit address size
  SCAS
  LFENCE
  JNZ .RepLoop
.ExitRepLoop:
  ...

The request i get is to add options to handle or not handle REP
CMPS/SCAS also plus issue a warning.

> >>> @@ -4583,33 +4613,47 @@ insert_lfence_before (void)

> >>>                        last_insn.name, i.tm.name);

> >>>         return;

> >>>       }

> >>> -      if (lfence_before_ret == lfence_before_ret_or)

> >>> -      {

> >>> -        /* orl: 0x830c2400.  */

> >>> -        p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

> >>> -        if (flag_code == CODE_64BIT)

> >>> -          *p++ = 0x48;

> >>> -        *p++ = 0x83;

> >>> -        *p++ = 0xc;

> >>> -        *p++ = 0x24;

> >>> -        *p++ = 0x0;

> >>> -      }

> >>> -      else

> >>> +

> >>> +      char prefix = i.prefix[DATA_PREFIX] ? 0x66

> >>> +      : flag_code == CODE_64BIT ? 0x48 : 0x0;

> >>

> >> Is this correct when the RET _also_ has an explicitly specified

> >> REX.W prefix? Also indentation looks somewhat odd on the last

> >> line of this block.

> >

> > I think yes.

>

> Please explain yourself. 66 48 C3 accesses 8 bytes of memory

> after all, whereas you'd generate a 2-byte access ahead of the

> LFENCE afaics.

>


Changed.

> >>> @@ -13012,6 +13061,8 @@ md_parse_option (int c, const char *arg)

> >>>       lfence_before_ret = lfence_before_ret_or;

> >>>       else if (strcasecmp (arg, "not") == 0)

> >>>       lfence_before_ret = lfence_before_ret_not;

> >>> +      else if (strcasecmp (arg, "shl") == 0)

> >>> +      lfence_before_ret = lfence_before_ret_shl;

> >>>       else if (strcasecmp (arg, "none") == 0)

> >>>       lfence_before_ret = lfence_before_ret_none;

> >>>       else

> >>

> >> With the SHL variant being truly benign (except for the

> >> performance impact of course), would it make sense to also

> >> allow for a simple "=yes" form now?

> >

> > Do you means add -mlfence-before-ret=yes which indicates

> > -mlfence-before-ret=shl?

>

> Yes.

>


Changed.

> > Update my patch:

>

> I've not looked at this just yet - the issues above should be

> resolved first (one way or another).

>

> Jan



From ac48b1dea2c679a01fc849ef9b5d23bd92c04b50 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>

Date: Mon, 16 Mar 2020 11:03:12 +0800
Subject: [PATCH] Improve -mlfence-after-load

  1.Implict load for POP/POPF/POPA/XLATB and Anysize insns
  2. Add -mlfence-before-ret=shl/yes, adjust operand size of or/not/shl to
  ret's.
  3. Ajust -mlfence-after-load=[yes/no] to
  -mlfence-after-load=[none|general|all]. -mlfence-after-load=[none/all]
  equal original -mlfence-after-load=[no/yes],
  -mlfence-after-load=general won't add lfence after REP CMPS/SCAS
  since they would affect control flow behavior.
  -mlfence-after-load=all will issue an warning when adding lfence
  after REP CMPS/SCAS.
  4. Adjust testcases and documents.

gas/Changelog:
        * config/tc-i386.c (lfence_after_load) Deleted.
        (lfence_after_load_kind): New.
        (lfence_before_ret_shl): New member.
        (load_insn_p): implict load for POP/POPA/POPF/XLATB and Anysize insns.
        (insert_after_load): Handle specially for REP CMPS/SCAS.
        (insert_before_before): Handle iret, Handle
        -mlfence-before-ret=shl, Adjust operand size of or/not/shl to ret's,
        (md_parse_option): Change -mlfence-after-load=[yes|no] to
        -mlfence-after-load=[none|general|all], Change
        -mlfence-before-ret=[none|not|or] to
        -mlfence-before-ret=[none/not/or/shl/yes].
        Enable -mlfence-before-ret=shl when
        -mlfence-beofre-indirect-branch=all and no explict
-mlfence-before-ret option.
        (md_show_usage): Ditto.
        * doc/c-i386.texi: Ditto.
        * testsuite/gas/i386/i386.exp: Add new testcases.
        * testsuite/gas/i386/lfence-load-b.d: New.
        * testsuite/gas/i386/lfence-load-b.e: New.
        * testsuite/gas/i386/lfence-load.d: Modified.
        * testsuite/gas/i386/lfence-load.e: New.
        * testsuite/gas/i386/lfence-load.s: Modified.
        * testsuite/gas/i386/lfence-ret-a.d: Modified.
        * testsuite/gas/i386/lfence-ret-b.d: Modified.
        * testsuite/gas/i386/lfence-ret-c.d: New.
        * testsuite/gas/i386/lfence-ret-d.d: New.
        * testsuite/gas/i386/lfence-ret.s: Modified.
        * testsuite/gas/i386/x86-64-lfence-load-b.d: New.
        * testsuite/gas/i386/x86-64-lfence-load.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-load.s: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-a.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-b.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-c.d: New.
        * testsuite/gas/i386/x86-64-lfence-ret-d.d: New
        * testsuite/gas/i386/x86-64-lfence-ret-e.d: New.
        * testsuite/gas/i386/x86-64-lfence-ret.s: New.
---
 gas/config/tc-i386.c                          | 147 +++++++++++++-----
 gas/doc/c-i386.texi                           |  31 ++--
 gas/testsuite/gas/i386/i386.exp               |   7 +
 gas/testsuite/gas/i386/lfence-load-b.d        | 137 ++++++++++++++++
 gas/testsuite/gas/i386/lfence-load-b.e        |   3 +
 gas/testsuite/gas/i386/lfence-load.d          |  30 +++-
 gas/testsuite/gas/i386/lfence-load.e          |   3 +
 gas/testsuite/gas/i386/lfence-load.s          |  20 +++
 gas/testsuite/gas/i386/lfence-ret-a.d         |   6 +
 gas/testsuite/gas/i386/lfence-ret-b.d         |   8 +
 gas/testsuite/gas/i386/lfence-ret-c.d         |  23 +++
 gas/testsuite/gas/i386/lfence-ret-d.d         |  24 +++
 gas/testsuite/gas/i386/lfence-ret.s           |   2 +
 gas/testsuite/gas/i386/x86-64-lfence-load-b.d | 137 ++++++++++++++++
 gas/testsuite/gas/i386/x86-64-lfence-load.d   |  28 +++-
 gas/testsuite/gas/i386/x86-64-lfence-load.s   |  19 +++
 gas/testsuite/gas/i386/x86-64-lfence-ret-a.d  |  14 +-
 gas/testsuite/gas/i386/x86-64-lfence-ret-b.d  |  18 ++-
 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d  |  29 ++++
 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d  |  31 ++++
 gas/testsuite/gas/i386/x86-64-lfence-ret-e.d  |  31 ++++
 gas/testsuite/gas/i386/x86-64-lfence-ret.s    |   8 +
 22 files changed, 698 insertions(+), 58 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/lfence-load-b.d
 create mode 100644 gas/testsuite/gas/i386/lfence-load-b.e
 create mode 100644 gas/testsuite/gas/i386/lfence-load.e
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-d.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-load-b.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 093497becd..199b818816 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -629,8 +629,17 @@ static int omit_lock_prefix = 0;
    "lock addl $0, (%{re}sp)".  */
 static int avoid_fence = 0;

-/* 1 if lfence should be inserted after every load.  */
-static int lfence_after_load = 0;
+/* Non-zero if lfence should be inserted after load.
+   lfence_load_all will generate lfence for all load instructions,
+   lfence_load_general will generate lfence for all
+   load instruction except REP CMPS/SCAS.  */
+static enum lfence_after_load_kind
+  {
+   lfence_load_none = 0,
+   lfence_load_general,
+   lfence_load_all
+  }
+lfence_after_load;

 /* Non-zero if lfence should be inserted before indirect branch.  */
 static enum lfence_before_indirect_branch_kind
@@ -647,7 +656,8 @@ static enum lfence_before_ret_kind
   {
     lfence_before_ret_none = 0,
     lfence_before_ret_not,
-    lfence_before_ret_or
+    lfence_before_ret_or,
+    lfence_before_ret_shl
   }
 lfence_before_ret;

@@ -4350,22 +4360,28 @@ load_insn_p (void)

   if (!any_vex_p)
     {
-      /* lea  */
-      if (i.tm.base_opcode == 0x8d)
+      /* Anysize insns: lea, invlpg, clflush, prefetchnta, prefetcht0,
+ prefetcht1, prefetcht2, prefetchtw, bndmk, bndcl, bndcu, bndcn,
+ bndstx, bndldx, prefetchwt1, clflushopt, clwb, cldemote.  */
+      if (i.tm.opcode_modifier.anysize)
  return 0;

-      /* pop  */
-      if ((i.tm.base_opcode & ~7) == 0x58
-   || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))
+      /* pop, popf, popa.   */
+      if (strcmp (i.tm.name, "pop") == 0
+   || i.tm.base_opcode == 0x9d
+   || i.tm.base_opcode == 0x61)
  return 1;

       /* movs, cmps, lods, scas.  */
       if ((i.tm.base_opcode | 0xb) == 0xaf)
  return 1;

-      /* outs */
-      if (base_opcode == 0x6f)
+      /* outs, xlatb.  */
+      if (base_opcode == 0x6f
+   || i.tm.base_opcode == 0xd7)
  return 1;
+      /* NB: For AMD-specific insns with implicit memory operands,
+ they're intentionally not covered.  */
     }

   /* No memory operand.  */
@@ -4506,6 +4522,31 @@ insert_lfence_after (void)
 {
   if (lfence_after_load && load_insn_p ())
     {
+      /* Insert lfence after rep cmps/scas only under
+ -mlfence-after-load=all.  */
+      /* There are also two REP string instructions that require
+ special treatment. Specifically, the compare string (CMPS)
+ and scan string (SCAS) instructions set EFLAGS in a manner
+ that depends on the data being compared/scanned. When used
+ with a REP prefix, the number of iterations may therefore
+ vary depending on this data. If the data is a program secret
+ chosen by the adversary using an LVI method,
+ then this data-dependent behavior may leak some aspect
+ of the secret.  */
+      if (((i.tm.base_opcode | 0x1) == 0xa7
+    || (i.tm.base_opcode | 0x1) == 0xaf)
+   && i.prefix[REP_PREFIX])
+ {
+   if (lfence_after_load == lfence_load_general)
+     {
+       as_warn (_("`%s` skips -mlfence-after-load=general"),
+        i.tm.name);
+       return;
+     }
+   else
+     as_warn (_("`%s` changes flags which would affect control flow behavior"),
+      i.tm.name);
+ }
       char *p = frag_more (3);
       *p++ = 0xf;
       *p++ = 0xae;
@@ -4536,8 +4577,8 @@ insert_lfence_before (void)

       if (i.reg_operands == 1)
  {
-   /* Indirect branch via register.  Don't insert lfence with
-      -mlfence-after-load=yes.  */
+   /* Indirect branch via register. Insert lfence when
+      -mlfence-after-load=none.  */
    if (lfence_after_load
        || lfence_before_indirect_branch == lfence_branch_memory)
      return;
@@ -4568,12 +4609,13 @@ insert_lfence_before (void)
       return;
     }

-  /* Output or/not and lfence before ret.  */
+  /* Output or/not/shl and lfence before ret/lret/iret.  */
   if (lfence_before_ret != lfence_before_ret_none
       && (i.tm.base_opcode == 0xc2
    || i.tm.base_opcode == 0xc3
    || i.tm.base_opcode == 0xca
-   || i.tm.base_opcode == 0xcb))
+   || i.tm.base_opcode == 0xcb
+   || i.tm.base_opcode == 0xcf))
     {
       if (last_insn.kind != last_insn_other
    && last_insn.seg == now_seg)
@@ -4583,33 +4625,50 @@ insert_lfence_before (void)
  last_insn.name, i.tm.name);
    return;
  }
-      if (lfence_before_ret == lfence_before_ret_or)
- {
-   /* orl: 0x830c2400.  */
-   p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);
-   if (flag_code == CODE_64BIT)
-     *p++ = 0x48;
-   *p++ = 0x83;
-   *p++ = 0xc;
-   *p++ = 0x24;
-   *p++ = 0x0;
- }
-      else
+
+      char prefix = i.prefix[DATA_PREFIX] && !(i.prefix[REX_PREFIX] & REX_W)
+ ? 0x66 : flag_code == CODE_64BIT ? 0x48 : 0x0;
+
+      if (lfence_before_ret == lfence_before_ret_not)
  {
-   p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);
-   /* notl: 0xf71424.  */
-   if (flag_code == CODE_64BIT)
-     *p++ = 0x48;
+   /* notl: 0xf71424, may add prefix
+      for operand size overwrite or 64-bit code.  */
+   p = frag_more ((prefix ? 2 : 0) + 6 + 3);
+   if (prefix)
+     *p++ = prefix;
    *p++ = 0xf7;
    *p++ = 0x14;
    *p++ = 0x24;
-   /* notl: 0xf71424.  */
-   if (flag_code == CODE_64BIT)
-     *p++ = 0x48;
+   if (prefix)
+     *p++ = prefix;
    *p++ = 0xf7;
    *p++ = 0x14;
    *p++ = 0x24;
  }
+      else
+ {
+   p = frag_more ((prefix ? 1 : 0) + 4 + 3);
+   if (prefix)
+     *p++ = prefix;
+   if (lfence_before_ret == lfence_before_ret_or)
+     {
+       /* orl: 0x830c2400, may add prefix
+ for operand size overwrite or 64-bit code.  */
+       *p++ = 0x83;
+       *p++ = 0x0c;
+     }
+   else
+     {
+       /* shl: 0xc1242400, may add prefix
+ for operand size overwrite or 64-bit code.  */
+       *p++ = 0xc1;
+       *p++ = 0x24;
+     }
+
+   *p++ = 0x24;
+   *p++ = 0x0;
+ }
+
       *p++ = 0xf;
       *p++ = 0xae;
       *p = 0xe8;
@@ -12985,17 +13044,23 @@ md_parse_option (int c, const char *arg)
       break;

     case OPTION_MLFENCE_AFTER_LOAD:
-      if (strcasecmp (arg, "yes") == 0)
- lfence_after_load = 1;
-      else if (strcasecmp (arg, "no") == 0)
- lfence_after_load = 0;
+      if (strcasecmp (arg, "general") == 0)
+ lfence_after_load = lfence_load_general;
+      else if (strcasecmp (arg, "all") == 0)
+ lfence_after_load = lfence_load_all;
+      else if (strcasecmp (arg, "none") == 0)
+ lfence_after_load = lfence_load_none;
       else
         as_fatal (_("invalid -mlfence-after-load= option: `%s'"), arg);
       break;

     case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:
       if (strcasecmp (arg, "all") == 0)
- lfence_before_indirect_branch = lfence_branch_all;
+ {
+   lfence_before_indirect_branch = lfence_branch_all;
+   if (lfence_before_ret == lfence_before_ret_none)
+     lfence_before_ret = lfence_before_ret_shl;
+ }
       else if (strcasecmp (arg, "memory") == 0)
  lfence_before_indirect_branch = lfence_branch_memory;
       else if (strcasecmp (arg, "register") == 0)
@@ -13012,6 +13077,8 @@ md_parse_option (int c, const char *arg)
  lfence_before_ret = lfence_before_ret_or;
       else if (strcasecmp (arg, "not") == 0)
  lfence_before_ret = lfence_before_ret_not;
+      else if (strcasecmp (arg, "shl") == 0 || strcasecmp (arg, "yes") == 0)
+ lfence_before_ret = lfence_before_ret_shl;
       else if (strcasecmp (arg, "none") == 0)
  lfence_before_ret = lfence_before_ret_none;
       else
@@ -13376,13 +13443,13 @@ md_show_usage (FILE *stream)
   -mbranches-within-32B-boundaries\n\
                           align branches within 32 byte boundary\n"));
   fprintf (stream, _("\
-  -mlfence-after-load=[no|yes] (default: no)\n\
+  -mlfence-after-load=[none|general|all] (default: none)\n\
                           generate lfence after load\n"));
   fprintf (stream, _("\
   -mlfence-before-indirect-branch=[none|all|register|memory] (default: none)\n\
                           generate lfence before indirect near branch\n"));
   fprintf (stream, _("\
-  -mlfence-before-ret=[none|or|not] (default: none)\n\
+  -mlfence-before-ret=[none|or|not|shl|yes] (default: none)\n\
                           generate lfence before ret\n"));
   fprintf (stream, _("\
   -mamd64                 accept only AMD64 ISA [default]\n"));
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 628fb1ad5a..19a4bf874e 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -470,12 +470,15 @@ The default doesn't align branches.

 @cindex @samp{-mlfence-after-load=} option, i386
 @cindex @samp{-mlfence-after-load=} option, x86-64
-@item -mlfence-after-load=@var{no}
-@itemx -mlfence-after-load=@var{yes}
+@item -mlfence-after-load=@var{none}
+@item -mlfence-after-load=@var{general}
+@itemx -mlfence-after-load=@var{all}
 These options control whether the assembler should generate lfence
-after load instructions.  @option{-mlfence-after-load=@var{yes}} will
-generate lfence.  @option{-mlfence-after-load=@var{no}} will not generate
-lfence, which is the default.
+after load instructions.  @option{-mlfence-after-load=@var{all}} will
+generate lfence for all load instructions,
+@option{-mlfence-after-load=@var{general}}will generate lfence for all
+load instruction except rep cmps/scas, @option{-mlfence-after-load=@var{none}}
+will not generate lfence, which is the default.

 @cindex @samp{-mlfence-before-indirect-branch=} option, i386
 @cindex @samp{-mlfence-before-indirect-branch=} option, x86-64
@@ -488,28 +491,32 @@ before indirect near branch instructions.
 @option{-mlfence-before-indirect-branch=@var{all}} will generate lfence
 before indirect near branch via register and issue a warning before
 indirect near branch via memory.
+It also implicitly sets @option{-mlfence-before-ret=@var{shl}} when
+there's no explict @option{-mlfence-before-ret=}.
 @option{-mlfence-before-indirect-branch=@var{register}} will generate
 lfence before indirect near branch via register.
 @option{-mlfence-before-indirect-branch=@var{memory}} will issue a
 warning before indirect near branch via memory.
 @option{-mlfence-before-indirect-branch=@var{none}} will not generate
-lfence nor issue warning, which is the default.  Note that lfence won't
-be generated before indirect near branch via register with
-@option{-mlfence-after-load=@var{yes}} since lfence will be generated
+lfence nor issue warning, which is the default.  Note that lfence will
+generate before indirect near branch via register only with
+@option{-mlfence-after-load=@var{none}} since lfence will be generated
 after loading branch target register.

 @cindex @samp{-mlfence-before-ret=} option, i386
 @cindex @samp{-mlfence-before-ret=} option, x86-64
 @item -mlfence-before-ret=@var{none}
+@item -mlfence-before-ret=@var{shl}
 @item -mlfence-before-ret=@var{or}
+@item -mlfence-before-ret=@var{yes}
 @itemx -mlfence-before-ret=@var{not}
 These options control whether the assembler should generate lfence
 before ret.  @option{-mlfence-before-ret=@var{or}} will generate
 generate or instruction with lfence.
-@option{-mlfence-before-ret=@var{not}} will generate not instruction
-with lfence.
-@option{-mlfence-before-ret=@var{none}} will not generate lfence,
-which is the default.
+@option{-mlfence-before-ret=@var{shl/yes}} will generate shl instruction
+with lfence. @option{-mlfence-before-ret=@var{not}} will generate not
+instruction with lfence. @option{-mlfence-before-ret=@var{none}} will not
+generate lfence, which is the default.

 @cindex @samp{-mx86-used-note=} option, i386
 @cindex @samp{-mx86-used-note=} option, x86-64
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 9dacc11906..bb3897b9ad 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -530,11 +530,14 @@ if [expr ([istarget "i*86-*-*"] ||  [istarget
"x86_64-*-*"]) && [gas_32_check]]
     run_dump_test "align-branch-8"
     run_dump_test "align-branch-9"
     run_dump_test "lfence-load"
+    run_dump_test "lfence-load-b"
     run_dump_test "lfence-indbr-a"
     run_dump_test "lfence-indbr-b"
     run_dump_test "lfence-indbr-c"
     run_dump_test "lfence-ret-a"
     run_dump_test "lfence-ret-b"
+    run_dump_test "lfence-ret-c"
+    run_dump_test "lfence-ret-d"
     run_dump_test "lfence-byte"

     # These tests require support for 8 and 16 bit relocs,
@@ -1117,11 +1120,15 @@ if [expr ([istarget "i*86-*-*"] || [istarget
"x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-align-branch-8"
     run_dump_test "x86-64-align-branch-9"
     run_dump_test "x86-64-lfence-load"
+    run_dump_test "x86-64-lfence-load-b"
     run_dump_test "x86-64-lfence-indbr-a"
     run_dump_test "x86-64-lfence-indbr-b"
     run_dump_test "x86-64-lfence-indbr-c"
     run_dump_test "x86-64-lfence-ret-a"
     run_dump_test "x86-64-lfence-ret-b"
+    run_dump_test "x86-64-lfence-ret-c"
+    run_dump_test "x86-64-lfence-ret-d"
+    run_dump_test "x86-64-lfence-ret-e"
     run_dump_test "x86-64-lfence-byte"

     if { ![istarget "*-*-aix*"]
diff --git a/gas/testsuite/gas/i386/lfence-load-b.d
b/gas/testsuite/gas/i386/lfence-load-b.d
new file mode 100644
index 0000000000..b4f7bc0f19
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load-b.d
@@ -0,0 +1,137 @@
+#source: lfence-load.s
+#as: -mlfence-after-load=general
+#objdump: -dw
+#warning_output: lfence-load-b.e
+#name: lfence-load-b
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: c5 f8 ae 55 00        vldmxcsr 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 55 00          lgdtl  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%ebp\),%edx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 7d 00          invlpg 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%ebp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%ebp\)
+ +[a-f0-9]+: 1f                    pop    %ds
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popf
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 61                    popa
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d9 55 00              fsts   0x0\(%ebp\)
+ +[a-f0-9]+: d9 45 00              flds   0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: db 55 00              fistl  0x0\(%ebp\)
+ +[a-f0-9]+: df 55 00              fists  0x0\(%ebp\)
+ +[a-f0-9]+: db 45 00              fildl  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b dd 75 00          fsave  0x0\(%ebp\)
+ +[a-f0-9]+: dd 65 00              frstor 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 4d 00              fisttps 0x0\(%ebp\)
+ +[a-f0-9]+: d9 65 00              fldenv 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b d9 75 00          fstenv 0x0\(%ebp\)
+ +[a-f0-9]+: d8 45 00              fadds  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 04 24              fadds  \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 c3                fadd   %st\(3\),%st
+ +[a-f0-9]+: d8 01                fadds  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 01                filds  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 11                fists  \(%ecx\)
+ +[a-f0-9]+: 0f ae 29              xrstor \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 18 01              prefetchnta \(%ecx\)
+ +[a-f0-9]+: 0f c7 09              cmpxchg8b \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 41                    inc    %ecx
+ +[a-f0-9]+: 0f 01 10              lgdtl  \(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 0f 66 02 b0        pfcmpeq 0x2\(%esi\),%mm4
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 8f 00                popl   \(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 58                    pop    %eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 d1 11              rclw   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 01 01 00 00 00    testl  \$0x1,\(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ff 01                incl   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 11                notl   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 31                divl   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 21                mull   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 39                idivl  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 29                imull  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 8d 04 40              lea    \(%eax,%eax,2\),%eax
+ +[a-f0-9]+: c9                    leave
+ +[a-f0-9]+: 6e                    outsb  %ds:\(%esi\),\(%dx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ac                    lods   %ds:\(%esi\),%al
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 a5                rep movsl %ds:\(%esi\),%es:\(%edi\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 af                repz scas %es:\(%edi\),%eax
+ +[a-f0-9]+: f3 a7                repz cmpsl %es:\(%edi\),%ds:\(%esi\)
+ +[a-f0-9]+: f3 ad                rep lods %ds:\(%esi\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 83 00 01              addl   \$0x1,\(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f ba 20 01          btl    \$0x1,\(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c1 03              xadd   %eax,\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c1 c3              xadd   %eax,%ebx
+ +[a-f0-9]+: 87 03                xchg   %eax,\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 93                    xchg   %eax,%ebx
+ +[a-f0-9]+: 39 45 40              cmp    %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 3b 45 40              cmp    0x40\(%ebp\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 01 45 40              add    %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 03 00                add    \(%eax\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 85 45 40              test   %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 85 45 40              test   %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-load-b.e
b/gas/testsuite/gas/i386/lfence-load-b.e
new file mode 100644
index 0000000000..c394e02296
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load-b.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:??: Warning: `scas` skips -mlfence-after-load=general
+.*:??: Warning: `cmps` skips -mlfence-after-load=general
\ No newline at end of file
diff --git a/gas/testsuite/gas/i386/lfence-load.d
b/gas/testsuite/gas/i386/lfence-load.d
index cd7e7f76df..273e302f38 100644
--- a/gas/testsuite/gas/i386/lfence-load.d
+++ b/gas/testsuite/gas/i386/lfence-load.d
@@ -1,6 +1,7 @@
-#as: -mlfence-after-load=yes
+#as: -mlfence-after-load=all
 #objdump: -dw
-#name: -mlfence-after-load=yes
+#warning_output: lfence-load.e
+#name: -mlfence-after-load=all

 .*: +file format .*

@@ -15,6 +16,31 @@ Disassembly of section .text:
  +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%ebp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%ebp\),%edx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 7d 00          invlpg 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%ebp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%ebp\)
+ +[a-f0-9]+: 1f                    pop    %ds
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popf
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 61                    popa
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: d9 55 00              fsts   0x0\(%ebp\)
  +[a-f0-9]+: d9 45 00              flds   0x0\(%ebp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/lfence-load.e
b/gas/testsuite/gas/i386/lfence-load.e
new file mode 100644
index 0000000000..1ee49da7fd
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:??: Warning: `scas` changes flags which would affect control flow behavior
+.*:??: Warning: `cmps` changes flags which would affect control flow behavior
diff --git a/gas/testsuite/gas/i386/lfence-load.s
b/gas/testsuite/gas/i386/lfence-load.s
index b417ac644e..4b4aa1610b 100644
--- a/gas/testsuite/gas/i386/lfence-load.s
+++ b/gas/testsuite/gas/i386/lfence-load.s
@@ -4,6 +4,26 @@ _start:
  lgdt (%ebp)
  vmptrld (%ebp)
  vmclear (%ebp)
+ invpcid (%ebp), %edx
+ invlpg (%ebp)
+ clflush (%ebp)
+ clflushopt (%ebp)
+ clwb (%ebp)
+ cldemote (%ebp)
+ bndmk (%ebp), %bnd1
+ bndcl (%ebp), %bnd1
+ bndcu (%ebp), %bnd1
+ bndcn (%ebp), %bnd1
+ bndstx %bnd1, (%ebp)
+ bndldx (%ebp), %bnd1
+ prefetcht0 (%ebp)
+ prefetcht1 (%ebp)
+ prefetcht2 (%ebp)
+ prefetchw (%ebp)
+ pop %ds
+ popf
+ popa
+ xlatb (%ebx)
  fsts (%ebp)
  flds (%ebp)
  fistl (%ebp)
diff --git a/gas/testsuite/gas/i386/lfence-ret-a.d
b/gas/testsuite/gas/i386/lfence-ret-a.d
index 719cf1b472..613d1d50a2 100644
--- a/gas/testsuite/gas/i386/lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/lfence-ret-a.d
@@ -9,6 +9,12 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c3                    ret
diff --git a/gas/testsuite/gas/i386/lfence-ret-b.d
b/gas/testsuite/gas/i386/lfence-ret-b.d
index e3914b9c28..e6dd4f4bf6 100644
--- a/gas/testsuite/gas/i386/lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/lfence-ret-b.d
@@ -9,6 +9,14 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/lfence-ret-c.d
b/gas/testsuite/gas/i386/lfence-ret-c.d
new file mode 100644
index 0000000000..58f7e0a706
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-c.d
@@ -0,0 +1,23 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    ret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret-d.d
b/gas/testsuite/gas/i386/lfence-ret-d.d
new file mode 100644
index 0000000000..9078216e53
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-d.d
@@ -0,0 +1,24 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=shl
+#objdump: -dw
+#name: -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    ret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret.s
b/gas/testsuite/gas/i386/lfence-ret.s
index 35c4e6eeaa..5de4f08447 100644
--- a/gas/testsuite/gas/i386/lfence-ret.s
+++ b/gas/testsuite/gas/i386/lfence-ret.s
@@ -1,4 +1,6 @@
  .text
 _start:
+ retw
+ retw $20
  ret
  ret $30
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
b/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
new file mode 100644
index 0000000000..b1fd3cad42
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
@@ -0,0 +1,137 @@
+#source: x86-64-lfence-load.s
+#as: -mlfence-after-load=general
+#objdump: -dw
+#warning_output: lfence-load-b.e
+#name: x86-64 lfence-load-b
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: c5 f8 ae 55 00        vldmxcsr 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 55 00          lgdt   0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%rbp\),%rdx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 67 0f 01 38          invlpg \(%eax\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%rbp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%rbp\)
+ +[a-f0-9]+: 0f a1                popq   %fs
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popfq
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d9 55 00              fsts   0x0\(%rbp\)
+ +[a-f0-9]+: d9 45 00              flds   0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: db 55 00              fistl  0x0\(%rbp\)
+ +[a-f0-9]+: df 55 00              fists  0x0\(%rbp\)
+ +[a-f0-9]+: db 45 00              fildl  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b dd 75 00          fsave  0x0\(%rbp\)
+ +[a-f0-9]+: dd 65 00              frstor 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 4d 00              fisttps 0x0\(%rbp\)
+ +[a-f0-9]+: d9 65 00              fldenv 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b d9 75 00          fstenv 0x0\(%rbp\)
+ +[a-f0-9]+: d8 45 00              fadds  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 04 24              fadds  \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 c3                fadd   %st\(3\),%st
+ +[a-f0-9]+: d8 01                fadds  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 01                filds  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 11                fists  \(%rcx\)
+ +[a-f0-9]+: 0f ae 29              xrstor \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 18 01              prefetchnta \(%rcx\)
+ +[a-f0-9]+: 0f c7 09              cmpxchg8b \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 0f c7 09          cmpxchg16b \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ff c1                inc    %ecx
+ +[a-f0-9]+: 0f 01 10              lgdt   \(%rax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 0f 66 02 b0        pfcmpeq 0x2\(%rsi\),%mm4
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 8f 00                popq   \(%rax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 58                    pop    %rax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 d1 11              rclw   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 01 01 00 00 00    testl  \$0x1,\(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ff 01                incl   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 11                notl   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 31                divl   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 21                mull   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 39                idivl  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 29                imull  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 8d 04 40          lea    \(%rax,%rax,2\),%rax
+ +[a-f0-9]+: c9                    leaveq
+ +[a-f0-9]+: 6e                    outsb  %ds:\(%rsi\),\(%dx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ac                    lods   %ds:\(%rsi\),%al
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 a5                rep movsl %ds:\(%rsi\),%es:\(%rdi\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 af                repz scas %es:\(%rdi\),%eax
+ +[a-f0-9]+: f3 a7                repz cmpsl %es:\(%rdi\),%ds:\(%rsi\)
+ +[a-f0-9]+: f3 ad                rep lods %ds:\(%rsi\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 41 83 03 01          addl   \$0x1,\(%r11\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 41 0f ba 23 01        btl    \$0x1,\(%r11\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 0f c1 03          xadd   %rax,\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 0f c1 c3          xadd   %rax,%rbx
+ +[a-f0-9]+: 48 87 03              xchg   %rax,\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 93                xchg   %rax,%rbx
+ +[a-f0-9]+: 48 39 45 40          cmp    %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 3b 45 40          cmp    0x40\(%rbp\),%rax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 01 45 40          add    %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 03 00              add    \(%rax\),%rax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 85 45 40          test   %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 85 45 40          test   %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.d
b/gas/testsuite/gas/i386/x86-64-lfence-load.d
index 4f6cd00edf..f21aba85d5 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
@@ -1,6 +1,7 @@
-#as: -mlfence-after-load=yes
+#as: -mlfence-after-load=all
 #objdump: -dw
-#name: x86-64 -mlfence-after-load=yes
+#warning_output: lfence-load.e
+#name: x86-64 -mlfence-after-load=all

 .*: +file format .*

@@ -15,6 +16,29 @@ Disassembly of section .text:
  +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%rbp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%rbp\),%rdx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 67 0f 01 38          invlpg \(%eax\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%rbp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%rbp\)
+ +[a-f0-9]+: 0f a1                popq   %fs
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popfq
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: d9 55 00              fsts   0x0\(%rbp\)
  +[a-f0-9]+: d9 45 00              flds   0x0\(%rbp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.s
b/gas/testsuite/gas/i386/x86-64-lfence-load.s
index 76d0886617..2a3ac6b7d2 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.s
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s
@@ -4,6 +4,25 @@ _start:
  lgdt (%rbp)
  vmptrld (%rbp)
  vmclear (%rbp)
+ invpcid (%rbp), %rdx
+ invlpg (%eax)
+ clflush (%rbp)
+ clflushopt (%rbp)
+ clwb (%rbp)
+ cldemote (%rbp)
+ bndmk (%rbp), %bnd1
+ bndcl (%rbp), %bnd1
+ bndcu (%rbp), %bnd1
+ bndcn (%rbp), %bnd1
+ bndstx %bnd1, (%rbp)
+ bndldx (%rbp), %bnd1
+ prefetcht0 (%rbp)
+ prefetcht1 (%rbp)
+ prefetcht2 (%rbp)
+ prefetchw (%rbp)
+ pop %fs
+ popf
+ xlatb (%rbx)
  fsts (%rbp)
  flds (%rbp)
  fistl (%rbp)
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
index 26e5b48bec..758bf88da2 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
@@ -1,4 +1,4 @@
-#source: lfence-ret.s
+#source: x86-64-lfence-ret.s
 #as: -mlfence-before-ret=or
 #objdump: -dw
 #name: x86-64 -mlfence-before-ret=or
@@ -9,10 +9,22 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c3                    retq
  +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
index 340488831d..7a06080ef5 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
@@ -1,4 +1,4 @@
-#source: lfence-ret.s
+#source: x86-64-lfence-ret.s
 #as: -mlfence-before-ret=not
 #objdump: -dw
 #name: x86-64 -mlfence-before-ret=not
@@ -9,6 +9,14 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
@@ -17,4 +25,12 @@ Disassembly of section .text:
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
new file mode 100644
index 0000000000..0e99d8bed8
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
@@ -0,0 +1,29 @@
+#source: x86-64-lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
new file mode 100644
index 0000000000..b1e6d8f90e
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
@@ -0,0 +1,31 @@
+#source: x86-64-lfence-ret.s
+#as: -mlfence-before-ret=shl
+#objdump: -dw
+#name: x86-64 -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
new file mode 100644
index 0000000000..4ffb19207d
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
@@ -0,0 +1,31 @@
+#source: x86-64-lfence-ret.s
+#as: -mlfence-before-ret=shl
+#objdump: -dw
+#name: x86-64 -mlfence-before-ret=yes
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret.s
b/gas/testsuite/gas/i386/x86-64-lfence-ret.s
new file mode 100644
index 0000000000..dd0961a49c
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret.s
@@ -0,0 +1,8 @@
+ .text
+_start:
+ retw
+ retw $20
+ ret
+ ret $30
+ data16 rex.w ret
+ data16 rex.w ret $40
-- 
2.18.1


-- 
BR,
Hongtao
Jan Beulich April 21, 2020, 6:30 a.m. | #12
On 21.04.2020 04:24, Hongtao Liu wrote:
> On Mon, Apr 20, 2020 at 3:34 PM Jan Beulich <jbeulich@suse.com> wrote:

>>

>> On 20.04.2020 09:20, Hongtao Liu wrote:

>>> On Thu, Apr 16, 2020 at 4:33 PM Jan Beulich <jbeulich@suse.com> wrote:

>>>> On 16.04.2020 07:34, Hongtao Liu wrote:

>>>>> @@ -4506,6 +4520,22 @@ insert_lfence_after (void)

>>>>> {

>>>>>   if (lfence_after_load && load_insn_p ())

>>>>>     {

>>>>> +      /* Insert lfence after rep cmps/scas only under

>>>>> +       -mlfence-after-load=all.  */

>>>>> +      if (((i.tm.base_opcode | 0x1) == 0xa7

>>>>> +         || (i.tm.base_opcode | 0x1) == 0xaf)

>>>>> +        && i.prefix[REP_PREFIX])

>>>>

>>>> I'm afraid I don't understand why the REP forms need treating

>>>> differently from the non-REP ones of the same insns.

>>>>

>>>

>>> Not all REP forms, just REP CMPS/SCAS which would change EFLAGS.

>>

>> Well, of course just the two. But this doesn't answer my question

>> as to why there is such a special case.

>>

> 

> There are also two REP string instructions that require special

> treatment. Specifically, the compare string (CMPS) and scan string

> (SCAS) instructions set EFLAGS in a manner that depends on the data

> being compared/scanned. When used with a REP prefix, the number of

> iterations may therefore vary depending on this data. If the data is a

> program secret chosen by the adversary using an LVI method, then this

> data-dependent behavior may leak some aspect of the secret. The

> solution is to unfold any REP CMPS and REP SCAS operations into a loop

> and insert an LFENCE after the CMPS/SCAS instruction. For example,

> REPNZ SCAS can be unfolded to:

> 

> .RepLoop:

>   JRCXZ .ExitRepLoop

>   DEC rcx  # or ecx if the REPNZ SCAS uses a 32-bit address size

>   SCAS

>   LFENCE

>   JNZ .RepLoop

> .ExitRepLoop:

>   ...

> 

> The request i get is to add options to handle or not handle REP

> CMPS/SCAS also plus issue a warning.


But you don't handle them as per what you've written above, afaics.
Am I overlooking anything?

> @@ -647,7 +656,8 @@ static enum lfence_before_ret_kind

>    {

>      lfence_before_ret_none = 0,

>      lfence_before_ret_not,

> -    lfence_before_ret_or

> +    lfence_before_ret_or,

> +    lfence_before_ret_shl

>    }

>  lfence_before_ret;

> 

> @@ -4350,22 +4360,28 @@ load_insn_p (void)

> 

>    if (!any_vex_p)

>      {

> -      /* lea  */

> -      if (i.tm.base_opcode == 0x8d)

> +      /* Anysize insns: lea, invlpg, clflush, prefetchnta, prefetcht0,

> + prefetcht1, prefetcht2, prefetchtw, bndmk, bndcl, bndcu, bndcn,

> + bndstx, bndldx, prefetchwt1, clflushopt, clwb, cldemote.  */


Bad indentation (also elsewhere, so this may be an issue with your
mail client)?

> @@ -4536,8 +4577,8 @@ insert_lfence_before (void)

> 

>        if (i.reg_operands == 1)

>   {

> -   /* Indirect branch via register.  Don't insert lfence with

> -      -mlfence-after-load=yes.  */

> +   /* Indirect branch via register. Insert lfence when

> +      -mlfence-after-load=none.  */

>     if (lfence_after_load

>         || lfence_before_indirect_branch == lfence_branch_memory)

>       return;


The changed comment is awkward to read - the reader will almost
certainly wonder why "none" implies an action. I think you either
want to explain this further, or revert back to the original form
by simply making it "... with -mlfence-after-load={all,general}."

> @@ -4568,12 +4609,13 @@ insert_lfence_before (void)

>        return;

>      }

> 

> -  /* Output or/not and lfence before ret.  */

> +  /* Output or/not/shl and lfence before ret/lret/iret.  */

>    if (lfence_before_ret != lfence_before_ret_none

>        && (i.tm.base_opcode == 0xc2

>     || i.tm.base_opcode == 0xc3

>     || i.tm.base_opcode == 0xca

> -   || i.tm.base_opcode == 0xcb))

> +   || i.tm.base_opcode == 0xcb

> +   || i.tm.base_opcode == 0xcf))

>      {

>        if (last_insn.kind != last_insn_other

>     && last_insn.seg == now_seg)

> @@ -4583,33 +4625,50 @@ insert_lfence_before (void)

>   last_insn.name, i.tm.name);

>     return;

>   }

> -      if (lfence_before_ret == lfence_before_ret_or)

> - {

> -   /* orl: 0x830c2400.  */

> -   p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

> -   if (flag_code == CODE_64BIT)

> -     *p++ = 0x48;

> -   *p++ = 0x83;

> -   *p++ = 0xc;

> -   *p++ = 0x24;

> -   *p++ = 0x0;

> - }

> -      else

> +

> +      char prefix = i.prefix[DATA_PREFIX] && !(i.prefix[REX_PREFIX] & REX_W)

> + ? 0x66 : flag_code == CODE_64BIT ? 0x48 : 0x0;


While this now looks better, it's tailored to near RET. Far RET
as well as IRET default to 32-bit operand size in 64-bit mode.
I can't tell how relevant it is to match effective operand size
of the guarded and guarding insns.

> +      if (lfence_before_ret == lfence_before_ret_not)

>   {

> -   p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);

> -   /* notl: 0xf71424.  */

> -   if (flag_code == CODE_64BIT)

> -     *p++ = 0x48;

> +   /* notl: 0xf71424, may add prefix

> +      for operand size overwrite or 64-bit code.  */


Despite the comment extension you still say "notl". Please switch
toi either just "not" or something like "not{w,l,q}". Also
s/overwrite/override/. Note how you ...

> +   p = frag_more ((prefix ? 2 : 0) + 6 + 3);

> +   if (prefix)

> +     *p++ = prefix;

>     *p++ = 0xf7;

>     *p++ = 0x14;

>     *p++ = 0x24;

> -   /* notl: 0xf71424.  */

> -   if (flag_code == CODE_64BIT)

> -     *p++ = 0x48;

> +   if (prefix)

> +     *p++ = prefix;

>     *p++ = 0xf7;

>     *p++ = 0x14;

>     *p++ = 0x24;

>   }

> +      else

> + {

> +   p = frag_more ((prefix ? 1 : 0) + 4 + 3);

> +   if (prefix)

> +     *p++ = prefix;

> +   if (lfence_before_ret == lfence_before_ret_or)

> +     {

> +       /* orl: 0x830c2400, may add prefix

> + for operand size overwrite or 64-bit code.  */


... also have the same (bogus) suffixe here, but ...

> +       *p++ = 0x83;

> +       *p++ = 0x0c;

> +     }

> +   else

> +     {

> +       /* shl: 0xc1242400, may add prefix

> + for operand size overwrite or 64-bit code.  */


... not here.

Jan
Fangrui Song via Binutils April 22, 2020, 3:33 a.m. | #13
On Tue, Apr 21, 2020 at 2:30 PM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 21.04.2020 04:24, Hongtao Liu wrote:

> > On Mon, Apr 20, 2020 at 3:34 PM Jan Beulich <jbeulich@suse.com> wrote:

> >>

> >> On 20.04.2020 09:20, Hongtao Liu wrote:

> >>> On Thu, Apr 16, 2020 at 4:33 PM Jan Beulich <jbeulich@suse.com> wrote:

> >>>> On 16.04.2020 07:34, Hongtao Liu wrote:

> >>>>> @@ -4506,6 +4520,22 @@ insert_lfence_after (void)

> >>>>> {

> >>>>>   if (lfence_after_load && load_insn_p ())

> >>>>>     {

> >>>>> +      /* Insert lfence after rep cmps/scas only under

> >>>>> +       -mlfence-after-load=all.  */

> >>>>> +      if (((i.tm.base_opcode | 0x1) == 0xa7

> >>>>> +         || (i.tm.base_opcode | 0x1) == 0xaf)

> >>>>> +        && i.prefix[REP_PREFIX])

> >>>>

> >>>> I'm afraid I don't understand why the REP forms need treating

> >>>> differently from the non-REP ones of the same insns.

> >>>>

> >>>

> >>> Not all REP forms, just REP CMPS/SCAS which would change EFLAGS.

> >>

> >> Well, of course just the two. But this doesn't answer my question

> >> as to why there is such a special case.

> >>

> >

> > There are also two REP string instructions that require special

> > treatment. Specifically, the compare string (CMPS) and scan string

> > (SCAS) instructions set EFLAGS in a manner that depends on the data

> > being compared/scanned. When used with a REP prefix, the number of

> > iterations may therefore vary depending on this data. If the data is a

> > program secret chosen by the adversary using an LVI method, then this

> > data-dependent behavior may leak some aspect of the secret. The

> > solution is to unfold any REP CMPS and REP SCAS operations into a loop

> > and insert an LFENCE after the CMPS/SCAS instruction. For example,

> > REPNZ SCAS can be unfolded to:

> >

> > .RepLoop:

> >   JRCXZ .ExitRepLoop

> >   DEC rcx  # or ecx if the REPNZ SCAS uses a 32-bit address size

> >   SCAS

> >   LFENCE

> >   JNZ .RepLoop

> > .ExitRepLoop:

> >   ...

> >

> > The request i get is to add options to handle or not handle REP

> > CMPS/SCAS also plus issue a warning.

>

> But you don't handle them as per what you've written above, afaics.

> Am I overlooking anything?

>


Well, that solution is not meant for gas, i put them here for
convienence of understanding of why we need to handle REP CMPS/SCAS
specially.

> > @@ -647,7 +656,8 @@ static enum lfence_before_ret_kind

> >    {

> >      lfence_before_ret_none = 0,

> >      lfence_before_ret_not,

> > -    lfence_before_ret_or

> > +    lfence_before_ret_or,

> > +    lfence_before_ret_shl

> >    }

> >  lfence_before_ret;

> >

> > @@ -4350,22 +4360,28 @@ load_insn_p (void)

> >

> >    if (!any_vex_p)

> >      {

> > -      /* lea  */

> > -      if (i.tm.base_opcode == 0x8d)

> > +      /* Anysize insns: lea, invlpg, clflush, prefetchnta, prefetcht0,

> > + prefetcht1, prefetcht2, prefetchtw, bndmk, bndcl, bndcu, bndcn,

> > + bndstx, bndldx, prefetchwt1, clflushopt, clwb, cldemote.  */

>

> Bad indentation (also elsewhere, so this may be an issue with your

> mail client)?

>


Yes, tab is ignored when copy into gmail(plain text mode).
I need to manually replace tab with 8 space.

> > @@ -4536,8 +4577,8 @@ insert_lfence_before (void)

> >

> >        if (i.reg_operands == 1)

> >   {

> > -   /* Indirect branch via register.  Don't insert lfence with

> > -      -mlfence-after-load=yes.  */

> > +   /* Indirect branch via register. Insert lfence when

> > +      -mlfence-after-load=none.  */

> >     if (lfence_after_load

> >         || lfence_before_indirect_branch == lfence_branch_memory)

> >       return;

>

> The changed comment is awkward to read - the reader will almost

> certainly wonder why "none" implies an action. I think you either

> want to explain this further, or revert back to the original form

> by simply making it "... with -mlfence-after-load={all,general}."

>

Changed.
> > @@ -4568,12 +4609,13 @@ insert_lfence_before (void)

> >        return;

> >      }

> >

> > -  /* Output or/not and lfence before ret.  */

> > +  /* Output or/not/shl and lfence before ret/lret/iret.  */

> >    if (lfence_before_ret != lfence_before_ret_none

> >        && (i.tm.base_opcode == 0xc2

> >     || i.tm.base_opcode == 0xc3

> >     || i.tm.base_opcode == 0xca

> > -   || i.tm.base_opcode == 0xcb))

> > +   || i.tm.base_opcode == 0xcb

> > +   || i.tm.base_opcode == 0xcf))

> >      {

> >        if (last_insn.kind != last_insn_other

> >     && last_insn.seg == now_seg)

> > @@ -4583,33 +4625,50 @@ insert_lfence_before (void)

> >   last_insn.name, i.tm.name);

> >     return;

> >   }

> > -      if (lfence_before_ret == lfence_before_ret_or)

> > - {

> > -   /* orl: 0x830c2400.  */

> > -   p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

> > -   if (flag_code == CODE_64BIT)

> > -     *p++ = 0x48;

> > -   *p++ = 0x83;

> > -   *p++ = 0xc;

> > -   *p++ = 0x24;

> > -   *p++ = 0x0;

> > - }

> > -      else

> > +

> > +      char prefix = i.prefix[DATA_PREFIX] && !(i.prefix[REX_PREFIX] & REX_W)

> > + ? 0x66 : flag_code == CODE_64BIT ? 0x48 : 0x0;

>

> While this now looks better, it's tailored to near RET. Far RET

> as well as IRET default to 32-bit operand size in 64-bit mode.

> I can't tell how relevant it is to match effective operand size

> of the guarded and guarding insns.

>

Changed

Update patch.

From 26d23fc18e090799872057bec21831c02e8b5d03 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>

Date: Mon, 16 Mar 2020 11:03:12 +0800
Subject: [PATCH] Improve -mlfence-after-load

  1.Implict load for POP/POPF/POPA/XLATB, no load for Anysize insns
  2. Add -mlfence-before-ret=shl/yes, adjust operand size of
  or/not/shl according to ret's.
  3. Ajust -mlfence-after-load=[yes/no] to
  -mlfence-after-load=[none|general|all]. -mlfence-after-load=[none/all]
  equal original -mlfence-after-load=[no/yes],
  -mlfence-after-load=general won't add lfence after REP CMPS/SCAS
  since they would affect control flow behavior.
  -mlfence-after-load=all will issue an warning when adding lfence
  after REP CMPS/SCAS.
  4. Adjust testcases and documents.

gas/Changelog:
        * config/tc-i386.c (lfence_after_load) Deleted.
        (lfence_after_load_kind): New.
        (lfence_before_ret_shl): New member.
        (load_insn_p): implict load for POP/POPA/POPF/XLATB, no load
        for Anysize insns.
        (insert_after_load): Handle specially for REP CMPS/SCAS.
        (insert_before_before): Handle iret, Handle
        -mlfence-before-ret=shl, Adjust operand size of or/not/shl to ret's,
        (md_parse_option): Change -mlfence-after-load=[yes|no] to
        -mlfence-after-load=[none|general|all], Change
        -mlfence-before-ret=[none|not|or] to
        -mlfence-before-ret=[none/not/or/shl/yes].
        Enable -mlfence-before-ret=shl when
        -mlfence-beofre-indirect-branch=all and no explict
-mlfence-before-ret option.
        (md_show_usage): Ditto.
        * doc/c-i386.texi: Ditto.
        * testsuite/gas/i386/i386.exp: Add new testcases.
        * testsuite/gas/i386/lfence-load-b.d: New.
        * testsuite/gas/i386/lfence-load-b.e: New.
        * testsuite/gas/i386/lfence-load.d: Modified.
        * testsuite/gas/i386/lfence-load.e: New.
        * testsuite/gas/i386/lfence-load.s: Modified.
        * testsuite/gas/i386/lfence-ret-a.d: Modified.
        * testsuite/gas/i386/lfence-ret-b.d: Modified.
        * testsuite/gas/i386/lfence-ret-c.d: New.
        * testsuite/gas/i386/lfence-ret-d.d: New.
        * testsuite/gas/i386/lfence-ret.s: Modified.
        * testsuite/gas/i386/x86-64-lfence-load-b.d: New.
        * testsuite/gas/i386/x86-64-lfence-load.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-load.s: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-a.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-b.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-c.d: New.
        * testsuite/gas/i386/x86-64-lfence-ret-d.d: New
        * testsuite/gas/i386/x86-64-lfence-ret-e.d: New.
        * testsuite/gas/i386/x86-64-lfence-ret.e: New.
        * testsuite/gas/i386/x86-64-lfence-ret.s: New.
---
 gas/config/tc-i386.c                          | 154 +++++++++++++-----
 gas/doc/c-i386.texi                           |  31 ++--
 gas/testsuite/gas/i386/i386.exp               |   7 +
 gas/testsuite/gas/i386/lfence-load-b.d        | 137 ++++++++++++++++
 gas/testsuite/gas/i386/lfence-load-b.e        |   3 +
 gas/testsuite/gas/i386/lfence-load.d          |  30 +++-
 gas/testsuite/gas/i386/lfence-load.e          |   3 +
 gas/testsuite/gas/i386/lfence-load.s          |  20 +++
 gas/testsuite/gas/i386/lfence-ret-a.d         |  18 ++
 gas/testsuite/gas/i386/lfence-ret-b.d         |  24 +++
 gas/testsuite/gas/i386/lfence-ret-c.d         |  35 ++++
 gas/testsuite/gas/i386/lfence-ret-d.d         |  36 ++++
 gas/testsuite/gas/i386/lfence-ret.s           |   6 +
 gas/testsuite/gas/i386/x86-64-lfence-load-b.d | 137 ++++++++++++++++
 gas/testsuite/gas/i386/x86-64-lfence-load.d   |  28 +++-
 gas/testsuite/gas/i386/x86-64-lfence-load.s   |  19 +++
 gas/testsuite/gas/i386/x86-64-lfence-ret-a.d  |  33 +++-
 gas/testsuite/gas/i386/x86-64-lfence-ret-b.d  |  43 ++++-
 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d  |  48 ++++++
 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d  |  49 ++++++
 gas/testsuite/gas/i386/x86-64-lfence-ret-e.d  |  49 ++++++
 gas/testsuite/gas/i386/x86-64-lfence-ret.e    |   3 +
 gas/testsuite/gas/i386/x86-64-lfence-ret.s    |  14 ++
 23 files changed, 870 insertions(+), 57 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/lfence-load-b.d
 create mode 100644 gas/testsuite/gas/i386/lfence-load-b.e
 create mode 100644 gas/testsuite/gas/i386/lfence-load.e
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-d.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-load-b.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret.e
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 093497becd..7454f2987f 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -629,8 +629,17 @@ static int omit_lock_prefix = 0;
    "lock addl $0, (%{re}sp)".  */
 static int avoid_fence = 0;

-/* 1 if lfence should be inserted after every load.  */
-static int lfence_after_load = 0;
+/* Non-zero if lfence should be inserted after load.
+   lfence_load_all will generate lfence for all load instructions,
+   lfence_load_general will generate lfence for all
+   load instruction except REP CMPS/SCAS.  */
+static enum lfence_after_load_kind
+  {
+   lfence_load_none = 0,
+   lfence_load_general,
+   lfence_load_all
+  }
+lfence_after_load;

 /* Non-zero if lfence should be inserted before indirect branch.  */
 static enum lfence_before_indirect_branch_kind
@@ -647,7 +656,8 @@ static enum lfence_before_ret_kind
   {
     lfence_before_ret_none = 0,
     lfence_before_ret_not,
-    lfence_before_ret_or
+    lfence_before_ret_or,
+    lfence_before_ret_shl
   }
 lfence_before_ret;

@@ -4350,22 +4360,28 @@ load_insn_p (void)

   if (!any_vex_p)
     {
-      /* lea  */
-      if (i.tm.base_opcode == 0x8d)
+      /* Anysize insns: lea, invlpg, clflush, prefetchnta, prefetcht0,
+         prefetcht1, prefetcht2, prefetchtw, bndmk, bndcl, bndcu, bndcn,
+         bndstx, bndldx, prefetchwt1, clflushopt, clwb, cldemote.  */
+      if (i.tm.opcode_modifier.anysize)
         return 0;

-      /* pop  */
-      if ((i.tm.base_opcode & ~7) == 0x58
-          || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))
+      /* pop, popf, popa.   */
+      if (strcmp (i.tm.name, "pop") == 0
+          || i.tm.base_opcode == 0x9d
+          || i.tm.base_opcode == 0x61)
         return 1;

       /* movs, cmps, lods, scas.  */
       if ((i.tm.base_opcode | 0xb) == 0xaf)
         return 1;

-      /* outs */
-      if (base_opcode == 0x6f)
+      /* outs, xlatb.  */
+      if (base_opcode == 0x6f
+          || i.tm.base_opcode == 0xd7)
         return 1;
+      /* NB: For AMD-specific insns with implicit memory operands,
+         they're intentionally not covered.  */
     }

   /* No memory operand.  */
@@ -4506,6 +4522,31 @@ insert_lfence_after (void)
 {
   if (lfence_after_load && load_insn_p ())
     {
+      /* Insert lfence after rep cmps/scas only under
+         -mlfence-after-load=all.  */
+      /* There are also two REP string instructions that require
+         special treatment. Specifically, the compare string (CMPS)
+         and scan string (SCAS) instructions set EFLAGS in a manner
+         that depends on the data being compared/scanned. When used
+         with a REP prefix, the number of iterations may therefore
+         vary depending on this data. If the data is a program secret
+         chosen by the adversary using an LVI method,
+         then this data-dependent behavior may leak some aspect
+         of the secret.  */
+      if (((i.tm.base_opcode | 0x1) == 0xa7
+           || (i.tm.base_opcode | 0x1) == 0xaf)
+          && i.prefix[REP_PREFIX])
+        {
+          if (lfence_after_load == lfence_load_general)
+            {
+              as_warn (_("`%s` skips -mlfence-after-load=general"),
+                       i.tm.name);
+              return;
+            }
+          else
+            as_warn (_("`%s` changes flags which would affect control
flow behavior"),
+                     i.tm.name);
+        }
       char *p = frag_more (3);
       *p++ = 0xf;
       *p++ = 0xae;
@@ -4536,8 +4577,8 @@ insert_lfence_before (void)

       if (i.reg_operands == 1)
         {
-          /* Indirect branch via register.  Don't insert lfence with
-             -mlfence-after-load=yes.  */
+          /* Indirect branch via register. Don't insert lfence with
+             -mlfence-after-load={general,all}.  */
           if (lfence_after_load
               || lfence_before_indirect_branch == lfence_branch_memory)
             return;
@@ -4568,12 +4609,13 @@ insert_lfence_before (void)
       return;
     }

-  /* Output or/not and lfence before ret.  */
+  /* Output or/not/shl and lfence before ret/lret/iret.  */
   if (lfence_before_ret != lfence_before_ret_none
       && (i.tm.base_opcode == 0xc2
           || i.tm.base_opcode == 0xc3
           || i.tm.base_opcode == 0xca
-          || i.tm.base_opcode == 0xcb))
+          || i.tm.base_opcode == 0xcb
+          || i.tm.base_opcode == 0xcf))
     {
       if (last_insn.kind != last_insn_other
           && last_insn.seg == now_seg)
@@ -4583,33 +4625,59 @@ insert_lfence_before (void)
                          last_insn.name, i.tm.name);
           return;
         }
-      if (lfence_before_ret == lfence_before_ret_or)
-        {
-          /* orl: 0x830c2400.  */
-          p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
-          *p++ = 0x83;
-          *p++ = 0xc;
-          *p++ = 0x24;
-          *p++ = 0x0;
-        }
+
+      bfd_boolean lret = (i.tm.base_opcode | 0x1) == 0xcb;
+      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;
+      char prefix = 0x0;
+      /* Default operand size for far return is 32 bits,
+         64 bits for near return.  */
+      if (has_rexw)
+        prefix = 0x48;
       else
+        prefix = i.prefix[DATA_PREFIX]
+                 ? 0x66
+                 : !lret && flag_code == CODE_64BIT ? 0x48 : 0x0;
+
+      if (lfence_before_ret == lfence_before_ret_not)
         {
-          p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);
-          /* notl: 0xf71424.  */
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
+          /* not: 0xf71424, may add prefix
+             for operand size overwrite or 64-bit code.  */
+          p = frag_more ((prefix ? 2 : 0) + 6 + 3);
+          if (prefix)
+            *p++ = prefix;
           *p++ = 0xf7;
           *p++ = 0x14;
           *p++ = 0x24;
-          /* notl: 0xf71424.  */
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
+          if (prefix)
+            *p++ = prefix;
           *p++ = 0xf7;
           *p++ = 0x14;
           *p++ = 0x24;
         }
+      else
+        {
+          p = frag_more ((prefix ? 1 : 0) + 4 + 3);
+          if (prefix)
+            *p++ = prefix;
+          if (lfence_before_ret == lfence_before_ret_or)
+            {
+              /* or: 0x830c2400, may add prefix
+                 for operand size overwrite or 64-bit code.  */
+              *p++ = 0x83;
+              *p++ = 0x0c;
+            }
+          else
+            {
+              /* shl: 0xc1242400, may add prefix
+                 for operand size overwrite or 64-bit code.  */
+              *p++ = 0xc1;
+              *p++ = 0x24;
+            }
+
+          *p++ = 0x24;
+          *p++ = 0x0;
+        }
+
       *p++ = 0xf;
       *p++ = 0xae;
       *p = 0xe8;
@@ -12985,17 +13053,23 @@ md_parse_option (int c, const char *arg)
       break;

     case OPTION_MLFENCE_AFTER_LOAD:
-      if (strcasecmp (arg, "yes") == 0)
-        lfence_after_load = 1;
-      else if (strcasecmp (arg, "no") == 0)
-        lfence_after_load = 0;
+      if (strcasecmp (arg, "general") == 0)
+        lfence_after_load = lfence_load_general;
+      else if (strcasecmp (arg, "all") == 0)
+        lfence_after_load = lfence_load_all;
+      else if (strcasecmp (arg, "none") == 0)
+        lfence_after_load = lfence_load_none;
       else
         as_fatal (_("invalid -mlfence-after-load= option: `%s'"), arg);
       break;

     case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:
       if (strcasecmp (arg, "all") == 0)
-        lfence_before_indirect_branch = lfence_branch_all;
+        {
+          lfence_before_indirect_branch = lfence_branch_all;
+          if (lfence_before_ret == lfence_before_ret_none)
+            lfence_before_ret = lfence_before_ret_shl;
+        }
       else if (strcasecmp (arg, "memory") == 0)
         lfence_before_indirect_branch = lfence_branch_memory;
       else if (strcasecmp (arg, "register") == 0)
@@ -13012,6 +13086,8 @@ md_parse_option (int c, const char *arg)
         lfence_before_ret = lfence_before_ret_or;
       else if (strcasecmp (arg, "not") == 0)
         lfence_before_ret = lfence_before_ret_not;
+      else if (strcasecmp (arg, "shl") == 0 || strcasecmp (arg, "yes") == 0)
+        lfence_before_ret = lfence_before_ret_shl;
       else if (strcasecmp (arg, "none") == 0)
         lfence_before_ret = lfence_before_ret_none;
       else
@@ -13376,13 +13452,13 @@ md_show_usage (FILE *stream)
   -mbranches-within-32B-boundaries\n\
                           align branches within 32 byte boundary\n"));
   fprintf (stream, _("\
-  -mlfence-after-load=[no|yes] (default: no)\n\
+  -mlfence-after-load=[none|general|all] (default: none)\n\
                           generate lfence after load\n"));
   fprintf (stream, _("\
   -mlfence-before-indirect-branch=[none|all|register|memory] (default: none)\n\
                           generate lfence before indirect near branch\n"));
   fprintf (stream, _("\
-  -mlfence-before-ret=[none|or|not] (default: none)\n\
+  -mlfence-before-ret=[none|or|not|shl|yes] (default: none)\n\
                           generate lfence before ret\n"));
   fprintf (stream, _("\
   -mamd64                 accept only AMD64 ISA [default]\n"));
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 628fb1ad5a..19a4bf874e 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -470,12 +470,15 @@ The default doesn't align branches.

 @cindex @samp{-mlfence-after-load=} option, i386
 @cindex @samp{-mlfence-after-load=} option, x86-64
-@item -mlfence-after-load=@var{no}
-@itemx -mlfence-after-load=@var{yes}
+@item -mlfence-after-load=@var{none}
+@item -mlfence-after-load=@var{general}
+@itemx -mlfence-after-load=@var{all}
 These options control whether the assembler should generate lfence
-after load instructions.  @option{-mlfence-after-load=@var{yes}} will
-generate lfence.  @option{-mlfence-after-load=@var{no}} will not generate
-lfence, which is the default.
+after load instructions.  @option{-mlfence-after-load=@var{all}} will
+generate lfence for all load instructions,
+@option{-mlfence-after-load=@var{general}}will generate lfence for all
+load instruction except rep cmps/scas, @option{-mlfence-after-load=@var{none}}
+will not generate lfence, which is the default.

 @cindex @samp{-mlfence-before-indirect-branch=} option, i386
 @cindex @samp{-mlfence-before-indirect-branch=} option, x86-64
@@ -488,28 +491,32 @@ before indirect near branch instructions.
 @option{-mlfence-before-indirect-branch=@var{all}} will generate lfence
 before indirect near branch via register and issue a warning before
 indirect near branch via memory.
+It also implicitly sets @option{-mlfence-before-ret=@var{shl}} when
+there's no explict @option{-mlfence-before-ret=}.
 @option{-mlfence-before-indirect-branch=@var{register}} will generate
 lfence before indirect near branch via register.
 @option{-mlfence-before-indirect-branch=@var{memory}} will issue a
 warning before indirect near branch via memory.
 @option{-mlfence-before-indirect-branch=@var{none}} will not generate
-lfence nor issue warning, which is the default.  Note that lfence won't
-be generated before indirect near branch via register with
-@option{-mlfence-after-load=@var{yes}} since lfence will be generated
+lfence nor issue warning, which is the default.  Note that lfence will
+generate before indirect near branch via register only with
+@option{-mlfence-after-load=@var{none}} since lfence will be generated
 after loading branch target register.

 @cindex @samp{-mlfence-before-ret=} option, i386
 @cindex @samp{-mlfence-before-ret=} option, x86-64
 @item -mlfence-before-ret=@var{none}
+@item -mlfence-before-ret=@var{shl}
 @item -mlfence-before-ret=@var{or}
+@item -mlfence-before-ret=@var{yes}
 @itemx -mlfence-before-ret=@var{not}
 These options control whether the assembler should generate lfence
 before ret.  @option{-mlfence-before-ret=@var{or}} will generate
 generate or instruction with lfence.
-@option{-mlfence-before-ret=@var{not}} will generate not instruction
-with lfence.
-@option{-mlfence-before-ret=@var{none}} will not generate lfence,
-which is the default.
+@option{-mlfence-before-ret=@var{shl/yes}} will generate shl instruction
+with lfence. @option{-mlfence-before-ret=@var{not}} will generate not
+instruction with lfence. @option{-mlfence-before-ret=@var{none}} will not
+generate lfence, which is the default.

 @cindex @samp{-mx86-used-note=} option, i386
 @cindex @samp{-mx86-used-note=} option, x86-64
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 9dacc11906..bb3897b9ad 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -530,11 +530,14 @@ if [expr ([istarget "i*86-*-*"] ||  [istarget
"x86_64-*-*"]) && [gas_32_check]]
     run_dump_test "align-branch-8"
     run_dump_test "align-branch-9"
     run_dump_test "lfence-load"
+    run_dump_test "lfence-load-b"
     run_dump_test "lfence-indbr-a"
     run_dump_test "lfence-indbr-b"
     run_dump_test "lfence-indbr-c"
     run_dump_test "lfence-ret-a"
     run_dump_test "lfence-ret-b"
+    run_dump_test "lfence-ret-c"
+    run_dump_test "lfence-ret-d"
     run_dump_test "lfence-byte"

     # These tests require support for 8 and 16 bit relocs,
@@ -1117,11 +1120,15 @@ if [expr ([istarget "i*86-*-*"] || [istarget
"x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-align-branch-8"
     run_dump_test "x86-64-align-branch-9"
     run_dump_test "x86-64-lfence-load"
+    run_dump_test "x86-64-lfence-load-b"
     run_dump_test "x86-64-lfence-indbr-a"
     run_dump_test "x86-64-lfence-indbr-b"
     run_dump_test "x86-64-lfence-indbr-c"
     run_dump_test "x86-64-lfence-ret-a"
     run_dump_test "x86-64-lfence-ret-b"
+    run_dump_test "x86-64-lfence-ret-c"
+    run_dump_test "x86-64-lfence-ret-d"
+    run_dump_test "x86-64-lfence-ret-e"
     run_dump_test "x86-64-lfence-byte"

     if { ![istarget "*-*-aix*"]
diff --git a/gas/testsuite/gas/i386/lfence-load-b.d
b/gas/testsuite/gas/i386/lfence-load-b.d
new file mode 100644
index 0000000000..b4f7bc0f19
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load-b.d
@@ -0,0 +1,137 @@
+#source: lfence-load.s
+#as: -mlfence-after-load=general
+#objdump: -dw
+#warning_output: lfence-load-b.e
+#name: lfence-load-b
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: c5 f8 ae 55 00        vldmxcsr 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 55 00          lgdtl  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%ebp\),%edx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 7d 00          invlpg 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%ebp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%ebp\)
+ +[a-f0-9]+: 1f                    pop    %ds
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popf
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 61                    popa
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d9 55 00              fsts   0x0\(%ebp\)
+ +[a-f0-9]+: d9 45 00              flds   0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: db 55 00              fistl  0x0\(%ebp\)
+ +[a-f0-9]+: df 55 00              fists  0x0\(%ebp\)
+ +[a-f0-9]+: db 45 00              fildl  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b dd 75 00          fsave  0x0\(%ebp\)
+ +[a-f0-9]+: dd 65 00              frstor 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 4d 00              fisttps 0x0\(%ebp\)
+ +[a-f0-9]+: d9 65 00              fldenv 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b d9 75 00          fstenv 0x0\(%ebp\)
+ +[a-f0-9]+: d8 45 00              fadds  0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 04 24              fadds  \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 c3                fadd   %st\(3\),%st
+ +[a-f0-9]+: d8 01                fadds  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 01                filds  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 11                fists  \(%ecx\)
+ +[a-f0-9]+: 0f ae 29              xrstor \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 18 01              prefetchnta \(%ecx\)
+ +[a-f0-9]+: 0f c7 09              cmpxchg8b \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 41                    inc    %ecx
+ +[a-f0-9]+: 0f 01 10              lgdtl  \(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 0f 66 02 b0        pfcmpeq 0x2\(%esi\),%mm4
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 8f 00                popl   \(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 58                    pop    %eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 d1 11              rclw   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 01 01 00 00 00    testl  \$0x1,\(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ff 01                incl   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 11                notl   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 31                divl   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 21                mull   \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 39                idivl  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 29                imull  \(%ecx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 8d 04 40              lea    \(%eax,%eax,2\),%eax
+ +[a-f0-9]+: c9                    leave
+ +[a-f0-9]+: 6e                    outsb  %ds:\(%esi\),\(%dx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ac                    lods   %ds:\(%esi\),%al
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 a5                rep movsl %ds:\(%esi\),%es:\(%edi\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 af                repz scas %es:\(%edi\),%eax
+ +[a-f0-9]+: f3 a7                repz cmpsl %es:\(%edi\),%ds:\(%esi\)
+ +[a-f0-9]+: f3 ad                rep lods %ds:\(%esi\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 83 00 01              addl   \$0x1,\(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f ba 20 01          btl    \$0x1,\(%eax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c1 03              xadd   %eax,\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c1 c3              xadd   %eax,%ebx
+ +[a-f0-9]+: 87 03                xchg   %eax,\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 93                    xchg   %eax,%ebx
+ +[a-f0-9]+: 39 45 40              cmp    %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 3b 45 40              cmp    0x40\(%ebp\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 01 45 40              add    %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 03 00                add    \(%eax\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 85 45 40              test   %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 85 45 40              test   %eax,0x40\(%ebp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-load-b.e
b/gas/testsuite/gas/i386/lfence-load-b.e
new file mode 100644
index 0000000000..c394e02296
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load-b.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:??: Warning: `scas` skips -mlfence-after-load=general
+.*:??: Warning: `cmps` skips -mlfence-after-load=general
\ No newline at end of file
diff --git a/gas/testsuite/gas/i386/lfence-load.d
b/gas/testsuite/gas/i386/lfence-load.d
index cd7e7f76df..273e302f38 100644
--- a/gas/testsuite/gas/i386/lfence-load.d
+++ b/gas/testsuite/gas/i386/lfence-load.d
@@ -1,6 +1,7 @@
-#as: -mlfence-after-load=yes
+#as: -mlfence-after-load=all
 #objdump: -dw
-#name: -mlfence-after-load=yes
+#warning_output: lfence-load.e
+#name: -mlfence-after-load=all

 .*: +file format .*

@@ -15,6 +16,31 @@ Disassembly of section .text:
  +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%ebp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%ebp\),%edx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 7d 00          invlpg 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%ebp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%ebp\)
+ +[a-f0-9]+: 1f                    pop    %ds
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popf
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 61                    popa
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: d9 55 00              fsts   0x0\(%ebp\)
  +[a-f0-9]+: d9 45 00              flds   0x0\(%ebp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/lfence-load.e
b/gas/testsuite/gas/i386/lfence-load.e
new file mode 100644
index 0000000000..1ee49da7fd
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:??: Warning: `scas` changes flags which would affect control flow behavior
+.*:??: Warning: `cmps` changes flags which would affect control flow behavior
diff --git a/gas/testsuite/gas/i386/lfence-load.s
b/gas/testsuite/gas/i386/lfence-load.s
index b417ac644e..4b4aa1610b 100644
--- a/gas/testsuite/gas/i386/lfence-load.s
+++ b/gas/testsuite/gas/i386/lfence-load.s
@@ -4,6 +4,26 @@ _start:
  lgdt (%ebp)
  vmptrld (%ebp)
  vmclear (%ebp)
+ invpcid (%ebp), %edx
+ invlpg (%ebp)
+ clflush (%ebp)
+ clflushopt (%ebp)
+ clwb (%ebp)
+ cldemote (%ebp)
+ bndmk (%ebp), %bnd1
+ bndcl (%ebp), %bnd1
+ bndcu (%ebp), %bnd1
+ bndcn (%ebp), %bnd1
+ bndstx %bnd1, (%ebp)
+ bndldx (%ebp), %bnd1
+ prefetcht0 (%ebp)
+ prefetcht1 (%ebp)
+ prefetcht2 (%ebp)
+ prefetchw (%ebp)
+ pop %ds
+ popf
+ popa
+ xlatb (%ebx)
  fsts (%ebp)
  flds (%ebp)
  fistl (%ebp)
diff --git a/gas/testsuite/gas/i386/lfence-ret-a.d
b/gas/testsuite/gas/i386/lfence-ret-a.d
index 719cf1b472..aa35857664 100644
--- a/gas/testsuite/gas/i386/lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/lfence-ret-a.d
@@ -9,10 +9,28 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c3                    ret
  +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/lfence-ret-b.d
b/gas/testsuite/gas/i386/lfence-ret-b.d
index e3914b9c28..77001c425e 100644
--- a/gas/testsuite/gas/i386/lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/lfence-ret-b.d
@@ -9,6 +9,14 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
@@ -17,4 +25,20 @@ Disassembly of section .text:
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: f7 14 24              notl   \(%esp\)
+ +[a-f0-9]+: f7 14 24              notl   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: f7 14 24              notl   \(%esp\)
+ +[a-f0-9]+: f7 14 24              notl   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/lfence-ret-c.d
b/gas/testsuite/gas/i386/lfence-ret-c.d
new file mode 100644
index 0000000000..fceb0eb182
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-c.d
@@ -0,0 +1,35 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    ret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret-d.d
b/gas/testsuite/gas/i386/lfence-ret-d.d
new file mode 100644
index 0000000000..03f8f88fd7
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-d.d
@@ -0,0 +1,36 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=shl
+#objdump: -dw
+#name: -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    ret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret.s
b/gas/testsuite/gas/i386/lfence-ret.s
index 35c4e6eeaa..f27fa5839e 100644
--- a/gas/testsuite/gas/i386/lfence-ret.s
+++ b/gas/testsuite/gas/i386/lfence-ret.s
@@ -1,4 +1,10 @@
  .text
 _start:
+ retw
+ retw $20
  ret
  ret $30
+ lretw
+ lretw $40
+ lret
+ lret $40
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
b/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
new file mode 100644
index 0000000000..b1fd3cad42
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load-b.d
@@ -0,0 +1,137 @@
+#source: x86-64-lfence-load.s
+#as: -mlfence-after-load=general
+#objdump: -dw
+#warning_output: lfence-load-b.e
+#name: x86-64 lfence-load-b
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: c5 f8 ae 55 00        vldmxcsr 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 55 00          lgdt   0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%rbp\),%rdx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 67 0f 01 38          invlpg \(%eax\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%rbp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%rbp\)
+ +[a-f0-9]+: 0f a1                popq   %fs
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popfq
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d9 55 00              fsts   0x0\(%rbp\)
+ +[a-f0-9]+: d9 45 00              flds   0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: db 55 00              fistl  0x0\(%rbp\)
+ +[a-f0-9]+: df 55 00              fists  0x0\(%rbp\)
+ +[a-f0-9]+: db 45 00              fildl  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b dd 75 00          fsave  0x0\(%rbp\)
+ +[a-f0-9]+: dd 65 00              frstor 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 45 00              filds  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 4d 00              fisttps 0x0\(%rbp\)
+ +[a-f0-9]+: d9 65 00              fldenv 0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9b d9 75 00          fstenv 0x0\(%rbp\)
+ +[a-f0-9]+: d8 45 00              fadds  0x0\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 04 24              fadds  \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d8 c3                fadd   %st\(3\),%st
+ +[a-f0-9]+: d8 01                fadds  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 01                filds  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: df 11                fists  \(%rcx\)
+ +[a-f0-9]+: 0f ae 29              xrstor \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 18 01              prefetchnta \(%rcx\)
+ +[a-f0-9]+: 0f c7 09              cmpxchg8b \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 0f c7 09          cmpxchg16b \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ff c1                inc    %ecx
+ +[a-f0-9]+: 0f 01 10              lgdt   \(%rax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 0f 66 02 b0        pfcmpeq 0x2\(%rsi\),%mm4
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 8f 00                popq   \(%rax\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 58                    pop    %rax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 d1 11              rclw   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 01 01 00 00 00    testl  \$0x1,\(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ff 01                incl   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 11                notl   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 31                divl   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 21                mull   \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 39                idivl  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f7 29                imull  \(%rcx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 8d 04 40          lea    \(%rax,%rax,2\),%rax
+ +[a-f0-9]+: c9                    leaveq
+ +[a-f0-9]+: 6e                    outsb  %ds:\(%rsi\),\(%dx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ac                    lods   %ds:\(%rsi\),%al
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 a5                rep movsl %ds:\(%rsi\),%es:\(%rdi\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: f3 af                repz scas %es:\(%rdi\),%eax
+ +[a-f0-9]+: f3 a7                repz cmpsl %es:\(%rdi\),%ds:\(%rsi\)
+ +[a-f0-9]+: f3 ad                rep lods %ds:\(%rsi\),%eax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 41 83 03 01          addl   \$0x1,\(%r11\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 41 0f ba 23 01        btl    \$0x1,\(%r11\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 0f c1 03          xadd   %rax,\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 0f c1 c3          xadd   %rax,%rbx
+ +[a-f0-9]+: 48 87 03              xchg   %rax,\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 93                xchg   %rax,%rbx
+ +[a-f0-9]+: 48 39 45 40          cmp    %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 3b 45 40          cmp    0x40\(%rbp\),%rax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 01 45 40          add    %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 03 00              add    \(%rax\),%rax
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 85 45 40          test   %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 85 45 40          test   %rax,0x40\(%rbp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.d
b/gas/testsuite/gas/i386/x86-64-lfence-load.d
index 4f6cd00edf..f21aba85d5 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
@@ -1,6 +1,7 @@
-#as: -mlfence-after-load=yes
+#as: -mlfence-after-load=all
 #objdump: -dw
-#name: x86-64 -mlfence-after-load=yes
+#warning_output: lfence-load.e
+#name: x86-64 -mlfence-after-load=all

 .*: +file format .*

@@ -15,6 +16,29 @@ Disassembly of section .text:
  +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%rbp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%rbp\),%rdx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 67 0f 01 38          invlpg \(%eax\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%rbp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%rbp\)
+ +[a-f0-9]+: 0f a1                popq   %fs
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popfq
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: d9 55 00              fsts   0x0\(%rbp\)
  +[a-f0-9]+: d9 45 00              flds   0x0\(%rbp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.s
b/gas/testsuite/gas/i386/x86-64-lfence-load.s
index 76d0886617..2a3ac6b7d2 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.s
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s
@@ -4,6 +4,25 @@ _start:
  lgdt (%rbp)
  vmptrld (%rbp)
  vmclear (%rbp)
+ invpcid (%rbp), %rdx
+ invlpg (%eax)
+ clflush (%rbp)
+ clflushopt (%rbp)
+ clwb (%rbp)
+ cldemote (%rbp)
+ bndmk (%rbp), %bnd1
+ bndcl (%rbp), %bnd1
+ bndcu (%rbp), %bnd1
+ bndcn (%rbp), %bnd1
+ bndstx %bnd1, (%rbp)
+ bndldx (%rbp), %bnd1
+ prefetcht0 (%rbp)
+ prefetcht1 (%rbp)
+ prefetcht2 (%rbp)
+ prefetchw (%rbp)
+ pop %fs
+ popf
+ xlatb (%rbx)
  fsts (%rbp)
  flds (%rbp)
  fistl (%rbp)
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
index 26e5b48bec..d8e6fa059d 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
@@ -1,5 +1,6 @@
-#source: lfence-ret.s
+#source: x86-64-lfence-ret.s
 #as: -mlfence-before-ret=or
+#warning_output: x86-64-lfence-ret.e
 #objdump: -dw
 #name: x86-64 -mlfence-before-ret=or

@@ -9,10 +10,40 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c3                    retq
  +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
index 340488831d..e9bb64fe94 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
@@ -1,5 +1,6 @@
-#source: lfence-ret.s
+#source: x86-64-lfence-ret.s
 #as: -mlfence-before-ret=not
+#warning_output: x86-64-lfence-ret.e
 #objdump: -dw
 #name: x86-64 -mlfence-before-ret=not

@@ -9,6 +10,14 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
@@ -17,4 +26,36 @@ Disassembly of section .text:
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: f7 14 24              notl   \(%rsp\)
+ +[a-f0-9]+: f7 14 24              notl   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: f7 14 24              notl   \(%rsp\)
+ +[a-f0-9]+: f7 14 24              notl   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
new file mode 100644
index 0000000000..d5027d385f
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
@@ -0,0 +1,48 @@
+#source: x86-64-lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#warning_output: x86-64-lfence-ret.e
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
new file mode 100644
index 0000000000..533445fee6
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
@@ -0,0 +1,49 @@
+#source: x86-64-lfence-ret.s
+#as: -mlfence-before-ret=shl
+#warning_output: x86-64-lfence-ret.e
+#objdump: -dw
+#name: x86-64 -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
new file mode 100644
index 0000000000..646b352a62
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
@@ -0,0 +1,49 @@
+#source: x86-64-lfence-ret.s
+#as: -mlfence-before-ret=shl
+#warning_output: x86-64-lfence-ret.e
+#objdump: -dw
+#name: x86-64 -mlfence-before-ret=yes
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret.e
b/gas/testsuite/gas/i386/x86-64-lfence-ret.e
new file mode 100644
index 0000000000..13730e50e6
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:??: Warning: no instruction mnemonic suffix given and no register
operands; using default for `lret'
+.*:??: Warning: no instruction mnemonic suffix given and no register
operands; using default for `lret'
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret.s
b/gas/testsuite/gas/i386/x86-64-lfence-ret.s
new file mode 100644
index 0000000000..986239c222
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret.s
@@ -0,0 +1,14 @@
+ .text
+_start:
+ retw
+ retw $20
+ ret
+ ret $30
+ data16 rex.w ret
+ data16 rex.w ret $40
+ lretw
+ lretw $40
+ lret
+ lret $40
+ lretq
+ lretq $40
-- 
2.18.1
> > +      if (lfence_before_ret == lfence_before_ret_not)

> >   {

> > -   p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);

> > -   /* notl: 0xf71424.  */

> > -   if (flag_code == CODE_64BIT)

> > -     *p++ = 0x48;

> > +   /* notl: 0xf71424, may add prefix

> > +      for operand size overwrite or 64-bit code.  */

>

> Despite the comment extension you still say "notl". Please switch

> toi either just "not" or something like "not{w,l,q}". Also

> s/overwrite/override/. Note how you ...

>

> > +   p = frag_more ((prefix ? 2 : 0) + 6 + 3);

> > +   if (prefix)

> > +     *p++ = prefix;

> >     *p++ = 0xf7;

> >     *p++ = 0x14;

> >     *p++ = 0x24;

> > -   /* notl: 0xf71424.  */

> > -   if (flag_code == CODE_64BIT)

> > -     *p++ = 0x48;

> > +   if (prefix)

> > +     *p++ = prefix;

> >     *p++ = 0xf7;

> >     *p++ = 0x14;

> >     *p++ = 0x24;

> >   }

> > +      else

> > + {

> > +   p = frag_more ((prefix ? 1 : 0) + 4 + 3);

> > +   if (prefix)

> > +     *p++ = prefix;

> > +   if (lfence_before_ret == lfence_before_ret_or)

> > +     {

> > +       /* orl: 0x830c2400, may add prefix

> > + for operand size overwrite or 64-bit code.  */

>

> ... also have the same (bogus) suffixe here, but ...

>

> > +       *p++ = 0x83;

> > +       *p++ = 0x0c;

> > +     }

> > +   else

> > +     {

> > +       /* shl: 0xc1242400, may add prefix

> > + for operand size overwrite or 64-bit code.  */

>

> ... not here.

>

> Jan




--
BR,
Hongtao
Jan Beulich April 22, 2020, 8:47 a.m. | #14
On 22.04.2020 05:33, Hongtao Liu wrote:
> On Tue, Apr 21, 2020 at 2:30 PM Jan Beulich <jbeulich@suse.com> wrote:

>>

>> On 21.04.2020 04:24, Hongtao Liu wrote:

>>> On Mon, Apr 20, 2020 at 3:34 PM Jan Beulich <jbeulich@suse.com> wrote:

>>>>

>>>> On 20.04.2020 09:20, Hongtao Liu wrote:

>>>>> On Thu, Apr 16, 2020 at 4:33 PM Jan Beulich <jbeulich@suse.com> wrote:

>>>>>> On 16.04.2020 07:34, Hongtao Liu wrote:

>>>>>>> @@ -4506,6 +4520,22 @@ insert_lfence_after (void)

>>>>>>> {

>>>>>>>   if (lfence_after_load && load_insn_p ())

>>>>>>>     {

>>>>>>> +      /* Insert lfence after rep cmps/scas only under

>>>>>>> +       -mlfence-after-load=all.  */

>>>>>>> +      if (((i.tm.base_opcode | 0x1) == 0xa7

>>>>>>> +         || (i.tm.base_opcode | 0x1) == 0xaf)

>>>>>>> +        && i.prefix[REP_PREFIX])

>>>>>>

>>>>>> I'm afraid I don't understand why the REP forms need treating

>>>>>> differently from the non-REP ones of the same insns.

>>>>>>

>>>>>

>>>>> Not all REP forms, just REP CMPS/SCAS which would change EFLAGS.

>>>>

>>>> Well, of course just the two. But this doesn't answer my question

>>>> as to why there is such a special case.

>>>>

>>>

>>> There are also two REP string instructions that require special

>>> treatment. Specifically, the compare string (CMPS) and scan string

>>> (SCAS) instructions set EFLAGS in a manner that depends on the data

>>> being compared/scanned. When used with a REP prefix, the number of

>>> iterations may therefore vary depending on this data. If the data is a

>>> program secret chosen by the adversary using an LVI method, then this

>>> data-dependent behavior may leak some aspect of the secret. The

>>> solution is to unfold any REP CMPS and REP SCAS operations into a loop

>>> and insert an LFENCE after the CMPS/SCAS instruction. For example,

>>> REPNZ SCAS can be unfolded to:

>>>

>>> .RepLoop:

>>>   JRCXZ .ExitRepLoop

>>>   DEC rcx  # or ecx if the REPNZ SCAS uses a 32-bit address size

>>>   SCAS

>>>   LFENCE

>>>   JNZ .RepLoop

>>> .ExitRepLoop:

>>>   ...

>>>

>>> The request i get is to add options to handle or not handle REP

>>> CMPS/SCAS also plus issue a warning.

>>

>> But you don't handle them as per what you've written above, afaics.

>> Am I overlooking anything?

> 

> Well, that solution is not meant for gas, i put them here for

> convienence of understanding of why we need to handle REP CMPS/SCAS

> specially.


And how is it better then to issue a warning and leave the code
alone over still at least inserting an LFENCE after the insn? I.e.
I'm not sure I see the value of the separate "general" and "all"
sub-options then.

As to it not being meant for gas - why is that?

> @@ -4568,12 +4609,13 @@ insert_lfence_before (void)

>        return;

>      }

> 

> -  /* Output or/not and lfence before ret.  */

> +  /* Output or/not/shl and lfence before ret/lret/iret.  */

>    if (lfence_before_ret != lfence_before_ret_none

>        && (i.tm.base_opcode == 0xc2

>            || i.tm.base_opcode == 0xc3

>            || i.tm.base_opcode == 0xca

> -          || i.tm.base_opcode == 0xcb))

> +          || i.tm.base_opcode == 0xcb

> +          || i.tm.base_opcode == 0xcf))

>      {

>        if (last_insn.kind != last_insn_other

>            && last_insn.seg == now_seg)

> @@ -4583,33 +4625,59 @@ insert_lfence_before (void)

>                           last_insn.name, i.tm.name);

>            return;

>          }

> -      if (lfence_before_ret == lfence_before_ret_or)

> -        {

> -          /* orl: 0x830c2400.  */

> -          p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

> -          if (flag_code == CODE_64BIT)

> -            *p++ = 0x48;

> -          *p++ = 0x83;

> -          *p++ = 0xc;

> -          *p++ = 0x24;

> -          *p++ = 0x0;

> -        }

> +

> +      bfd_boolean lret = (i.tm.base_opcode | 0x1) == 0xcb;


"(i.tm.base_opcode | 0x5) == 0xcf" or "(i.tm.base_opcode & 8)"
to also cover IRET.

> +      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;

> +      char prefix = 0x0;

> +      /* Default operand size for far return is 32 bits,

> +         64 bits for near return.  */

> +      if (has_rexw)

> +        prefix = 0x48;

>        else

> +        prefix = i.prefix[DATA_PREFIX]

> +                 ? 0x66

> +                 : !lret && flag_code == CODE_64BIT ? 0x48 : 0x0;


Aiui the workaround is specifically for Intel CPUs. Intel CPUs
ignore operand size overrides on near RET. (Sorry, I should
have pointed out this fact earlier already.)

> +      if (lfence_before_ret == lfence_before_ret_not)

>          {

> -          p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);

> -          /* notl: 0xf71424.  */

> -          if (flag_code == CODE_64BIT)

> -            *p++ = 0x48;

> +          /* not: 0xf71424, may add prefix

> +             for operand size overwrite or 64-bit code.  */


As said before - "override", not "overwrite" (there are several
instances to change).

Jan
Fangrui Song via Binutils April 23, 2020, 2:53 a.m. | #15
On Wed, Apr 22, 2020 at 4:47 PM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 22.04.2020 05:33, Hongtao Liu wrote:

> > On Tue, Apr 21, 2020 at 2:30 PM Jan Beulich <jbeulich@suse.com> wrote:

> >>

> >> On 21.04.2020 04:24, Hongtao Liu wrote:

> >>> On Mon, Apr 20, 2020 at 3:34 PM Jan Beulich <jbeulich@suse.com> wrote:

> >>>>

> >>>> On 20.04.2020 09:20, Hongtao Liu wrote:

> >>>>> On Thu, Apr 16, 2020 at 4:33 PM Jan Beulich <jbeulich@suse.com> wrote:

> >>>>>> On 16.04.2020 07:34, Hongtao Liu wrote:

> >>>>>>> @@ -4506,6 +4520,22 @@ insert_lfence_after (void)

> >>>>>>> {

> >>>>>>>   if (lfence_after_load && load_insn_p ())

> >>>>>>>     {

> >>>>>>> +      /* Insert lfence after rep cmps/scas only under

> >>>>>>> +       -mlfence-after-load=all.  */

> >>>>>>> +      if (((i.tm.base_opcode | 0x1) == 0xa7

> >>>>>>> +         || (i.tm.base_opcode | 0x1) == 0xaf)

> >>>>>>> +        && i.prefix[REP_PREFIX])

> >>>>>>

> >>>>>> I'm afraid I don't understand why the REP forms need treating

> >>>>>> differently from the non-REP ones of the same insns.

> >>>>>>

> >>>>>

> >>>>> Not all REP forms, just REP CMPS/SCAS which would change EFLAGS.

> >>>>

> >>>> Well, of course just the two. But this doesn't answer my question

> >>>> as to why there is such a special case.

> >>>>

> >>>

> >>> There are also two REP string instructions that require special

> >>> treatment. Specifically, the compare string (CMPS) and scan string

> >>> (SCAS) instructions set EFLAGS in a manner that depends on the data

> >>> being compared/scanned. When used with a REP prefix, the number of

> >>> iterations may therefore vary depending on this data. If the data is a

> >>> program secret chosen by the adversary using an LVI method, then this

> >>> data-dependent behavior may leak some aspect of the secret. The

> >>> solution is to unfold any REP CMPS and REP SCAS operations into a loop

> >>> and insert an LFENCE after the CMPS/SCAS instruction. For example,

> >>> REPNZ SCAS can be unfolded to:

> >>>

> >>> .RepLoop:

> >>>   JRCXZ .ExitRepLoop

> >>>   DEC rcx  # or ecx if the REPNZ SCAS uses a 32-bit address size

> >>>   SCAS

> >>>   LFENCE

> >>>   JNZ .RepLoop

> >>> .ExitRepLoop:

> >>>   ...

> >>>

> >>> The request i get is to add options to handle or not handle REP

> >>> CMPS/SCAS also plus issue a warning.

> >>

> >> But you don't handle them as per what you've written above, afaics.

> >> Am I overlooking anything?

> >

> > Well, that solution is not meant for gas, i put them here for

> > convienence of understanding of why we need to handle REP CMPS/SCAS

> > specially.

>

> And how is it better then to issue a warning and leave the code

> alone over still at least inserting an LFENCE after the insn? I.e.

> I'm not sure I see the value of the separate "general" and "all"

> sub-options then.


You're right, i'll revert sub-options for lfence_after_load, and only
issue a warning for REP CMPS/SCAS.

>

> As to it not being meant for gas - why is that?

>


And handle REP CMPS/SCAS stuff in another separate patch.

> > @@ -4568,12 +4609,13 @@ insert_lfence_before (void)

> >        return;

> >      }

> >

> > -  /* Output or/not and lfence before ret.  */

> > +  /* Output or/not/shl and lfence before ret/lret/iret.  */

> >    if (lfence_before_ret != lfence_before_ret_none

> >        && (i.tm.base_opcode == 0xc2

> >            || i.tm.base_opcode == 0xc3

> >            || i.tm.base_opcode == 0xca

> > -          || i.tm.base_opcode == 0xcb))

> > +          || i.tm.base_opcode == 0xcb

> > +          || i.tm.base_opcode == 0xcf))

> >      {

> >        if (last_insn.kind != last_insn_other

> >            && last_insn.seg == now_seg)

> > @@ -4583,33 +4625,59 @@ insert_lfence_before (void)

> >                           last_insn.name, i.tm.name);

> >            return;

> >          }

> > -      if (lfence_before_ret == lfence_before_ret_or)

> > -        {

> > -          /* orl: 0x830c2400.  */

> > -          p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

> > -          if (flag_code == CODE_64BIT)

> > -            *p++ = 0x48;

> > -          *p++ = 0x83;

> > -          *p++ = 0xc;

> > -          *p++ = 0x24;

> > -          *p++ = 0x0;

> > -        }

> > +

> > +      bfd_boolean lret = (i.tm.base_opcode | 0x1) == 0xcb;

>

> "(i.tm.base_opcode | 0x5) == 0xcf" or "(i.tm.base_opcode & 8)"

> to also cover IRET.

>


Changed.

> > +      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;

> > +      char prefix = 0x0;

> > +      /* Default operand size for far return is 32 bits,

> > +         64 bits for near return.  */

> > +      if (has_rexw)

> > +        prefix = 0x48;

> >        else

> > +        prefix = i.prefix[DATA_PREFIX]

> > +                 ? 0x66

> > +                 : !lret && flag_code == CODE_64BIT ? 0x48 : 0x0;

>

> Aiui the workaround is specifically for Intel CPUs. Intel CPUs

> ignore operand size overrides on near RET. (Sorry, I should

> have pointed out this fact earlier already.)


I don't quite understand your point, could you give a testcase to show that?

>

> > +      if (lfence_before_ret == lfence_before_ret_not)

> >          {

> > -          p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);

> > -          /* notl: 0xf71424.  */

> > -          if (flag_code == CODE_64BIT)

> > -            *p++ = 0x48;

> > +          /* not: 0xf71424, may add prefix

> > +             for operand size overwrite or 64-bit code.  */

>

> As said before - "override", not "overwrite" (there are several

> instances to change).


Changed.

>

> Jan


-- 
BR,
Hongtao
Jan Beulich April 23, 2020, 6:59 a.m. | #16
On 23.04.2020 04:53, Hongtao Liu wrote:
> On Wed, Apr 22, 2020 at 4:47 PM Jan Beulich <jbeulich@suse.com> wrote:

>> On 22.04.2020 05:33, Hongtao Liu wrote:

>>> +      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;

>>> +      char prefix = 0x0;

>>> +      /* Default operand size for far return is 32 bits,

>>> +         64 bits for near return.  */

>>> +      if (has_rexw)

>>> +        prefix = 0x48;

>>>        else

>>> +        prefix = i.prefix[DATA_PREFIX]

>>> +                 ? 0x66

>>> +                 : !lret && flag_code == CODE_64BIT ? 0x48 : 0x0;

>>

>> Aiui the workaround is specifically for Intel CPUs. Intel CPUs

>> ignore operand size overrides on near RET. (Sorry, I should

>> have pointed out this fact earlier already.)

> 

> I don't quite understand your point, could you give a testcase to show that?


Please see commit aeab2b26dbea. But of course creating a testcase
to try out is pretty easy - just encode RET with a 0x66 prefix
and observe the different behavior on Intel vs AMD systems.

Jan
Fangrui Song via Binutils April 23, 2020, 8:53 a.m. | #17
On Thu, Apr 23, 2020 at 2:59 PM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 23.04.2020 04:53, Hongtao Liu wrote:

> > On Wed, Apr 22, 2020 at 4:47 PM Jan Beulich <jbeulich@suse.com> wrote:

> >> On 22.04.2020 05:33, Hongtao Liu wrote:

> >>> +      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;

> >>> +      char prefix = 0x0;

> >>> +      /* Default operand size for far return is 32 bits,

> >>> +         64 bits for near return.  */

> >>> +      if (has_rexw)

> >>> +        prefix = 0x48;

> >>>        else

> >>> +        prefix = i.prefix[DATA_PREFIX]

> >>> +                 ? 0x66

> >>> +                 : !lret && flag_code == CODE_64BIT ? 0x48 : 0x0;

> >>

> >> Aiui the workaround is specifically for Intel CPUs. Intel CPUs

> >> ignore operand size overrides on near RET. (Sorry, I should

> >> have pointed out this fact earlier already.)

> >

> > I don't quite understand your point, could you give a testcase to show that?

>

> Please see commit aeab2b26dbea. But of course creating a testcase

> to try out is pretty easy - just encode RET with a 0x66 prefix

> and observe the different behavior on Intel vs AMD systems.


operand size for near ret under Cpu64 for Intel CPUs is always 64 bits, right?

>

> Jan




-- 
BR,
Hongtao
Jan Beulich April 23, 2020, 9:15 a.m. | #18
On 23.04.2020 10:53, Hongtao Liu wrote:
> On Thu, Apr 23, 2020 at 2:59 PM Jan Beulich <jbeulich@suse.com> wrote:

>>

>> On 23.04.2020 04:53, Hongtao Liu wrote:

>>> On Wed, Apr 22, 2020 at 4:47 PM Jan Beulich <jbeulich@suse.com> wrote:

>>>> On 22.04.2020 05:33, Hongtao Liu wrote:

>>>>> +      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;

>>>>> +      char prefix = 0x0;

>>>>> +      /* Default operand size for far return is 32 bits,

>>>>> +         64 bits for near return.  */

>>>>> +      if (has_rexw)

>>>>> +        prefix = 0x48;

>>>>>        else

>>>>> +        prefix = i.prefix[DATA_PREFIX]

>>>>> +                 ? 0x66

>>>>> +                 : !lret && flag_code == CODE_64BIT ? 0x48 : 0x0;

>>>>

>>>> Aiui the workaround is specifically for Intel CPUs. Intel CPUs

>>>> ignore operand size overrides on near RET. (Sorry, I should

>>>> have pointed out this fact earlier already.)

>>>

>>> I don't quite understand your point, could you give a testcase to show that?

>>

>> Please see commit aeab2b26dbea. But of course creating a testcase

>> to try out is pretty easy - just encode RET with a 0x66 prefix

>> and observe the different behavior on Intel vs AMD systems.

> 

> operand size for near ret under Cpu64 for Intel CPUs is always 64 bits, right?


Yes.

Jan
Fangrui Song via Binutils April 24, 2020, 5:30 a.m. | #19
On Thu, Apr 23, 2020 at 4:53 PM Hongtao Liu <crazylht@gmail.com> wrote:
>

> On Thu, Apr 23, 2020 at 2:59 PM Jan Beulich <jbeulich@suse.com> wrote:

> >

> > On 23.04.2020 04:53, Hongtao Liu wrote:

> > > On Wed, Apr 22, 2020 at 4:47 PM Jan Beulich <jbeulich@suse.com> wrote:

> > >> On 22.04.2020 05:33, Hongtao Liu wrote:

> > >>> +      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;

> > >>> +      char prefix = 0x0;

> > >>> +      /* Default operand size for far return is 32 bits,

> > >>> +         64 bits for near return.  */

> > >>> +      if (has_rexw)

> > >>> +        prefix = 0x48;

> > >>>        else

> > >>> +        prefix = i.prefix[DATA_PREFIX]

> > >>> +                 ? 0x66

> > >>> +                 : !lret && flag_code == CODE_64BIT ? 0x48 : 0x0;

> > >>

> > >> Aiui the workaround is specifically for Intel CPUs. Intel CPUs

> > >> ignore operand size overrides on near RET. (Sorry, I should

> > >> have pointed out this fact earlier already.)

> > >

> > > I don't quite understand your point, could you give a testcase to show that?

> >

> > Please see commit aeab2b26dbea. But of course creating a testcase

> > to try out is pretty easy - just encode RET with a 0x66 prefix

> > and observe the different behavior on Intel vs AMD systems.

>

> operand size for near ret under Cpu64 for Intel CPUs is always 64 bits, right?

>

> >

> > Jan

>

>


Change to

+      /* lret or iret.  */
+      bfd_boolean lret = (i.tm.base_opcode | 0x5) == 0xcf;
+      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;
+      char prefix = 0x0;
+      /* Default operand size for far return is 32 bits,
+         64 bits for near return.  */
+      /* Near ret ingore operand size override under CPU64.  */
+      if ((!lret && flag_code == CODE_64BIT) || has_rexw)
+        prefix = 0x48;
       else
+        prefix = i.prefix[DATA_PREFIX] ? 0x66 : 0x0;


>

> --

> BR,

> Hongtao


Update total patch:

From 7e30c4345b64bac7ad129f9da762af089f5f1cd4 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>

Date: Mon, 16 Mar 2020 11:03:12 +0800
Subject: [PATCH] Improve -mlfence-after-load

  1.Implict load for POP/POPF/POPA/XLATB, no load for Anysize insns
  2. Add -mlfence-before-ret=shl/yes, adjust operand size of
  or/not/shl according to ret's.
  3. Issue warning for REP CMPS/SCAS since they would affect control
  flow behavior.
  4. Adjust testcases and documents.

gas/Changelog:
        * config/tc-i386.c (lfence_before_ret_shl): New member.
        (load_insn_p): implict load for POP/POPA/POPF/XLATB, no load
        for Anysize insns.
        (insert_after_load): Issue warning for REP CMPS/SCAS.
        (insert_before_before): Handle iret, Handle
        -mlfence-before-ret=shl, Adjust operand size of or/not/shl to ret's,
        (md_parse_option): Change -mlfence-before-ret=[none|not|or] to
        -mlfence-before-ret=[none/not/or/shl/yes].
        Enable -mlfence-before-ret=shl when
        -mlfence-beofre-indirect-branch=all and no explict
-mlfence-before-ret option.
        (md_show_usage): Ditto.
        * doc/c-i386.texi: Ditto.
        * testsuite/gas/i386/i386.exp: Add new testcases.
        * testsuite/gas/i386/lfence-load-b.d: New.
        * testsuite/gas/i386/lfence-load-b.e: New.
        * testsuite/gas/i386/lfence-load.d: Modified.
        * testsuite/gas/i386/lfence-load.e: New.
        * testsuite/gas/i386/lfence-load.s: Modified.
        * testsuite/gas/i386/lfence-ret-a.d: Modified.
        * testsuite/gas/i386/lfence-ret-b.d: Modified.
        * testsuite/gas/i386/lfence-ret-c.d: New.
        * testsuite/gas/i386/lfence-ret-d.d: New.
        * testsuite/gas/i386/lfence-ret.s: Modified.
        * testsuite/gas/i386/x86-64-lfence-load-b.d: New.
        * testsuite/gas/i386/x86-64-lfence-load.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-load.s: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-a.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-b.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-c.d: New.
        * testsuite/gas/i386/x86-64-lfence-ret-d.d: New
        * testsuite/gas/i386/x86-64-lfence-ret-e.d: New.
        * testsuite/gas/i386/x86-64-lfence-ret.e: New.
        * testsuite/gas/i386/x86-64-lfence-ret.s: New.
---
 gas/config/tc-i386.c                         | 116 ++++++++++++++-----
 gas/doc/c-i386.texi                          |  12 +-
 gas/testsuite/gas/i386/i386.exp              |   5 +
 gas/testsuite/gas/i386/lfence-load.d         |  26 +++++
 gas/testsuite/gas/i386/lfence-load.e         |   3 +
 gas/testsuite/gas/i386/lfence-load.s         |  20 ++++
 gas/testsuite/gas/i386/lfence-ret-a.d        |  18 +++
 gas/testsuite/gas/i386/lfence-ret-b.d        |  24 ++++
 gas/testsuite/gas/i386/lfence-ret-c.d        |  35 ++++++
 gas/testsuite/gas/i386/lfence-ret-d.d        |  36 ++++++
 gas/testsuite/gas/i386/lfence-ret.s          |   6 +
 gas/testsuite/gas/i386/x86-64-lfence-load.d  |  24 ++++
 gas/testsuite/gas/i386/x86-64-lfence-load.s  |  19 +++
 gas/testsuite/gas/i386/x86-64-lfence-ret-a.d |  35 +++++-
 gas/testsuite/gas/i386/x86-64-lfence-ret-b.d |  45 ++++++-
 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d |  48 ++++++++
 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d |  49 ++++++++
 gas/testsuite/gas/i386/x86-64-lfence-ret-e.d |  49 ++++++++
 gas/testsuite/gas/i386/x86-64-lfence-ret.e   |   3 +
 gas/testsuite/gas/i386/x86-64-lfence-ret.s   |  14 +++
 20 files changed, 549 insertions(+), 38 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/lfence-load.e
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-d.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret.e
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 093497becd..59646392d2 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -647,7 +647,8 @@ static enum lfence_before_ret_kind
   {
     lfence_before_ret_none = 0,
     lfence_before_ret_not,
-    lfence_before_ret_or
+    lfence_before_ret_or,
+    lfence_before_ret_shl
   }
 lfence_before_ret;

@@ -4350,22 +4351,28 @@ load_insn_p (void)

   if (!any_vex_p)
     {
-      /* lea  */
-      if (i.tm.base_opcode == 0x8d)
+      /* Anysize insns: lea, invlpg, clflush, prefetchnta, prefetcht0,
+         prefetcht1, prefetcht2, prefetchtw, bndmk, bndcl, bndcu, bndcn,
+         bndstx, bndldx, prefetchwt1, clflushopt, clwb, cldemote.  */
+      if (i.tm.opcode_modifier.anysize)
         return 0;

-      /* pop  */
-      if ((i.tm.base_opcode & ~7) == 0x58
-          || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))
+      /* pop, popf, popa.   */
+      if (strcmp (i.tm.name, "pop") == 0
+          || i.tm.base_opcode == 0x9d
+          || i.tm.base_opcode == 0x61)
         return 1;

       /* movs, cmps, lods, scas.  */
       if ((i.tm.base_opcode | 0xb) == 0xaf)
         return 1;

-      /* outs */
-      if (base_opcode == 0x6f)
+      /* outs, xlatb.  */
+      if (base_opcode == 0x6f
+          || i.tm.base_opcode == 0xd7)
         return 1;
+      /* NB: For AMD-specific insns with implicit memory operands,
+         they're intentionally not covered.  */
     }

   /* No memory operand.  */
@@ -4506,6 +4513,22 @@ insert_lfence_after (void)
 {
   if (lfence_after_load && load_insn_p ())
     {
+      /* There are also two REP string instructions that require
+         special treatment. Specifically, the compare string (CMPS)
+         and scan string (SCAS) instructions set EFLAGS in a manner
+         that depends on the data being compared/scanned. When used
+         with a REP prefix, the number of iterations may therefore
+         vary depending on this data. If the data is a program secret
+         chosen by the adversary using an LVI method,
+         then this data-dependent behavior may leak some aspect
+         of the secret.  */
+      if (((i.tm.base_opcode | 0x1) == 0xa7
+           || (i.tm.base_opcode | 0x1) == 0xaf)
+          && i.prefix[REP_PREFIX])
+        {
+            as_warn (_("`%s` changes flags which would affect control
flow behavior"),
+                     i.tm.name);
+        }
       char *p = frag_more (3);
       *p++ = 0xf;
       *p++ = 0xae;
@@ -4568,12 +4591,13 @@ insert_lfence_before (void)
       return;
     }

-  /* Output or/not and lfence before ret.  */
+  /* Output or/not/shl and lfence before ret/lret/iret.  */
   if (lfence_before_ret != lfence_before_ret_none
       && (i.tm.base_opcode == 0xc2
           || i.tm.base_opcode == 0xc3
           || i.tm.base_opcode == 0xca
-          || i.tm.base_opcode == 0xcb))
+          || i.tm.base_opcode == 0xcb
+          || i.tm.base_opcode == 0xcf))
     {
       if (last_insn.kind != last_insn_other
           && last_insn.seg == now_seg)
@@ -4583,33 +4607,59 @@ insert_lfence_before (void)
                          last_insn.name, i.tm.name);
           return;
         }
-      if (lfence_before_ret == lfence_before_ret_or)
-        {
-          /* orl: 0x830c2400.  */
-          p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
-          *p++ = 0x83;
-          *p++ = 0xc;
-          *p++ = 0x24;
-          *p++ = 0x0;
-        }
+
+      /* lret or iret.  */
+      bfd_boolean lret = (i.tm.base_opcode | 0x5) == 0xcf;
+      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;
+      char prefix = 0x0;
+      /* Default operand size for far return is 32 bits,
+         64 bits for near return.  */
+      /* Near ret ingore operand size override under CPU64.  */
+      if ((!lret && flag_code == CODE_64BIT) || has_rexw)
+        prefix = 0x48;
       else
+        prefix = i.prefix[DATA_PREFIX] ? 0x66 : 0x0;
+
+      if (lfence_before_ret == lfence_before_ret_not)
         {
-          p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);
-          /* notl: 0xf71424.  */
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
+          /* not: 0xf71424, may add prefix
+             for operand size override or 64-bit code.  */
+          p = frag_more ((prefix ? 2 : 0) + 6 + 3);
+          if (prefix)
+            *p++ = prefix;
           *p++ = 0xf7;
           *p++ = 0x14;
           *p++ = 0x24;
-          /* notl: 0xf71424.  */
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
+          if (prefix)
+            *p++ = prefix;
           *p++ = 0xf7;
           *p++ = 0x14;
           *p++ = 0x24;
         }
+      else
+        {
+          p = frag_more ((prefix ? 1 : 0) + 4 + 3);
+          if (prefix)
+            *p++ = prefix;
+          if (lfence_before_ret == lfence_before_ret_or)
+            {
+              /* or: 0x830c2400, may add prefix
+                 for operand size override or 64-bit code.  */
+              *p++ = 0x83;
+              *p++ = 0x0c;
+            }
+          else
+            {
+              /* shl: 0xc1242400, may add prefix
+                 for operand size override or 64-bit code.  */
+              *p++ = 0xc1;
+              *p++ = 0x24;
+            }
+
+          *p++ = 0x24;
+          *p++ = 0x0;
+        }
+
       *p++ = 0xf;
       *p++ = 0xae;
       *p = 0xe8;
@@ -12995,7 +13045,11 @@ md_parse_option (int c, const char *arg)

     case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:
       if (strcasecmp (arg, "all") == 0)
-        lfence_before_indirect_branch = lfence_branch_all;
+        {
+          lfence_before_indirect_branch = lfence_branch_all;
+          if (lfence_before_ret == lfence_before_ret_none)
+            lfence_before_ret = lfence_before_ret_shl;
+        }
       else if (strcasecmp (arg, "memory") == 0)
         lfence_before_indirect_branch = lfence_branch_memory;
       else if (strcasecmp (arg, "register") == 0)
@@ -13012,6 +13066,8 @@ md_parse_option (int c, const char *arg)
         lfence_before_ret = lfence_before_ret_or;
       else if (strcasecmp (arg, "not") == 0)
         lfence_before_ret = lfence_before_ret_not;
+      else if (strcasecmp (arg, "shl") == 0 || strcasecmp (arg, "yes") == 0)
+        lfence_before_ret = lfence_before_ret_shl;
       else if (strcasecmp (arg, "none") == 0)
         lfence_before_ret = lfence_before_ret_none;
       else
@@ -13382,7 +13438,7 @@ md_show_usage (FILE *stream)
   -mlfence-before-indirect-branch=[none|all|register|memory] (default: none)\n\
                           generate lfence before indirect near branch\n"));
   fprintf (stream, _("\
-  -mlfence-before-ret=[none|or|not] (default: none)\n\
+  -mlfence-before-ret=[none|or|not|shl|yes] (default: none)\n\
                           generate lfence before ret\n"));
   fprintf (stream, _("\
   -mamd64                 accept only AMD64 ISA [default]\n"));
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 628fb1ad5a..4acece4394 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -488,6 +488,8 @@ before indirect near branch instructions.
 @option{-mlfence-before-indirect-branch=@var{all}} will generate lfence
 before indirect near branch via register and issue a warning before
 indirect near branch via memory.
+It also implicitly sets @option{-mlfence-before-ret=@var{shl}} when
+there's no explict @option{-mlfence-before-ret=}.
 @option{-mlfence-before-indirect-branch=@var{register}} will generate
 lfence before indirect near branch via register.
 @option{-mlfence-before-indirect-branch=@var{memory}} will issue a
@@ -501,15 +503,17 @@ after loading branch target register.
 @cindex @samp{-mlfence-before-ret=} option, i386
 @cindex @samp{-mlfence-before-ret=} option, x86-64
 @item -mlfence-before-ret=@var{none}
+@item -mlfence-before-ret=@var{shl}
 @item -mlfence-before-ret=@var{or}
+@item -mlfence-before-ret=@var{yes}
 @itemx -mlfence-before-ret=@var{not}
 These options control whether the assembler should generate lfence
 before ret.  @option{-mlfence-before-ret=@var{or}} will generate
 generate or instruction with lfence.
-@option{-mlfence-before-ret=@var{not}} will generate not instruction
-with lfence.
-@option{-mlfence-before-ret=@var{none}} will not generate lfence,
-which is the default.
+@option{-mlfence-before-ret=@var{shl/yes}} will generate shl instruction
+with lfence. @option{-mlfence-before-ret=@var{not}} will generate not
+instruction with lfence. @option{-mlfence-before-ret=@var{none}} will not
+generate lfence, which is the default.

 @cindex @samp{-mx86-used-note=} option, i386
 @cindex @samp{-mx86-used-note=} option, x86-64
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 9dacc11906..3bacb80178 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -535,6 +535,8 @@ if [expr ([istarget "i*86-*-*"] ||  [istarget
"x86_64-*-*"]) && [gas_32_check]]
     run_dump_test "lfence-indbr-c"
     run_dump_test "lfence-ret-a"
     run_dump_test "lfence-ret-b"
+    run_dump_test "lfence-ret-c"
+    run_dump_test "lfence-ret-d"
     run_dump_test "lfence-byte"

     # These tests require support for 8 and 16 bit relocs,
@@ -1122,6 +1124,9 @@ if [expr ([istarget "i*86-*-*"] || [istarget
"x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-lfence-indbr-c"
     run_dump_test "x86-64-lfence-ret-a"
     run_dump_test "x86-64-lfence-ret-b"
+    run_dump_test "x86-64-lfence-ret-c"
+    run_dump_test "x86-64-lfence-ret-d"
+    run_dump_test "x86-64-lfence-ret-e"
     run_dump_test "x86-64-lfence-byte"

     if { ![istarget "*-*-aix*"]
diff --git a/gas/testsuite/gas/i386/lfence-load.d
b/gas/testsuite/gas/i386/lfence-load.d
index cd7e7f76df..0d355df556 100644
--- a/gas/testsuite/gas/i386/lfence-load.d
+++ b/gas/testsuite/gas/i386/lfence-load.d
@@ -1,5 +1,6 @@
 #as: -mlfence-after-load=yes
 #objdump: -dw
+#warning_output: lfence-load.e
 #name: -mlfence-after-load=yes

 .*: +file format .*
@@ -15,6 +16,31 @@ Disassembly of section .text:
  +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%ebp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%ebp\),%edx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 7d 00          invlpg 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%ebp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%ebp\)
+ +[a-f0-9]+: 1f                    pop    %ds
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popf
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 61                    popa
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: d9 55 00              fsts   0x0\(%ebp\)
  +[a-f0-9]+: d9 45 00              flds   0x0\(%ebp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/lfence-load.e
b/gas/testsuite/gas/i386/lfence-load.e
new file mode 100644
index 0000000000..1ee49da7fd
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:??: Warning: `scas` changes flags which would affect control flow behavior
+.*:??: Warning: `cmps` changes flags which would affect control flow behavior
diff --git a/gas/testsuite/gas/i386/lfence-load.s
b/gas/testsuite/gas/i386/lfence-load.s
index b417ac644e..4b4aa1610b 100644
--- a/gas/testsuite/gas/i386/lfence-load.s
+++ b/gas/testsuite/gas/i386/lfence-load.s
@@ -4,6 +4,26 @@ _start:
  lgdt (%ebp)
  vmptrld (%ebp)
  vmclear (%ebp)
+ invpcid (%ebp), %edx
+ invlpg (%ebp)
+ clflush (%ebp)
+ clflushopt (%ebp)
+ clwb (%ebp)
+ cldemote (%ebp)
+ bndmk (%ebp), %bnd1
+ bndcl (%ebp), %bnd1
+ bndcu (%ebp), %bnd1
+ bndcn (%ebp), %bnd1
+ bndstx %bnd1, (%ebp)
+ bndldx (%ebp), %bnd1
+ prefetcht0 (%ebp)
+ prefetcht1 (%ebp)
+ prefetcht2 (%ebp)
+ prefetchw (%ebp)
+ pop %ds
+ popf
+ popa
+ xlatb (%ebx)
  fsts (%ebp)
  flds (%ebp)
  fistl (%ebp)
diff --git a/gas/testsuite/gas/i386/lfence-ret-a.d
b/gas/testsuite/gas/i386/lfence-ret-a.d
index 719cf1b472..aa35857664 100644
--- a/gas/testsuite/gas/i386/lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/lfence-ret-a.d
@@ -9,10 +9,28 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c3                    ret
  +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/lfence-ret-b.d
b/gas/testsuite/gas/i386/lfence-ret-b.d
index e3914b9c28..77001c425e 100644
--- a/gas/testsuite/gas/i386/lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/lfence-ret-b.d
@@ -9,6 +9,14 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
@@ -17,4 +25,20 @@ Disassembly of section .text:
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: f7 14 24              notl   \(%esp\)
+ +[a-f0-9]+: f7 14 24              notl   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: f7 14 24              notl   \(%esp\)
+ +[a-f0-9]+: f7 14 24              notl   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/lfence-ret-c.d
b/gas/testsuite/gas/i386/lfence-ret-c.d
new file mode 100644
index 0000000000..fceb0eb182
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-c.d
@@ -0,0 +1,35 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    ret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret-d.d
b/gas/testsuite/gas/i386/lfence-ret-d.d
new file mode 100644
index 0000000000..03f8f88fd7
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-d.d
@@ -0,0 +1,36 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=shl
+#objdump: -dw
+#name: -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    ret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret.s
b/gas/testsuite/gas/i386/lfence-ret.s
index 35c4e6eeaa..f27fa5839e 100644
--- a/gas/testsuite/gas/i386/lfence-ret.s
+++ b/gas/testsuite/gas/i386/lfence-ret.s
@@ -1,4 +1,10 @@
  .text
 _start:
+ retw
+ retw $20
  ret
  ret $30
+ lretw
+ lretw $40
+ lret
+ lret $40
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.d
b/gas/testsuite/gas/i386/x86-64-lfence-load.d
index 4f6cd00edf..5cd764391d 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
@@ -1,5 +1,6 @@
 #as: -mlfence-after-load=yes
 #objdump: -dw
+#warning_output: lfence-load.e
 #name: x86-64 -mlfence-after-load=yes

 .*: +file format .*
@@ -15,6 +16,29 @@ Disassembly of section .text:
  +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%rbp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%rbp\),%rdx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 67 0f 01 38          invlpg \(%eax\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%rbp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%rbp\)
+ +[a-f0-9]+: 0f a1                popq   %fs
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popfq
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: d9 55 00              fsts   0x0\(%rbp\)
  +[a-f0-9]+: d9 45 00              flds   0x0\(%rbp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.s
b/gas/testsuite/gas/i386/x86-64-lfence-load.s
index 76d0886617..2a3ac6b7d2 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.s
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s
@@ -4,6 +4,25 @@ _start:
  lgdt (%rbp)
  vmptrld (%rbp)
  vmclear (%rbp)
+ invpcid (%rbp), %rdx
+ invlpg (%eax)
+ clflush (%rbp)
+ clflushopt (%rbp)
+ clwb (%rbp)
+ cldemote (%rbp)
+ bndmk (%rbp), %bnd1
+ bndcl (%rbp), %bnd1
+ bndcu (%rbp), %bnd1
+ bndcn (%rbp), %bnd1
+ bndstx %bnd1, (%rbp)
+ bndldx (%rbp), %bnd1
+ prefetcht0 (%rbp)
+ prefetcht1 (%rbp)
+ prefetcht2 (%rbp)
+ prefetchw (%rbp)
+ pop %fs
+ popf
+ xlatb (%rbx)
  fsts (%rbp)
  flds (%rbp)
  fistl (%rbp)
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
index 26e5b48bec..345217b17c 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
@@ -1,6 +1,7 @@
-#source: lfence-ret.s
+#source: x86-64-lfence-ret.s
 #as: -mlfence-before-ret=or
-#objdump: -dw
+#warning_output: x86-64-lfence-ret.e
+#objdump: -dw -Mintel64
 #name: x86-64 -mlfence-before-ret=or

 .*: +file format .*
@@ -9,10 +10,40 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                data16 retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14
  +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c3                    retq
  +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
index 340488831d..3947660fea 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
@@ -1,6 +1,7 @@
-#source: lfence-ret.s
+#source: x86-64-lfence-ret.s
 #as: -mlfence-before-ret=not
-#objdump: -dw
+#warning_output: x86-64-lfence-ret.e
+#objdump: -dw -Mintel64
 #name: x86-64 -mlfence-before-ret=not

 .*: +file format .*
@@ -9,6 +10,14 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                data16 retq
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
@@ -17,4 +26,36 @@ Disassembly of section .text:
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: f7 14 24              notl   \(%rsp\)
+ +[a-f0-9]+: f7 14 24              notl   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: f7 14 24              notl   \(%rsp\)
+ +[a-f0-9]+: f7 14 24              notl   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
new file mode 100644
index 0000000000..cd89a95bc4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
@@ -0,0 +1,48 @@
+#source: x86-64-lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#warning_output: x86-64-lfence-ret.e
+#objdump: -dw -Mintel64
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                data16 retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
new file mode 100644
index 0000000000..593b889435
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
@@ -0,0 +1,49 @@
+#source: x86-64-lfence-ret.s
+#as: -mlfence-before-ret=shl
+#warning_output: x86-64-lfence-ret.e
+#objdump: -dw -Mintel64
+#name: x86-64 -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                data16 retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
new file mode 100644
index 0000000000..b4d229654c
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
@@ -0,0 +1,49 @@
+#source: x86-64-lfence-ret.s
+#as: -mlfence-before-ret=shl
+#warning_output: x86-64-lfence-ret.e
+#objdump: -dw -Mintel64
+#name: x86-64 -mlfence-before-ret=yes
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                data16 retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret.e
b/gas/testsuite/gas/i386/x86-64-lfence-ret.e
new file mode 100644
index 0000000000..13730e50e6
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:??: Warning: no instruction mnemonic suffix given and no register
operands; using default for `lret'
+.*:??: Warning: no instruction mnemonic suffix given and no register
operands; using default for `lret'
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret.s
b/gas/testsuite/gas/i386/x86-64-lfence-ret.s
new file mode 100644
index 0000000000..986239c222
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret.s
@@ -0,0 +1,14 @@
+ .text
+_start:
+ retw
+ retw $20
+ ret
+ ret $30
+ data16 rex.w ret
+ data16 rex.w ret $40
+ lretw
+ lretw $40
+ lret
+ lret $40
+ lretq
+ lretq $40
-- 
2.18.1

-- 
BR,
Hongtao
Jan Beulich April 24, 2020, 6 a.m. | #20
On 24.04.2020 07:30, Hongtao Liu wrote:
> Change to

> 

> +      /* lret or iret.  */

> +      bfd_boolean lret = (i.tm.base_opcode | 0x5) == 0xcf;

> +      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;

> +      char prefix = 0x0;

> +      /* Default operand size for far return is 32 bits,

> +         64 bits for near return.  */

> +      /* Near ret ingore operand size override under CPU64.  */

> +      if ((!lret && flag_code == CODE_64BIT) || has_rexw)

> +        prefix = 0x48;

>        else

> +        prefix = i.prefix[DATA_PREFIX] ? 0x66 : 0x0;


One minor remark on this one - I'd suggest to either omit the
initializer for prefix, or make the last two lines

      else if (i.prefix[DATA_PREFIX])
        prefix = 0x66;

as there's no point assigning 0 twice.

> Update total patch:


Looks okay to me now, thanks.

Jan
Fangrui Song via Binutils April 24, 2020, 7:29 a.m. | #21
On Fri, Apr 24, 2020 at 2:01 PM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 24.04.2020 07:30, Hongtao Liu wrote:

> > Change to

> >

> > +      /* lret or iret.  */

> > +      bfd_boolean lret = (i.tm.base_opcode | 0x5) == 0xcf;

> > +      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;

> > +      char prefix = 0x0;

> > +      /* Default operand size for far return is 32 bits,

> > +         64 bits for near return.  */

> > +      /* Near ret ingore operand size override under CPU64.  */

> > +      if ((!lret && flag_code == CODE_64BIT) || has_rexw)

> > +        prefix = 0x48;

> >        else

> > +        prefix = i.prefix[DATA_PREFIX] ? 0x66 : 0x0;

>

> One minor remark on this one - I'd suggest to either omit the

> initializer for prefix, or make the last two lines

>

>       else if (i.prefix[DATA_PREFIX])

>         prefix = 0x66;

>

> as there's no point assigning 0 twice.

>


I'll do this change.

> > Update total patch:

>

> Looks okay to me now, thanks.

>

> Jan


Thanks for you review and patience, I'll wait for H.J's approval and
submit my patch.

-- 
BR,
Hongtao
Fangrui Song via Binutils April 24, 2020, 1 p.m. | #22
On Fri, Apr 24, 2020 at 12:24 AM Hongtao Liu <crazylht@gmail.com> wrote:
>

> On Fri, Apr 24, 2020 at 2:01 PM Jan Beulich <jbeulich@suse.com> wrote:

> >

> > On 24.04.2020 07:30, Hongtao Liu wrote:

> > > Change to

> > >

> > > +      /* lret or iret.  */

> > > +      bfd_boolean lret = (i.tm.base_opcode | 0x5) == 0xcf;

> > > +      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;

> > > +      char prefix = 0x0;

> > > +      /* Default operand size for far return is 32 bits,

> > > +         64 bits for near return.  */

> > > +      /* Near ret ingore operand size override under CPU64.  */

> > > +      if ((!lret && flag_code == CODE_64BIT) || has_rexw)

> > > +        prefix = 0x48;

> > >        else

> > > +        prefix = i.prefix[DATA_PREFIX] ? 0x66 : 0x0;

> >

> > One minor remark on this one - I'd suggest to either omit the

> > initializer for prefix, or make the last two lines

> >

> >       else if (i.prefix[DATA_PREFIX])

> >         prefix = 0x66;

> >

> > as there's no point assigning 0 twice.

> >

>

> I'll do this change.

>

> > > Update total patch:

> >

> > Looks okay to me now, thanks.

> >

> > Jan

>

> Thanks for you review and patience, I'll wait for H.J's approval and

> submit my patch.

>


Please post your final patch.

Thanks.

-- 
H.J.
Fangrui Song via Binutils April 26, 2020, 2:03 a.m. | #23
On Fri, Apr 24, 2020 at 9:00 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Fri, Apr 24, 2020 at 12:24 AM Hongtao Liu <crazylht@gmail.com> wrote:

> >

> > On Fri, Apr 24, 2020 at 2:01 PM Jan Beulich <jbeulich@suse.com> wrote:

> > >

> > > On 24.04.2020 07:30, Hongtao Liu wrote:

> > > > Change to

> > > >

> > > > +      /* lret or iret.  */

> > > > +      bfd_boolean lret = (i.tm.base_opcode | 0x5) == 0xcf;

> > > > +      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;

> > > > +      char prefix = 0x0;

> > > > +      /* Default operand size for far return is 32 bits,

> > > > +         64 bits for near return.  */

> > > > +      /* Near ret ingore operand size override under CPU64.  */

> > > > +      if ((!lret && flag_code == CODE_64BIT) || has_rexw)

> > > > +        prefix = 0x48;

> > > >        else

> > > > +        prefix = i.prefix[DATA_PREFIX] ? 0x66 : 0x0;

> > >

> > > One minor remark on this one - I'd suggest to either omit the

> > > initializer for prefix, or make the last two lines

> > >

> > >       else if (i.prefix[DATA_PREFIX])

> > >         prefix = 0x66;

> > >

> > > as there's no point assigning 0 twice.

> > >

> >

> > I'll do this change.

> >

> > > > Update total patch:

> > >

> > > Looks okay to me now, thanks.

> > >

> > > Jan

> >

> > Thanks for you review and patience, I'll wait for H.J's approval and

> > submit my patch.

> >

>

> Please post your final patch.

>

> Thanks.

>

> --

> H.J.


From 495f32049c894cfc24984dfceed3f45169bc0128 Mon Sep 17 00:00:00 2001
From: liuhongt <hongtao.liu@intel.com>

Date: Mon, 16 Mar 2020 11:03:12 +0800
Subject: [PATCH] Improve -mlfence-after-load

  1.Implict load for POP/POPF/POPA/XLATB, no load for Anysize insns
  2. Add -mlfence-before-ret=shl/yes, adjust operand size of
  or/not/shl according to ret's.
  3. Issue warning for REP CMPS/SCAS since they would affect control
  flow behavior.
  4. Adjust testcases and documents.

gas/Changelog:
        * config/tc-i386.c (lfence_before_ret_shl): New member.
        (load_insn_p): implict load for POP/POPA/POPF/XLATB, no load
        for Anysize insns.
        (insert_after_load): Issue warning for REP CMPS/SCAS.
        (insert_before_before): Handle iret, Handle
        -mlfence-before-ret=shl, Adjust operand size of or/not/shl to ret's,
        (md_parse_option): Change -mlfence-before-ret=[none|not|or] to
        -mlfence-before-ret=[none/not/or/shl/yes].
        Enable -mlfence-before-ret=shl when
        -mlfence-beofre-indirect-branch=all and no explict
-mlfence-before-ret option.
        (md_show_usage): Ditto.
        * doc/c-i386.texi: Ditto.
        * testsuite/gas/i386/i386.exp: Add new testcases.
        * testsuite/gas/i386/lfence-load-b.d: New.
        * testsuite/gas/i386/lfence-load-b.e: New.
        * testsuite/gas/i386/lfence-load.d: Modified.
        * testsuite/gas/i386/lfence-load.e: New.
        * testsuite/gas/i386/lfence-load.s: Modified.
        * testsuite/gas/i386/lfence-ret-a.d: Modified.
        * testsuite/gas/i386/lfence-ret-b.d: Modified.
        * testsuite/gas/i386/lfence-ret-c.d: New.
        * testsuite/gas/i386/lfence-ret-d.d: New.
        * testsuite/gas/i386/lfence-ret.s: Modified.
        * testsuite/gas/i386/x86-64-lfence-load-b.d: New.
        * testsuite/gas/i386/x86-64-lfence-load.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-load.s: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-a.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-b.d: Modified.
        * testsuite/gas/i386/x86-64-lfence-ret-c.d: New.
        * testsuite/gas/i386/x86-64-lfence-ret-d.d: New
        * testsuite/gas/i386/x86-64-lfence-ret-e.d: New.
        * testsuite/gas/i386/x86-64-lfence-ret.e: New.
        * testsuite/gas/i386/x86-64-lfence-ret.s: New.
---
 gas/config/tc-i386.c                         | 120 ++++++++++++++-----
 gas/doc/c-i386.texi                          |  12 +-
 gas/testsuite/gas/i386/i386.exp              |   5 +
 gas/testsuite/gas/i386/lfence-load.d         |  26 ++++
 gas/testsuite/gas/i386/lfence-load.e         |   3 +
 gas/testsuite/gas/i386/lfence-load.s         |  20 ++++
 gas/testsuite/gas/i386/lfence-ret-a.d        |  18 +++
 gas/testsuite/gas/i386/lfence-ret-b.d        |  24 ++++
 gas/testsuite/gas/i386/lfence-ret-c.d        |  35 ++++++
 gas/testsuite/gas/i386/lfence-ret-d.d        |  36 ++++++
 gas/testsuite/gas/i386/lfence-ret.s          |   6 +
 gas/testsuite/gas/i386/x86-64-lfence-load.d  |  24 ++++
 gas/testsuite/gas/i386/x86-64-lfence-load.s  |  19 +++
 gas/testsuite/gas/i386/x86-64-lfence-ret-a.d |  35 +++++-
 gas/testsuite/gas/i386/x86-64-lfence-ret-b.d |  45 ++++++-
 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d |  48 ++++++++
 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d |  49 ++++++++
 gas/testsuite/gas/i386/x86-64-lfence-ret-e.d |  49 ++++++++
 gas/testsuite/gas/i386/x86-64-lfence-ret.e   |   3 +
 gas/testsuite/gas/i386/x86-64-lfence-ret.s   |  14 +++
 20 files changed, 551 insertions(+), 40 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/lfence-load.e
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/lfence-ret-d.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret.e
 create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 093497becd..a692c457a5 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -647,7 +647,8 @@ static enum lfence_before_ret_kind
   {
     lfence_before_ret_none = 0,
     lfence_before_ret_not,
-    lfence_before_ret_or
+    lfence_before_ret_or,
+    lfence_before_ret_shl
   }
 lfence_before_ret;

@@ -4350,22 +4351,28 @@ load_insn_p (void)

   if (!any_vex_p)
     {
-      /* lea  */
-      if (i.tm.base_opcode == 0x8d)
+      /* Anysize insns: lea, invlpg, clflush, prefetchnta, prefetcht0,
+         prefetcht1, prefetcht2, prefetchtw, bndmk, bndcl, bndcu, bndcn,
+         bndstx, bndldx, prefetchwt1, clflushopt, clwb, cldemote.  */
+      if (i.tm.opcode_modifier.anysize)
         return 0;

-      /* pop  */
-      if ((i.tm.base_opcode & ~7) == 0x58
-          || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))
+      /* pop, popf, popa.   */
+      if (strcmp (i.tm.name, "pop") == 0
+          || i.tm.base_opcode == 0x9d
+          || i.tm.base_opcode == 0x61)
         return 1;

       /* movs, cmps, lods, scas.  */
       if ((i.tm.base_opcode | 0xb) == 0xaf)
         return 1;

-      /* outs */
-      if (base_opcode == 0x6f)
+      /* outs, xlatb.  */
+      if (base_opcode == 0x6f
+          || i.tm.base_opcode == 0xd7)
         return 1;
+      /* NB: For AMD-specific insns with implicit memory operands,
+         they're intentionally not covered.  */
     }

   /* No memory operand.  */
@@ -4506,6 +4513,22 @@ insert_lfence_after (void)
 {
   if (lfence_after_load && load_insn_p ())
     {
+      /* There are also two REP string instructions that require
+         special treatment. Specifically, the compare string (CMPS)
+         and scan string (SCAS) instructions set EFLAGS in a manner
+         that depends on the data being compared/scanned. When used
+         with a REP prefix, the number of iterations may therefore
+         vary depending on this data. If the data is a program secret
+         chosen by the adversary using an LVI method,
+         then this data-dependent behavior may leak some aspect
+         of the secret.  */
+      if (((i.tm.base_opcode | 0x1) == 0xa7
+           || (i.tm.base_opcode | 0x1) == 0xaf)
+          && i.prefix[REP_PREFIX])
+        {
+            as_warn (_("`%s` changes flags which would affect control
flow behavior"),
+                     i.tm.name);
+        }
       char *p = frag_more (3);
       *p++ = 0xf;
       *p++ = 0xae;
@@ -4568,12 +4591,13 @@ insert_lfence_before (void)
       return;
     }

-  /* Output or/not and lfence before ret.  */
+  /* Output or/not/shl and lfence before ret/lret/iret.  */
   if (lfence_before_ret != lfence_before_ret_none
       && (i.tm.base_opcode == 0xc2
           || i.tm.base_opcode == 0xc3
           || i.tm.base_opcode == 0xca
-          || i.tm.base_opcode == 0xcb))
+          || i.tm.base_opcode == 0xcb
+          || i.tm.base_opcode == 0xcf))
     {
       if (last_insn.kind != last_insn_other
           && last_insn.seg == now_seg)
@@ -4583,33 +4607,59 @@ insert_lfence_before (void)
                          last_insn.name, i.tm.name);
           return;
         }
-      if (lfence_before_ret == lfence_before_ret_or)
-        {
-          /* orl: 0x830c2400.  */
-          p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
-          *p++ = 0x83;
-          *p++ = 0xc;
-          *p++ = 0x24;
-          *p++ = 0x0;
-        }
-      else
-        {
-          p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);
-          /* notl: 0xf71424.  */
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
+
+      /* lret or iret.  */
+      bfd_boolean lret = (i.tm.base_opcode | 0x5) == 0xcf;
+      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;
+      char prefix = 0x0;
+      /* Default operand size for far return is 32 bits,
+         64 bits for near return.  */
+      /* Near ret ingore operand size override under CPU64.  */
+      if ((!lret && flag_code == CODE_64BIT) || has_rexw)
+        prefix = 0x48;
+      else if (i.prefix[DATA_PREFIX])
+        prefix = 0x66;
+
+      if (lfence_before_ret == lfence_before_ret_not)
+        {
+          /* not: 0xf71424, may add prefix
+             for operand size override or 64-bit code.  */
+          p = frag_more ((prefix ? 2 : 0) + 6 + 3);
+          if (prefix)
+            *p++ = prefix;
           *p++ = 0xf7;
           *p++ = 0x14;
           *p++ = 0x24;
-          /* notl: 0xf71424.  */
-          if (flag_code == CODE_64BIT)
-            *p++ = 0x48;
+          if (prefix)
+            *p++ = prefix;
           *p++ = 0xf7;
           *p++ = 0x14;
           *p++ = 0x24;
         }
+      else
+        {
+          p = frag_more ((prefix ? 1 : 0) + 4 + 3);
+          if (prefix)
+            *p++ = prefix;
+          if (lfence_before_ret == lfence_before_ret_or)
+            {
+              /* or: 0x830c2400, may add prefix
+                 for operand size override or 64-bit code.  */
+              *p++ = 0x83;
+              *p++ = 0x0c;
+            }
+          else
+            {
+              /* shl: 0xc1242400, may add prefix
+                 for operand size override or 64-bit code.  */
+              *p++ = 0xc1;
+              *p++ = 0x24;
+            }
+
+          *p++ = 0x24;
+          *p++ = 0x0;
+        }
+
       *p++ = 0xf;
       *p++ = 0xae;
       *p = 0xe8;
@@ -12995,7 +13045,11 @@ md_parse_option (int c, const char *arg)

     case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:
       if (strcasecmp (arg, "all") == 0)
-        lfence_before_indirect_branch = lfence_branch_all;
+        {
+          lfence_before_indirect_branch = lfence_branch_all;
+          if (lfence_before_ret == lfence_before_ret_none)
+            lfence_before_ret = lfence_before_ret_shl;
+        }
       else if (strcasecmp (arg, "memory") == 0)
         lfence_before_indirect_branch = lfence_branch_memory;
       else if (strcasecmp (arg, "register") == 0)
@@ -13012,6 +13066,8 @@ md_parse_option (int c, const char *arg)
         lfence_before_ret = lfence_before_ret_or;
       else if (strcasecmp (arg, "not") == 0)
         lfence_before_ret = lfence_before_ret_not;
+      else if (strcasecmp (arg, "shl") == 0 || strcasecmp (arg, "yes") == 0)
+        lfence_before_ret = lfence_before_ret_shl;
       else if (strcasecmp (arg, "none") == 0)
         lfence_before_ret = lfence_before_ret_none;
       else
@@ -13382,7 +13438,7 @@ md_show_usage (FILE *stream)
   -mlfence-before-indirect-branch=[none|all|register|memory] (default: none)\n\
                           generate lfence before indirect near branch\n"));
   fprintf (stream, _("\
-  -mlfence-before-ret=[none|or|not] (default: none)\n\
+  -mlfence-before-ret=[none|or|not|shl|yes] (default: none)\n\
                           generate lfence before ret\n"));
   fprintf (stream, _("\
   -mamd64                 accept only AMD64 ISA [default]\n"));
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 628fb1ad5a..4acece4394 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -488,6 +488,8 @@ before indirect near branch instructions.
 @option{-mlfence-before-indirect-branch=@var{all}} will generate lfence
 before indirect near branch via register and issue a warning before
 indirect near branch via memory.
+It also implicitly sets @option{-mlfence-before-ret=@var{shl}} when
+there's no explict @option{-mlfence-before-ret=}.
 @option{-mlfence-before-indirect-branch=@var{register}} will generate
 lfence before indirect near branch via register.
 @option{-mlfence-before-indirect-branch=@var{memory}} will issue a
@@ -501,15 +503,17 @@ after loading branch target register.
 @cindex @samp{-mlfence-before-ret=} option, i386
 @cindex @samp{-mlfence-before-ret=} option, x86-64
 @item -mlfence-before-ret=@var{none}
+@item -mlfence-before-ret=@var{shl}
 @item -mlfence-before-ret=@var{or}
+@item -mlfence-before-ret=@var{yes}
 @itemx -mlfence-before-ret=@var{not}
 These options control whether the assembler should generate lfence
 before ret.  @option{-mlfence-before-ret=@var{or}} will generate
 generate or instruction with lfence.
-@option{-mlfence-before-ret=@var{not}} will generate not instruction
-with lfence.
-@option{-mlfence-before-ret=@var{none}} will not generate lfence,
-which is the default.
+@option{-mlfence-before-ret=@var{shl/yes}} will generate shl instruction
+with lfence. @option{-mlfence-before-ret=@var{not}} will generate not
+instruction with lfence. @option{-mlfence-before-ret=@var{none}} will not
+generate lfence, which is the default.

 @cindex @samp{-mx86-used-note=} option, i386
 @cindex @samp{-mx86-used-note=} option, x86-64
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 9dacc11906..3bacb80178 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -535,6 +535,8 @@ if [expr ([istarget "i*86-*-*"] ||  [istarget
"x86_64-*-*"]) && [gas_32_check]]
     run_dump_test "lfence-indbr-c"
     run_dump_test "lfence-ret-a"
     run_dump_test "lfence-ret-b"
+    run_dump_test "lfence-ret-c"
+    run_dump_test "lfence-ret-d"
     run_dump_test "lfence-byte"

     # These tests require support for 8 and 16 bit relocs,
@@ -1122,6 +1124,9 @@ if [expr ([istarget "i*86-*-*"] || [istarget
"x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-lfence-indbr-c"
     run_dump_test "x86-64-lfence-ret-a"
     run_dump_test "x86-64-lfence-ret-b"
+    run_dump_test "x86-64-lfence-ret-c"
+    run_dump_test "x86-64-lfence-ret-d"
+    run_dump_test "x86-64-lfence-ret-e"
     run_dump_test "x86-64-lfence-byte"

     if { ![istarget "*-*-aix*"]
diff --git a/gas/testsuite/gas/i386/lfence-load.d
b/gas/testsuite/gas/i386/lfence-load.d
index cd7e7f76df..0d355df556 100644
--- a/gas/testsuite/gas/i386/lfence-load.d
+++ b/gas/testsuite/gas/i386/lfence-load.d
@@ -1,5 +1,6 @@
 #as: -mlfence-after-load=yes
 #objdump: -dw
+#warning_output: lfence-load.e
 #name: -mlfence-after-load=yes

 .*: +file format .*
@@ -15,6 +16,31 @@ Disassembly of section .text:
  +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%ebp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%ebp\),%edx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 0f 01 7d 00          invlpg 0x0\(%ebp\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%ebp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%ebp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%ebp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%ebp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%ebp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%ebp\)
+ +[a-f0-9]+: 1f                    pop    %ds
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popf
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 61                    popa
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%ebx\)
+ +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: d9 55 00              fsts   0x0\(%ebp\)
  +[a-f0-9]+: d9 45 00              flds   0x0\(%ebp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/lfence-load.e
b/gas/testsuite/gas/i386/lfence-load.e
new file mode 100644
index 0000000000..1ee49da7fd
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-load.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:??: Warning: `scas` changes flags which would affect control flow behavior
+.*:??: Warning: `cmps` changes flags which would affect control flow behavior
diff --git a/gas/testsuite/gas/i386/lfence-load.s
b/gas/testsuite/gas/i386/lfence-load.s
index b417ac644e..4b4aa1610b 100644
--- a/gas/testsuite/gas/i386/lfence-load.s
+++ b/gas/testsuite/gas/i386/lfence-load.s
@@ -4,6 +4,26 @@ _start:
  lgdt (%ebp)
  vmptrld (%ebp)
  vmclear (%ebp)
+ invpcid (%ebp), %edx
+ invlpg (%ebp)
+ clflush (%ebp)
+ clflushopt (%ebp)
+ clwb (%ebp)
+ cldemote (%ebp)
+ bndmk (%ebp), %bnd1
+ bndcl (%ebp), %bnd1
+ bndcu (%ebp), %bnd1
+ bndcn (%ebp), %bnd1
+ bndstx %bnd1, (%ebp)
+ bndldx (%ebp), %bnd1
+ prefetcht0 (%ebp)
+ prefetcht1 (%ebp)
+ prefetcht2 (%ebp)
+ prefetchw (%ebp)
+ pop %ds
+ popf
+ popa
+ xlatb (%ebx)
  fsts (%ebp)
  flds (%ebp)
  fistl (%ebp)
diff --git a/gas/testsuite/gas/i386/lfence-ret-a.d
b/gas/testsuite/gas/i386/lfence-ret-a.d
index 719cf1b472..aa35857664 100644
--- a/gas/testsuite/gas/i386/lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/lfence-ret-a.d
@@ -9,10 +9,28 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c3                    ret
  +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/lfence-ret-b.d
b/gas/testsuite/gas/i386/lfence-ret-b.d
index e3914b9c28..77001c425e 100644
--- a/gas/testsuite/gas/i386/lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/lfence-ret-b.d
@@ -9,6 +9,14 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
@@ -17,4 +25,20 @@ Disassembly of section .text:
  +[a-f0-9]+: f7 14 24              notl   \(%esp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: f7 14 24              notl   \(%esp\)
+ +[a-f0-9]+: f7 14 24              notl   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: f7 14 24              notl   \(%esp\)
+ +[a-f0-9]+: f7 14 24              notl   \(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/lfence-ret-c.d
b/gas/testsuite/gas/i386/lfence-ret-c.d
new file mode 100644
index 0000000000..fceb0eb182
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-c.d
@@ -0,0 +1,35 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#objdump: -dw
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    ret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret-d.d
b/gas/testsuite/gas/i386/lfence-ret-d.d
new file mode 100644
index 0000000000..03f8f88fd7
--- /dev/null
+++ b/gas/testsuite/gas/i386/lfence-ret-d.d
@@ -0,0 +1,36 @@
+#source: lfence-ret.s
+#as: -mlfence-before-ret=shl
+#objdump: -dw
+#name: -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                retw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          retw   \$0x14
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    ret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              ret    \$0x1e
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/lfence-ret.s
b/gas/testsuite/gas/i386/lfence-ret.s
index 35c4e6eeaa..f27fa5839e 100644
--- a/gas/testsuite/gas/i386/lfence-ret.s
+++ b/gas/testsuite/gas/i386/lfence-ret.s
@@ -1,4 +1,10 @@
  .text
 _start:
+ retw
+ retw $20
  ret
  ret $30
+ lretw
+ lretw $40
+ lret
+ lret $40
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.d
b/gas/testsuite/gas/i386/x86-64-lfence-load.d
index 4f6cd00edf..5cd764391d 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d
@@ -1,5 +1,6 @@
 #as: -mlfence-after-load=yes
 #objdump: -dw
+#warning_output: lfence-load.e
 #name: x86-64 -mlfence-after-load=yes

 .*: +file format .*
@@ -15,6 +16,29 @@ Disassembly of section .text:
  +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%rbp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%rbp\),%rdx
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 67 0f 01 38          invlpg \(%eax\)
+ +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%rbp\)
+ +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%rbp\)
+ +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%rbp\)
+ +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%rbp\),%bnd1
+ +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%rbp\)
+ +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%rbp\)
+ +[a-f0-9]+: 0f a1                popq   %fs
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 9d                    popfq
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: d7                    xlat   %ds:\(%rbx\)
+ +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: d9 55 00              fsts   0x0\(%rbp\)
  +[a-f0-9]+: d9 45 00              flds   0x0\(%rbp\)
  +[a-f0-9]+: 0f ae e8              lfence
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.s
b/gas/testsuite/gas/i386/x86-64-lfence-load.s
index 76d0886617..2a3ac6b7d2 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-load.s
+++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s
@@ -4,6 +4,25 @@ _start:
  lgdt (%rbp)
  vmptrld (%rbp)
  vmclear (%rbp)
+ invpcid (%rbp), %rdx
+ invlpg (%eax)
+ clflush (%rbp)
+ clflushopt (%rbp)
+ clwb (%rbp)
+ cldemote (%rbp)
+ bndmk (%rbp), %bnd1
+ bndcl (%rbp), %bnd1
+ bndcu (%rbp), %bnd1
+ bndcn (%rbp), %bnd1
+ bndstx %bnd1, (%rbp)
+ bndldx (%rbp), %bnd1
+ prefetcht0 (%rbp)
+ prefetcht1 (%rbp)
+ prefetcht2 (%rbp)
+ prefetchw (%rbp)
+ pop %fs
+ popf
+ xlatb (%rbx)
  fsts (%rbp)
  flds (%rbp)
  fistl (%rbp)
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
index 26e5b48bec..345217b17c 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d
@@ -1,6 +1,7 @@
-#source: lfence-ret.s
+#source: x86-64-lfence-ret.s
 #as: -mlfence-before-ret=or
-#objdump: -dw
+#warning_output: x86-64-lfence-ret.e
+#objdump: -dw -Mintel64
 #name: x86-64 -mlfence-before-ret=or

 .*: +file format .*
@@ -9,10 +10,40 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                data16 retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14
  +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c3                    retq
  +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
index 340488831d..3947660fea 100644
--- a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d
@@ -1,6 +1,7 @@
-#source: lfence-ret.s
+#source: x86-64-lfence-ret.s
 #as: -mlfence-before-ret=not
-#objdump: -dw
+#warning_output: x86-64-lfence-ret.e
+#objdump: -dw -Mintel64
 #name: x86-64 -mlfence-before-ret=not

 .*: +file format .*
@@ -9,6 +10,14 @@
 Disassembly of section .text:

 0+ <_start>:
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                data16 retq
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
@@ -17,4 +26,36 @@ Disassembly of section .text:
  +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
  +[a-f0-9]+: 0f ae e8              lfence
  +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: f7 14 24              notl   \(%rsp\)
+ +[a-f0-9]+: f7 14 24              notl   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: f7 14 24              notl   \(%rsp\)
+ +[a-f0-9]+: f7 14 24              notl   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
new file mode 100644
index 0000000000..cd89a95bc4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d
@@ -0,0 +1,48 @@
+#source: x86-64-lfence-ret.s
+#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all
+#warning_output: x86-64-lfence-ret.e
+#objdump: -dw -Mintel64
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                data16 retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
new file mode 100644
index 0000000000..593b889435
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d
@@ -0,0 +1,49 @@
+#source: x86-64-lfence-ret.s
+#as: -mlfence-before-ret=shl
+#warning_output: x86-64-lfence-ret.e
+#objdump: -dw -Mintel64
+#name: x86-64 -mlfence-before-ret=shl
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                data16 retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
b/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
new file mode 100644
index 0000000000..b4d229654c
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d
@@ -0,0 +1,49 @@
+#source: x86-64-lfence-ret.s
+#as: -mlfence-before-ret=shl
+#warning_output: x86-64-lfence-ret.e
+#objdump: -dw -Mintel64
+#name: x86-64 -mlfence-before-ret=yes
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c3                data16 retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c3                    retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: c2 1e 00              retq   \$0x1e
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c3              data16 rex.W retq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 cb                lretw
+ +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: cb                    lret
+ +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: ca 28 00              lret   \$0x28
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 cb                lretq
+ +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)
+ +[a-f0-9]+: 0f ae e8              lfence
+ +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret.e
b/gas/testsuite/gas/i386/x86-64-lfence-ret.e
new file mode 100644
index 0000000000..13730e50e6
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret.e
@@ -0,0 +1,3 @@
+.*: Assembler messages:
+.*:??: Warning: no instruction mnemonic suffix given and no register
operands; using default for `lret'
+.*:??: Warning: no instruction mnemonic suffix given and no register
operands; using default for `lret'
diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret.s
b/gas/testsuite/gas/i386/x86-64-lfence-ret.s
new file mode 100644
index 0000000000..986239c222
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-lfence-ret.s
@@ -0,0 +1,14 @@
+ .text
+_start:
+ retw
+ retw $20
+ ret
+ ret $30
+ data16 rex.w ret
+ data16 rex.w ret $40
+ lretw
+ lretw $40
+ lret
+ lret $40
+ lretq
+ lretq $40
-- 
2.18.1


-- 
BR,
Hongtao
Fangrui Song via Binutils April 26, 2020, 3:26 a.m. | #24
On Sat, Apr 25, 2020 at 7:03 PM Hongtao Liu <crazylht@gmail.com> wrote:
>

> On Fri, Apr 24, 2020 at 9:00 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> >

> > On Fri, Apr 24, 2020 at 12:24 AM Hongtao Liu <crazylht@gmail.com> wrote:

> > >

> > > On Fri, Apr 24, 2020 at 2:01 PM Jan Beulich <jbeulich@suse.com> wrote:

> > > >

> > > > On 24.04.2020 07:30, Hongtao Liu wrote:

> > > > > Change to

> > > > >

> > > > > +      /* lret or iret.  */

> > > > > +      bfd_boolean lret = (i.tm.base_opcode | 0x5) == 0xcf;

> > > > > +      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;

> > > > > +      char prefix = 0x0;

> > > > > +      /* Default operand size for far return is 32 bits,

> > > > > +         64 bits for near return.  */

> > > > > +      /* Near ret ingore operand size override under CPU64.  */

> > > > > +      if ((!lret && flag_code == CODE_64BIT) || has_rexw)

> > > > > +        prefix = 0x48;

> > > > >        else

> > > > > +        prefix = i.prefix[DATA_PREFIX] ? 0x66 : 0x0;

> > > >

> > > > One minor remark on this one - I'd suggest to either omit the

> > > > initializer for prefix, or make the last two lines

> > > >

> > > >       else if (i.prefix[DATA_PREFIX])

> > > >         prefix = 0x66;

> > > >

> > > > as there's no point assigning 0 twice.

> > > >

> > >

> > > I'll do this change.

> > >

> > > > > Update total patch:

> > > >

> > > > Looks okay to me now, thanks.

> > > >

> > > > Jan

> > >

> > > Thanks for you review and patience, I'll wait for H.J's approval and

> > > submit my patch.

> > >

> >

> > Please post your final patch.

> >

> > Thanks.

> >

> > --

> > H.J.

>

> From 495f32049c894cfc24984dfceed3f45169bc0128 Mon Sep 17 00:00:00 2001

> From: liuhongt <hongtao.liu@intel.com>

> Date: Mon, 16 Mar 2020 11:03:12 +0800

> Subject: [PATCH] Improve -mlfence-after-load

>

>   1.Implict load for POP/POPF/POPA/XLATB, no load for Anysize insns

>   2. Add -mlfence-before-ret=shl/yes, adjust operand size of

>   or/not/shl according to ret's.

>   3. Issue warning for REP CMPS/SCAS since they would affect control

>   flow behavior.

>   4. Adjust testcases and documents.

>

> gas/Changelog:

>         * config/tc-i386.c (lfence_before_ret_shl): New member.

>         (load_insn_p): implict load for POP/POPA/POPF/XLATB, no load

>         for Anysize insns.

>         (insert_after_load): Issue warning for REP CMPS/SCAS.

>         (insert_before_before): Handle iret, Handle

>         -mlfence-before-ret=shl, Adjust operand size of or/not/shl to ret's,

>         (md_parse_option): Change -mlfence-before-ret=[none|not|or] to

>         -mlfence-before-ret=[none/not/or/shl/yes].

>         Enable -mlfence-before-ret=shl when

>         -mlfence-beofre-indirect-branch=all and no explict

> -mlfence-before-ret option.

>         (md_show_usage): Ditto.

>         * doc/c-i386.texi: Ditto.

>         * testsuite/gas/i386/i386.exp: Add new testcases.

>         * testsuite/gas/i386/lfence-load-b.d: New.

>         * testsuite/gas/i386/lfence-load-b.e: New.

>         * testsuite/gas/i386/lfence-load.d: Modified.

>         * testsuite/gas/i386/lfence-load.e: New.

>         * testsuite/gas/i386/lfence-load.s: Modified.

>         * testsuite/gas/i386/lfence-ret-a.d: Modified.

>         * testsuite/gas/i386/lfence-ret-b.d: Modified.

>         * testsuite/gas/i386/lfence-ret-c.d: New.

>         * testsuite/gas/i386/lfence-ret-d.d: New.

>         * testsuite/gas/i386/lfence-ret.s: Modified.

>         * testsuite/gas/i386/x86-64-lfence-load-b.d: New.

>         * testsuite/gas/i386/x86-64-lfence-load.d: Modified.

>         * testsuite/gas/i386/x86-64-lfence-load.s: Modified.

>         * testsuite/gas/i386/x86-64-lfence-ret-a.d: Modified.

>         * testsuite/gas/i386/x86-64-lfence-ret-b.d: Modified.

>         * testsuite/gas/i386/x86-64-lfence-ret-c.d: New.

>         * testsuite/gas/i386/x86-64-lfence-ret-d.d: New

>         * testsuite/gas/i386/x86-64-lfence-ret-e.d: New.

>         * testsuite/gas/i386/x86-64-lfence-ret.e: New.

>         * testsuite/gas/i386/x86-64-lfence-ret.s: New.

> ---

>  gas/config/tc-i386.c                         | 120 ++++++++++++++-----

>  gas/doc/c-i386.texi                          |  12 +-

>  gas/testsuite/gas/i386/i386.exp              |   5 +

>  gas/testsuite/gas/i386/lfence-load.d         |  26 ++++

>  gas/testsuite/gas/i386/lfence-load.e         |   3 +

>  gas/testsuite/gas/i386/lfence-load.s         |  20 ++++

>  gas/testsuite/gas/i386/lfence-ret-a.d        |  18 +++

>  gas/testsuite/gas/i386/lfence-ret-b.d        |  24 ++++

>  gas/testsuite/gas/i386/lfence-ret-c.d        |  35 ++++++

>  gas/testsuite/gas/i386/lfence-ret-d.d        |  36 ++++++

>  gas/testsuite/gas/i386/lfence-ret.s          |   6 +

>  gas/testsuite/gas/i386/x86-64-lfence-load.d  |  24 ++++

>  gas/testsuite/gas/i386/x86-64-lfence-load.s  |  19 +++

>  gas/testsuite/gas/i386/x86-64-lfence-ret-a.d |  35 +++++-

>  gas/testsuite/gas/i386/x86-64-lfence-ret-b.d |  45 ++++++-

>  gas/testsuite/gas/i386/x86-64-lfence-ret-c.d |  48 ++++++++

>  gas/testsuite/gas/i386/x86-64-lfence-ret-d.d |  49 ++++++++

>  gas/testsuite/gas/i386/x86-64-lfence-ret-e.d |  49 ++++++++

>  gas/testsuite/gas/i386/x86-64-lfence-ret.e   |   3 +

>  gas/testsuite/gas/i386/x86-64-lfence-ret.s   |  14 +++

>  20 files changed, 551 insertions(+), 40 deletions(-)

>  create mode 100644 gas/testsuite/gas/i386/lfence-load.e

>  create mode 100644 gas/testsuite/gas/i386/lfence-ret-c.d

>  create mode 100644 gas/testsuite/gas/i386/lfence-ret-d.d

>  create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-c.d

>  create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-d.d

>  create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret-e.d

>  create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret.e

>  create mode 100644 gas/testsuite/gas/i386/x86-64-lfence-ret.s

>

> diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c

> index 093497becd..a692c457a5 100644

> --- a/gas/config/tc-i386.c

> +++ b/gas/config/tc-i386.c

> @@ -647,7 +647,8 @@ static enum lfence_before_ret_kind

>    {

>      lfence_before_ret_none = 0,

>      lfence_before_ret_not,

> -    lfence_before_ret_or

> +    lfence_before_ret_or,

> +    lfence_before_ret_shl

>    }

>  lfence_before_ret;

>

> @@ -4350,22 +4351,28 @@ load_insn_p (void)

>

>    if (!any_vex_p)

>      {

> -      /* lea  */

> -      if (i.tm.base_opcode == 0x8d)

> +      /* Anysize insns: lea, invlpg, clflush, prefetchnta, prefetcht0,

> +         prefetcht1, prefetcht2, prefetchtw, bndmk, bndcl, bndcu, bndcn,

> +         bndstx, bndldx, prefetchwt1, clflushopt, clwb, cldemote.  */

> +      if (i.tm.opcode_modifier.anysize)

>          return 0;

>

> -      /* pop  */

> -      if ((i.tm.base_opcode & ~7) == 0x58

> -          || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))

> +      /* pop, popf, popa.   */

> +      if (strcmp (i.tm.name, "pop") == 0

> +          || i.tm.base_opcode == 0x9d

> +          || i.tm.base_opcode == 0x61)

>          return 1;

>

>        /* movs, cmps, lods, scas.  */

>        if ((i.tm.base_opcode | 0xb) == 0xaf)

>          return 1;

>

> -      /* outs */

> -      if (base_opcode == 0x6f)

> +      /* outs, xlatb.  */

> +      if (base_opcode == 0x6f

> +          || i.tm.base_opcode == 0xd7)

>          return 1;

> +      /* NB: For AMD-specific insns with implicit memory operands,

> +         they're intentionally not covered.  */

>      }

>

>    /* No memory operand.  */

> @@ -4506,6 +4513,22 @@ insert_lfence_after (void)

>  {

>    if (lfence_after_load && load_insn_p ())

>      {

> +      /* There are also two REP string instructions that require

> +         special treatment. Specifically, the compare string (CMPS)

> +         and scan string (SCAS) instructions set EFLAGS in a manner

> +         that depends on the data being compared/scanned. When used

> +         with a REP prefix, the number of iterations may therefore

> +         vary depending on this data. If the data is a program secret

> +         chosen by the adversary using an LVI method,

> +         then this data-dependent behavior may leak some aspect

> +         of the secret.  */

> +      if (((i.tm.base_opcode | 0x1) == 0xa7

> +           || (i.tm.base_opcode | 0x1) == 0xaf)

> +          && i.prefix[REP_PREFIX])

> +        {

> +            as_warn (_("`%s` changes flags which would affect control

> flow behavior"),

> +                     i.tm.name);

> +        }

>        char *p = frag_more (3);

>        *p++ = 0xf;

>        *p++ = 0xae;

> @@ -4568,12 +4591,13 @@ insert_lfence_before (void)

>        return;

>      }

>

> -  /* Output or/not and lfence before ret.  */

> +  /* Output or/not/shl and lfence before ret/lret/iret.  */

>    if (lfence_before_ret != lfence_before_ret_none

>        && (i.tm.base_opcode == 0xc2

>            || i.tm.base_opcode == 0xc3

>            || i.tm.base_opcode == 0xca

> -          || i.tm.base_opcode == 0xcb))

> +          || i.tm.base_opcode == 0xcb

> +          || i.tm.base_opcode == 0xcf))

>      {

>        if (last_insn.kind != last_insn_other

>            && last_insn.seg == now_seg)

> @@ -4583,33 +4607,59 @@ insert_lfence_before (void)

>                           last_insn.name, i.tm.name);

>            return;

>          }

> -      if (lfence_before_ret == lfence_before_ret_or)

> -        {

> -          /* orl: 0x830c2400.  */

> -          p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);

> -          if (flag_code == CODE_64BIT)

> -            *p++ = 0x48;

> -          *p++ = 0x83;

> -          *p++ = 0xc;

> -          *p++ = 0x24;

> -          *p++ = 0x0;

> -        }

> -      else

> -        {

> -          p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);

> -          /* notl: 0xf71424.  */

> -          if (flag_code == CODE_64BIT)

> -            *p++ = 0x48;

> +

> +      /* lret or iret.  */

> +      bfd_boolean lret = (i.tm.base_opcode | 0x5) == 0xcf;

> +      bfd_boolean has_rexw = i.prefix[REX_PREFIX] & REX_W;

> +      char prefix = 0x0;

> +      /* Default operand size for far return is 32 bits,

> +         64 bits for near return.  */

> +      /* Near ret ingore operand size override under CPU64.  */

> +      if ((!lret && flag_code == CODE_64BIT) || has_rexw)

> +        prefix = 0x48;

> +      else if (i.prefix[DATA_PREFIX])

> +        prefix = 0x66;

> +

> +      if (lfence_before_ret == lfence_before_ret_not)

> +        {

> +          /* not: 0xf71424, may add prefix

> +             for operand size override or 64-bit code.  */

> +          p = frag_more ((prefix ? 2 : 0) + 6 + 3);

> +          if (prefix)

> +            *p++ = prefix;

>            *p++ = 0xf7;

>            *p++ = 0x14;

>            *p++ = 0x24;

> -          /* notl: 0xf71424.  */

> -          if (flag_code == CODE_64BIT)

> -            *p++ = 0x48;

> +          if (prefix)

> +            *p++ = prefix;

>            *p++ = 0xf7;

>            *p++ = 0x14;

>            *p++ = 0x24;

>          }

> +      else

> +        {

> +          p = frag_more ((prefix ? 1 : 0) + 4 + 3);

> +          if (prefix)

> +            *p++ = prefix;

> +          if (lfence_before_ret == lfence_before_ret_or)

> +            {

> +              /* or: 0x830c2400, may add prefix

> +                 for operand size override or 64-bit code.  */

> +              *p++ = 0x83;

> +              *p++ = 0x0c;

> +            }

> +          else

> +            {

> +              /* shl: 0xc1242400, may add prefix

> +                 for operand size override or 64-bit code.  */

> +              *p++ = 0xc1;

> +              *p++ = 0x24;

> +            }

> +

> +          *p++ = 0x24;

> +          *p++ = 0x0;

> +        }

> +

>        *p++ = 0xf;

>        *p++ = 0xae;

>        *p = 0xe8;

> @@ -12995,7 +13045,11 @@ md_parse_option (int c, const char *arg)

>

>      case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:

>        if (strcasecmp (arg, "all") == 0)

> -        lfence_before_indirect_branch = lfence_branch_all;

> +        {

> +          lfence_before_indirect_branch = lfence_branch_all;

> +          if (lfence_before_ret == lfence_before_ret_none)

> +            lfence_before_ret = lfence_before_ret_shl;

> +        }

>        else if (strcasecmp (arg, "memory") == 0)

>          lfence_before_indirect_branch = lfence_branch_memory;

>        else if (strcasecmp (arg, "register") == 0)

> @@ -13012,6 +13066,8 @@ md_parse_option (int c, const char *arg)

>          lfence_before_ret = lfence_before_ret_or;

>        else if (strcasecmp (arg, "not") == 0)

>          lfence_before_ret = lfence_before_ret_not;

> +      else if (strcasecmp (arg, "shl") == 0 || strcasecmp (arg, "yes") == 0)

> +        lfence_before_ret = lfence_before_ret_shl;

>        else if (strcasecmp (arg, "none") == 0)

>          lfence_before_ret = lfence_before_ret_none;

>        else

> @@ -13382,7 +13438,7 @@ md_show_usage (FILE *stream)

>    -mlfence-before-indirect-branch=[none|all|register|memory] (default: none)\n\

>                            generate lfence before indirect near branch\n"));

>    fprintf (stream, _("\

> -  -mlfence-before-ret=[none|or|not] (default: none)\n\

> +  -mlfence-before-ret=[none|or|not|shl|yes] (default: none)\n\

>                            generate lfence before ret\n"));

>    fprintf (stream, _("\

>    -mamd64                 accept only AMD64 ISA [default]\n"));

> diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi

> index 628fb1ad5a..4acece4394 100644

> --- a/gas/doc/c-i386.texi

> +++ b/gas/doc/c-i386.texi

> @@ -488,6 +488,8 @@ before indirect near branch instructions.

>  @option{-mlfence-before-indirect-branch=@var{all}} will generate lfence

>  before indirect near branch via register and issue a warning before

>  indirect near branch via memory.

> +It also implicitly sets @option{-mlfence-before-ret=@var{shl}} when

> +there's no explict @option{-mlfence-before-ret=}.

>  @option{-mlfence-before-indirect-branch=@var{register}} will generate

>  lfence before indirect near branch via register.

>  @option{-mlfence-before-indirect-branch=@var{memory}} will issue a

> @@ -501,15 +503,17 @@ after loading branch target register.

>  @cindex @samp{-mlfence-before-ret=} option, i386

>  @cindex @samp{-mlfence-before-ret=} option, x86-64

>  @item -mlfence-before-ret=@var{none}

> +@item -mlfence-before-ret=@var{shl}

>  @item -mlfence-before-ret=@var{or}

> +@item -mlfence-before-ret=@var{yes}

>  @itemx -mlfence-before-ret=@var{not}

>  These options control whether the assembler should generate lfence

>  before ret.  @option{-mlfence-before-ret=@var{or}} will generate

>  generate or instruction with lfence.

> -@option{-mlfence-before-ret=@var{not}} will generate not instruction

> -with lfence.

> -@option{-mlfence-before-ret=@var{none}} will not generate lfence,

> -which is the default.

> +@option{-mlfence-before-ret=@var{shl/yes}} will generate shl instruction

> +with lfence. @option{-mlfence-before-ret=@var{not}} will generate not

> +instruction with lfence. @option{-mlfence-before-ret=@var{none}} will not

> +generate lfence, which is the default.

>

>  @cindex @samp{-mx86-used-note=} option, i386

>  @cindex @samp{-mx86-used-note=} option, x86-64

> diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp

> index 9dacc11906..3bacb80178 100644

> --- a/gas/testsuite/gas/i386/i386.exp

> +++ b/gas/testsuite/gas/i386/i386.exp

> @@ -535,6 +535,8 @@ if [expr ([istarget "i*86-*-*"] ||  [istarget

> "x86_64-*-*"]) && [gas_32_check]]

>      run_dump_test "lfence-indbr-c"

>      run_dump_test "lfence-ret-a"

>      run_dump_test "lfence-ret-b"

> +    run_dump_test "lfence-ret-c"

> +    run_dump_test "lfence-ret-d"

>      run_dump_test "lfence-byte"

>

>      # These tests require support for 8 and 16 bit relocs,

> @@ -1122,6 +1124,9 @@ if [expr ([istarget "i*86-*-*"] || [istarget

> "x86_64-*-*"]) && [gas_64_check]] t

>      run_dump_test "x86-64-lfence-indbr-c"

>      run_dump_test "x86-64-lfence-ret-a"

>      run_dump_test "x86-64-lfence-ret-b"

> +    run_dump_test "x86-64-lfence-ret-c"

> +    run_dump_test "x86-64-lfence-ret-d"

> +    run_dump_test "x86-64-lfence-ret-e"

>      run_dump_test "x86-64-lfence-byte"

>

>      if { ![istarget "*-*-aix*"]

> diff --git a/gas/testsuite/gas/i386/lfence-load.d

> b/gas/testsuite/gas/i386/lfence-load.d

> index cd7e7f76df..0d355df556 100644

> --- a/gas/testsuite/gas/i386/lfence-load.d

> +++ b/gas/testsuite/gas/i386/lfence-load.d

> @@ -1,5 +1,6 @@

>  #as: -mlfence-after-load=yes

>  #objdump: -dw

> +#warning_output: lfence-load.e

>  #name: -mlfence-after-load=yes

>

>  .*: +file format .*

> @@ -15,6 +16,31 @@ Disassembly of section .text:

>   +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%ebp\)

>   +[a-f0-9]+: 0f ae e8              lfence

>   +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%ebp\)

> + +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%ebp\),%edx

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 0f 01 7d 00          invlpg 0x0\(%ebp\)

> + +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%ebp\)

> + +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%ebp\)

> + +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%ebp\)

> + +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%ebp\)

> + +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%ebp\),%bnd1

> + +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%ebp\),%bnd1

> + +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%ebp\),%bnd1

> + +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%ebp\),%bnd1

> + +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%ebp\)

> + +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%ebp\),%bnd1

> + +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%ebp\)

> + +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%ebp\)

> + +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%ebp\)

> + +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%ebp\)

> + +[a-f0-9]+: 1f                    pop    %ds

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 9d                    popf

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 61                    popa

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: d7                    xlat   %ds:\(%ebx\)

> + +[a-f0-9]+: 0f ae e8              lfence

>   +[a-f0-9]+: d9 55 00              fsts   0x0\(%ebp\)

>   +[a-f0-9]+: d9 45 00              flds   0x0\(%ebp\)

>   +[a-f0-9]+: 0f ae e8              lfence

> diff --git a/gas/testsuite/gas/i386/lfence-load.e

> b/gas/testsuite/gas/i386/lfence-load.e

> new file mode 100644

> index 0000000000..1ee49da7fd

> --- /dev/null

> +++ b/gas/testsuite/gas/i386/lfence-load.e

> @@ -0,0 +1,3 @@

> +.*: Assembler messages:

> +.*:??: Warning: `scas` changes flags which would affect control flow behavior

> +.*:??: Warning: `cmps` changes flags which would affect control flow behavior

> diff --git a/gas/testsuite/gas/i386/lfence-load.s

> b/gas/testsuite/gas/i386/lfence-load.s

> index b417ac644e..4b4aa1610b 100644

> --- a/gas/testsuite/gas/i386/lfence-load.s

> +++ b/gas/testsuite/gas/i386/lfence-load.s

> @@ -4,6 +4,26 @@ _start:

>   lgdt (%ebp)

>   vmptrld (%ebp)

>   vmclear (%ebp)

> + invpcid (%ebp), %edx

> + invlpg (%ebp)

> + clflush (%ebp)

> + clflushopt (%ebp)

> + clwb (%ebp)

> + cldemote (%ebp)

> + bndmk (%ebp), %bnd1

> + bndcl (%ebp), %bnd1

> + bndcu (%ebp), %bnd1

> + bndcn (%ebp), %bnd1

> + bndstx %bnd1, (%ebp)

> + bndldx (%ebp), %bnd1

> + prefetcht0 (%ebp)

> + prefetcht1 (%ebp)

> + prefetcht2 (%ebp)

> + prefetchw (%ebp)

> + pop %ds

> + popf

> + popa

> + xlatb (%ebx)

>   fsts (%ebp)

>   flds (%ebp)

>   fistl (%ebp)

> diff --git a/gas/testsuite/gas/i386/lfence-ret-a.d

> b/gas/testsuite/gas/i386/lfence-ret-a.d

> index 719cf1b472..aa35857664 100644

> --- a/gas/testsuite/gas/i386/lfence-ret-a.d

> +++ b/gas/testsuite/gas/i386/lfence-ret-a.d

> @@ -9,10 +9,28 @@

>  Disassembly of section .text:

>

>  0+ <_start>:

> + +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c3                retw

> + +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c2 14 00          retw   \$0x14

>   +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)

>   +[a-f0-9]+: 0f ae e8              lfence

>   +[a-f0-9]+: c3                    ret

>   +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)

>   +[a-f0-9]+: 0f ae e8              lfence

>   +[a-f0-9]+: c2 1e 00              ret    \$0x1e

> + +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 cb                lretw

> + +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28

> + +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: cb                    lret

> + +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: ca 28 00              lret   \$0x28

>  #pass

> diff --git a/gas/testsuite/gas/i386/lfence-ret-b.d

> b/gas/testsuite/gas/i386/lfence-ret-b.d

> index e3914b9c28..77001c425e 100644

> --- a/gas/testsuite/gas/i386/lfence-ret-b.d

> +++ b/gas/testsuite/gas/i386/lfence-ret-b.d

> @@ -9,6 +9,14 @@

>  Disassembly of section .text:

>

>  0+ <_start>:

> + +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)

> + +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c3                retw

> + +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)

> + +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c2 14 00          retw   \$0x14

>   +[a-f0-9]+: f7 14 24              notl   \(%esp\)

>   +[a-f0-9]+: f7 14 24              notl   \(%esp\)

>   +[a-f0-9]+: 0f ae e8              lfence

> @@ -17,4 +25,20 @@ Disassembly of section .text:

>   +[a-f0-9]+: f7 14 24              notl   \(%esp\)

>   +[a-f0-9]+: 0f ae e8              lfence

>   +[a-f0-9]+: c2 1e 00              ret    \$0x1e

> + +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)

> + +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 cb                lretw

> + +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)

> + +[a-f0-9]+: 66 f7 14 24          notw   \(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28

> + +[a-f0-9]+: f7 14 24              notl   \(%esp\)

> + +[a-f0-9]+: f7 14 24              notl   \(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: cb                    lret

> + +[a-f0-9]+: f7 14 24              notl   \(%esp\)

> + +[a-f0-9]+: f7 14 24              notl   \(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: ca 28 00              lret   \$0x28

>  #pass

> diff --git a/gas/testsuite/gas/i386/lfence-ret-c.d

> b/gas/testsuite/gas/i386/lfence-ret-c.d

> new file mode 100644

> index 0000000000..fceb0eb182

> --- /dev/null

> +++ b/gas/testsuite/gas/i386/lfence-ret-c.d

> @@ -0,0 +1,35 @@

> +#source: lfence-ret.s

> +#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all

> +#objdump: -dw

> +

> +.*: +file format .*

> +

> +

> +Disassembly of section .text:

> +

> +0+ <_start>:

> + +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c3                retw

> + +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c2 14 00          retw   \$0x14

> + +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: c3                    ret

> + +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: c2 1e 00              ret    \$0x1e

> + +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 cb                lretw

> + +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28

> + +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: cb                    lret

> + +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: ca 28 00              lret   \$0x28

> +#pass

> diff --git a/gas/testsuite/gas/i386/lfence-ret-d.d

> b/gas/testsuite/gas/i386/lfence-ret-d.d

> new file mode 100644

> index 0000000000..03f8f88fd7

> --- /dev/null

> +++ b/gas/testsuite/gas/i386/lfence-ret-d.d

> @@ -0,0 +1,36 @@

> +#source: lfence-ret.s

> +#as: -mlfence-before-ret=shl

> +#objdump: -dw

> +#name: -mlfence-before-ret=shl

> +

> +.*: +file format .*

> +

> +

> +Disassembly of section .text:

> +

> +0+ <_start>:

> + +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c3                retw

> + +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c2 14 00          retw   \$0x14

> + +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: c3                    ret

> + +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: c2 1e 00              ret    \$0x1e

> + +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 cb                lretw

> + +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28

> + +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: cb                    lret

> + +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%esp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: ca 28 00              lret   \$0x28

> +#pass

> diff --git a/gas/testsuite/gas/i386/lfence-ret.s

> b/gas/testsuite/gas/i386/lfence-ret.s

> index 35c4e6eeaa..f27fa5839e 100644

> --- a/gas/testsuite/gas/i386/lfence-ret.s

> +++ b/gas/testsuite/gas/i386/lfence-ret.s

> @@ -1,4 +1,10 @@

>   .text

>  _start:

> + retw

> + retw $20

>   ret

>   ret $30

> + lretw

> + lretw $40

> + lret

> + lret $40

> diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.d

> b/gas/testsuite/gas/i386/x86-64-lfence-load.d

> index 4f6cd00edf..5cd764391d 100644

> --- a/gas/testsuite/gas/i386/x86-64-lfence-load.d

> +++ b/gas/testsuite/gas/i386/x86-64-lfence-load.d

> @@ -1,5 +1,6 @@

>  #as: -mlfence-after-load=yes

>  #objdump: -dw

> +#warning_output: lfence-load.e

>  #name: x86-64 -mlfence-after-load=yes

>

>  .*: +file format .*

> @@ -15,6 +16,29 @@ Disassembly of section .text:

>   +[a-f0-9]+: 0f c7 75 00          vmptrld 0x0\(%rbp\)

>   +[a-f0-9]+: 0f ae e8              lfence

>   +[a-f0-9]+: 66 0f c7 75 00        vmclear 0x0\(%rbp\)

> + +[a-f0-9]+: 66 0f 38 82 55 00    invpcid 0x0\(%rbp\),%rdx

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 67 0f 01 38          invlpg \(%eax\)

> + +[a-f0-9]+: 0f ae 7d 00          clflush 0x0\(%rbp\)

> + +[a-f0-9]+: 66 0f ae 7d 00        clflushopt 0x0\(%rbp\)

> + +[a-f0-9]+: 66 0f ae 75 00        clwb   0x0\(%rbp\)

> + +[a-f0-9]+: 0f 1c 45 00          cldemote 0x0\(%rbp\)

> + +[a-f0-9]+: f3 0f 1b 4d 00        bndmk  0x0\(%rbp\),%bnd1

> + +[a-f0-9]+: f3 0f 1a 4d 00        bndcl  0x0\(%rbp\),%bnd1

> + +[a-f0-9]+: f2 0f 1a 4d 00        bndcu  0x0\(%rbp\),%bnd1

> + +[a-f0-9]+: f2 0f 1b 4d 00        bndcn  0x0\(%rbp\),%bnd1

> + +[a-f0-9]+: 0f 1b 4d 00          bndstx %bnd1,0x0\(%rbp\)

> + +[a-f0-9]+: 0f 1a 4d 00          bndldx 0x0\(%rbp\),%bnd1

> + +[a-f0-9]+: 0f 18 4d 00          prefetcht0 0x0\(%rbp\)

> + +[a-f0-9]+: 0f 18 55 00          prefetcht1 0x0\(%rbp\)

> + +[a-f0-9]+: 0f 18 5d 00          prefetcht2 0x0\(%rbp\)

> + +[a-f0-9]+: 0f 0d 4d 00          prefetchw 0x0\(%rbp\)

> + +[a-f0-9]+: 0f a1                popq   %fs

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 9d                    popfq

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: d7                    xlat   %ds:\(%rbx\)

> + +[a-f0-9]+: 0f ae e8              lfence

>   +[a-f0-9]+: d9 55 00              fsts   0x0\(%rbp\)

>   +[a-f0-9]+: d9 45 00              flds   0x0\(%rbp\)

>   +[a-f0-9]+: 0f ae e8              lfence

> diff --git a/gas/testsuite/gas/i386/x86-64-lfence-load.s

> b/gas/testsuite/gas/i386/x86-64-lfence-load.s

> index 76d0886617..2a3ac6b7d2 100644

> --- a/gas/testsuite/gas/i386/x86-64-lfence-load.s

> +++ b/gas/testsuite/gas/i386/x86-64-lfence-load.s

> @@ -4,6 +4,25 @@ _start:

>   lgdt (%rbp)

>   vmptrld (%rbp)

>   vmclear (%rbp)

> + invpcid (%rbp), %rdx

> + invlpg (%eax)

> + clflush (%rbp)

> + clflushopt (%rbp)

> + clwb (%rbp)

> + cldemote (%rbp)

> + bndmk (%rbp), %bnd1

> + bndcl (%rbp), %bnd1

> + bndcu (%rbp), %bnd1

> + bndcn (%rbp), %bnd1

> + bndstx %bnd1, (%rbp)

> + bndldx (%rbp), %bnd1

> + prefetcht0 (%rbp)

> + prefetcht1 (%rbp)

> + prefetcht2 (%rbp)

> + prefetchw (%rbp)

> + pop %fs

> + popf

> + xlatb (%rbx)

>   fsts (%rbp)

>   flds (%rbp)

>   fistl (%rbp)

> diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d

> b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d

> index 26e5b48bec..345217b17c 100644

> --- a/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d

> +++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-a.d

> @@ -1,6 +1,7 @@

> -#source: lfence-ret.s

> +#source: x86-64-lfence-ret.s

>  #as: -mlfence-before-ret=or

> -#objdump: -dw

> +#warning_output: x86-64-lfence-ret.e

> +#objdump: -dw -Mintel64

>  #name: x86-64 -mlfence-before-ret=or

>

>  .*: +file format .*

> @@ -9,10 +10,40 @@

>  Disassembly of section .text:

>

>  0+ <_start>:

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c3                data16 retq

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14

>   +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

>   +[a-f0-9]+: 0f ae e8              lfence

>   +[a-f0-9]+: c3                    retq

>   +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

>   +[a-f0-9]+: 0f ae e8              lfence

>   +[a-f0-9]+: c2 1e 00              retq   \$0x1e

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 48 c3              data16 rex.W retq

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28

> + +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 cb                lretw

> + +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28

> + +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: cb                    lret

> + +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: ca 28 00              lret   \$0x28

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 48 cb                lretq

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28

>  #pass

> diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d

> b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d

> index 340488831d..3947660fea 100644

> --- a/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d

> +++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-b.d

> @@ -1,6 +1,7 @@

> -#source: lfence-ret.s

> +#source: x86-64-lfence-ret.s

>  #as: -mlfence-before-ret=not

> -#objdump: -dw

> +#warning_output: x86-64-lfence-ret.e

> +#objdump: -dw -Mintel64

>  #name: x86-64 -mlfence-before-ret=not

>

>  .*: +file format .*

> @@ -9,6 +10,14 @@

>  Disassembly of section .text:

>

>  0+ <_start>:

> + +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

> + +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c3                data16 retq

> + +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

> + +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14

>   +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

>   +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

>   +[a-f0-9]+: 0f ae e8              lfence

> @@ -17,4 +26,36 @@ Disassembly of section .text:

>   +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

>   +[a-f0-9]+: 0f ae e8              lfence

>   +[a-f0-9]+: c2 1e 00              retq   \$0x1e

> + +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

> + +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 48 c3              data16 rex.W retq

> + +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

> + +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28

> + +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)

> + +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 cb                lretw

> + +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)

> + +[a-f0-9]+: 66 f7 14 24          notw   \(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28

> + +[a-f0-9]+: f7 14 24              notl   \(%rsp\)

> + +[a-f0-9]+: f7 14 24              notl   \(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: cb                    lret

> + +[a-f0-9]+: f7 14 24              notl   \(%rsp\)

> + +[a-f0-9]+: f7 14 24              notl   \(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: ca 28 00              lret   \$0x28

> + +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

> + +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 48 cb                lretq

> + +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

> + +[a-f0-9]+: 48 f7 14 24          notq   \(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28

>  #pass

> diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d

> b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d

> new file mode 100644

> index 0000000000..cd89a95bc4

> --- /dev/null

> +++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-c.d

> @@ -0,0 +1,48 @@

> +#source: x86-64-lfence-ret.s

> +#as: -mlfence-before-ret=or -mlfence-before-indirect-branch=all

> +#warning_output: x86-64-lfence-ret.e

> +#objdump: -dw -Mintel64

> +

> +.*: +file format .*

> +

> +

> +Disassembly of section .text:

> +

> +0+ <_start>:

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c3                data16 retq

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: c3                    retq

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: c2 1e 00              retq   \$0x1e

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 48 c3              data16 rex.W retq

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28

> + +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 cb                lretw

> + +[a-f0-9]+: 66 83 0c 24 00        orw    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28

> + +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: cb                    lret

> + +[a-f0-9]+: 83 0c 24 00          orl    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: ca 28 00              lret   \$0x28

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 48 cb                lretq

> + +[a-f0-9]+: 48 83 0c 24 00        orq    \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28

> +#pass

> diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d

> b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d

> new file mode 100644

> index 0000000000..593b889435

> --- /dev/null

> +++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-d.d

> @@ -0,0 +1,49 @@

> +#source: x86-64-lfence-ret.s

> +#as: -mlfence-before-ret=shl

> +#warning_output: x86-64-lfence-ret.e

> +#objdump: -dw -Mintel64

> +#name: x86-64 -mlfence-before-ret=shl

> +

> +.*: +file format .*

> +

> +

> +Disassembly of section .text:

> +

> +0+ <_start>:

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c3                data16 retq

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: c3                    retq

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: c2 1e 00              retq   \$0x1e

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 48 c3              data16 rex.W retq

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28

> + +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 cb                lretw

> + +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28

> + +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: cb                    lret

> + +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: ca 28 00              lret   \$0x28

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 48 cb                lretq

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28

> +#pass

> diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d

> b/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d

> new file mode 100644

> index 0000000000..b4d229654c

> --- /dev/null

> +++ b/gas/testsuite/gas/i386/x86-64-lfence-ret-e.d

> @@ -0,0 +1,49 @@

> +#source: x86-64-lfence-ret.s

> +#as: -mlfence-before-ret=shl

> +#warning_output: x86-64-lfence-ret.e

> +#objdump: -dw -Mintel64

> +#name: x86-64 -mlfence-before-ret=yes

> +

> +.*: +file format .*

> +

> +

> +Disassembly of section .text:

> +

> +0+ <_start>:

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c3                data16 retq

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 c2 14 00          data16 retq \$0x14

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: c3                    retq

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: c2 1e 00              retq   \$0x1e

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 48 c3              data16 rex.W retq

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 48 c2 28 00        data16 rex.W retq \$0x28

> + +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 cb                lretw

> + +[a-f0-9]+: 66 c1 24 24 00        shlw   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 66 ca 28 00          lretw  \$0x28

> + +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: cb                    lret

> + +[a-f0-9]+: c1 24 24 00          shll   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: ca 28 00              lret   \$0x28

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 48 cb                lretq

> + +[a-f0-9]+: 48 c1 24 24 00        shlq   \$0x0,\(%rsp\)

> + +[a-f0-9]+: 0f ae e8              lfence

> + +[a-f0-9]+: 48 ca 28 00          lretq  \$0x28

> +#pass

> diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret.e

> b/gas/testsuite/gas/i386/x86-64-lfence-ret.e

> new file mode 100644

> index 0000000000..13730e50e6

> --- /dev/null

> +++ b/gas/testsuite/gas/i386/x86-64-lfence-ret.e

> @@ -0,0 +1,3 @@

> +.*: Assembler messages:

> +.*:??: Warning: no instruction mnemonic suffix given and no register

> operands; using default for `lret'

> +.*:??: Warning: no instruction mnemonic suffix given and no register

> operands; using default for `lret'

> diff --git a/gas/testsuite/gas/i386/x86-64-lfence-ret.s

> b/gas/testsuite/gas/i386/x86-64-lfence-ret.s

> new file mode 100644

> index 0000000000..986239c222

> --- /dev/null

> +++ b/gas/testsuite/gas/i386/x86-64-lfence-ret.s

> @@ -0,0 +1,14 @@

> + .text

> +_start:

> + retw

> + retw $20

> + ret

> + ret $30

> + data16 rex.w ret

> + data16 rex.w ret $40

> + lretw

> + lretw $40

> + lret

> + lret $40

> + lretq

> + lretq $40

> --

> 2.18.1

>

>


OK.

Thanks.


-- 
H.J.

Patch

diff --git a/gas/ChangeLog b/gas/ChangeLog
index 836cb5c6d9..d581cc3d47 100644
--- a/gas/ChangeLog
+++ b/gas/ChangeLog
@@ -1,3 +1,31 @@ 
+2020-03-10  H.J. Lu  <hongjiu.lu@intel.com>
+
+	* config/tc-i386.c (lfence_after_load): New.
+	(lfence_before_indirect_branch_kind): New.
+	(lfence_before_indirect_branch): New.
+	(lfence_before_ret_kind): New.
+	(lfence_before_ret): New.
+	(last_insn): New.
+	(load_insn_p): New.
+	(insert_lfence_after): New.
+	(insert_lfence_before): New.
+	(md_assemble): Call insert_lfence_before and insert_lfence_after.
+	Set last_insn.
+	(OPTION_MLFENCE_AFTER_LOAD): New.
+	(OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH): New.
+	(OPTION_MLFENCE_BEFORE_RET): New.
+	(md_longopts): Add -mlfence-after-load=,
+	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
+	(md_parse_option): Handle -mlfence-after-load=,
+	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
+	(md_show_usage): Display -mlfence-after-load=,
+	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
+	(i386_cons_align): New.
+	* config/tc-i386.h (i386_cons_align): New.
+	(md_cons_align): New.
+	* doc/c-i386.texi: Document -mlfence-after-load=,
+	-mlfence-before-indirect-branch= and -mlfence-before-ret=.
+
 2020-03-10  Alan Modra  <amodra@gmail.com>
 
 	* config/tc-csky.c (get_operand_value): Rewrite 1 << 31 expressions
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index b020f39c86..916fc8b235 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -629,7 +629,29 @@  static int omit_lock_prefix = 0;
    "lock addl $0, (%{re}sp)".  */
 static int avoid_fence = 0;
 
-/* Type of the previous instruction.  */
+/* 1 if lfence should be inserted after every load.  */
+static int lfence_after_load = 0;
+
+/* Non-zero if lfence should be inserted before indirect branch.  */
+static enum lfence_before_indirect_branch_kind
+  {
+    lfence_branch_none = 0,
+    lfence_branch_register,
+    lfence_branch_memory,
+    lfence_branch_all
+  }
+lfence_before_indirect_branch;
+
+/* Non-zero if lfence should be inserted before ret.  */
+static enum lfence_before_ret_kind
+  {
+    lfence_before_ret_none = 0,
+    lfence_before_ret_not,
+    lfence_before_ret_or
+  }
+lfence_before_ret;
+
+/* Types of previous instruction is .byte or prefix.  */
 static struct
   {
     segT seg;
@@ -4311,6 +4333,291 @@  optimize_encoding (void)
     }
 }
 
+/* Return non-zero for load instruction.  */
+
+static int
+load_insn_p (void)
+{
+  unsigned int dest;
+  int any_vex_p = is_any_vex_encoding (&i.tm);
+
+  if (!any_vex_p)
+    {
+      /* lea  */
+      if (i.tm.base_opcode == 0x8d)
+	return 0;
+
+      /* pop  */
+      if ((i.tm.base_opcode & 0xfffffff8) == 0x58
+	  || (i.tm.base_opcode == 0x8f && i.tm.extension_opcode == 0))
+	return 1;
+
+      /* movs, cmps, lods, scas.  */
+      if ((i.tm.base_opcode >= 0xa4 && i.tm.base_opcode <= 0xa7)
+	  || (i.tm.base_opcode >= 0xac && i.tm.base_opcode <= 0xaf))
+	return 1;
+
+      /* outs */
+      if (i.tm.base_opcode == 0x6e || i.tm.base_opcode == 0x6f)
+	return 1;
+    }
+
+  /* No memory operand.  */
+  if (!i.mem_operands)
+    return 0;
+
+  if (any_vex_p)
+    {
+      /* vldmxcsr.  */
+      if (i.tm.base_opcode == 0xae
+	  && i.tm.opcode_modifier.vex
+	  && i.tm.opcode_modifier.vexopcode == VEX0F
+	  && i.tm.extension_opcode == 2)
+	return 1;
+    }
+  else
+    {
+      /* test, not, neg, mul, imul, div, idiv.  */
+      if ((i.tm.base_opcode == 0xf6 || i.tm.base_opcode == 0xf7)
+	  && i.tm.extension_opcode != 1)
+	return 1;
+
+      /* inc, dec.  */
+      if ((i.tm.base_opcode == 0xfe || i.tm.base_opcode == 0xff)
+	  && i.tm.extension_opcode <= 1)
+	return 1;
+
+      /* add, or, adc, sbb, and, sub, xor, cmp.  */
+      if (i.tm.base_opcode >= 0x80 && i.tm.base_opcode <= 0x83)
+	return 1;
+
+      /* bt, bts, btr, btc.  */
+      if (i.tm.base_opcode == 0xfba
+	  && (i.tm.extension_opcode >= 4 && i.tm.extension_opcode <= 7))
+	return 1;
+
+      /* rol, ror, rcl, rcr, shl/sal, shr, sar. */
+      if ((i.tm.base_opcode == 0xc0
+	   || i.tm.base_opcode == 0xc1
+	   || (i.tm.base_opcode >= 0xd0 && i.tm.base_opcode <= 0xd3))
+	  && i.tm.extension_opcode != 6)
+	return 1;
+
+      /* cmpxchg8b, cmpxchg16b, xrstors.  */
+      if (i.tm.base_opcode == 0xfc7
+	  && (i.tm.extension_opcode == 1 || i.tm.extension_opcode == 3))
+	return 1;
+
+      /* fxrstor, ldmxcsr, xrstor.  */
+      if (i.tm.base_opcode == 0xfae
+	  && (i.tm.extension_opcode == 1
+	      || i.tm.extension_opcode == 2
+	      || i.tm.extension_opcode == 5))
+	return 1;
+
+      /* lgdt, lidt, lmsw.  */
+      if (i.tm.base_opcode == 0xf01
+	  && (i.tm.extension_opcode == 2
+	      || i.tm.extension_opcode == 3
+	      || i.tm.extension_opcode == 6))
+	return 1;
+
+      /* vmptrld */
+      if (i.tm.base_opcode == 0xfc7
+	  && i.tm.extension_opcode == 6)
+	return 1;
+
+      /* Check for x87 instructions.  */
+      if (i.tm.base_opcode >= 0xd8 && i.tm.base_opcode <= 0xdf)
+	{
+	  /* Skip fst, fstp, fstenv, fstcw.  */
+	  if (i.tm.base_opcode == 0xd9
+	      && (i.tm.extension_opcode == 2
+		  || i.tm.extension_opcode == 3
+		  || i.tm.extension_opcode == 6
+		  || i.tm.extension_opcode == 7))
+	    return 0;
+
+	  /* Skip fisttp, fist, fistp, fstp.  */
+	  if (i.tm.base_opcode == 0xdb
+	      && (i.tm.extension_opcode == 1
+		  || i.tm.extension_opcode == 2
+		  || i.tm.extension_opcode == 3
+		  || i.tm.extension_opcode == 7))
+	    return 0;
+
+	  /* Skip fisttp, fst, fstp, fsave, fstsw.  */
+	  if (i.tm.base_opcode == 0xdd
+	      && (i.tm.extension_opcode == 1
+		  || i.tm.extension_opcode == 2
+		  || i.tm.extension_opcode == 3
+		  || i.tm.extension_opcode == 6
+		  || i.tm.extension_opcode == 7))
+	    return 0;
+
+	  /* Skip fisttp, fist, fistp, fbstp, fistp.  */
+	  if (i.tm.base_opcode == 0xdf
+	      && (i.tm.extension_opcode == 1
+		  || i.tm.extension_opcode == 2
+		  || i.tm.extension_opcode == 3
+		  || i.tm.extension_opcode == 6
+		  || i.tm.extension_opcode == 7))
+	    return 0;
+
+	  return 1;
+	}
+    }
+
+  dest = i.operands - 1;
+
+  /* Check fake imm8 operand and 3 source operands.  */
+  if ((i.tm.opcode_modifier.immext
+       || i.tm.opcode_modifier.vexsources == VEX3SOURCES)
+      && i.types[dest].bitfield.imm8)
+    dest--;
+
+  /* add, or, adc, sbb, and, sub, xor, cmp, test, xchg, xadd  */
+  if (!any_vex_p
+      && (i.tm.base_opcode == 0x0
+	  || i.tm.base_opcode == 0x1
+	  || i.tm.base_opcode == 0x8
+	  || i.tm.base_opcode == 0x9
+	  || i.tm.base_opcode == 0x10
+	  || i.tm.base_opcode == 0x11
+	  || i.tm.base_opcode == 0x18
+	  || i.tm.base_opcode == 0x19
+	  || i.tm.base_opcode == 0x20
+	  || i.tm.base_opcode == 0x21
+	  || i.tm.base_opcode == 0x28
+	  || i.tm.base_opcode == 0x29
+	  || i.tm.base_opcode == 0x30
+	  || i.tm.base_opcode == 0x31
+	  || i.tm.base_opcode == 0x38
+	  || i.tm.base_opcode == 0x39
+	  || (i.tm.base_opcode >= 0x84 && i.tm.base_opcode <= 0x87)
+	  || i.tm.base_opcode == 0xfc0
+	  || i.tm.base_opcode == 0xfc1))
+    return 1;
+
+  /* Check for load instruction.  */
+  return (i.types[dest].bitfield.class != ClassNone
+	  || i.types[dest].bitfield.instance == Accum);
+}
+
+/* Output lfence, 0xfaee8, after instruction.  */
+
+static void
+insert_lfence_after (void)
+{
+  if (lfence_after_load && load_insn_p ())
+    {
+      char *p = frag_more (3);
+      *p++ = 0xf;
+      *p++ = 0xae;
+      *p = 0xe8;
+    }
+}
+
+/* Output lfence, 0xfaee8, before instruction.  */
+
+static void
+insert_lfence_before (void)
+{
+  char *p;
+
+  if (i.tm.base_opcode == 0xff
+      && (i.tm.extension_opcode == 2 || i.tm.extension_opcode == 4))
+    {
+      /* Insert lfence before indirect branch if needed.  */
+
+      if (lfence_before_indirect_branch == lfence_branch_none)
+	return;
+
+      if (i.operands != 1)
+	abort ();
+
+      if (i.reg_operands == 1)
+	{
+	  /* Indirect branch via register.  Don't insert lfence with
+	     -mlfence-after-load=yes.  */
+	  if (lfence_after_load
+	      || lfence_before_indirect_branch == lfence_branch_memory)
+	    return;
+	}
+      else if (i.mem_operands == 1
+	       && lfence_before_indirect_branch != lfence_branch_register)
+	{
+	  as_warn (_("indirect branch `%s` over memory should be avoided"),
+		   i.tm.name);
+	  return;
+	}
+      else
+	return;
+
+      if (last_insn.kind != last_insn_other
+	  && last_insn.seg == now_seg)
+	{
+	  as_warn_where (last_insn.file, last_insn.line,
+			 _("`%s` skips -mlfence-before-indirect-branch on `%s`"),
+			 last_insn.name, i.tm.name);
+	  return;
+	}
+
+      p = frag_more (3);
+      *p++ = 0xf;
+      *p++ = 0xae;
+      *p = 0xe8;
+      return;
+    }
+
+  /* Output orl/notl and lfence before ret.  */
+  if (lfence_before_ret != lfence_before_ret_none
+      && (i.tm.base_opcode == 0xc2
+	  || i.tm.base_opcode == 0xc3
+	  || i.tm.base_opcode == 0xca
+	  || i.tm.base_opcode == 0xcb))
+    {
+      if (last_insn.kind != last_insn_other
+	  && last_insn.seg == now_seg)
+	{
+	  as_warn_where (last_insn.file, last_insn.line,
+			 _("`%s` skips -mlfence-before-ret on `%s`"),
+			 last_insn.name, i.tm.name);
+	  return;
+	}
+      if (lfence_before_ret == lfence_before_ret_or)
+	{
+	  /* orl: 0x830c2400.  */
+	  p = frag_more ((flag_code == CODE_64BIT ? 1 : 0) + 4 + 3);
+	  if (flag_code == CODE_64BIT)
+	    *p++ = 0x48;
+	  *p++ = 0x83;
+	  *p++ = 0xc;
+	  *p++ = 0x24;
+	  *p++ = 0x0;
+	}
+      else
+	{
+	  p = frag_more ((flag_code == CODE_64BIT ? 2 : 0) + 6 + 3);
+	  /* notl: 0xf71424.  */
+	  if (flag_code == CODE_64BIT)
+	    *p++ = 0x48;
+	  *p++ = 0xf7;
+	  *p++ = 0x14;
+	  *p++ = 0x24;
+	  if (flag_code == CODE_64BIT)
+	    *p++ = 0x48;
+	  /* notl: 0xf71424.  */
+	  *p++ = 0xf7;
+	  *p++ = 0x14;
+	  *p++ = 0x24;
+	}
+      *p++ = 0xf;
+      *p++ = 0xae;
+      *p = 0xe8;
+    }
+}
+
 /* This is the guts of the machine-dependent assembler.  LINE points to a
    machine dependent instruction.  This function is supposed to emit
    the frags/bytes it assembles to.  */
@@ -4628,9 +4935,13 @@  md_assemble (char *line)
   if (i.rex != 0)
     add_prefix (REX_OPCODE | i.rex);
 
+  insert_lfence_before ();
+
   /* We are ready to output the insn.  */
   output_insn ();
 
+  insert_lfence_after ();
+
   last_insn.seg = now_seg;
 
   if (i.tm.opcode_modifier.isprefix)
@@ -12250,6 +12561,9 @@  const char *md_shortopts = "qnO::";
 #define OPTION_MALIGN_BRANCH_PREFIX_SIZE (OPTION_MD_BASE + 28)
 #define OPTION_MALIGN_BRANCH (OPTION_MD_BASE + 29)
 #define OPTION_MBRANCHES_WITH_32B_BOUNDARIES (OPTION_MD_BASE + 30)
+#define OPTION_MLFENCE_AFTER_LOAD (OPTION_MD_BASE + 31)
+#define OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH (OPTION_MD_BASE + 32)
+#define OPTION_MLFENCE_BEFORE_RET (OPTION_MD_BASE + 33)
 
 struct option md_longopts[] =
 {
@@ -12289,6 +12603,10 @@  struct option md_longopts[] =
   {"malign-branch-prefix-size", required_argument, NULL, OPTION_MALIGN_BRANCH_PREFIX_SIZE},
   {"malign-branch", required_argument, NULL, OPTION_MALIGN_BRANCH},
   {"mbranches-within-32B-boundaries", no_argument, NULL, OPTION_MBRANCHES_WITH_32B_BOUNDARIES},
+  {"mlfence-after-load", required_argument, NULL, OPTION_MLFENCE_AFTER_LOAD},
+  {"mlfence-before-indirect-branch", required_argument, NULL,
+   OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH},
+  {"mlfence-before-ret", required_argument, NULL, OPTION_MLFENCE_BEFORE_RET},
   {"mamd64", no_argument, NULL, OPTION_MAMD64},
   {"mintel64", no_argument, NULL, OPTION_MINTEL64},
   {NULL, no_argument, NULL, 0}
@@ -12668,6 +12986,41 @@  md_parse_option (int c, const char *arg)
         as_fatal (_("invalid -mfence-as-lock-add= option: `%s'"), arg);
       break;
 
+    case OPTION_MLFENCE_AFTER_LOAD:
+      if (strcasecmp (arg, "yes") == 0)
+	lfence_after_load = 1;
+      else if (strcasecmp (arg, "no") == 0)
+	lfence_after_load = 0;
+      else
+        as_fatal (_("invalid -mlfence-after-load= option: `%s'"), arg);
+      break;
+
+    case OPTION_MLFENCE_BEFORE_INDIRECT_BRANCH:
+      if (strcasecmp (arg, "all") == 0)
+	lfence_before_indirect_branch = lfence_branch_all;
+      else if (strcasecmp (arg, "memory") == 0)
+	lfence_before_indirect_branch = lfence_branch_memory;
+      else if (strcasecmp (arg, "register") == 0)
+	lfence_before_indirect_branch = lfence_branch_register;
+      else if (strcasecmp (arg, "none") == 0)
+	lfence_before_indirect_branch = lfence_branch_none;
+      else
+        as_fatal (_("invalid -mlfence-before-indirect-branch= option: `%s'"),
+		  arg);
+      break;
+
+    case OPTION_MLFENCE_BEFORE_RET:
+      if (strcasecmp (arg, "or") == 0)
+	lfence_before_ret = lfence_before_ret_or;
+      else if (strcasecmp (arg, "not") == 0)
+	lfence_before_ret = lfence_before_ret_not;
+      else if (strcasecmp (arg, "none") == 0)
+	lfence_before_ret = lfence_before_ret_none;
+      else
+        as_fatal (_("invalid -mlfence-before-ret= option: `%s'"),
+		  arg);
+      break;
+
     case OPTION_MRELAX_RELOCATIONS:
       if (strcasecmp (arg, "yes") == 0)
         generate_relax_relocations = 1;
@@ -13025,6 +13378,15 @@  md_show_usage (FILE *stream)
   -mbranches-within-32B-boundaries\n\
                           align branches within 32 byte boundary\n"));
   fprintf (stream, _("\
+  -mlfence-after-load=[no|yes] (default: no)\n\
+                          generate lfence after load\n"));
+  fprintf (stream, _("\
+  -mlfence-before-indirect-branch=[none|all|register|memory] (default: none)\n\
+                          generate lfence before indirect near branch\n"));
+  fprintf (stream, _("\
+  -mlfence-before-ret=[none|or|not] (default: none)\n\
+                          generate lfence before ret\n"));
+  fprintf (stream, _("\
   -mamd64                 accept only AMD64 ISA [default]\n"));
   fprintf (stream, _("\
   -mintel64               accept only Intel64 ISA\n"));
@@ -13254,6 +13616,10 @@  i386_cons_align (int ignore ATTRIBUTE_UNUSED)
       last_insn.kind = last_insn_directive;
       last_insn.name = "constant directive";
       last_insn.file = as_where (&last_insn.line);
+      if (lfence_before_ret != lfence_before_ret_none)
+	as_warn (_("constant directive skips -mlfence-before-ret"));
+      if (lfence_before_indirect_branch != lfence_branch_none)
+	as_warn (_("constant directive skips -mlfence-before-indirect-branch"));
     }
 }
 
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index c536759cb3..1dd99f91bb 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -464,6 +464,49 @@  on an instruction.  It is equivalent to
 @option{-malign-branch-prefix-size=5}.
 The default doesn't align branches.
 
+@cindex @samp{-mlfence-after-load=} option, i386
+@cindex @samp{-mlfence-after-load=} option, x86-64
+@item -mlfence-after-load=@var{no}
+@itemx -mlfence-after-load=@var{yes}
+These options control whether the assembler should generate lfence
+after load instructions.  @option{-mlfence-after-load=@var{yes}} will
+generate lfence.  @option{-mlfence-after-load=@var{no}} will not generate
+lfence, which is the default.
+
+@cindex @samp{-mlfence-before-indirect-branch=} option, i386
+@cindex @samp{-mlfence-before-indirect-branch=} option, x86-64
+@item -mlfence-before-indirect-branch=@var{none}
+@item -mlfence-before-indirect-branch=@var{all}
+@item -mlfence-before-indirect-branch=@var{register}
+@itemx -mlfence-before-indirect-branch=@var{memory}
+These options control whether the assembler should generate lfence
+after indirect near branch instructions.
+@option{-mlfence-before-indirect-branch=@var{all}} will generate lfence
+after indirect near branch via register and issue a warning before
+indirect near branch via memory.
+@option{-mlfence-before-indirect-branch=@var{register}} will generate
+lfence after indirect near branch via register.
+@option{-mlfence-before-indirect-branch=@var{memory}} will issue a
+warning before indirect near branch via memory.
+@option{-mlfence-before-indirect-branch=@var{none}} will not generate
+lfence nor issue warning, which is the default.  Note that lfence won't
+be generated before indirect near branch via register with
+@option{-mlfence-after-load=@var{yes}} since lfence will be generated
+after loading branch target register.
+
+@cindex @samp{-mlfence-before-ret=} option, i386
+@cindex @samp{-mlfence-before-ret=} option, x86-64
+@item -mlfence-before-ret=@var{none}
+@item -mlfence-before-ret=@var{or}
+@itemx -mlfence-before-ret=@var{not}
+These options control whether the assembler should generate lfence
+before ret.  @option{-mlfence-before-ret=@var{or}} will generate
+generate or instruction with lfence.
+@option{-mlfence-before-ret=@var{not}} will generate not instruction
+with lfence.
+@option{-mlfence-before-ret=@var{none}} will not generate lfence,
+which is the default.
+
 @cindex @samp{-mx86-used-note=} option, i386
 @cindex @samp{-mx86-used-note=} option, x86-64
 @item -mx86-used-note=@var{no}