V2: [PATCH] x32: Generate 0x67 prefix for VSIB address without base

Message ID 20190226043508.GA6382@gmail.com
State New
Headers show
Series
  • V2: [PATCH] x32: Generate 0x67 prefix for VSIB address without base
Related show

Commit Message

H.J. Lu Feb. 26, 2019, 4:35 a.m.
On Mon, Feb 25, 2019 at 02:54:28PM -0800, H.J. Lu wrote:
> Here is the updated patch.  Tested for glibc, GCC, binutils and CPU

> CPU 2000.

> 


This patch changed error into warning for GCC.


H.J.
---
In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base
register so that vector index register will be zero-extended to 64 bits.

We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is
segment override since address will be segment base + zero-extended to 64
bits of (base + index * scale + disp).  But GCC:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502

generates

	movl	$24, %edx
	movl	%fs:(%edx), %ecx

instead of

	movl	%fs:24, %ecx

So a warning:

Warning: segment `%fs' override with 32-bit address

is issued by default.  -moperand-check=error will turn a warning into
an error.

Error: can't encode segment `%fs' with 32-bit address

	PR gas/24263
	* config/tc-i386.c (output_insn): In x32, add 0x67 address size
	prefix for VSIB address without base register.  Issue a warning
	or an error if there is segment override with ADDR_PREFIX_OPCODE
	prefix.
	* testsuite/gas/i386/ilp32/ilp32.exp: Run x86-64-seg-inval.
	* testsuite/gas/i386/ilp32/x86-64-seg-inval.l: New file.
	* testsuite/gas/i386/ilp32/x86-64-seg-inval.s: Likewise.
	* estsuite/gas/i386/ilp32/x86-64-seg-warn.d: Likewise.
	* estsuite/gas/i386/ilp32/x86-64-seg-warn.s: Likewise.
	* testsuite/gas/i386/ilp32/x86-64-seg.d: Likewise.
	* testsuite/gas/i386/ilp32/x86-64-seg.s: Likewise.
---
 gas/config/tc-i386.c                          |  30 +++
 gas/testsuite/gas/i386/ilp32/ilp32.exp        |   1 +
 .../gas/i386/ilp32/x86-64-seg-inval.l         |   7 +
 .../gas/i386/ilp32/x86-64-seg-inval.s         |   9 +
 .../gas/i386/ilp32/x86-64-seg-warn.d          |  17 ++
 .../gas/i386/ilp32/x86-64-seg-warn.e          |   7 +
 gas/testsuite/gas/i386/ilp32/x86-64-seg.d     | 207 ++++++++++++++++++
 gas/testsuite/gas/i386/ilp32/x86-64-seg.s     |   9 +
 8 files changed, 287 insertions(+)
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.l
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.s
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.d
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.e
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg.d
 create mode 100644 gas/testsuite/gas/i386/ilp32/x86-64-seg.s

-- 
2.20.1

Comments

Jan Beulich Feb. 26, 2019, 11:41 a.m. | #1
>>> On 26.02.19 at 05:35, <hjl.tools@gmail.com> wrote:

> In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base

> register so that vector index register will be zero-extended to 64 bits.

> 

> We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is

> segment override since address will be segment base + zero-extended to 64

> bits of (base + index * scale + disp).  But GCC:

> 

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502 


Neither above nor in the bug you explain what's wrong with the
segment override plus address size override in x32 mode. Since you
keep using the same wording with just slight alterations, it must be
something very obvious to you, but entirely un-obvious to me. Is
this related to the desire of using both negative and positive
offsets into TLS, where (obviously I would say) there's not going
to be any wrapping at the 4Gb boundary? If so, I'd say the TLS
usage model is broken, but it's not the assembler that should
prevent use of otherwise valid constructs. Whether full 64-bit
addresses (and hence full non-zero %fs/%gs bases with no
wrapping at the 4Gb boundary) is intended is the programmer's
choice, not something the assembler should enforce unconditionally.
Optionally emitting a warning is acceptable, but then this shouldn't
be tied to any other, more generically applicable warnings.

In any event, if this is to stay, then at least the code comment
needs to be quite a bit more clear - "we can't have" is not enough
without explicitly saying why that is.

> --- a/gas/config/tc-i386.c

> +++ b/gas/config/tc-i386.c

> @@ -8141,6 +8141,36 @@ output_insn (void)

>  	  i.prefix[LOCK_PREFIX] = 0;

>  	}

>  

> +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)

> +      if (flag_code == CODE_64BIT && x86_elf_abi == X86_64_X32_ABI)

> +	{

> +	  /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address

> +	     without base register so that vector index register will

> +	     be zero-extended to 64 bits.  */

> +	  if (!i.base_reg && i.tm.opcode_modifier.vecsib)

> +	    add_prefix (ADDR_PREFIX_OPCODE);


Just to re-state: There needs to be a way to override this behavior.
And this is already leaving aside that making this the default from
now on has a fair risk of breaking currently working code. (Note
that this is not to say that I can't see that the change will also
help currently broken code.)

> +	  /* In x32, we can't have ADDR_PREFIX_OPCODE prefix with

> +	     segment override since final address will be segment

> +	     base + zero-extended (base + index * scale + disp).  */

> +	  if (operand_check != check_none

> +	      && i.prefix[ADDR_PREFIX]

> +	      && i.prefix[SEG_PREFIX])

> +	    {

> +	      const seg_entry *seg;

> +	      if (i.seg[0])

> +		seg = i.seg[0];

> +	      else

> +		seg = i.seg[1];

> +	      if (operand_check == check_error)

> +		as_bad (_("can't encode segment `%s%s' with 32-bit address"),

> +			register_prefix, seg->seg_name);

> +	      else

> +		as_warn (_("segment `%s%s' override with 32-bit address"),

> +			 register_prefix, seg->seg_name);

> +	    }

> +	}

> +#endif

> +

>        /* Since the VEX/EVEX prefix contains the implicit prefix, we

>  	 don't need the explicit prefix.  */

>        if (!i.tm.opcode_modifier.vex && !i.tm.opcode_modifier.evex)
H.J. Lu Feb. 26, 2019, 1:23 p.m. | #2
On Tue, Feb 26, 2019 at 3:41 AM Jan Beulich <JBeulich@suse.com> wrote:
>

> >>> On 26.02.19 at 05:35, <hjl.tools@gmail.com> wrote:

> > In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base

> > register so that vector index register will be zero-extended to 64 bits.

> >

> > We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is

> > segment override since address will be segment base + zero-extended to 64

> > bits of (base + index * scale + disp).  But GCC:

> >

> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502

>

> Neither above nor in the bug you explain what's wrong with the

> segment override plus address size override in x32 mode. Since you


X32 relies on 0x67 prefix to zero-extend address to 64 bits:

zero-extended (base + index * scale + disp)

With segment override, we got

segment base + zero-extended (base + index * scale + disp)

instead of

zero-extended (segment base + base + index * scale + disp)

When base + index * scale + disp is negative, we get the wrong
address.

VSIB address in vgatherdps is

base + sign-extend(index) * scale + disp

With segment override, we got

segment base + zero-extended (base + sign-extend(index) * scale + disp)

175.vpr in SPEC CPU 2000:

VPR FPGA Placement and Routing Program Version 4.00-spec
Source completed August 19, 1997.


General Options:
The circuit will be placed but not routed.

Placer Options:
User annealing schedule selected with:
Initial Temperature: 5
Exit (Final) Temperature: 0.005
Temperature Reduction factor (alpha_t): 0.9412
Number of moves in the inner loop is (num_blocks)^4/3 * 2
Placement cost type is linear congestion.
Placement will be performed once.
Placement channel width factor = 100.
Exponent used in placement cost: 1
Initial random seed: 1

Reading the FPGA architectural description from arch.in.
Successfully read arch.in.
Pins per clb: 6.  Pads per row/column: 2.
Subblocks per clb: 1.  Subblock LUT size: 4.
Fc value is fraction of tracks in a channel.
Fc_output: 1.  Fc_input: 1.  Fc_pad: 1.
Switch block type: Subset.
Distinct types of segments: 3.
Distinct types of user-specified switches: 3.

Reading the circuit netlist from net.in.
Warning:  logic block #368 (n_n13961) has only 1 pin.
Pin is an output -- may be a constant generator.  Non-fatal, but check this.
Successfully read net.in.
8527 blocks, 8445 nets, 1 global nets.
8383 clbs, 62 inputs, 82 outputs.
The circuit will be mapped into a 92 x 92 array of clbs.


Program received signal SIGSEGV, Segmentation fault.
0x004158fd in try_place.isra ()
(gdb) disass 0x004158fd,+32
Dump of assembler code from 0x4158fd to 0x41591d:
=> 0x004158fd <try_place.isra.5+7517>: vgatherdps %ymm2,0xc(,%ymm15,1),%ymm12
   0x00415907 <try_place.isra.5+7527>: vandps %ymm12,%ymm7,%ymm0
   0x0041590c <try_place.isra.5+7532>: vpslld $0x2,%ymm1,%ymm10
   0x00415911 <try_place.isra.5+7537>: vmovdqa 0x1cbe7(%rip),%ymm13
    # 0x432500
   0x00415919 <try_place.isra.5+7545>: inc    %eax
   0x0041591b <try_place.isra.5+7547>: vpaddd %ymm5,%ymm10,%ymm14
End of assembler dump.
(gdb) p/x $ymm15
$1 = {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {
    0x8000000000000000, 0x8000000000000000, 0x8000000000000000,
    0x8000000000000000}, v32_int8 = {0x10, 0x30, 0xfa, 0xf7, 0x24, 0x30, 0xfa,
    0xf7, 0x38, 0x30, 0xfa, 0xf7, 0x4c, 0x30, 0xfa, 0xf7, 0x60, 0x30, 0xfa,
    0xf7, 0x74, 0x30, 0xfa, 0xf7, 0x88, 0x30, 0xfa, 0xf7, 0x9c, 0x30, 0xfa,
    0xf7}, v16_int16 = {0x3010, 0xf7fa, 0x3024, 0xf7fa, 0x3038, 0xf7fa,
    0x304c, 0xf7fa, 0x3060, 0xf7fa, 0x3074, 0xf7fa, 0x3088, 0xf7fa, 0x309c,
    0xf7fa}, v8_int32 = {0xf7fa3010, 0xf7fa3024, 0xf7fa3038, 0xf7fa304c,
    0xf7fa3060, 0xf7fa3074, 0xf7fa3088, 0xf7fa309c}, v4_int64 = {
    0xf7fa3024f7fa3010, 0xf7fa304cf7fa3038, 0xf7fa3074f7fa3060,
    0xf7fa309cf7fa3088}, v2_int128 = {0xf7fa304cf7fa3038f7fa3024f7fa3010,
    0xf7fa309cf7fa3088f7fa3074f7fa3060}}
(gdb)

Here indexes are 0xf7fa3010, .... Before my fix, they are sign-extended to
0xfffffffff7fa3010 which leads to invalid address in x32.

> keep using the same wording with just slight alterations, it must be

> something very obvious to you, but entirely un-obvious to me. Is

> this related to the desire of using both negative and positive

> offsets into TLS, where (obviously I would say) there's not going

> to be any wrapping at the 4Gb boundary? If so, I'd say the TLS


It won't wrap for x32.

> usage model is broken, but it's not the assembler that should


No, it is not.  Please read "ILP32 Programming Model" in x86-64 psABI.

> prevent use of otherwise valid constructs. Whether full 64-bit


Assembly is correct for 64-bit mode.  Since it doesn't work for
x32 when offset is negative, we should at least give a warning.

> addresses (and hence full non-zero %fs/%gs bases with no

> wrapping at the 4Gb boundary) is intended is the programmer's

> choice, not something the assembler should enforce unconditionally.

> Optionally emitting a warning is acceptable, but then this shouldn't

> be tied to any other, more generically applicable warnings.


Binutils, including linker, does a few things special for x32 to deal
with address limitation.  This is just one of them.

> In any event, if this is to stay, then at least the code comment

> needs to be quite a bit more clear - "we can't have" is not enough

> without explicitly saying why that is.

>

> > --- a/gas/config/tc-i386.c

> > +++ b/gas/config/tc-i386.c

> > @@ -8141,6 +8141,36 @@ output_insn (void)

> >         i.prefix[LOCK_PREFIX] = 0;

> >       }

> >

> > +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)

> > +      if (flag_code == CODE_64BIT && x86_elf_abi == X86_64_X32_ABI)

> > +     {

> > +       /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address

> > +          without base register so that vector index register will

> > +          be zero-extended to 64 bits.  */

> > +       if (!i.base_reg && i.tm.opcode_modifier.vecsib)

> > +         add_prefix (ADDR_PREFIX_OPCODE);

>

> Just to re-state: There needs to be a way to override this behavior.

> And this is already leaving aside that making this the default from

> now on has a fair risk of breaking currently working code. (Note

> that this is not to say that I can't see that the change will also

> help currently broken code.)


Please see above.  If VSIB index is below 2G, my fix doesn't change
anything.  If VSIB index is above 2G, the program crashes before my fix.

> > +       /* In x32, we can't have ADDR_PREFIX_OPCODE prefix with

> > +          segment override since final address will be segment

> > +          base + zero-extended (base + index * scale + disp).  */

> > +       if (operand_check != check_none

> > +           && i.prefix[ADDR_PREFIX]

> > +           && i.prefix[SEG_PREFIX])

> > +         {

> > +           const seg_entry *seg;

> > +           if (i.seg[0])

> > +             seg = i.seg[0];

> > +           else

> > +             seg = i.seg[1];

> > +           if (operand_check == check_error)

> > +             as_bad (_("can't encode segment `%s%s' with 32-bit address"),


How about just

segment `%s%s' override with 32-bit address

> > +                     register_prefix, seg->seg_name);

> > +           else

> > +             as_warn (_("segment `%s%s' override with 32-bit address"),

> > +                      register_prefix, seg->seg_name);

> > +         }

> > +     }

> > +#endif

> > +

> >        /* Since the VEX/EVEX prefix contains the implicit prefix, we

> >        don't need the explicit prefix.  */

> >        if (!i.tm.opcode_modifier.vex && !i.tm.opcode_modifier.evex)

>

>

>



-- 
H.J.
Jan Beulich Feb. 26, 2019, 2:45 p.m. | #3
>>> On 26.02.19 at 14:23, <hjl.tools@gmail.com> wrote:

> On Tue, Feb 26, 2019 at 3:41 AM Jan Beulich <JBeulich@suse.com> wrote:

>>

>> >>> On 26.02.19 at 05:35, <hjl.tools@gmail.com> wrote:

>> > In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base

>> > register so that vector index register will be zero-extended to 64 bits.

>> >

>> > We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is

>> > segment override since address will be segment base + zero-extended to 64

>> > bits of (base + index * scale + disp).  But GCC:

>> >

>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502 

>>

>> Neither above nor in the bug you explain what's wrong with the

>> segment override plus address size override in x32 mode. Since you

> 

> X32 relies on 0x67 prefix to zero-extend address to 64 bits:

> 

> zero-extended (base + index * scale + disp)

> 

> With segment override, we got

> 

> segment base + zero-extended (base + index * scale + disp)

> 

> instead of

> 

> zero-extended (segment base + base + index * scale + disp)

> 

> When base + index * scale + disp is negative, we get the wrong

> address.

> 

> VSIB address in vgatherdps is

> 

> base + sign-extend(index) * scale + disp

> 

> With segment override, we got

> 

> segment base + zero-extended (base + sign-extend(index) * scale + disp)


Right. But whether that's what the programmer wanted we don't
know. Also please consider the qword index forms as well, plus
the dword index forms with scaling factor 2, 4, or 8 (allowing for
effective indexes up to 35 bits wide).

All of this would be acceptable if address space was limited to 4Gb
for x32, but that's not the case according to my reading of the
chapter in the psABI.

> 175.vpr in SPEC CPU 2000:

>[...]

> Program received signal SIGSEGV, Segmentation fault.

> 0x004158fd in try_place.isra ()

> (gdb) disass 0x004158fd,+32

> Dump of assembler code from 0x4158fd to 0x41591d:

> => 0x004158fd <try_place.isra.5+7517>: vgatherdps %ymm2,0xc(,%ymm15,1),%ymm12


Okay, this is the special case of the index register actually holding
addresses. What about the case where the displacement is the base
address, and the index register holds indeed indexes?

>> keep using the same wording with just slight alterations, it must be

>> something very obvious to you, but entirely un-obvious to me. Is

>> this related to the desire of using both negative and positive

>> offsets into TLS, where (obviously I would say) there's not going

>> to be any wrapping at the 4Gb boundary? If so, I'd say the TLS

> 

> It won't wrap for x32.

> 

>> usage model is broken, but it's not the assembler that should

> 

> No, it is not.  Please read "ILP32 Programming Model" in x86-64 psABI.


I trust you that you follow what is written there. The question
though is whether it wasn't a mistake to permit negative offsets in
the first place.

>> prevent use of otherwise valid constructs. Whether full 64-bit

> 

> Assembly is correct for 64-bit mode.  Since it doesn't work for

> x32 when offset is negative, we should at least give a warning.


Well, yes, since the ABI can't reasonably be changed, emitting a
warning looks like the only option now. But as said, please don't tie
this to that pre-existing one, not the least because that's also what
is going to control the lack-of-disambiguating-suffix diagnostic in
AT&T mode the change for I hope to submit at some point over the
next several months (now that I've mostly completed the prereqs
you had set for this).

>> addresses (and hence full non-zero %fs/%gs bases with no

>> wrapping at the 4Gb boundary) is intended is the programmer's

>> choice, not something the assembler should enforce unconditionally.

>> Optionally emitting a warning is acceptable, but then this shouldn't

>> be tied to any other, more generically applicable warnings.

> 

> Binutils, including linker, does a few things special for x32 to deal

> with address limitation.  This is just one of them.


But are there pre-existing cases where in order to make one
thing work a different thing got deliberately broken?

>> In any event, if this is to stay, then at least the code comment

>> needs to be quite a bit more clear - "we can't have" is not enough

>> without explicitly saying why that is.

>>

>> > --- a/gas/config/tc-i386.c

>> > +++ b/gas/config/tc-i386.c

>> > @@ -8141,6 +8141,36 @@ output_insn (void)

>> >         i.prefix[LOCK_PREFIX] = 0;

>> >       }

>> >

>> > +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)

>> > +      if (flag_code == CODE_64BIT && x86_elf_abi == X86_64_X32_ABI)

>> > +     {

>> > +       /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address

>> > +          without base register so that vector index register will

>> > +          be zero-extended to 64 bits.  */

>> > +       if (!i.base_reg && i.tm.opcode_modifier.vecsib)

>> > +         add_prefix (ADDR_PREFIX_OPCODE);

>>

>> Just to re-state: There needs to be a way to override this behavior.

>> And this is already leaving aside that making this the default from

>> now on has a fair risk of breaking currently working code. (Note

>> that this is not to say that I can't see that the change will also

>> help currently broken code.)

> 

> Please see above.  If VSIB index is below 2G, my fix doesn't change

> anything.  If VSIB index is above 2G, the program crashes before my fix.


Right, and I didn't put under question that you indeed fix one
specific case. I just can't help thinking that you do so by breaking
other cases, as per above. And I am of the opinion that it ought
to be the compiler (or assembly programmer) who ought to
explicitly request 32-bit addressing (e.g. by way of using the
addr32 prefix) in this specific example of yours.

>> > +       /* In x32, we can't have ADDR_PREFIX_OPCODE prefix with

>> > +          segment override since final address will be segment

>> > +          base + zero-extended (base + index * scale + disp).  */

>> > +       if (operand_check != check_none

>> > +           && i.prefix[ADDR_PREFIX]

>> > +           && i.prefix[SEG_PREFIX])

>> > +         {

>> > +           const seg_entry *seg;

>> > +           if (i.seg[0])

>> > +             seg = i.seg[0];

>> > +           else

>> > +             seg = i.seg[1];

>> > +           if (operand_check == check_error)

>> > +             as_bad (_("can't encode segment `%s%s' with 32-bit address"),

> 

> How about just

> 

> segment `%s%s' override with 32-bit address


That's slightly better text indeed.

Jan
H.J. Lu Feb. 26, 2019, 4:07 p.m. | #4
On Tue, Feb 26, 2019 at 6:45 AM Jan Beulich <JBeulich@suse.com> wrote:
>

> >>> On 26.02.19 at 14:23, <hjl.tools@gmail.com> wrote:

> > On Tue, Feb 26, 2019 at 3:41 AM Jan Beulich <JBeulich@suse.com> wrote:

> >>

> >> >>> On 26.02.19 at 05:35, <hjl.tools@gmail.com> wrote:

> >> > In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base

> >> > register so that vector index register will be zero-extended to 64 bits.

> >> >

> >> > We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is

> >> > segment override since address will be segment base + zero-extended to 64

> >> > bits of (base + index * scale + disp).  But GCC:

> >> >

> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502

> >>

> >> Neither above nor in the bug you explain what's wrong with the

> >> segment override plus address size override in x32 mode. Since you

> >

> > X32 relies on 0x67 prefix to zero-extend address to 64 bits:

> >

> > zero-extended (base + index * scale + disp)

> >

> > With segment override, we got

> >

> > segment base + zero-extended (base + index * scale + disp)

> >

> > instead of

> >

> > zero-extended (segment base + base + index * scale + disp)

> >

> > When base + index * scale + disp is negative, we get the wrong

> > address.

> >

> > VSIB address in vgatherdps is

> >

> > base + sign-extend(index) * scale + disp

> >

> > With segment override, we got

> >

> > segment base + zero-extended (base + sign-extend(index) * scale + disp)

>

> Right. But whether that's what the programmer wanted we don't

> know. Also please consider the qword index forms as well, plus

> the dword index forms with scaling factor 2, 4, or 8 (allowing for

> effective indexes up to 35 bits wide).

>

> All of this would be acceptable if address space was limited to 4Gb

> for x32, but that's not the case according to my reading of the

> chapter in the psABI.


10.4 Kernel Support
Kernel should limit stack and addresses returned from system calls
bewteen 0x00000000
to 0xf f f f f f f f .

> > 175.vpr in SPEC CPU 2000:

> >[...]

> > Program received signal SIGSEGV, Segmentation fault.

> > 0x004158fd in try_place.isra ()

> > (gdb) disass 0x004158fd,+32

> > Dump of assembler code from 0x4158fd to 0x41591d:

> > => 0x004158fd <try_place.isra.5+7517>: vgatherdps %ymm2,0xc(,%ymm15,1),%ymm12

>

> Okay, this is the special case of the index register actually holding

> addresses. What about the case where the displacement is the base

> address, and the index register holds indeed indexes?


I will fix it.

> >> keep using the same wording with just slight alterations, it must be

> >> something very obvious to you, but entirely un-obvious to me. Is

> >> this related to the desire of using both negative and positive

> >> offsets into TLS, where (obviously I would say) there's not going

> >> to be any wrapping at the 4Gb boundary? If so, I'd say the TLS

> >

> > It won't wrap for x32.

> >

> >> usage model is broken, but it's not the assembler that should

> >

> > No, it is not.  Please read "ILP32 Programming Model" in x86-64 psABI.

>

> I trust you that you follow what is written there. The question

> though is whether it wasn't a mistake to permit negative offsets in

> the first place.


Negative offset is by design.

> >> prevent use of otherwise valid constructs. Whether full 64-bit

> >

> > Assembly is correct for 64-bit mode.  Since it doesn't work for

> > x32 when offset is negative, we should at least give a warning.

>

> Well, yes, since the ABI can't reasonably be changed, emitting a

> warning looks like the only option now. But as said, please don't tie

> this to that pre-existing one, not the least because that's also what


Existing code will get a warning.

> is going to control the lack-of-disambiguating-suffix diagnostic in

> AT&T mode the change for I hope to submit at some point over the

> next several months (now that I've mostly completed the prereqs

> you had set for this).

>

> >> addresses (and hence full non-zero %fs/%gs bases with no

> >> wrapping at the 4Gb boundary) is intended is the programmer's

> >> choice, not something the assembler should enforce unconditionally.

> >> Optionally emitting a warning is acceptable, but then this shouldn't

> >> be tied to any other, more generically applicable warnings.

> >

> > Binutils, including linker, does a few things special for x32 to deal

> > with address limitation.  This is just one of them.

>

> But are there pre-existing cases where in order to make one

> thing work a different thing got deliberately broken?


It works only if offset isn't negative.

> >> In any event, if this is to stay, then at least the code comment

> >> needs to be quite a bit more clear - "we can't have" is not enough

> >> without explicitly saying why that is.

> >>

> >> > --- a/gas/config/tc-i386.c

> >> > +++ b/gas/config/tc-i386.c

> >> > @@ -8141,6 +8141,36 @@ output_insn (void)

> >> >         i.prefix[LOCK_PREFIX] = 0;

> >> >       }

> >> >

> >> > +#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)

> >> > +      if (flag_code == CODE_64BIT && x86_elf_abi == X86_64_X32_ABI)

> >> > +     {

> >> > +       /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address

> >> > +          without base register so that vector index register will

> >> > +          be zero-extended to 64 bits.  */

> >> > +       if (!i.base_reg && i.tm.opcode_modifier.vecsib)

> >> > +         add_prefix (ADDR_PREFIX_OPCODE);

> >>

> >> Just to re-state: There needs to be a way to override this behavior.

> >> And this is already leaving aside that making this the default from

> >> now on has a fair risk of breaking currently working code. (Note

> >> that this is not to say that I can't see that the change will also

> >> help currently broken code.)

> >

> > Please see above.  If VSIB index is below 2G, my fix doesn't change

> > anything.  If VSIB index is above 2G, the program crashes before my fix.

>

> Right, and I didn't put under question that you indeed fix one

> specific case. I just can't help thinking that you do so by breaking

> other cases, as per above. And I am of the opinion that it ought

> to be the compiler (or assembly programmer) who ought to

> explicitly request 32-bit addressing (e.g. by way of using the

> addr32 prefix) in this specific example of yours.

>

> >> > +       /* In x32, we can't have ADDR_PREFIX_OPCODE prefix with

> >> > +          segment override since final address will be segment

> >> > +          base + zero-extended (base + index * scale + disp).  */

> >> > +       if (operand_check != check_none

> >> > +           && i.prefix[ADDR_PREFIX]

> >> > +           && i.prefix[SEG_PREFIX])

> >> > +         {

> >> > +           const seg_entry *seg;

> >> > +           if (i.seg[0])

> >> > +             seg = i.seg[0];

> >> > +           else

> >> > +             seg = i.seg[1];

> >> > +           if (operand_check == check_error)

> >> > +             as_bad (_("can't encode segment `%s%s' with 32-bit address"),

> >

> > How about just

> >

> > segment `%s%s' override with 32-bit address

>

> That's slightly better text indeed.

>

> Jan

>



-- 
H.J.
Jan Beulich Feb. 26, 2019, 4:16 p.m. | #5
>>> On 26.02.19 at 17:07, <hjl.tools@gmail.com> wrote:

> On Tue, Feb 26, 2019 at 6:45 AM Jan Beulich <JBeulich@suse.com> wrote:

>>

>> >>> On 26.02.19 at 14:23, <hjl.tools@gmail.com> wrote:

>> > On Tue, Feb 26, 2019 at 3:41 AM Jan Beulich <JBeulich@suse.com> wrote:

>> >>

>> >> >>> On 26.02.19 at 05:35, <hjl.tools@gmail.com> wrote:

>> >> > In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base

>> >> > register so that vector index register will be zero-extended to 64 bits.

>> >> >

>> >> > We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is

>> >> > segment override since address will be segment base + zero-extended to 64

>> >> > bits of (base + index * scale + disp).  But GCC:

>> >> >

>> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502 

>> >>

>> >> Neither above nor in the bug you explain what's wrong with the

>> >> segment override plus address size override in x32 mode. Since you

>> >

>> > X32 relies on 0x67 prefix to zero-extend address to 64 bits:

>> >

>> > zero-extended (base + index * scale + disp)

>> >

>> > With segment override, we got

>> >

>> > segment base + zero-extended (base + index * scale + disp)

>> >

>> > instead of

>> >

>> > zero-extended (segment base + base + index * scale + disp)

>> >

>> > When base + index * scale + disp is negative, we get the wrong

>> > address.

>> >

>> > VSIB address in vgatherdps is

>> >

>> > base + sign-extend(index) * scale + disp

>> >

>> > With segment override, we got

>> >

>> > segment base + zero-extended (base + sign-extend(index) * scale + disp)

>>

>> Right. But whether that's what the programmer wanted we don't

>> know. Also please consider the qword index forms as well, plus

>> the dword index forms with scaling factor 2, 4, or 8 (allowing for

>> effective indexes up to 35 bits wide).

>>

>> All of this would be acceptable if address space was limited to 4Gb

>> for x32, but that's not the case according to my reading of the

>> chapter in the psABI.

> 

> 10.4 Kernel Support

> Kernel should limit stack and addresses returned from system calls

> bewteen 0x00000000

> to 0xf f f f f f f f .


Hmm, if that's indeed the case, despite it - according to my
interpretation - contradicting 10.2's wording, and despite it
being an unnecessary restriction imo, then ...

>> > 175.vpr in SPEC CPU 2000:

>> >[...]

>> > Program received signal SIGSEGV, Segmentation fault.

>> > 0x004158fd in try_place.isra ()

>> > (gdb) disass 0x004158fd,+32

>> > Dump of assembler code from 0x4158fd to 0x41591d:

>> > => 0x004158fd <try_place.isra.5+7517>: vgatherdps %ymm2,0xc(,%ymm15,1),%ymm12

>>

>> Okay, this is the special case of the index register actually holding

>> addresses. What about the case where the displacement is the base

>> address, and the index register holds indeed indexes?

> 

> I will fix it.


... there's nothing to fix here, I think.

Jan
H.J. Lu Feb. 26, 2019, 8:33 p.m. | #6
On Tue, Feb 26, 2019 at 8:16 AM Jan Beulich <JBeulich@suse.com> wrote:
>

> >>> On 26.02.19 at 17:07, <hjl.tools@gmail.com> wrote:

> > On Tue, Feb 26, 2019 at 6:45 AM Jan Beulich <JBeulich@suse.com> wrote:

> >>

> >> >>> On 26.02.19 at 14:23, <hjl.tools@gmail.com> wrote:

> >> > On Tue, Feb 26, 2019 at 3:41 AM Jan Beulich <JBeulich@suse.com> wrote:

> >> >>

> >> >> >>> On 26.02.19 at 05:35, <hjl.tools@gmail.com> wrote:

> >> >> > In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address without base

> >> >> > register so that vector index register will be zero-extended to 64 bits.

> >> >> >

> >> >> > We can't have ADDR_PREFIX_OPCODE prefix with 32-bit address if there is

> >> >> > segment override since address will be segment base + zero-extended to 64

> >> >> > bits of (base + index * scale + disp).  But GCC:

> >> >> >

> >> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89502

> >> >>

> >> >> Neither above nor in the bug you explain what's wrong with the

> >> >> segment override plus address size override in x32 mode. Since you

> >> >

> >> > X32 relies on 0x67 prefix to zero-extend address to 64 bits:

> >> >

> >> > zero-extended (base + index * scale + disp)

> >> >

> >> > With segment override, we got

> >> >

> >> > segment base + zero-extended (base + index * scale + disp)

> >> >

> >> > instead of

> >> >

> >> > zero-extended (segment base + base + index * scale + disp)

> >> >

> >> > When base + index * scale + disp is negative, we get the wrong

> >> > address.

> >> >

> >> > VSIB address in vgatherdps is

> >> >

> >> > base + sign-extend(index) * scale + disp

> >> >

> >> > With segment override, we got

> >> >

> >> > segment base + zero-extended (base + sign-extend(index) * scale + disp)

> >>

> >> Right. But whether that's what the programmer wanted we don't

> >> know. Also please consider the qword index forms as well, plus

> >> the dword index forms with scaling factor 2, 4, or 8 (allowing for

> >> effective indexes up to 35 bits wide).

> >>

> >> All of this would be acceptable if address space was limited to 4Gb

> >> for x32, but that's not the case according to my reading of the

> >> chapter in the psABI.

> >

> > 10.4 Kernel Support

> > Kernel should limit stack and addresses returned from system calls

> > bewteen 0x00000000

> > to 0xf f f f f f f f .

>

> Hmm, if that's indeed the case, despite it - according to my

> interpretation - contradicting 10.2's wording, and despite it

> being an unnecessary restriction imo, then ...

>

> >> > 175.vpr in SPEC CPU 2000:

> >> >[...]

> >> > Program received signal SIGSEGV, Segmentation fault.

> >> > 0x004158fd in try_place.isra ()

> >> > (gdb) disass 0x004158fd,+32

> >> > Dump of assembler code from 0x4158fd to 0x41591d:

> >> > => 0x004158fd <try_place.isra.5+7517>: vgatherdps %ymm2,0xc(,%ymm15,1),%ymm12

> >>

> >> Okay, this is the special case of the index register actually holding

> >> addresses. What about the case where the displacement is the base

> >> address, and the index register holds indeed indexes?

> >

> > I will fix it.

>

> ... there's nothing to fix here, I think.

>


Here is the updated patch.  I added VecSIBQword to mark VSIB instructions
with Qword indices and add 0x67 prefix only for VSIB address of Dword
indices without base register nor symbol so that Dword indices will be
zero-extended to 64 bits unless -moperand-check=none is passed to
assembler.

-- 
H.J.
Jan Beulich Feb. 27, 2019, 8:25 a.m. | #7
>>> On 26.02.19 at 21:33, <hjl.tools@gmail.com> wrote:

> Here is the updated patch.  I added VecSIBQword to mark VSIB instructions

> with Qword indices and add 0x67 prefix only for VSIB address of Dword

> indices without base register nor symbol so that Dword indices will be

> zero-extended to 64 bits unless -moperand-check=none is passed to

> assembler.


A couple of questions still remain:

1) What about a scale factor other than 1? Arguably this is difficult to
use with neither base nor O_symbol displacement, but it's not
impossible. As said before, _if_ qword indices are to be special cased,
I think such scale factors should be, too.

2) Given the wording you had quoted from psABI section 10.4, I did
suggest that special casing of qword indexes may then not be
necessary at all. Could you clarify why you (now) think otherwise?

3) Does the logic work not only with a specified displacement of zero,
but also without any displacement at all? The abort() invocations you
add make me uncertain of this, and the test cases you add don't
cover the case.

4) Why "else if (i.prefix[ADDR_PREFIX] && i.prefix[SEG_PREFIX])"
instead of just if()? Isn't this diagnostic equally applicable to all the
VecSIB cases?

5) In the comment following "case O_constant", could you add
"assuming that the index register actually holds addresses" or
something along these lines? Similarly the other comment is still as
vague as it was before; as said I really think it lacks sufficient
clearness as to the "why", i.e. the non-wrapping behavior at 4Gb
should be mentioned explicitly rather then be implied.

6) You still use the existing operand_check to control the diagnostic.
This being a very special case which one may want to disable without
also disabling diagnostics for other, more generic operand checks,
don't you agree that it should be separately controllable?

7) Would you mind addressing the previously raised point of it (in
my opinion) really being the compiler's / assembly programmer's job
to enforce 32-bit addressing here?

Jan
H.J. Lu Feb. 27, 2019, 6:09 p.m. | #8
On Wed, Feb 27, 2019 at 12:26 AM Jan Beulich <JBeulich@suse.com> wrote:
>

> >>> On 26.02.19 at 21:33, <hjl.tools@gmail.com> wrote:

> > Here is the updated patch.  I added VecSIBQword to mark VSIB instructions

> > with Qword indices and add 0x67 prefix only for VSIB address of Dword

> > indices without base register nor symbol so that Dword indices will be

> > zero-extended to 64 bits unless -moperand-check=none is passed to

> > assembler.

>

> A couple of questions still remain:

>

> 1) What about a scale factor other than 1? Arguably this is difficult to

> use with neither base nor O_symbol displacement, but it's not

> impossible. As said before, _if_ qword indices are to be special cased,

> I think such scale factors should be, too.

>

> 2) Given the wording you had quoted from psABI section 10.4, I did

> suggest that special casing of qword indexes may then not be

> necessary at all. Could you clarify why you (now) think otherwise?

>

> 3) Does the logic work not only with a specified displacement of zero,

> but also without any displacement at all? The abort() invocations you

> add make me uncertain of this, and the test cases you add don't

> cover the case.

>

> 4) Why "else if (i.prefix[ADDR_PREFIX] && i.prefix[SEG_PREFIX])"

> instead of just if()? Isn't this diagnostic equally applicable to all the

> VecSIB cases?

>

> 5) In the comment following "case O_constant", could you add

> "assuming that the index register actually holds addresses" or

> something along these lines? Similarly the other comment is still as

> vague as it was before; as said I really think it lacks sufficient

> clearness as to the "why", i.e. the non-wrapping behavior at 4Gb

> should be mentioned explicitly rather then be implied.

>

> 6) You still use the existing operand_check to control the diagnostic.

> This being a very special case which one may want to disable without

> also disabling diagnostics for other, more generic operand checks,

> don't you agree that it should be separately controllable?

>

> 7) Would you mind addressing the previously raised point of it (in

> my opinion) really being the compiler's / assembly programmer's job

> to enforce 32-bit addressing here?

>


Good point.  I withdraw my patch.  I opened:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89523


-- 
H.J.

Patch

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index d31ee6abdd..df7c152cc4 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -8141,6 +8141,36 @@  output_insn (void)
 	  i.prefix[LOCK_PREFIX] = 0;
 	}
 
+#if defined (OBJ_MAYBE_ELF) || defined (OBJ_ELF)
+      if (flag_code == CODE_64BIT && x86_elf_abi == X86_64_X32_ABI)
+	{
+	  /* In x32, add ADDR_PREFIX_OPCODE prefix for VSIB address
+	     without base register so that vector index register will
+	     be zero-extended to 64 bits.  */
+	  if (!i.base_reg && i.tm.opcode_modifier.vecsib)
+	    add_prefix (ADDR_PREFIX_OPCODE);
+	  /* In x32, we can't have ADDR_PREFIX_OPCODE prefix with
+	     segment override since final address will be segment
+	     base + zero-extended (base + index * scale + disp).  */
+	  if (operand_check != check_none
+	      && i.prefix[ADDR_PREFIX]
+	      && i.prefix[SEG_PREFIX])
+	    {
+	      const seg_entry *seg;
+	      if (i.seg[0])
+		seg = i.seg[0];
+	      else
+		seg = i.seg[1];
+	      if (operand_check == check_error)
+		as_bad (_("can't encode segment `%s%s' with 32-bit address"),
+			register_prefix, seg->seg_name);
+	      else
+		as_warn (_("segment `%s%s' override with 32-bit address"),
+			 register_prefix, seg->seg_name);
+	    }
+	}
+#endif
+
       /* Since the VEX/EVEX prefix contains the implicit prefix, we
 	 don't need the explicit prefix.  */
       if (!i.tm.opcode_modifier.vex && !i.tm.opcode_modifier.evex)
diff --git a/gas/testsuite/gas/i386/ilp32/ilp32.exp b/gas/testsuite/gas/i386/ilp32/ilp32.exp
index d3a7190ac5..fe1e9ea5df 100644
--- a/gas/testsuite/gas/i386/ilp32/ilp32.exp
+++ b/gas/testsuite/gas/i386/ilp32/ilp32.exp
@@ -38,6 +38,7 @@  if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_x32_check] &
     }
 
     run_list_test "reloc64" "--defsym _bad_=1"
+    run_list_test "x86-64-seg-inval" "-moperand-check=error"
 
     set ASFLAGS "$old_ASFLAGS"
 }
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.l b/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.l
new file mode 100644
index 0000000000..7ec3f4d14b
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.l
@@ -0,0 +1,7 @@ 
+.*: Assembler messages:
+.*:4: Error: can't encode segment `%fs' with 32-bit address
+.*:5: Error: can't encode segment `%gs' with 32-bit address
+.*:6: Error: can't encode segment `%fs' with 32-bit address
+.*:7: Error: can't encode segment `%fs' with 32-bit address
+.*:8: Error: can't encode segment `%gs' with 32-bit address
+.*:9: Error: can't encode segment `%gs' with 32-bit address
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.s b/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.s
new file mode 100644
index 0000000000..8117c68ec2
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg-inval.s
@@ -0,0 +1,9 @@ 
+	.text
+	.allow_index_reg
+_start:
+	vgatherdps %ymm12,%fs:0xc(%eax,%ymm15,1),%ymm11
+	vgatherdps %ymm12,%gs:0xc(,%ymm15,1),%ymm11
+	movl	%fs:(%eax), %eax
+	movl	%fs:(,%eax,1), %eax
+	movl	%gs:(,%eiz,1), %eax
+	movl	%gs:(%eip), %eax
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.d b/gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.d
new file mode 100644
index 0000000000..7c317c2d6b
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.d
@@ -0,0 +1,17 @@ 
+#source: x86-64-seg-inval.s
+#warning_output: x86-64-seg-warn.e
+#objdump: -dw
+#name: x86-64 (ILP32) segment (warning)
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	64 67 c4 22 1d 92 5c 38 0c 	vgatherdps %ymm12,%fs:0xc\(%eax,%ymm15,1\),%ymm11
+ +[a-f0-9]+:	65 67 c4 22 1d 92 1c 3d 0c 00 00 00 	vgatherdps %ymm12,%gs:0xc\(,%ymm15,1\),%ymm11
+ +[a-f0-9]+:	64 67 8b 00          	mov    %fs:\(%eax\),%eax
+ +[a-f0-9]+:	64 67 8b 04 05 00 00 00 00 	mov    %fs:0x0\(,%eax,1\),%eax
+ +[a-f0-9]+:	65 67 8b 04 25 00 00 00 00 	mov    %gs:0x0\(,%eiz,1\),%eax
+ +[a-f0-9]+:	65 67 8b 05 00 00 00 00 	mov    %gs:0x0\(%eip\),%eax        # [a-f0-9]+ <_start\+0x[a-f0-9]+>
+#pass
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.e b/gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.e
new file mode 100644
index 0000000000..f5a030f220
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg-warn.e
@@ -0,0 +1,7 @@ 
+.*: Assembler messages:
+.*:4: Warning: segment `%fs' override with 32-bit address
+.*:5: Warning: segment `%gs' override with 32-bit address
+.*:6: Warning: segment `%fs' override with 32-bit address
+.*:7: Warning: segment `%fs' override with 32-bit address
+.*:8: Warning: segment `%gs' override with 32-bit address
+.*:9: Warning: segment `%gs' override with 32-bit address
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg.d b/gas/testsuite/gas/i386/ilp32/x86-64-seg.d
new file mode 100644
index 0000000000..86e5526676
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg.d
@@ -0,0 +1,207 @@ 
+#as: -I$srcdir/$subdir
+#objdump: -dw
+#name: x86-64 (ILP32) segment
+
+.*: +file format .*
+
+Disassembly of section .text:
+
+0+ <_start>:
+ +[a-f0-9]+:	c4 e2 e9 92 4c 7d 00 	vgatherdpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 93 4c 7d 00 	vgatherqpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 92 4c 7d 00 	vgatherdpd %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 93 4c 7d 00 	vgatherqpd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 92 5c 75 00 	vgatherdpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 93 5c 75 00 	vgatherqpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 92 5c 75 00 	vgatherdpd %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 93 5c 75 00 	vgatherqpd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 69 92 4c 7d 00 	vgatherdps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 92 4c 7d 00 	vgatherdps %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 92 5c 75 00 	vgatherdps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 92 5c 75 00 	vgatherdps %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 69 90 4c 7d 00 	vpgatherdd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 90 4c 7d 00 	vpgatherdd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 90 5c 75 00 	vpgatherdd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 90 5c 75 00 	vpgatherdd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 e9 90 4c 7d 00 	vpgatherdq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 91 4c 7d 00 	vpgatherqq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 90 4c 7d 00 	vpgatherdq %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 91 4c 7d 00 	vpgatherqq %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 90 5c 75 00 	vpgatherdq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 91 5c 75 00 	vpgatherqq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 90 5c 75 00 	vpgatherdq %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 91 5c 75 00 	vpgatherqq %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 e9 92 4c 7d 00 	vgatherdpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 93 4c 7d 00 	vgatherqpd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 92 4c 7d 00 	vgatherdpd %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 93 4c 7d 00 	vgatherqpd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 92 5c 75 00 	vgatherdpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 93 5c 75 00 	vgatherqpd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 92 5c 75 00 	vgatherdpd %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 93 5c 75 00 	vgatherqpd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 25 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 92 34 e5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 35 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 08 00 00 00 	vgatherdpd %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 f8 ff ff ff 	vgatherdpd %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 00 00 00 00 	vgatherdpd %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 92 34 f5 98 02 00 00 	vgatherdpd %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	c4 e2 69 92 4c 7d 00 	vgatherdps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 92 4c 7d 00 	vgatherdps %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 93 4c 7d 00 	vgatherqps %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 92 5c 75 00 	vgatherdps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 92 5c 75 00 	vgatherdps %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 93 5c 75 00 	vgatherqps %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 25 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 92 34 e5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 35 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 08 00 00 00 	vgatherdps %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 f8 ff ff ff 	vgatherdps %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 00 00 00 00 	vgatherdps %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 92 34 f5 98 02 00 00 	vgatherdps %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 69 90 4c 7d 00 	vpgatherdd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 69 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 6d 90 4c 7d 00 	vpgatherdd %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 6d 91 4c 7d 00 	vpgatherqd %xmm2,0x0\(%rbp,%ymm7,2\),%xmm1
+ +[a-f0-9]+:	c4 02 19 90 5c 75 00 	vpgatherdd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 19 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 1d 90 5c 75 00 	vpgatherdd %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 1d 91 5c 75 00 	vpgatherqd %xmm12,0x0\(%r13,%ymm14,2\),%xmm11
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 25 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,1\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 e2 51 90 34 e5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm4,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 35 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,1\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 08 00 00 00 	vpgatherdd %xmm5,0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 f8 ff ff ff 	vpgatherdd %xmm5,-0x8\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 00 00 00 00 	vpgatherdd %xmm5,0x0\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	67 c4 a2 51 90 34 f5 98 02 00 00 	vpgatherdd %xmm5,0x298\(,%xmm14,8\),%xmm6
+ +[a-f0-9]+:	c4 e2 e9 90 4c 7d 00 	vpgatherdq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 e9 91 4c 7d 00 	vpgatherqq %xmm2,0x0\(%rbp,%xmm7,2\),%xmm1
+ +[a-f0-9]+:	c4 e2 ed 90 4c 7d 00 	vpgatherdq %ymm2,0x0\(%rbp,%xmm7,2\),%ymm1
+ +[a-f0-9]+:	c4 e2 ed 91 4c 7d 00 	vpgatherqq %ymm2,0x0\(%rbp,%ymm7,2\),%ymm1
+ +[a-f0-9]+:	c4 02 99 90 5c 75 00 	vpgatherdq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 99 91 5c 75 00 	vpgatherqq %xmm12,0x0\(%r13,%xmm14,2\),%xmm11
+ +[a-f0-9]+:	c4 02 9d 90 5c 75 00 	vpgatherdq %ymm12,0x0\(%r13,%xmm14,2\),%ymm11
+ +[a-f0-9]+:	c4 02 9d 91 5c 75 00 	vpgatherqq %ymm12,0x0\(%r13,%ymm14,2\),%ymm11
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 25 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,1\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 e2 d5 90 34 e5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm4,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 35 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,1\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 08 00 00 00 	vpgatherdq %ymm5,0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 f8 ff ff ff 	vpgatherdq %ymm5,-0x8\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 00 00 00 00 	vpgatherdq %ymm5,0x0\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	67 c4 a2 d5 90 34 f5 98 02 00 00 	vpgatherdq %ymm5,0x298\(,%xmm14,8\),%ymm6
+ +[a-f0-9]+:	64 8b 04 25 00 00 00 00 	mov    %fs:0x0,%eax
+ +[a-f0-9]+:	65 8b 05 00 00 00 00 	mov    %gs:0x0\(%rip\),%eax        # [a-f0-9]+ <_start\+0x[a-f0-9]+>
+ +[a-f0-9]+:	65 8b 00             	mov    %gs:\(%rax\),%eax
+ +[a-f0-9]+:	67 c4 22 1d 92 5c 38 0c 	vgatherdps %ymm12,0xc\(%eax,%ymm15,1\),%ymm11
+ +[a-f0-9]+:	64 c4 22 1d 92 5c 38 0c 	vgatherdps %ymm12,%fs:0xc\(%rax,%ymm15,1\),%ymm11
+#pass
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-seg.s b/gas/testsuite/gas/i386/ilp32/x86-64-seg.s
new file mode 100644
index 0000000000..7ad33e498c
--- /dev/null
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-seg.s
@@ -0,0 +1,9 @@ 
+.include "../x86-64-avx-gather.s"
+
+	.text
+	.att_syntax
+	movl	%fs:0, %eax
+	movl	%gs:(%rip), %eax
+	movl	%gs:(%rax), %eax
+	vgatherdps	%ymm12,0xc(%eax,%ymm15,1),%ymm11
+	vgatherdps	%ymm12,%fs:0xc(%rax,%ymm15,1),%ymm11