x86: Swap destination/source to encode VEX only if possible

Message ID CAMe9rOrQtWW00Ozt3Ho8d8nUmzaE_i27rJhx_fZig1iPm5c7ag@mail.gmail.com
State New
Headers show
Series
  • x86: Swap destination/source to encode VEX only if possible
Related show

Commit Message

H.J. Lu Sept. 13, 2018, 1:19 p.m.
On Thu, Sep 13, 2018 at 5:57 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Sep 5, 2018 at 6:05 AM, Jan Beulich <JBeulich@suse.com> wrote:

>> Various moves come in load and store forms, and just like on the GPR

>> and FPU sides there would better be only one pattern. In some cases this

>> is not feasible because the opcodes are too different, but quite a few

>> cases follow a similar standard scheme. Introduce Opcode_SIMD_FloatD and

>> Opcode_SIMD_IntD, generalize handling in operand_size_match() (reverse

>> operand handling there simply needs to match "straight" operand one),

>> and fix a long standing, but so far only latent bug with when to zap

>> found_reverse_match.

>>

>> Also once again drop IgnoreSize where pointlessly applied to templates

>> touched anyway as well as *word when redundant with Reg*.

>>

>> gas/

>> 2018-09-05  Jan Beulich  <jbeulich@suse.com>

>>

>>         * config/tc-i386.c (operand_size_match): Mirror

>>         .reg/.regsimd/.acc handling from forward to reverse case.

>>         (build_vex_prefix): Check first and last operand types are equal

>>         and also consider .d for swapping operands for VEX2 encoding.

>>         (match_template): Clear found_reverse_match on every iteration.

>>         Use Opcode_SIMD_FloatD and Opcode_SIMD_IntD.

>>         * testsuite/gas/i386/pseudos.s,

>>         testsuite/gas/i386/x86-64-pseudos.s: Add kmov* tests.

>>         * testsuite/gas/i386/pseudos.d,

>>         testsuite/gas/i386/x86-64-pseudos.d: Adjust expectations.

>>

>> opcodes/

>> 2018-09-05  Jan Beulich  <jbeulich@suse.com>

>>

>>         * i386-opc.tbl (bndmov, kmovb, kmovd, kmovq, kmovw, movapd,

>>         movaps, movd, movdqa, movdqu, movhpd, movhps, movlpd, movlps,

>>         movq, movsd, movss, movupd, movups, vmovapd, vmovaps, vmovd,

>>         vmovdqa, vmovdqa32, vmovdqa64, vmovdqu, vmovdqu16, vmovdqu32,

>>         vmovdqu64, vmovdqu8, vmovq, vmovsd, vmovss, vmovupd, vmovups):

>>         Fold load and store templates where possible, adding D. Drop

>>         IgnoreSize where it was pointlessly present. Drop redundant

>>         *word.

>>         * i386-tbl.h: Re-generate.

>

> On Linux/x86-64, this caused

>

> FAIL: i386 arch 10

> FAIL: i386 arch 10 (lzcnt)

> FAIL: i386 arch 10 (prefetchw)

> FAIL: i386 arch 10 (bdver1)

> FAIL: i386 arch 10 (bdver2)

> FAIL: i386 arch 10 (bdver3)

> FAIL: i386 arch 10 (bdver4)

> FAIL: i386 arch 10 (btver1)

> FAIL: i386 arch 10 (btver2)

> FAIL: i386 noavx-1

> FAIL: i386 noavx-3

> FAIL: i386 AVX

> FAIL: i386 AVX (Intel disassembly)

> FAIL: x86-64 arch 2

> FAIL: x86-64 arch 2 (lzcnt)

> FAIL: x86-64 arch 2 (prefetchw)

> FAIL: x86-64 arch 2 (bdver1)

> FAIL: x86-64 arch 2 (bdver2)

> FAIL: x86-64 arch 2 (bdver3)

> FAIL: x86-64 arch 2 (bdver4)

> FAIL: x86-64 arch 2 (btver1)

> FAIL: x86-64 arch 2 (btver2)

> FAIL: x86-64 AVX

> FAIL: x86-64 AVX (Intel mode)

>

> Can you fix it today?

>


as fails to assemble

vzeroall

I checked in the following patch to fix it.

-- 
H.J.

Comments

Jan Beulich Sept. 13, 2018, 1:33 p.m. | #1
>>> On 13.09.18 at 15:19, <hjl.tools@gmail.com> wrote:

> On Thu, Sep 13, 2018 at 5:57 AM, H.J. Lu <hjl.tools@gmail.com> wrote:

>> On Wed, Sep 5, 2018 at 6:05 AM, Jan Beulich <JBeulich@suse.com> wrote:

>>> Various moves come in load and store forms, and just like on the GPR

>>> and FPU sides there would better be only one pattern. In some cases this

>>> is not feasible because the opcodes are too different, but quite a few

>>> cases follow a similar standard scheme. Introduce Opcode_SIMD_FloatD and

>>> Opcode_SIMD_IntD, generalize handling in operand_size_match() (reverse

>>> operand handling there simply needs to match "straight" operand one),

>>> and fix a long standing, but so far only latent bug with when to zap

>>> found_reverse_match.

>>>

>>> Also once again drop IgnoreSize where pointlessly applied to templates

>>> touched anyway as well as *word when redundant with Reg*.

>>>

>>> gas/

>>> 2018-09-05  Jan Beulich  <jbeulich@suse.com>

>>>

>>>         * config/tc-i386.c (operand_size_match): Mirror

>>>         .reg/.regsimd/.acc handling from forward to reverse case.

>>>         (build_vex_prefix): Check first and last operand types are equal

>>>         and also consider .d for swapping operands for VEX2 encoding.

>>>         (match_template): Clear found_reverse_match on every iteration.

>>>         Use Opcode_SIMD_FloatD and Opcode_SIMD_IntD.

>>>         * testsuite/gas/i386/pseudos.s,

>>>         testsuite/gas/i386/x86-64-pseudos.s: Add kmov* tests.

>>>         * testsuite/gas/i386/pseudos.d,

>>>         testsuite/gas/i386/x86-64-pseudos.d: Adjust expectations.

>>>

>>> opcodes/

>>> 2018-09-05  Jan Beulich  <jbeulich@suse.com>

>>>

>>>         * i386-opc.tbl (bndmov, kmovb, kmovd, kmovq, kmovw, movapd,

>>>         movaps, movd, movdqa, movdqu, movhpd, movhps, movlpd, movlps,

>>>         movq, movsd, movss, movupd, movups, vmovapd, vmovaps, vmovd,

>>>         vmovdqa, vmovdqa32, vmovdqa64, vmovdqu, vmovdqu16, vmovdqu32,

>>>         vmovdqu64, vmovdqu8, vmovq, vmovsd, vmovss, vmovupd, vmovups):

>>>         Fold load and store templates where possible, adding D. Drop

>>>         IgnoreSize where it was pointlessly present. Drop redundant

>>>         *word.

>>>         * i386-tbl.h: Re-generate.

>>

>> On Linux/x86-64, this caused

>>

>> FAIL: i386 arch 10

>> FAIL: i386 arch 10 (lzcnt)

>> FAIL: i386 arch 10 (prefetchw)

>> FAIL: i386 arch 10 (bdver1)

>> FAIL: i386 arch 10 (bdver2)

>> FAIL: i386 arch 10 (bdver3)

>> FAIL: i386 arch 10 (bdver4)

>> FAIL: i386 arch 10 (btver1)

>> FAIL: i386 arch 10 (btver2)

>> FAIL: i386 noavx-1

>> FAIL: i386 noavx-3

>> FAIL: i386 AVX

>> FAIL: i386 AVX (Intel disassembly)

>> FAIL: x86-64 arch 2

>> FAIL: x86-64 arch 2 (lzcnt)

>> FAIL: x86-64 arch 2 (prefetchw)

>> FAIL: x86-64 arch 2 (bdver1)

>> FAIL: x86-64 arch 2 (bdver2)

>> FAIL: x86-64 arch 2 (bdver3)

>> FAIL: x86-64 arch 2 (bdver4)

>> FAIL: x86-64 arch 2 (btver1)

>> FAIL: x86-64 arch 2 (btver2)

>> FAIL: x86-64 AVX

>> FAIL: x86-64 AVX (Intel mode)

>>

>> Can you fix it today?

>>

> 

> as fails to assemble

> 

> vzeroall


Interesting. I have absolutely no idea why it looks to have worked for
me.

> I checked in the following patch to fix it.


Thanks a lot!

Jan

Patch

From 79f0fa25b95fd82122ee76d61fded661cc3ece87 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Thu, 13 Sep 2018 06:12:31 -0700
Subject: [PATCH] x86: Swap destination/source to encode VEX only if possible

When encoding VEX, we can swap destination and source only if there are
more than 1 register operand.

	* config/tc-i386.c (build_vex_prefix): Swap destination and
	source only if there are more than 1 register operand.
---
 gas/ChangeLog        | 5 +++++
 gas/config/tc-i386.c | 7 ++++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/gas/ChangeLog b/gas/ChangeLog
index 7a82555a41..fc17226bb0 100644
--- a/gas/ChangeLog
+++ b/gas/ChangeLog
@@ -1,3 +1,8 @@ 
+2018-09-13  H.J. Lu  <hongjiu.lu@intel.com>
+
+	* config/tc-i386.c (build_vex_prefix): Swap destination and
+	source only if there are more than 1 register operand.
+
 2018-09-13  Jan Beulich  <jbeulich@suse.com>
 
 	* config/tc-i386.c (operand_size_match): Also deal with three
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 2bff48a778..40b458333f 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -3364,9 +3364,10 @@  build_vex_prefix (const insn_template *t)
   else
     register_specifier = 0xf;
 
-  /* Use 2-byte VEX prefix by swapping destination and source
-     operand.  */
-  if (i.vec_encoding != vex_encoding_vex3
+  /* Use 2-byte VEX prefix by swapping destination and source operand
+     if there are more than 1 register operand.  */
+  if (i.reg_operands > 1
+      && i.vec_encoding != vex_encoding_vex3
       && i.dir_encoding == dir_encoding_default
       && i.operands == i.reg_operands
       && operand_type_equal (&i.types[0], &i.types[i.operands - 1])
-- 
2.17.1