x86: Add pmovzx/pmovsx patterns with SI/DI operands

Message ID 20180929220425.4714-1-hjl.tools@gmail.com
State New
Headers show
Series
  • x86: Add pmovzx/pmovsx patterns with SI/DI operands
Related show

Commit Message

H.J. Lu Sept. 29, 2018, 10:04 p.m.
Add pmovzx/pmovsx patterns with SI and DI operands for pmovzx/pmovsx
instructions which only read the low 4 or 8 bytes from the source.

gcc/

	PR target/87317
	* config/i386/sse.md (*sse4_1_<code>v8qiv8hi2<mask_name>): New
	pattern.
	(*sse4_1_<code>v4qiv4si2<mask_name>): Likewise.
	(*sse4_1_<code>v4hiv4si2<mask_name>): Likewise.
	(*sse4_1_<code>v2hiv2di2<mask_name>): Likewise.
	(*sse4_1_<code>v2siv2di2<mask_name>): Likewise.

gcc/testsuite/

	PR target/87317
	* gcc.target/i386/pr87317-1.c: New file.
	* gcc.target/i386/pr87317-2.c: Likewise.
	* gcc.target/i386/pr87317-3.c: Likewise.
	* gcc.target/i386/pr87317-4.c: Likewise.
	* gcc.target/i386/pr87317-5.c: Likewise.
	* gcc.target/i386/pr87317-6.c: Likewise.
	* gcc.target/i386/pr87317-7.c: Likewise.
	* gcc.target/i386/pr87317-8.c: Likewise.
	* gcc.target/i386/pr87317-9.c: Likewise.
	* gcc.target/i386/pr87317-10.c: Likewise.
---
 gcc/config/i386/sse.md                     | 98 ++++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr87317-1.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-10.c | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-2.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-3.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-4.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-5.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-6.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-7.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-8.c  | 13 +++
 gcc/testsuite/gcc.target/i386/pr87317-9.c  | 13 +++
 11 files changed, 228 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-9.c

-- 
2.17.1

Comments

Marc Glisse Sept. 30, 2018, 8:53 a.m. | #1
On Sat, 29 Sep 2018, H.J. Lu wrote:

> Add pmovzx/pmovsx patterns with SI and DI operands for pmovzx/pmovsx

> instructions which only read the low 4 or 8 bytes from the source.


Hello,

I am wondering a few things (these are questions, I am not asking for 
changes):

Should we change the builtin and make it take a shorter argument, so it is 
visible to gimple optimizers that the high part is unused? But then would 
that shorter type be v8qi (we don't really have that type) or di (risks 
trying to use general regs?)?

> +(define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>"

> +  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")

> +	(any_extend:V8HI

> +	  (vec_select:V8QI

> +	    (subreg:V16QI

> +	      (vec_concat:V2DI

> +	        (match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm")

> +		(const_int 0)) 0)

> +	    (parallel [(const_int 0) (const_int 1)

> +		       (const_int 2) (const_int 3)

> +		       (const_int 4) (const_int 5)

> +		       (const_int 6) (const_int 7)]))))]


There is code in simplify-rtx.c that handles (vec_select (vec_concat x
y) z) when vec_select only picks from x. We could extend it to handle an
intermediate subreg/cast, which would yield something like:
(any_extend:V8HI (subreg:V8QI (match_operand:DI)))
or maybe even
(any_extend:V8HI (match_operand:V8QI))
Would this be likely to work? Is it desirable?

-- 
Marc Glisse
H.J. Lu Sept. 30, 2018, 3 p.m. | #2
On Sun, Sep 30, 2018 at 1:53 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>

> On Sat, 29 Sep 2018, H.J. Lu wrote:

>

> > Add pmovzx/pmovsx patterns with SI and DI operands for pmovzx/pmovsx

> > instructions which only read the low 4 or 8 bytes from the source.

>

> Hello,

>

> I am wondering a few things (these are questions, I am not asking for

> changes):

>

> Should we change the builtin and make it take a shorter argument, so it is

> visible to gimple optimizers that the high part is unused? But then would

> that shorter type be v8qi (we don't really have that type) or di (risks

> trying to use general regs?)?


I think this may lead to other issues you pointed out.

> > +(define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>"

> > +  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")

> > +     (any_extend:V8HI

> > +       (vec_select:V8QI

> > +         (subreg:V16QI

> > +           (vec_concat:V2DI

> > +             (match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm")

> > +             (const_int 0)) 0)

> > +         (parallel [(const_int 0) (const_int 1)

> > +                    (const_int 2) (const_int 3)

> > +                    (const_int 4) (const_int 5)

> > +                    (const_int 6) (const_int 7)]))))]

>

> There is code in simplify-rtx.c that handles (vec_select (vec_concat x

> y) z) when vec_select only picks from x. We could extend it to handle an

> intermediate subreg/cast, which would yield something like:

> (any_extend:V8HI (subreg:V8QI (match_operand:DI)))

> or maybe even

> (any_extend:V8HI (match_operand:V8QI))

> Would this be likely to work? Is it desirable?

>


We need vector instructions with source memory or XMM operand in SI/DImode
only for these patterns.  Exposed it to simplify-rtx.c may lead to unexpected
results.  But we don't know for sure unless we try.

-- 
H.J.

Patch

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d2722fdfcd0..c8ff35b125c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15521,6 +15521,26 @@ 
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_<code>v8qiv8hi2<mask_name>"
+  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
+	(any_extend:V8HI
+	  (vec_select:V8QI
+	    (subreg:V16QI
+	      (vec_concat:V2DI
+	        (match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm")
+		(const_int 0)) 0)
+	    (parallel [(const_int 0) (const_int 1)
+		       (const_int 2) (const_int 3)
+		       (const_int 4) (const_int 5)
+		       (const_int 6) (const_int 7)]))))]
+  "TARGET_SSE4_1 && <mask_avx512bw_condition> && <mask_avx512vl_condition>"
+  "%vpmov<extsuffix>bw\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI")])
+
 (define_insn "<mask_codefor>avx512f_<code>v16qiv16si2<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_extend:V16SI
@@ -15562,6 +15582,28 @@ 
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_<code>v4qiv4si2<mask_name>"
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
+	(any_extend:V4SI
+	  (vec_select:V4QI
+	    (subreg:V16QI
+	      (vec_merge:V4SI
+	        (vec_duplicate:V4SI
+		  (match_operand:SI 1 "nonimmediate_operand" "m,*m,m"))
+		(const_vector:V4SI
+		   [(const_int 0) (const_int 0)
+		    (const_int 0) (const_int 0)])
+		(const_int 1)) 0)
+	    (parallel [(const_int 0) (const_int 1)
+		       (const_int 2) (const_int 3)]))))]
+  "TARGET_SSE4_1 && <mask_avx512vl_condition>"
+  "%vpmov<extsuffix>bd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %k1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI")])
+
 (define_insn "avx512f_<code>v16hiv16si2<mask_name>"
   [(set (match_operand:V16SI 0 "register_operand" "=v")
 	(any_extend:V16SI
@@ -15598,6 +15640,24 @@ 
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_<code>v4hiv4si2<mask_name>"
+  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
+	(any_extend:V4SI
+	  (vec_select:V4HI
+	    (subreg:V8HI
+	      (vec_concat:V2DI
+		(match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm")
+		(const_int 0)) 0)
+	    (parallel [(const_int 0) (const_int 1)
+		       (const_int 2) (const_int 3)]))))]
+  "TARGET_SSE4_1 && <mask_avx512vl_condition>"
+  "%vpmov<extsuffix>wd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI")])
+
 (define_insn "avx512f_<code>v8qiv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
@@ -15679,6 +15739,27 @@ 
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_<code>v2hiv2di2<mask_name>"
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
+	(any_extend:V2DI
+	  (vec_select:V2HI
+	    (subreg:V8HI
+	      (vec_merge:V4SI
+	        (vec_duplicate:V4SI
+	          (match_operand:SI 1 "nonimmediate_operand" "m,*m,m"))
+		(const_vector:V4SI
+		   [(const_int 0) (const_int 0)
+		    (const_int 0) (const_int 0)])
+		(const_int 1)) 0)
+	    (parallel [(const_int 0) (const_int 1)]))))]
+  "TARGET_SSE4_1 && <mask_avx512vl_condition>"
+  "%vpmov<extsuffix>wq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %k1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI")])
+
 (define_insn "avx512f_<code>v8siv8di2<mask_name>"
   [(set (match_operand:V8DI 0 "register_operand" "=v")
 	(any_extend:V8DI
@@ -15714,6 +15795,23 @@ 
    (set_attr "prefix" "orig,orig,maybe_evex")
    (set_attr "mode" "TI")])
 
+(define_insn "*sse4_1_<code>v2siv2di2<mask_name>"
+  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
+	(any_extend:V2DI
+	  (vec_select:V2SI
+	    (subreg:V4SI
+	      (vec_concat:V2DI
+		(match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm")
+		(const_int 0)) 0)
+	    (parallel [(const_int 0) (const_int 1)]))))]
+  "TARGET_SSE4_1 && <mask_avx512vl_condition>"
+  "%vpmov<extsuffix>dq\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssemov")
+   (set_attr "prefix_extra" "1")
+   (set_attr "prefix" "orig,orig,maybe_evex")
+   (set_attr "mode" "TI")])
+
 ;; ptestps/ptestpd are very similar to comiss and ucomiss when
 ;; setting FLAGS_REG. But it is not a really compare instruction.
 (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-1.c b/gcc/testsuite/gcc.target/i386/pr87317-1.c
new file mode 100644
index 00000000000..91f00368293
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-1.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, void *ptr)
+{
+  __m128i data = _mm_loadl_epi64((__m128i *)ptr);
+  data = _mm_cvtepu8_epi16(data);
+  _mm_storeu_si128((__m128i*)dst, data);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-10.c b/gcc/testsuite/gcc.target/i386/pr87317-10.c
new file mode 100644
index 00000000000..99c657e9df8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-10.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, void *ptr)
+{
+  __m128i data = _mm_cvtsi32_si128(*(int*)ptr);
+  data = _mm_cvtepu8_epi32(data);
+  _mm_storeu_si128((__m128i*)dst, data);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-2.c b/gcc/testsuite/gcc.target/i386/pr87317-2.c
new file mode 100644
index 00000000000..e21f00334e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-2.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, void *ptr)
+{
+  __m128i data = _mm_loadl_epi64((__m128i *)ptr);
+  data = _mm_cvtepi16_epi32(data);
+  _mm_storeu_si128((__m128i*)dst, data);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-3.c b/gcc/testsuite/gcc.target/i386/pr87317-3.c
new file mode 100644
index 00000000000..d4483f9c134
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-3.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, void *ptr)
+{
+  __m128i data = _mm_loadl_epi64((__m128i *)ptr);
+  data = _mm_cvtepi32_epi64(data);
+  _mm_storeu_si128((__m128i*)dst, data);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-4.c b/gcc/testsuite/gcc.target/i386/pr87317-4.c
new file mode 100644
index 00000000000..dff24a9d657
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-4.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, __m64 x)
+{
+  __m128i y = _mm_movpi64_epi64(x);
+  __m128i z = _mm_cvtepu8_epi16(y);
+  _mm_storeu_si128((__m128i*)dst, z);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-5.c b/gcc/testsuite/gcc.target/i386/pr87317-5.c
new file mode 100644
index 00000000000..574395894b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-5.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, __m64 x)
+{
+  __m128i y = _mm_movpi64_epi64(x);
+  __m128i z = _mm_cvtepi16_epi32(y);
+  _mm_storeu_si128((__m128i*)dst, z);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-6.c b/gcc/testsuite/gcc.target/i386/pr87317-6.c
new file mode 100644
index 00000000000..9d27648d433
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-6.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, __m64 x)
+{
+  __m128i y = _mm_movpi64_epi64(x);
+  __m128i z = _mm_cvtepi32_epi64 (y);
+  _mm_storeu_si128((__m128i*)dst, z);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-7.c b/gcc/testsuite/gcc.target/i386/pr87317-7.c
new file mode 100644
index 00000000000..99c657e9df8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-7.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, void *ptr)
+{
+  __m128i data = _mm_cvtsi32_si128(*(int*)ptr);
+  data = _mm_cvtepu8_epi32(data);
+  _mm_storeu_si128((__m128i*)dst, data);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-8.c b/gcc/testsuite/gcc.target/i386/pr87317-8.c
new file mode 100644
index 00000000000..c688e3e6d08
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-8.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmov(d|q)" } } */
+
+#include <immintrin.h>
+
+void
+f (void *dst, void *ptr)
+{
+  __m128i data = _mm_cvtsi32_si128(*(int*)ptr);
+  data = _mm_cvtepu16_epi64(data);
+  _mm_storeu_si128((__m128i*)dst, data);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr87317-9.c b/gcc/testsuite/gcc.target/i386/pr87317-9.c
new file mode 100644
index 00000000000..4311ed3ceb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr87317-9.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=haswell" } */
+/* { dg-final { scan-assembler-not "vmovq" } } */
+
+#include <immintrin.h>
+
+int
+f (void *ptr)
+{
+  __m128i data = _mm_loadl_epi64((__m128i *)ptr);
+  data = _mm_cvtepu8_epi16(data);
+  return _mm_cvtsi128_si32(data);
+}