[14/43] i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE

Message ID 20190210001947.27278-15-hjl.tools@gmail.com
State Superseded
Headers show
Series
  • V3: Emulate MMX intrinsics with SSE
Related show

Commit Message

H.J. Lu Feb. 10, 2019, 12:19 a.m.
Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE.

	PR target/89021
	* config/i386/mmx.md (sse_cvtps2pi): Add SSE emulation.
	(sse_cvttps2pi): Likewise.
---
 gcc/config/i386/sse.md | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

-- 
2.20.1

Comments

Uros Bizjak Feb. 10, 2019, 10:48 a.m. | #1
On 2/10/19, H.J. Lu <hjl.tools@gmail.com> wrote:
> Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE.

>

> 	PR target/89021

> 	* config/i386/mmx.md (sse_cvtps2pi): Add SSE emulation.

> 	(sse_cvttps2pi): Likewise.


It looks to me that this description is wrong. We don't have V4SF
modes here, but V2SF, so we have to fake 64bit load in case of MMX.
The cvtps2dq will access memory in true 128bit width, so this is
wrong.

We have to fix the description to not fake wide mode.

Uros.

> ---

>  gcc/config/i386/sse.md | 30 ++++++++++++++++++------------

>  1 file changed, 18 insertions(+), 12 deletions(-)

>

> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md

> index 7d2c0367911..ad31ac3d9e6 100644

> --- a/gcc/config/i386/sse.md

> +++ b/gcc/config/i386/sse.md

> @@ -4668,26 +4668,32 @@

>     (set_attr "mode" "V4SF")])

>

>  (define_insn "sse_cvtps2pi"

> -  [(set (match_operand:V2SI 0 "register_operand" "=y")

> +  [(set (match_operand:V2SI 0 "register_operand" "=y,Yv")

>  	(vec_select:V2SI

> -	  (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm")]

> +	  (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm")]

>  		       UNSPEC_FIX_NOTRUNC)

>  	  (parallel [(const_int 0) (const_int 1)])))]

> -  "TARGET_SSE"

> -  "cvtps2pi\t{%1, %0|%0, %q1}"

> -  [(set_attr "type" "ssecvt")

> -   (set_attr "unit" "mmx")

> +  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE"

> +  "@

> +   cvtps2pi\t{%1, %0|%0, %q1}

> +   %vcvtps2dq\t{%1, %0|%0, %1}"

> +  [(set_attr "mmx_isa" "native,x64")

> +   (set_attr "type" "ssecvt")

> +   (set_attr "unit" "mmx,*")

>     (set_attr "mode" "DI")])

>

>  (define_insn "sse_cvttps2pi"

> -  [(set (match_operand:V2SI 0 "register_operand" "=y")

> +  [(set (match_operand:V2SI 0 "register_operand" "=y,Yv")

>  	(vec_select:V2SI

> -	  (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm"))

> +	  (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm"))

>  	  (parallel [(const_int 0) (const_int 1)])))]

> -  "TARGET_SSE"

> -  "cvttps2pi\t{%1, %0|%0, %q1}"

> -  [(set_attr "type" "ssecvt")

> -   (set_attr "unit" "mmx")

> +  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE"

> +  "@

> +   cvttps2pi\t{%1, %0|%0, %q1}

> +   %vcvttps2dq\t{%1, %0|%0, %1}"

> +  [(set_attr "mmx_isa" "native,x64")

> +   (set_attr "type" "ssecvt")

> +   (set_attr "unit" "mmx,*")

>     (set_attr "prefix_rep" "0")

>     (set_attr "mode" "SF")])

>

> --

> 2.20.1

>

>
H.J. Lu Feb. 11, 2019, 7:07 p.m. | #2
On Sun, Feb 10, 2019 at 2:48 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On 2/10/19, H.J. Lu <hjl.tools@gmail.com> wrote:

> > Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE.

> >

> >       PR target/89021

> >       * config/i386/mmx.md (sse_cvtps2pi): Add SSE emulation.

> >       (sse_cvttps2pi): Likewise.

>

> It looks to me that this description is wrong. We don't have V4SF

> modes here, but V2SF, so we have to fake 64bit load in case of MMX.

> The cvtps2dq will access memory in true 128bit width, so this is

> wrong.

>

> We have to fix the description to not fake wide mode.

>


What do you propose to implement

__m64 _mm_cvtps_pi32 (__m128 __A);

We also have

(define_insn "sse2_cvtps2pd<mask_name>"
  [(set (match_operand:V2DF 0 "register_operand" "=v")
        (float_extend:V2DF
          (vec_select:V2SF
            (match_operand:V4SF 1 "vector_operand" "vm")
            (parallel [(const_int 0) (const_int 1)]))))]
  "TARGET_SSE2 && <mask_avx512vl_condition>"
  "%vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"

These aren't new problems introduced by my MMX work.

-- 
H.J.
Uros Bizjak Feb. 11, 2019, 7:52 p.m. | #3
On Mon, Feb 11, 2019 at 8:08 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Sun, Feb 10, 2019 at 2:48 AM Uros Bizjak <ubizjak@gmail.com> wrote:

> >

> > On 2/10/19, H.J. Lu <hjl.tools@gmail.com> wrote:

> > > Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE.

> > >

> > >       PR target/89021

> > >       * config/i386/mmx.md (sse_cvtps2pi): Add SSE emulation.

> > >       (sse_cvttps2pi): Likewise.

> >

> > It looks to me that this description is wrong. We don't have V4SF

> > modes here, but V2SF, so we have to fake 64bit load in case of MMX.

> > The cvtps2dq will access memory in true 128bit width, so this is

> > wrong.

> >

> > We have to fix the description to not fake wide mode.

> >

>

> What do you propose to implement

>

> __m64 _mm_cvtps_pi32 (__m128 __A);


Hm...

In your original patch, we *do* have V4SF memory access, but the
original insn accesses it in __m64 mode. This should be OK, but then
accessing this memory in __m128 mode should also be OK. So, on a more
detailed look, the original patch looks OK to me. Luckily, a false
alarm...

>

> We also have

>

> (define_insn "sse2_cvtps2pd<mask_name>"

>   [(set (match_operand:V2DF 0 "register_operand" "=v")

>         (float_extend:V2DF

>           (vec_select:V2SF

>             (match_operand:V4SF 1 "vector_operand" "vm")

>             (parallel [(const_int 0) (const_int 1)]))))]

>   "TARGET_SSE2 && <mask_avx512vl_condition>"

>   "%vcvtps2pd\t{%1, %0<mask_operand2>|%0<mask_operand2>, %q1}"

>

> These aren't new problems introduced by my MMX work.


This one is not problematic, since the instruction accesses memory in
__m64 mode, which is narrower that V4SFmode.

Uros.

Patch

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7d2c0367911..ad31ac3d9e6 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4668,26 +4668,32 @@ 
    (set_attr "mode" "V4SF")])
 
 (define_insn "sse_cvtps2pi"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,Yv")
 	(vec_select:V2SI
-	  (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm")]
+	  (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm")]
 		       UNSPEC_FIX_NOTRUNC)
 	  (parallel [(const_int 0) (const_int 1)])))]
-  "TARGET_SSE"
-  "cvtps2pi\t{%1, %0|%0, %q1}"
-  [(set_attr "type" "ssecvt")
-   (set_attr "unit" "mmx")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE"
+  "@
+   cvtps2pi\t{%1, %0|%0, %q1}
+   %vcvtps2dq\t{%1, %0|%0, %1}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "ssecvt")
+   (set_attr "unit" "mmx,*")
    (set_attr "mode" "DI")])
 
 (define_insn "sse_cvttps2pi"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,Yv")
 	(vec_select:V2SI
-	  (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm"))
+	  (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm"))
 	  (parallel [(const_int 0) (const_int 1)])))]
-  "TARGET_SSE"
-  "cvttps2pi\t{%1, %0|%0, %q1}"
-  [(set_attr "type" "ssecvt")
-   (set_attr "unit" "mmx")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE"
+  "@
+   cvttps2pi\t{%1, %0|%0, %q1}
+   %vcvttps2dq\t{%1, %0|%0, %1}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "ssecvt")
+   (set_attr "unit" "mmx,*")
    (set_attr "prefix_rep" "0")
    (set_attr "mode" "SF")])