[12/43] i386: Emulate MMX vec_dupv2si with SSE

Message ID 20190210001947.27278-13-hjl.tools@gmail.com
State New
Headers show
Series
  • V3: Emulate MMX intrinsics with SSE
Related show

Commit Message

H.J. Lu Feb. 10, 2019, 12:19 a.m.
Emulate MMX vec_dupv2si with SSE.  Only SSE register source operand is
allowed.

	PR target/89021
	* config/i386/mmx.md (*vec_dupv2si): Changed to
	define_insn_and_split and also allow TARGET_MMX_WITH_SSE to
	support SSE emulation.
	* config/i386/sse.md (*vec_dupv4si): Renamed to ...
	(vec_dupv4si): This.
---
 gcc/config/i386/mmx.md | 27 ++++++++++++++++++++-------
 gcc/config/i386/sse.md |  2 +-
 2 files changed, 21 insertions(+), 8 deletions(-)

-- 
2.20.1

Comments

Uros Bizjak Feb. 10, 2019, 10:36 a.m. | #1
On 2/10/19, H.J. Lu <hjl.tools@gmail.com> wrote:
> Emulate MMX vec_dupv2si with SSE.  Only SSE register source operand is

> allowed.

>

> 	PR target/89021

> 	* config/i386/mmx.md (*vec_dupv2si): Changed to

> 	define_insn_and_split and also allow TARGET_MMX_WITH_SSE to

> 	support SSE emulation.

> 	* config/i386/sse.md (*vec_dupv4si): Renamed to ...

> 	(vec_dupv4si): This.

> ---

>  gcc/config/i386/mmx.md | 27 ++++++++++++++++++++-------

>  gcc/config/i386/sse.md |  2 +-

>  2 files changed, 21 insertions(+), 8 deletions(-)

>

> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md

> index d360e97c98b..1ee51c5deb7 100644

> --- a/gcc/config/i386/mmx.md

> +++ b/gcc/config/i386/mmx.md

> @@ -1420,14 +1420,27 @@

>     (set_attr "length_immediate" "1")

>     (set_attr "mode" "DI")])

>

> -(define_insn "*vec_dupv2si"

> -  [(set (match_operand:V2SI 0 "register_operand" "=y")

> +(define_insn_and_split "*vec_dupv2si"

> +  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")

>  	(vec_duplicate:V2SI

> -	  (match_operand:SI 1 "register_operand" "0")))]

> -  "TARGET_MMX"

> -  "punpckldq\t%0, %0"

> -  [(set_attr "type" "mmxcvt")

> -   (set_attr "mode" "DI")])

> +	  (match_operand:SI 1 "register_operand" "0,0,Yv")))]

> +  "TARGET_MMX || TARGET_MMX_WITH_SSE"

> +  "@

> +   punpckldq\t%0, %0

> +   #

> +   #"

> +  "&& reload_completed && TARGET_MMX_WITH_SSE"


Please fix above.

> +  [(const_int 0)]

> +{

> +  /* Emulate MMX vec_dupv2si with SSE vec_dupv4si.  */

> +  rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> +  rtx insn = gen_vec_dupv4si (op0, operands[1]);

> +  emit_insn (insn);

> +  DONE;


Please write this simple RTX explicitly in the place of (const_int 0) above.

Uros.

> +}

> +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> +   (set_attr "type" "mmxcvt,ssemov,ssemov")

> +   (set_attr "mode" "DI,TI,TI")])

>

>  (define_insn "*mmx_concatv2si"

>    [(set (match_operand:V2SI 0 "register_operand"     "=y,y")

> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md

> index 5dc0930ac1f..7d2c0367911 100644

> --- a/gcc/config/i386/sse.md

> +++ b/gcc/config/i386/sse.md

> @@ -18976,7 +18976,7 @@

>     (set_attr "prefix" "maybe_evex,maybe_evex,orig")

>     (set_attr "mode" "V4SF")])

>

> -(define_insn "*vec_dupv4si"

> +(define_insn "vec_dupv4si"

>    [(set (match_operand:V4SI 0 "register_operand"     "=v,v,x")

>  	(vec_duplicate:V4SI

>  	  (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]

> --

> 2.20.1

>

>
H.J. Lu Feb. 10, 2019, 9 p.m. | #2
On Sun, Feb 10, 2019 at 2:36 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On 2/10/19, H.J. Lu <hjl.tools@gmail.com> wrote:

> > Emulate MMX vec_dupv2si with SSE.  Only SSE register source operand is

> > allowed.

> >

> >       PR target/89021

> >       * config/i386/mmx.md (*vec_dupv2si): Changed to

> >       define_insn_and_split and also allow TARGET_MMX_WITH_SSE to

> >       support SSE emulation.

> >       * config/i386/sse.md (*vec_dupv4si): Renamed to ...

> >       (vec_dupv4si): This.

> > ---

> >  gcc/config/i386/mmx.md | 27 ++++++++++++++++++++-------

> >  gcc/config/i386/sse.md |  2 +-

> >  2 files changed, 21 insertions(+), 8 deletions(-)

> >

> > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md

> > index d360e97c98b..1ee51c5deb7 100644

> > --- a/gcc/config/i386/mmx.md

> > +++ b/gcc/config/i386/mmx.md

> > @@ -1420,14 +1420,27 @@

> >     (set_attr "length_immediate" "1")

> >     (set_attr "mode" "DI")])

> >

> > -(define_insn "*vec_dupv2si"

> > -  [(set (match_operand:V2SI 0 "register_operand" "=y")

> > +(define_insn_and_split "*vec_dupv2si"

> > +  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")

> >       (vec_duplicate:V2SI

> > -       (match_operand:SI 1 "register_operand" "0")))]

> > -  "TARGET_MMX"

> > -  "punpckldq\t%0, %0"

> > -  [(set_attr "type" "mmxcvt")

> > -   (set_attr "mode" "DI")])

> > +       (match_operand:SI 1 "register_operand" "0,0,Yv")))]

> > +  "TARGET_MMX || TARGET_MMX_WITH_SSE"

> > +  "@

> > +   punpckldq\t%0, %0

> > +   #

> > +   #"

> > +  "&& reload_completed && TARGET_MMX_WITH_SSE"

>

> Please fix above.


I will use

"TARGET_MMX_WITH_SSE && reload_completed"

> > +  [(const_int 0)]

> > +{

> > +  /* Emulate MMX vec_dupv2si with SSE vec_dupv4si.  */

> > +  rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> > +  rtx insn = gen_vec_dupv4si (op0, operands[1]);

> > +  emit_insn (insn);

> > +  DONE;

>

> Please write this simple RTX explicitly in the place of (const_int 0) above.


rtx insn = gen_vec_dupv4si (op0, operands[1]);

is easy.   How do I write

rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

in place of  (const_int 0)?


> Uros.

>

> > +}

> > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > +   (set_attr "type" "mmxcvt,ssemov,ssemov")

> > +   (set_attr "mode" "DI,TI,TI")])

> >

> >  (define_insn "*mmx_concatv2si"

> >    [(set (match_operand:V2SI 0 "register_operand"     "=y,y")

> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md

> > index 5dc0930ac1f..7d2c0367911 100644

> > --- a/gcc/config/i386/sse.md

> > +++ b/gcc/config/i386/sse.md

> > @@ -18976,7 +18976,7 @@

> >     (set_attr "prefix" "maybe_evex,maybe_evex,orig")

> >     (set_attr "mode" "V4SF")])

> >

> > -(define_insn "*vec_dupv4si"

> > +(define_insn "vec_dupv4si"

> >    [(set (match_operand:V4SI 0 "register_operand"     "=v,v,x")

> >       (vec_duplicate:V4SI

> >         (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]

> > --

> > 2.20.1

> >

> >




-- 
H.J.
Uros Bizjak Feb. 10, 2019, 9:45 p.m. | #3
On Sun, Feb 10, 2019 at 10:01 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Sun, Feb 10, 2019 at 2:36 AM Uros Bizjak <ubizjak@gmail.com> wrote:

> >

> > On 2/10/19, H.J. Lu <hjl.tools@gmail.com> wrote:

> > > Emulate MMX vec_dupv2si with SSE.  Only SSE register source operand is

> > > allowed.

> > >

> > >       PR target/89021

> > >       * config/i386/mmx.md (*vec_dupv2si): Changed to

> > >       define_insn_and_split and also allow TARGET_MMX_WITH_SSE to

> > >       support SSE emulation.

> > >       * config/i386/sse.md (*vec_dupv4si): Renamed to ...

> > >       (vec_dupv4si): This.

> > > ---

> > >  gcc/config/i386/mmx.md | 27 ++++++++++++++++++++-------

> > >  gcc/config/i386/sse.md |  2 +-

> > >  2 files changed, 21 insertions(+), 8 deletions(-)

> > >

> > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md

> > > index d360e97c98b..1ee51c5deb7 100644

> > > --- a/gcc/config/i386/mmx.md

> > > +++ b/gcc/config/i386/mmx.md

> > > @@ -1420,14 +1420,27 @@

> > >     (set_attr "length_immediate" "1")

> > >     (set_attr "mode" "DI")])

> > >

> > > -(define_insn "*vec_dupv2si"

> > > -  [(set (match_operand:V2SI 0 "register_operand" "=y")

> > > +(define_insn_and_split "*vec_dupv2si"

> > > +  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")

> > >       (vec_duplicate:V2SI

> > > -       (match_operand:SI 1 "register_operand" "0")))]

> > > -  "TARGET_MMX"

> > > -  "punpckldq\t%0, %0"

> > > -  [(set_attr "type" "mmxcvt")

> > > -   (set_attr "mode" "DI")])

> > > +       (match_operand:SI 1 "register_operand" "0,0,Yv")))]

> > > +  "TARGET_MMX || TARGET_MMX_WITH_SSE"

> > > +  "@

> > > +   punpckldq\t%0, %0

> > > +   #

> > > +   #"

> > > +  "&& reload_completed && TARGET_MMX_WITH_SSE"

> >

> > Please fix above.

>

> I will use

>

> "TARGET_MMX_WITH_SSE && reload_completed"

>

> > > +  [(const_int 0)]

> > > +{

> > > +  /* Emulate MMX vec_dupv2si with SSE vec_dupv4si.  */

> > > +  rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> > > +  rtx insn = gen_vec_dupv4si (op0, operands[1]);

> > > +  emit_insn (insn);

> > > +  DONE;

> >

> > Please write this simple RTX explicitly in the place of (const_int 0) above.

>

> rtx insn = gen_vec_dupv4si (op0, operands[1]);

>

> is easy.   How do I write

>

> rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

>

> in place of  (const_int 0)?


  [(set (match_dup 2)
    (vec_duplicate:V4SI (match_dup 1)))]

with

"operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"

or even better:

"operands[2] = gen_lowpart (V4SImode, operands[0]);"

in the preparation statement.

Uros.
Uros Bizjak Feb. 10, 2019, 9:49 p.m. | #4
On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > > > +  [(const_int 0)]

> > > > +{

> > > > +  /* Emulate MMX vec_dupv2si with SSE vec_dupv4si.  */

> > > > +  rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> > > > +  rtx insn = gen_vec_dupv4si (op0, operands[1]);

> > > > +  emit_insn (insn);

> > > > +  DONE;

> > >

> > > Please write this simple RTX explicitly in the place of (const_int 0) above.

> >

> > rtx insn = gen_vec_dupv4si (op0, operands[1]);

> >

> > is easy.   How do I write

> >

> > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> >

> > in place of  (const_int 0)?

>

>   [(set (match_dup 2)

>     (vec_duplicate:V4SI (match_dup 1)))]

>

> with

>

> "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"

>

> or even better:

>

> "operands[2] = gen_lowpart (V4SImode, operands[0]);"

>

> in the preparation statement.


Even shorter is

"operands[0] = gen_lowpart (V4SImode, operands[0]);"

and use (match_dup 0) instead of (match_dup 2) in the RTX.

There is plenty of examples throughout sse.md.

Uros.
H.J. Lu Feb. 11, 2019, 1:03 a.m. | #5
On Sun, Feb 10, 2019 at 1:49 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:

>

> > > > > +  [(const_int 0)]

> > > > > +{

> > > > > +  /* Emulate MMX vec_dupv2si with SSE vec_dupv4si.  */

> > > > > +  rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> > > > > +  rtx insn = gen_vec_dupv4si (op0, operands[1]);

> > > > > +  emit_insn (insn);

> > > > > +  DONE;

> > > >

> > > > Please write this simple RTX explicitly in the place of (const_int 0) above.

> > >

> > > rtx insn = gen_vec_dupv4si (op0, operands[1]);

> > >

> > > is easy.   How do I write

> > >

> > > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> > >

> > > in place of  (const_int 0)?

> >

> >   [(set (match_dup 2)

> >     (vec_duplicate:V4SI (match_dup 1)))]

> >

> > with

> >

> > "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"

> >

> > or even better:

> >

> > "operands[2] = gen_lowpart (V4SImode, operands[0]);"

> >

> > in the preparation statement.

>

> Even shorter is

>

> "operands[0] = gen_lowpart (V4SImode, operands[0]);"

>

> and use (match_dup 0) instead of (match_dup 2) in the RTX.

>

> There is plenty of examples throughout sse.md.

>


This works:

(define_insn_and_split "*vec_dupv2si"
  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
        (vec_duplicate:V2SI
          (match_operand:SI 1 "register_operand" "0,0,Yv")))]
  "TARGET_MMX || TARGET_MMX_WITH_SSE"
  "@
   punpckldq\t%0, %0
   #
   #"
  "TARGET_MMX_WITH_SSE && reload_completed"
  [(set (match_dup 0)
        (vec_duplicate:V4SI (match_dup 1)))]
  "operands[0] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
   (set_attr "type" "mmxcvt,ssemov,ssemov")
   (set_attr "mode" "DI,TI,TI")])

Thanks.

-- 
H.J.
Uros Bizjak Feb. 11, 2019, 7:25 a.m. | #6
On Mon, Feb 11, 2019 at 2:04 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Sun, Feb 10, 2019 at 1:49 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> >

> > On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> >

> > > > > > +  [(const_int 0)]

> > > > > > +{

> > > > > > +  /* Emulate MMX vec_dupv2si with SSE vec_dupv4si.  */

> > > > > > +  rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> > > > > > +  rtx insn = gen_vec_dupv4si (op0, operands[1]);

> > > > > > +  emit_insn (insn);

> > > > > > +  DONE;

> > > > >

> > > > > Please write this simple RTX explicitly in the place of (const_int 0) above.

> > > >

> > > > rtx insn = gen_vec_dupv4si (op0, operands[1]);

> > > >

> > > > is easy.   How do I write

> > > >

> > > > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> > > >

> > > > in place of  (const_int 0)?

> > >

> > >   [(set (match_dup 2)

> > >     (vec_duplicate:V4SI (match_dup 1)))]

> > >

> > > with

> > >

> > > "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"

> > >

> > > or even better:

> > >

> > > "operands[2] = gen_lowpart (V4SImode, operands[0]);"

> > >

> > > in the preparation statement.

> >

> > Even shorter is

> >

> > "operands[0] = gen_lowpart (V4SImode, operands[0]);"

> >

> > and use (match_dup 0) instead of (match_dup 2) in the RTX.

> >

> > There is plenty of examples throughout sse.md.

> >

>

> This works:

>

> (define_insn_and_split "*vec_dupv2si"

>   [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")

>         (vec_duplicate:V2SI

>           (match_operand:SI 1 "register_operand" "0,0,Yv")))]

>   "TARGET_MMX || TARGET_MMX_WITH_SSE"

>   "@

>    punpckldq\t%0, %0

>    #

>    #"

>   "TARGET_MMX_WITH_SSE && reload_completed"

>   [(set (match_dup 0)

>         (vec_duplicate:V4SI (match_dup 1)))]

>   "operands[0] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"

>   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

>    (set_attr "type" "mmxcvt,ssemov,ssemov")

>    (set_attr "mode" "DI,TI,TI")])


If it works, then gen_lowpart is preferred due to extra checks.
However, it would result in a paradoxical subreg, so I wonder if these
extra checks allow this transformation.

Uros.
H.J. Lu Feb. 11, 2019, 12:26 p.m. | #7
On Sun, Feb 10, 2019 at 11:25 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On Mon, Feb 11, 2019 at 2:04 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> >

> > On Sun, Feb 10, 2019 at 1:49 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > >

> > > On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > >

> > > > > > > +  [(const_int 0)]

> > > > > > > +{

> > > > > > > +  /* Emulate MMX vec_dupv2si with SSE vec_dupv4si.  */

> > > > > > > +  rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> > > > > > > +  rtx insn = gen_vec_dupv4si (op0, operands[1]);

> > > > > > > +  emit_insn (insn);

> > > > > > > +  DONE;

> > > > > >

> > > > > > Please write this simple RTX explicitly in the place of (const_int 0) above.

> > > > >

> > > > > rtx insn = gen_vec_dupv4si (op0, operands[1]);

> > > > >

> > > > > is easy.   How do I write

> > > > >

> > > > > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> > > > >

> > > > > in place of  (const_int 0)?

> > > >

> > > >   [(set (match_dup 2)

> > > >     (vec_duplicate:V4SI (match_dup 1)))]

> > > >

> > > > with

> > > >

> > > > "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"

> > > >

> > > > or even better:

> > > >

> > > > "operands[2] = gen_lowpart (V4SImode, operands[0]);"

> > > >

> > > > in the preparation statement.

> > >

> > > Even shorter is

> > >

> > > "operands[0] = gen_lowpart (V4SImode, operands[0]);"

> > >

> > > and use (match_dup 0) instead of (match_dup 2) in the RTX.

> > >

> > > There is plenty of examples throughout sse.md.

> > >

> >

> > This works:

> >

> > (define_insn_and_split "*vec_dupv2si"

> >   [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")

> >         (vec_duplicate:V2SI

> >           (match_operand:SI 1 "register_operand" "0,0,Yv")))]

> >   "TARGET_MMX || TARGET_MMX_WITH_SSE"

> >   "@

> >    punpckldq\t%0, %0

> >    #

> >    #"

> >   "TARGET_MMX_WITH_SSE && reload_completed"

> >   [(set (match_dup 0)

> >         (vec_duplicate:V4SI (match_dup 1)))]

> >   "operands[0] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"

> >   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> >    (set_attr "type" "mmxcvt,ssemov,ssemov")

> >    (set_attr "mode" "DI,TI,TI")])

>

> If it works, then gen_lowpart is preferred due to extra checks.

> However, it would result in a paradoxical subreg, so I wonder if these

> extra checks allow this transformation.


gen_lowpart dosn't work:

#include <mmintrin.h>

__m64
foo (int i)
{
  __v2si x = { i, i };
  return (__m64) x;
}

(gdb) f 1
#1  0x0000000000ba7cca in gen_reg_rtx (mode=E_V2SImode)
    at /export/gnu/import/git/gitlab/x86-gcc/gcc/emit-rtl.c:1155
1155   gcc_assert (can_create_pseudo_p ());
(gdb) bt
#0  fancy_abort (
    file=0x22180e0 "/export/gnu/import/git/gitlab/x86-gcc/gcc/emit-rtl.c",
    line=1155,
    function=0x22193a8 <gen_reg_rtx(machine_mode)::__FUNCTION__> "gen_reg_rtx")
    at /export/gnu/import/git/gitlab/x86-gcc/gcc/diagnostic.c:1607
#1  0x0000000000ba7cca in gen_reg_rtx (mode=E_V2SImode)
    at /export/gnu/import/git/gitlab/x86-gcc/gcc/emit-rtl.c:1155
#2  0x0000000000bd3044 in copy_to_reg (x=0x7fffea99b528)
    at /export/gnu/import/git/gitlab/x86-gcc/gcc/explow.c:594
#3  0x00000000010c7c0a in gen_lowpart_general (mode=E_V4SImode,
    x=0x7fffea99b528)
    at /export/gnu/import/git/gitlab/x86-gcc/gcc/rtlhooks.c:56
...
#1  0x0000000000ba7cca in gen_reg_rtx (mode=E_V2SImode)
    at /export/gnu/import/git/gitlab/x86-gcc/gcc/emit-rtl.c:1155
1155   gcc_assert (can_create_pseudo_p ());
(gdb)

-- 
H.J.
Uros Bizjak Feb. 11, 2019, 12:51 p.m. | #8
On Mon, Feb 11, 2019 at 1:26 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Sun, Feb 10, 2019 at 11:25 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> >

> > On Mon, Feb 11, 2019 at 2:04 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> > >

> > > On Sun, Feb 10, 2019 at 1:49 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > > >

> > > > On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > > >

> > > > > > > > +  [(const_int 0)]

> > > > > > > > +{

> > > > > > > > +  /* Emulate MMX vec_dupv2si with SSE vec_dupv4si.  */

> > > > > > > > +  rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> > > > > > > > +  rtx insn = gen_vec_dupv4si (op0, operands[1]);

> > > > > > > > +  emit_insn (insn);

> > > > > > > > +  DONE;

> > > > > > >

> > > > > > > Please write this simple RTX explicitly in the place of (const_int 0) above.

> > > > > >

> > > > > > rtx insn = gen_vec_dupv4si (op0, operands[1]);

> > > > > >

> > > > > > is easy.   How do I write

> > > > > >

> > > > > > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> > > > > >

> > > > > > in place of  (const_int 0)?

> > > > >

> > > > >   [(set (match_dup 2)

> > > > >     (vec_duplicate:V4SI (match_dup 1)))]

> > > > >

> > > > > with

> > > > >

> > > > > "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"

> > > > >

> > > > > or even better:

> > > > >

> > > > > "operands[2] = gen_lowpart (V4SImode, operands[0]);"

> > > > >

> > > > > in the preparation statement.

> > > >

> > > > Even shorter is

> > > >

> > > > "operands[0] = gen_lowpart (V4SImode, operands[0]);"

> > > >

> > > > and use (match_dup 0) instead of (match_dup 2) in the RTX.

> > > >

> > > > There is plenty of examples throughout sse.md.

> > > >

> > >

> > > This works:

> > >

> > > (define_insn_and_split "*vec_dupv2si"

> > >   [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")

> > >         (vec_duplicate:V2SI

> > >           (match_operand:SI 1 "register_operand" "0,0,Yv")))]

> > >   "TARGET_MMX || TARGET_MMX_WITH_SSE"

> > >   "@

> > >    punpckldq\t%0, %0

> > >    #

> > >    #"

> > >   "TARGET_MMX_WITH_SSE && reload_completed"

> > >   [(set (match_dup 0)

> > >         (vec_duplicate:V4SI (match_dup 1)))]

> > >   "operands[0] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"

> > >   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > >    (set_attr "type" "mmxcvt,ssemov,ssemov")

> > >    (set_attr "mode" "DI,TI,TI")])

> >

> > If it works, then gen_lowpart is preferred due to extra checks.

> > However, it would result in a paradoxical subreg, so I wonder if these

> > extra checks allow this transformation.

>

> gen_lowpart dosn't work:


Ah, we need lowpart_subreg after reload.

Uros.
H.J. Lu Feb. 11, 2019, 1:11 p.m. | #9
In Mon, Feb 11, 2019 at 4:51 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On Mon, Feb 11, 2019 at 1:26 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> >

> > On Sun, Feb 10, 2019 at 11:25 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > >

> > > On Mon, Feb 11, 2019 at 2:04 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> > > >

> > > > On Sun, Feb 10, 2019 at 1:49 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > > > >

> > > > > On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > > > >

> > > > > > > > > +  [(const_int 0)]

> > > > > > > > > +{

> > > > > > > > > +  /* Emulate MMX vec_dupv2si with SSE vec_dupv4si.  */

> > > > > > > > > +  rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> > > > > > > > > +  rtx insn = gen_vec_dupv4si (op0, operands[1]);

> > > > > > > > > +  emit_insn (insn);

> > > > > > > > > +  DONE;

> > > > > > > >

> > > > > > > > Please write this simple RTX explicitly in the place of (const_int 0) above.

> > > > > > >

> > > > > > > rtx insn = gen_vec_dupv4si (op0, operands[1]);

> > > > > > >

> > > > > > > is easy.   How do I write

> > > > > > >

> > > > > > > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));

> > > > > > >

> > > > > > > in place of  (const_int 0)?

> > > > > >

> > > > > >   [(set (match_dup 2)

> > > > > >     (vec_duplicate:V4SI (match_dup 1)))]

> > > > > >

> > > > > > with

> > > > > >

> > > > > > "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"

> > > > > >

> > > > > > or even better:

> > > > > >

> > > > > > "operands[2] = gen_lowpart (V4SImode, operands[0]);"

> > > > > >

> > > > > > in the preparation statement.

> > > > >

> > > > > Even shorter is

> > > > >

> > > > > "operands[0] = gen_lowpart (V4SImode, operands[0]);"

> > > > >

> > > > > and use (match_dup 0) instead of (match_dup 2) in the RTX.

> > > > >

> > > > > There is plenty of examples throughout sse.md.

> > > > >

> > > >

> > > > This works:

> > > >

> > > > (define_insn_and_split "*vec_dupv2si"

> > > >   [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")

> > > >         (vec_duplicate:V2SI

> > > >           (match_operand:SI 1 "register_operand" "0,0,Yv")))]

> > > >   "TARGET_MMX || TARGET_MMX_WITH_SSE"

> > > >   "@

> > > >    punpckldq\t%0, %0

> > > >    #

> > > >    #"

> > > >   "TARGET_MMX_WITH_SSE && reload_completed"

> > > >   [(set (match_dup 0)

> > > >         (vec_duplicate:V4SI (match_dup 1)))]

> > > >   "operands[0] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"

> > > >   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > > >    (set_attr "type" "mmxcvt,ssemov,ssemov")

> > > >    (set_attr "mode" "DI,TI,TI")])

> > >

> > > If it works, then gen_lowpart is preferred due to extra checks.

> > > However, it would result in a paradoxical subreg, so I wonder if these

> > > extra checks allow this transformation.

> >

> > gen_lowpart dosn't work:

>

> Ah, we need lowpart_subreg after reload.

>

> Uros.


 "operands[0] = lowpart_subreg (V4SImode, operands[0],
                                 GET_MODE (operands[0]));"

works.

-- 
H.J.

Patch

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index d360e97c98b..1ee51c5deb7 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1420,14 +1420,27 @@ 
    (set_attr "length_immediate" "1")
    (set_attr "mode" "DI")])
 
-(define_insn "*vec_dupv2si"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv2si"
+  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
 	(vec_duplicate:V2SI
-	  (match_operand:SI 1 "register_operand" "0")))]
-  "TARGET_MMX"
-  "punpckldq\t%0, %0"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "mode" "DI")])
+	  (match_operand:SI 1 "register_operand" "0,0,Yv")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   punpckldq\t%0, %0
+   #
+   #"
+  "&& reload_completed && TARGET_MMX_WITH_SSE"
+  [(const_int 0)]
+{
+  /* Emulate MMX vec_dupv2si with SSE vec_dupv4si.  */
+  rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
+  rtx insn = gen_vec_dupv4si (op0, operands[1]);
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxcvt,ssemov,ssemov")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "*mmx_concatv2si"
   [(set (match_operand:V2SI 0 "register_operand"     "=y,y")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5dc0930ac1f..7d2c0367911 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18976,7 +18976,7 @@ 
    (set_attr "prefix" "maybe_evex,maybe_evex,orig")
    (set_attr "mode" "V4SF")])
 
-(define_insn "*vec_dupv4si"
+(define_insn "vec_dupv4si"
   [(set (match_operand:V4SI 0 "register_operand"     "=v,v,x")
 	(vec_duplicate:V4SI
 	  (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]