[00/41] V8: Emulate MMX intrinsics with SSE

Message ID 20190216224032.4889-1-hjl.tools@gmail.com
Headers show
Series
  • V8: Emulate MMX intrinsics with SSE
Related show

Message

H.J. Lu Feb. 16, 2019, 10:39 p.m.
On x86-64, since __m64 is returned and passed in XMM registers, we can
emulate MMX intrinsics with SSE instructions. To support it, we added

 #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

;; Define instruction set of MMX instructions
(define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"
  (const_string "base"))

         (eq_attr "mmx_isa" "native")
           (symbol_ref "!TARGET_MMX_WITH_SSE")
         (eq_attr "mmx_isa" "x64")
           (symbol_ref "TARGET_MMX_WITH_SSE")
         (eq_attr "mmx_isa" "x64_avx")
           (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
         (eq_attr "mmx_isa" "x64_noavx")
           (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

We added SSE emulation to MMX patterns and disabled MMX alternatives with
TARGET_MMX_WITH_SSE.

Most of MMX instructions have equivalent SSE versions and results of some
SSE versions need to be reshuffled to the right order for MMX.  Thee are
couple tricky cases:

1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX
maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
mask operand and handle unmapped bits 64:127 at memory address by
adjusting source and mask operands together with memory address.

2. MMX movntq is emulated with SSE2 DImode movnti, which is available
in 64-bit mode.

3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.
SSE emulation must clear the bit 4 in the shuffle control mask.

4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve
the upper 64 bits of destination XMM register.

Tests are also added to check each SSE emulation of MMX intrinsics.

There are no regressions on i686 and x86-64.  For x86-64, GCC is also
tested with

--with-arch=native --with-cpu=native

on AVX2 and AVX512F machines.

H.J. Lu (40):
  i386: Allow MMX register modes in SSE registers
  i386: Emulate MMX packsswb/packssdw/packuswb with SSE2
  i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX
  i386: Emulate MMX plusminus/sat_plusminus with SSE
  i386: Emulate MMX mulv4hi3 with SSE
  i386: Emulate MMX smulv4hi3_highpart with SSE
  i386: Emulate MMX mmx_pmaddwd with SSE
  i386: Emulate MMX ashr<mode>3/<shift_insn><mode>3 with SSE
  i386: Emulate MMX <any_logic><mode>3 with SSE
  i386: Emulate MMX mmx_andnot<mode>3 with SSE
  i386: Emulate MMX mmx_eq/mmx_gt<mode>3 with SSE
  i386: Emulate MMX vec_dupv2si with SSE
  i386: Emulate MMX pshufw with SSE
  i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE
  i386: Emulate MMX sse_cvtpi2ps with SSE
  i386: Emulate MMX mmx_pextrw with SSE
  i386: Emulate MMX mmx_pinsrw with SSE
  i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE
  i386: Emulate MMX mmx_pmovmskb with SSE
  i386: Emulate MMX mmx_umulv4hi3_highpart with SSE
  i386: Emulate MMX maskmovq with SSE2 maskmovdqu
  i386: Emulate MMX mmx_uavgv8qi3 with SSE
  i386: Emulate MMX mmx_uavgv4hi3 with SSE
  i386: Emulate MMX mmx_psadbw with SSE
  i386: Emulate MMX movntq with SSE2 movntidi
  i386: Emulate MMX umulv1siv1di3 with SSE2
  i386: Make _mm_empty () as NOP for TARGET_MMX_WITH_SSE
  i386: Emulate MMX ssse3_ph<plusminus_mnemonic>wv4hi3 with SSE
  i386: Emulate MMX ssse3_ph<plusminus_mnemonic>dv2si3 with SSE
  i386: Emulate MMX ssse3_pmaddubsw with SSE
  i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE
  i386: Emulate MMX pshufb with SSE version
  i386: Emulate MMX ssse3_psign<mode>3 with SSE
  i386: Emulate MMX ssse3_palignrdi with SSE
  i386: Emulate MMX abs<mode>2 with SSE
  i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE
  i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE
  i386: Allow MMX intrinsic emulation with SSE
  i386: Enable TM MMX intrinsics with SSE2
  i386: Add tests for MMX intrinsic emulations with SSE

Uros Bizjak (1):
  Prevent allocation of MMX registers with TARGET_MMX_WITH_SSE

 gcc/config/i386/constraints.md                |   6 +
 gcc/config/i386/i386-builtin.def              | 126 +--
 gcc/config/i386/i386-c.c                      |   2 +
 gcc/config/i386/i386-protos.h                 |   4 +
 gcc/config/i386/i386.c                        | 189 +++-
 gcc/config/i386/i386.h                        |   2 +
 gcc/config/i386/i386.md                       |  17 +
 gcc/config/i386/mmintrin.h                    |  12 +-
 gcc/config/i386/mmx.md                        | 984 ++++++++++++------
 gcc/config/i386/predicates.md                 |   7 +
 gcc/config/i386/sse.md                        | 359 +++++--
 gcc/config/i386/xmmintrin.h                   |  61 ++
 gcc/testsuite/gcc.target/i386/mmx-vals.h      |  77 ++
 gcc/testsuite/gcc.target/i386/pr82483-1.c     |   2 +-
 gcc/testsuite/gcc.target/i386/pr82483-2.c     |   2 +-
 gcc/testsuite/gcc.target/i386/sse2-mmx-10.c   |  43 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-11.c   |  39 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-12.c   |  42 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-13.c   |  40 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-14.c   |  31 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-15.c   |  36 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-16.c   |  40 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-17.c   |  51 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-18a.c  |  14 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-18b.c  |   7 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-18c.c  |   7 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-19a.c  |  14 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-19b.c  |   7 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-19c.c  |   7 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-19d.c  |   7 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-19e.c  |   7 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-2.c    |  12 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-20.c   |  12 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-21.c   |  13 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-22.c   |  14 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-3.c    |  13 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-4.c    |   4 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-5.c    |  11 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-6.c    |  11 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-7.c    |  13 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-8.c    |   4 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-9.c    |  79 ++
 .../gcc.target/i386/sse2-mmx-cvtpi2ps.c       |  43 +
 .../gcc.target/i386/sse2-mmx-cvtps2pi.c       |  36 +
 .../gcc.target/i386/sse2-mmx-cvttps2pi.c      |  36 +
 .../gcc.target/i386/sse2-mmx-maskmovq.c       |  99 ++
 .../gcc.target/i386/sse2-mmx-packssdw.c       |  52 +
 .../gcc.target/i386/sse2-mmx-packsswb.c       |  52 +
 .../gcc.target/i386/sse2-mmx-packuswb.c       |  52 +
 .../gcc.target/i386/sse2-mmx-paddb.c          |  48 +
 .../gcc.target/i386/sse2-mmx-paddd.c          |  48 +
 .../gcc.target/i386/sse2-mmx-paddq.c          |  43 +
 .../gcc.target/i386/sse2-mmx-paddsb.c         |  48 +
 .../gcc.target/i386/sse2-mmx-paddsw.c         |  48 +
 .../gcc.target/i386/sse2-mmx-paddusb.c        |  48 +
 .../gcc.target/i386/sse2-mmx-paddusw.c        |  48 +
 .../gcc.target/i386/sse2-mmx-paddw.c          |  48 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-pand.c |  44 +
 .../gcc.target/i386/sse2-mmx-pandn.c          |  44 +
 .../gcc.target/i386/sse2-mmx-pavgb.c          |  52 +
 .../gcc.target/i386/sse2-mmx-pavgw.c          |  52 +
 .../gcc.target/i386/sse2-mmx-pcmpeqb.c        |  48 +
 .../gcc.target/i386/sse2-mmx-pcmpeqd.c        |  48 +
 .../gcc.target/i386/sse2-mmx-pcmpeqw.c        |  48 +
 .../gcc.target/i386/sse2-mmx-pcmpgtb.c        |  48 +
 .../gcc.target/i386/sse2-mmx-pcmpgtd.c        |  48 +
 .../gcc.target/i386/sse2-mmx-pcmpgtw.c        |  48 +
 .../gcc.target/i386/sse2-mmx-pextrw.c         |  59 ++
 .../gcc.target/i386/sse2-mmx-pinsrw.c         |  61 ++
 .../gcc.target/i386/sse2-mmx-pmaddwd.c        |  47 +
 .../gcc.target/i386/sse2-mmx-pmaxsw.c         |  48 +
 .../gcc.target/i386/sse2-mmx-pmaxub.c         |  48 +
 .../gcc.target/i386/sse2-mmx-pminsw.c         |  48 +
 .../gcc.target/i386/sse2-mmx-pminub.c         |  48 +
 .../gcc.target/i386/sse2-mmx-pmovmskb.c       |  46 +
 .../gcc.target/i386/sse2-mmx-pmulhuw.c        |  51 +
 .../gcc.target/i386/sse2-mmx-pmulhw.c         |  53 +
 .../gcc.target/i386/sse2-mmx-pmullw.c         |  52 +
 .../gcc.target/i386/sse2-mmx-pmuludq.c        |  47 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-por.c  |  44 +
 .../gcc.target/i386/sse2-mmx-psadbw.c         |  58 ++
 .../gcc.target/i386/sse2-mmx-pshufw.c         | 248 +++++
 .../gcc.target/i386/sse2-mmx-pslld.c          |  52 +
 .../gcc.target/i386/sse2-mmx-pslldi.c         | 153 +++
 .../gcc.target/i386/sse2-mmx-psllq.c          |  47 +
 .../gcc.target/i386/sse2-mmx-psllqi.c         | 245 +++++
 .../gcc.target/i386/sse2-mmx-psllw.c          |  52 +
 .../gcc.target/i386/sse2-mmx-psllwi.c         | 105 ++
 .../gcc.target/i386/sse2-mmx-psrad.c          |  52 +
 .../gcc.target/i386/sse2-mmx-psradi.c         | 153 +++
 .../gcc.target/i386/sse2-mmx-psraw.c          |  52 +
 .../gcc.target/i386/sse2-mmx-psrawi.c         | 105 ++
 .../gcc.target/i386/sse2-mmx-psrld.c          |  52 +
 .../gcc.target/i386/sse2-mmx-psrldi.c         | 153 +++
 .../gcc.target/i386/sse2-mmx-psrlq.c          |  47 +
 .../gcc.target/i386/sse2-mmx-psrlqi.c         | 245 +++++
 .../gcc.target/i386/sse2-mmx-psrlw.c          |  52 +
 .../gcc.target/i386/sse2-mmx-psrlwi.c         | 105 ++
 .../gcc.target/i386/sse2-mmx-psubb.c          |  48 +
 .../gcc.target/i386/sse2-mmx-psubd.c          |  48 +
 .../gcc.target/i386/sse2-mmx-psubq.c          |  43 +
 .../gcc.target/i386/sse2-mmx-psubusb.c        |  48 +
 .../gcc.target/i386/sse2-mmx-psubusw.c        |  48 +
 .../gcc.target/i386/sse2-mmx-psubw.c          |  48 +
 .../gcc.target/i386/sse2-mmx-punpckhbw.c      |  53 +
 .../gcc.target/i386/sse2-mmx-punpckhdq.c      |  47 +
 .../gcc.target/i386/sse2-mmx-punpckhwd.c      |  49 +
 .../gcc.target/i386/sse2-mmx-punpcklbw.c      |  53 +
 .../gcc.target/i386/sse2-mmx-punpckldq.c      |  47 +
 .../gcc.target/i386/sse2-mmx-punpcklwd.c      |  49 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-pxor.c |  44 +
 gcc/testsuite/gcc.target/i386/sse2-mmx.c      |   1 -
 112 files changed, 6418 insertions(+), 493 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/mmx-vals.h
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-18a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-18b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-18c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19d.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19e.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-22.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvtpi2ps.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvtps2pi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvttps2pi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-maskmovq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packssdw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packsswb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packuswb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddsb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddsw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddusb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddusw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pand.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pandn.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pavgb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pavgw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pextrw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pinsrw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaddwd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaxsw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaxub.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pminsw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pminub.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmovmskb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmulhuw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmulhw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmullw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmuludq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-por.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psadbw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pshufw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pslld.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pslldi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllqi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllwi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrad.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psradi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psraw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrawi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrld.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrldi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlqi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlwi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubusb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubusw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhbw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhdq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhwd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpcklbw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckldq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpcklwd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pxor.c

-- 
2.20.1

Comments

Uros Bizjak Feb. 17, 2019, 10:33 a.m. | #1
On 2/16/19, H.J. Lu <hjl.tools@gmail.com> wrote:
> On x86-64, since __m64 is returned and passed in XMM registers, we can

> emulate MMX intrinsics with SSE instructions. To support it, we added

>

>  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

>

> ;; Define instruction set of MMX instructions

> (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"

>   (const_string "base"))

>

>          (eq_attr "mmx_isa" "native")

>            (symbol_ref "!TARGET_MMX_WITH_SSE")

>          (eq_attr "mmx_isa" "x64")

>            (symbol_ref "TARGET_MMX_WITH_SSE")

>          (eq_attr "mmx_isa" "x64_avx")

>            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")

>          (eq_attr "mmx_isa" "x64_noavx")

>            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

>

> We added SSE emulation to MMX patterns and disabled MMX alternatives with

> TARGET_MMX_WITH_SSE.

>

> Most of MMX instructions have equivalent SSE versions and results of some

> SSE versions need to be reshuffled to the right order for MMX.  Thee are

> couple tricky cases:

>

> 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX

> maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the

> mask operand and handle unmapped bits 64:127 at memory address by

> adjusting source and mask operands together with memory address.

>

> 2. MMX movntq is emulated with SSE2 DImode movnti, which is available

> in 64-bit mode.

>

> 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.

> SSE emulation must clear the bit 4 in the shuffle control mask.

>

> 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve

> the upper 64 bits of destination XMM register.

>

> Tests are also added to check each SSE emulation of MMX intrinsics.

>

> There are no regressions on i686 and x86-64.  For x86-64, GCC is also

> tested with

>

> --with-arch=native --with-cpu=native

>

> on AVX2 and AVX512F machines.


An idea that would take patch a step further also on 32 bit targets:

*Assuming* that operations on XMM registers are as fast (or perhaps
faster) than operations on MMX registers, we can change mmx_isa
attribute in e.g.

+  "@
+   p<logic>\t{%2, %0|%0, %2}
+   p<logic>\t{%2, %0|%0, %2}
+   vp<logic>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

to:

[(set_attr "isa" "*,noavx,avx")
 (set_attr "mmx_isa" "native,*,*")]

So, for x86_64 everything stays the same, but for x86_32 we now allow
intrinsics to use xmm registers in addition to mmx registers. We can't
disable MMX for x64_32 anyway due to ISA constraints (and some tricky
cases, e.g. monvti that works only for 64bit targets and e.g. maskmovq
& similar, which are more efficient with MMX regs), but RA has much
more freedom to allocate the most effective register set even for
32bit targets.

WDYT?

Uros.

>

> H.J. Lu (40):

>   i386: Allow MMX register modes in SSE registers

>   i386: Emulate MMX packsswb/packssdw/packuswb with SSE2

>   i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX

>   i386: Emulate MMX plusminus/sat_plusminus with SSE

>   i386: Emulate MMX mulv4hi3 with SSE

>   i386: Emulate MMX smulv4hi3_highpart with SSE

>   i386: Emulate MMX mmx_pmaddwd with SSE

>   i386: Emulate MMX ashr<mode>3/<shift_insn><mode>3 with SSE

>   i386: Emulate MMX <any_logic><mode>3 with SSE

>   i386: Emulate MMX mmx_andnot<mode>3 with SSE

>   i386: Emulate MMX mmx_eq/mmx_gt<mode>3 with SSE

>   i386: Emulate MMX vec_dupv2si with SSE

>   i386: Emulate MMX pshufw with SSE

>   i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE

>   i386: Emulate MMX sse_cvtpi2ps with SSE

>   i386: Emulate MMX mmx_pextrw with SSE

>   i386: Emulate MMX mmx_pinsrw with SSE

>   i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE

>   i386: Emulate MMX mmx_pmovmskb with SSE

>   i386: Emulate MMX mmx_umulv4hi3_highpart with SSE

>   i386: Emulate MMX maskmovq with SSE2 maskmovdqu

>   i386: Emulate MMX mmx_uavgv8qi3 with SSE

>   i386: Emulate MMX mmx_uavgv4hi3 with SSE

>   i386: Emulate MMX mmx_psadbw with SSE

>   i386: Emulate MMX movntq with SSE2 movntidi

>   i386: Emulate MMX umulv1siv1di3 with SSE2

>   i386: Make _mm_empty () as NOP for TARGET_MMX_WITH_SSE

>   i386: Emulate MMX ssse3_ph<plusminus_mnemonic>wv4hi3 with SSE

>   i386: Emulate MMX ssse3_ph<plusminus_mnemonic>dv2si3 with SSE

>   i386: Emulate MMX ssse3_pmaddubsw with SSE

>   i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE

>   i386: Emulate MMX pshufb with SSE version

>   i386: Emulate MMX ssse3_psign<mode>3 with SSE

>   i386: Emulate MMX ssse3_palignrdi with SSE

>   i386: Emulate MMX abs<mode>2 with SSE

>   i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE

>   i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE

>   i386: Allow MMX intrinsic emulation with SSE

>   i386: Enable TM MMX intrinsics with SSE2

>   i386: Add tests for MMX intrinsic emulations with SSE

>

> Uros Bizjak (1):

>   Prevent allocation of MMX registers with TARGET_MMX_WITH_SSE

>

>  gcc/config/i386/constraints.md                |   6 +

>  gcc/config/i386/i386-builtin.def              | 126 +--

>  gcc/config/i386/i386-c.c                      |   2 +

>  gcc/config/i386/i386-protos.h                 |   4 +

>  gcc/config/i386/i386.c                        | 189 +++-

>  gcc/config/i386/i386.h                        |   2 +

>  gcc/config/i386/i386.md                       |  17 +

>  gcc/config/i386/mmintrin.h                    |  12 +-

>  gcc/config/i386/mmx.md                        | 984 ++++++++++++------

>  gcc/config/i386/predicates.md                 |   7 +

>  gcc/config/i386/sse.md                        | 359 +++++--

>  gcc/config/i386/xmmintrin.h                   |  61 ++

>  gcc/testsuite/gcc.target/i386/mmx-vals.h      |  77 ++

>  gcc/testsuite/gcc.target/i386/pr82483-1.c     |   2 +-

>  gcc/testsuite/gcc.target/i386/pr82483-2.c     |   2 +-

>  gcc/testsuite/gcc.target/i386/sse2-mmx-10.c   |  43 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-11.c   |  39 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-12.c   |  42 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-13.c   |  40 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-14.c   |  31 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-15.c   |  36 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-16.c   |  40 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-17.c   |  51 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-18a.c  |  14 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-18b.c  |   7 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-18c.c  |   7 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-19a.c  |  14 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-19b.c  |   7 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-19c.c  |   7 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-19d.c  |   7 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-19e.c  |   7 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-2.c    |  12 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-20.c   |  12 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-21.c   |  13 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-22.c   |  14 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-3.c    |  13 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-4.c    |   4 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-5.c    |  11 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-6.c    |  11 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-7.c    |  13 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-8.c    |   4 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-9.c    |  79 ++

>  .../gcc.target/i386/sse2-mmx-cvtpi2ps.c       |  43 +

>  .../gcc.target/i386/sse2-mmx-cvtps2pi.c       |  36 +

>  .../gcc.target/i386/sse2-mmx-cvttps2pi.c      |  36 +

>  .../gcc.target/i386/sse2-mmx-maskmovq.c       |  99 ++

>  .../gcc.target/i386/sse2-mmx-packssdw.c       |  52 +

>  .../gcc.target/i386/sse2-mmx-packsswb.c       |  52 +

>  .../gcc.target/i386/sse2-mmx-packuswb.c       |  52 +

>  .../gcc.target/i386/sse2-mmx-paddb.c          |  48 +

>  .../gcc.target/i386/sse2-mmx-paddd.c          |  48 +

>  .../gcc.target/i386/sse2-mmx-paddq.c          |  43 +

>  .../gcc.target/i386/sse2-mmx-paddsb.c         |  48 +

>  .../gcc.target/i386/sse2-mmx-paddsw.c         |  48 +

>  .../gcc.target/i386/sse2-mmx-paddusb.c        |  48 +

>  .../gcc.target/i386/sse2-mmx-paddusw.c        |  48 +

>  .../gcc.target/i386/sse2-mmx-paddw.c          |  48 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-pand.c |  44 +

>  .../gcc.target/i386/sse2-mmx-pandn.c          |  44 +

>  .../gcc.target/i386/sse2-mmx-pavgb.c          |  52 +

>  .../gcc.target/i386/sse2-mmx-pavgw.c          |  52 +

>  .../gcc.target/i386/sse2-mmx-pcmpeqb.c        |  48 +

>  .../gcc.target/i386/sse2-mmx-pcmpeqd.c        |  48 +

>  .../gcc.target/i386/sse2-mmx-pcmpeqw.c        |  48 +

>  .../gcc.target/i386/sse2-mmx-pcmpgtb.c        |  48 +

>  .../gcc.target/i386/sse2-mmx-pcmpgtd.c        |  48 +

>  .../gcc.target/i386/sse2-mmx-pcmpgtw.c        |  48 +

>  .../gcc.target/i386/sse2-mmx-pextrw.c         |  59 ++

>  .../gcc.target/i386/sse2-mmx-pinsrw.c         |  61 ++

>  .../gcc.target/i386/sse2-mmx-pmaddwd.c        |  47 +

>  .../gcc.target/i386/sse2-mmx-pmaxsw.c         |  48 +

>  .../gcc.target/i386/sse2-mmx-pmaxub.c         |  48 +

>  .../gcc.target/i386/sse2-mmx-pminsw.c         |  48 +

>  .../gcc.target/i386/sse2-mmx-pminub.c         |  48 +

>  .../gcc.target/i386/sse2-mmx-pmovmskb.c       |  46 +

>  .../gcc.target/i386/sse2-mmx-pmulhuw.c        |  51 +

>  .../gcc.target/i386/sse2-mmx-pmulhw.c         |  53 +

>  .../gcc.target/i386/sse2-mmx-pmullw.c         |  52 +

>  .../gcc.target/i386/sse2-mmx-pmuludq.c        |  47 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-por.c  |  44 +

>  .../gcc.target/i386/sse2-mmx-psadbw.c         |  58 ++

>  .../gcc.target/i386/sse2-mmx-pshufw.c         | 248 +++++

>  .../gcc.target/i386/sse2-mmx-pslld.c          |  52 +

>  .../gcc.target/i386/sse2-mmx-pslldi.c         | 153 +++

>  .../gcc.target/i386/sse2-mmx-psllq.c          |  47 +

>  .../gcc.target/i386/sse2-mmx-psllqi.c         | 245 +++++

>  .../gcc.target/i386/sse2-mmx-psllw.c          |  52 +

>  .../gcc.target/i386/sse2-mmx-psllwi.c         | 105 ++

>  .../gcc.target/i386/sse2-mmx-psrad.c          |  52 +

>  .../gcc.target/i386/sse2-mmx-psradi.c         | 153 +++

>  .../gcc.target/i386/sse2-mmx-psraw.c          |  52 +

>  .../gcc.target/i386/sse2-mmx-psrawi.c         | 105 ++

>  .../gcc.target/i386/sse2-mmx-psrld.c          |  52 +

>  .../gcc.target/i386/sse2-mmx-psrldi.c         | 153 +++

>  .../gcc.target/i386/sse2-mmx-psrlq.c          |  47 +

>  .../gcc.target/i386/sse2-mmx-psrlqi.c         | 245 +++++

>  .../gcc.target/i386/sse2-mmx-psrlw.c          |  52 +

>  .../gcc.target/i386/sse2-mmx-psrlwi.c         | 105 ++

>  .../gcc.target/i386/sse2-mmx-psubb.c          |  48 +

>  .../gcc.target/i386/sse2-mmx-psubd.c          |  48 +

>  .../gcc.target/i386/sse2-mmx-psubq.c          |  43 +

>  .../gcc.target/i386/sse2-mmx-psubusb.c        |  48 +

>  .../gcc.target/i386/sse2-mmx-psubusw.c        |  48 +

>  .../gcc.target/i386/sse2-mmx-psubw.c          |  48 +

>  .../gcc.target/i386/sse2-mmx-punpckhbw.c      |  53 +

>  .../gcc.target/i386/sse2-mmx-punpckhdq.c      |  47 +

>  .../gcc.target/i386/sse2-mmx-punpckhwd.c      |  49 +

>  .../gcc.target/i386/sse2-mmx-punpcklbw.c      |  53 +

>  .../gcc.target/i386/sse2-mmx-punpckldq.c      |  47 +

>  .../gcc.target/i386/sse2-mmx-punpcklwd.c      |  49 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx-pxor.c |  44 +

>  gcc/testsuite/gcc.target/i386/sse2-mmx.c      |   1 -

>  112 files changed, 6418 insertions(+), 493 deletions(-)

>  create mode 100644 gcc/testsuite/gcc.target/i386/mmx-vals.h

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-10.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-11.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-12.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-13.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-14.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-15.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-16.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-17.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-18a.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-18b.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-18c.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19a.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19b.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19c.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19d.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19e.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-2.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-20.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-21.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-22.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-3.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-4.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-5.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-6.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-7.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-8.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-9.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvtpi2ps.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvtps2pi.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvttps2pi.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-maskmovq.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packssdw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packsswb.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packuswb.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddb.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddd.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddq.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddsb.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddsw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddusb.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddusw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pand.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pandn.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pavgb.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pavgw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqb.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqd.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtb.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtd.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pextrw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pinsrw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaddwd.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaxsw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaxub.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pminsw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pminub.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmovmskb.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmulhuw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmulhw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmullw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmuludq.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-por.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psadbw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pshufw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pslld.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pslldi.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllq.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllqi.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllwi.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrad.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psradi.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psraw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrawi.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrld.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrldi.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlq.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlqi.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlwi.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubb.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubd.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubq.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubusb.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubusw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhbw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhdq.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhwd.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpcklbw.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckldq.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpcklwd.c

>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pxor.c

>

> --

> 2.20.1

>

>
H.J. Lu Feb. 17, 2019, 1:41 p.m. | #2
On Sun, Feb 17, 2019 at 2:33 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On 2/16/19, H.J. Lu <hjl.tools@gmail.com> wrote:

> > On x86-64, since __m64 is returned and passed in XMM registers, we can

> > emulate MMX intrinsics with SSE instructions. To support it, we added

> >

> >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> >

> > ;; Define instruction set of MMX instructions

> > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"

> >   (const_string "base"))

> >

> >          (eq_attr "mmx_isa" "native")

> >            (symbol_ref "!TARGET_MMX_WITH_SSE")

> >          (eq_attr "mmx_isa" "x64")

> >            (symbol_ref "TARGET_MMX_WITH_SSE")

> >          (eq_attr "mmx_isa" "x64_avx")

> >            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")

> >          (eq_attr "mmx_isa" "x64_noavx")

> >            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

> >

> > We added SSE emulation to MMX patterns and disabled MMX alternatives with

> > TARGET_MMX_WITH_SSE.

> >

> > Most of MMX instructions have equivalent SSE versions and results of some

> > SSE versions need to be reshuffled to the right order for MMX.  Thee are

> > couple tricky cases:

> >

> > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX

> > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the

> > mask operand and handle unmapped bits 64:127 at memory address by

> > adjusting source and mask operands together with memory address.

> >

> > 2. MMX movntq is emulated with SSE2 DImode movnti, which is available

> > in 64-bit mode.

> >

> > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.

> > SSE emulation must clear the bit 4 in the shuffle control mask.

> >

> > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve

> > the upper 64 bits of destination XMM register.

> >

> > Tests are also added to check each SSE emulation of MMX intrinsics.

> >

> > There are no regressions on i686 and x86-64.  For x86-64, GCC is also

> > tested with

> >

> > --with-arch=native --with-cpu=native

> >

> > on AVX2 and AVX512F machines.

>

> An idea that would take patch a step further also on 32 bit targets:

>

> *Assuming* that operations on XMM registers are as fast (or perhaps

> faster) than operations on MMX registers, we can change mmx_isa

> attribute in e.g.

>

> +  "@

> +   p<logic>\t{%2, %0|%0, %2}

> +   p<logic>\t{%2, %0|%0, %2}

> +   vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

>

> to:

>

> [(set_attr "isa" "*,noavx,avx")

>  (set_attr "mmx_isa" "native,*,*")]

>

> So, for x86_64 everything stays the same, but for x86_32 we now allow

> intrinsics to use xmm registers in addition to mmx registers. We can't

> disable MMX for x64_32 anyway due to ISA constraints (and some tricky

> cases, e.g. monvti that works only for 64bit targets and e.g. maskmovq

> & similar, which are more efficient with MMX regs), but RA has much

> more freedom to allocate the most effective register set even for

> 32bit targets.

>

> WDYT?

>


Since MMX registers are used to pass and return __m64 values,
we can't really get rid of MMX instructions in 32-bit mode.  If people
have to stay with 32-bit mode, they need MMX.  I don't think we should
extend TARGET_MMX_WITH_SSE to 32-bit mode.

-- 
H.J.
Uros Bizjak Feb. 17, 2019, 3:53 p.m. | #3
On Sun, Feb 17, 2019 at 2:42 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Sun, Feb 17, 2019 at 2:33 AM Uros Bizjak <ubizjak@gmail.com> wrote:

> >

> > On 2/16/19, H.J. Lu <hjl.tools@gmail.com> wrote:

> > > On x86-64, since __m64 is returned and passed in XMM registers, we can

> > > emulate MMX intrinsics with SSE instructions. To support it, we added

> > >

> > >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> > >

> > > ;; Define instruction set of MMX instructions

> > > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"

> > >   (const_string "base"))

> > >

> > >          (eq_attr "mmx_isa" "native")

> > >            (symbol_ref "!TARGET_MMX_WITH_SSE")

> > >          (eq_attr "mmx_isa" "x64")

> > >            (symbol_ref "TARGET_MMX_WITH_SSE")

> > >          (eq_attr "mmx_isa" "x64_avx")

> > >            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")

> > >          (eq_attr "mmx_isa" "x64_noavx")

> > >            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

> > >

> > > We added SSE emulation to MMX patterns and disabled MMX alternatives with

> > > TARGET_MMX_WITH_SSE.

> > >

> > > Most of MMX instructions have equivalent SSE versions and results of some

> > > SSE versions need to be reshuffled to the right order for MMX.  Thee are

> > > couple tricky cases:

> > >

> > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX

> > > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the

> > > mask operand and handle unmapped bits 64:127 at memory address by

> > > adjusting source and mask operands together with memory address.

> > >

> > > 2. MMX movntq is emulated with SSE2 DImode movnti, which is available

> > > in 64-bit mode.

> > >

> > > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.

> > > SSE emulation must clear the bit 4 in the shuffle control mask.

> > >

> > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve

> > > the upper 64 bits of destination XMM register.

> > >

> > > Tests are also added to check each SSE emulation of MMX intrinsics.

> > >

> > > There are no regressions on i686 and x86-64.  For x86-64, GCC is also

> > > tested with

> > >

> > > --with-arch=native --with-cpu=native

> > >

> > > on AVX2 and AVX512F machines.

> >

> > An idea that would take patch a step further also on 32 bit targets:

> >

> > *Assuming* that operations on XMM registers are as fast (or perhaps

> > faster) than operations on MMX registers, we can change mmx_isa

> > attribute in e.g.

> >

> > +  "@

> > +   p<logic>\t{%2, %0|%0, %2}

> > +   p<logic>\t{%2, %0|%0, %2}

> > +   vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> >

> > to:

> >

> > [(set_attr "isa" "*,noavx,avx")

> >  (set_attr "mmx_isa" "native,*,*")]

> >

> > So, for x86_64 everything stays the same, but for x86_32 we now allow

> > intrinsics to use xmm registers in addition to mmx registers. We can't

> > disable MMX for x64_32 anyway due to ISA constraints (and some tricky

> > cases, e.g. monvti that works only for 64bit targets and e.g. maskmovq

> > & similar, which are more efficient with MMX regs), but RA has much

> > more freedom to allocate the most effective register set even for

> > 32bit targets.

> >

> > WDYT?

> >

>

> Since MMX registers are used to pass and return __m64 values,

> we can't really get rid of MMX instructions in 32-bit mode.  If people

> have to stay with 32-bit mode, they need MMX.  I don't think we should

> extend TARGET_MMX_WITH_SSE to 32-bit mode.


No, TARGET_MMX_WITH_SSE is still enabled only for 64bit targets. We
should not *disable* SSE alternatives on 32bit targets.

Uros.
Uros Bizjak Feb. 17, 2019, 3:57 p.m. | #4
On Sun, Feb 17, 2019 at 4:53 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > > > On x86-64, since __m64 is returned and passed in XMM registers, we can

> > > > emulate MMX intrinsics with SSE instructions. To support it, we added

> > > >

> > > >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> > > >

> > > > ;; Define instruction set of MMX instructions

> > > > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"

> > > >   (const_string "base"))

> > > >

> > > >          (eq_attr "mmx_isa" "native")

> > > >            (symbol_ref "!TARGET_MMX_WITH_SSE")

> > > >          (eq_attr "mmx_isa" "x64")

> > > >            (symbol_ref "TARGET_MMX_WITH_SSE")

> > > >          (eq_attr "mmx_isa" "x64_avx")

> > > >            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")

> > > >          (eq_attr "mmx_isa" "x64_noavx")

> > > >            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

> > > >

> > > > We added SSE emulation to MMX patterns and disabled MMX alternatives with

> > > > TARGET_MMX_WITH_SSE.

> > > >

> > > > Most of MMX instructions have equivalent SSE versions and results of some

> > > > SSE versions need to be reshuffled to the right order for MMX.  Thee are

> > > > couple tricky cases:

> > > >

> > > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX

> > > > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the

> > > > mask operand and handle unmapped bits 64:127 at memory address by

> > > > adjusting source and mask operands together with memory address.

> > > >

> > > > 2. MMX movntq is emulated with SSE2 DImode movnti, which is available

> > > > in 64-bit mode.

> > > >

> > > > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.

> > > > SSE emulation must clear the bit 4 in the shuffle control mask.

> > > >

> > > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve

> > > > the upper 64 bits of destination XMM register.

> > > >

> > > > Tests are also added to check each SSE emulation of MMX intrinsics.

> > > >

> > > > There are no regressions on i686 and x86-64.  For x86-64, GCC is also

> > > > tested with

> > > >

> > > > --with-arch=native --with-cpu=native

> > > >

> > > > on AVX2 and AVX512F machines.

> > >

> > > An idea that would take patch a step further also on 32 bit targets:

> > >

> > > *Assuming* that operations on XMM registers are as fast (or perhaps

> > > faster) than operations on MMX registers, we can change mmx_isa

> > > attribute in e.g.

> > >

> > > +  "@

> > > +   p<logic>\t{%2, %0|%0, %2}

> > > +   p<logic>\t{%2, %0|%0, %2}

> > > +   vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > >

> > > to:

> > >

> > > [(set_attr "isa" "*,noavx,avx")

> > >  (set_attr "mmx_isa" "native,*,*")]

> > >

> > > So, for x86_64 everything stays the same, but for x86_32 we now allow

> > > intrinsics to use xmm registers in addition to mmx registers. We can't

> > > disable MMX for x64_32 anyway due to ISA constraints (and some tricky

> > > cases, e.g. monvti that works only for 64bit targets and e.g. maskmovq

> > > & similar, which are more efficient with MMX regs), but RA has much

> > > more freedom to allocate the most effective register set even for

> > > 32bit targets.

> > >

> > > WDYT?

> > >

> >

> > Since MMX registers are used to pass and return __m64 values,

> > we can't really get rid of MMX instructions in 32-bit mode.  If people

> > have to stay with 32-bit mode, they need MMX.  I don't think we should

> > extend TARGET_MMX_WITH_SSE to 32-bit mode.

>

> No, TARGET_MMX_WITH_SSE is still enabled only for 64bit targets. We

> should not *disable* SSE alternatives on 32bit targets.


The correct isa attribute definition would be:

[(set_attr "isa" "*,sse2_noavx,avx")
 (set_attr "mmx_isa" "native,*,*")]

Uros.
H.J. Lu Feb. 17, 2019, 5:10 p.m. | #5
On Sun, Feb 17, 2019 at 7:57 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On Sun, Feb 17, 2019 at 4:53 PM Uros Bizjak <ubizjak@gmail.com> wrote:

>

> > > > > On x86-64, since __m64 is returned and passed in XMM registers, we can

> > > > > emulate MMX intrinsics with SSE instructions. To support it, we added

> > > > >

> > > > >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> > > > >

> > > > > ;; Define instruction set of MMX instructions

> > > > > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"

> > > > >   (const_string "base"))

> > > > >

> > > > >          (eq_attr "mmx_isa" "native")

> > > > >            (symbol_ref "!TARGET_MMX_WITH_SSE")

> > > > >          (eq_attr "mmx_isa" "x64")

> > > > >            (symbol_ref "TARGET_MMX_WITH_SSE")

> > > > >          (eq_attr "mmx_isa" "x64_avx")

> > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")

> > > > >          (eq_attr "mmx_isa" "x64_noavx")

> > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

> > > > >

> > > > > We added SSE emulation to MMX patterns and disabled MMX alternatives with

> > > > > TARGET_MMX_WITH_SSE.

> > > > >

> > > > > Most of MMX instructions have equivalent SSE versions and results of some

> > > > > SSE versions need to be reshuffled to the right order for MMX.  Thee are

> > > > > couple tricky cases:

> > > > >

> > > > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX

> > > > > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the

> > > > > mask operand and handle unmapped bits 64:127 at memory address by

> > > > > adjusting source and mask operands together with memory address.

> > > > >

> > > > > 2. MMX movntq is emulated with SSE2 DImode movnti, which is available

> > > > > in 64-bit mode.

> > > > >

> > > > > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.

> > > > > SSE emulation must clear the bit 4 in the shuffle control mask.

> > > > >

> > > > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve

> > > > > the upper 64 bits of destination XMM register.

> > > > >

> > > > > Tests are also added to check each SSE emulation of MMX intrinsics.

> > > > >

> > > > > There are no regressions on i686 and x86-64.  For x86-64, GCC is also

> > > > > tested with

> > > > >

> > > > > --with-arch=native --with-cpu=native

> > > > >

> > > > > on AVX2 and AVX512F machines.

> > > >

> > > > An idea that would take patch a step further also on 32 bit targets:

> > > >

> > > > *Assuming* that operations on XMM registers are as fast (or perhaps

> > > > faster) than operations on MMX registers, we can change mmx_isa

> > > > attribute in e.g.

> > > >

> > > > +  "@

> > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > +   vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > > > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > > >

> > > > to:

> > > >

> > > > [(set_attr "isa" "*,noavx,avx")

> > > >  (set_attr "mmx_isa" "native,*,*")]

> > > >

> > > > So, for x86_64 everything stays the same, but for x86_32 we now allow

> > > > intrinsics to use xmm registers in addition to mmx registers. We can't

> > > > disable MMX for x64_32 anyway due to ISA constraints (and some tricky

> > > > cases, e.g. monvti that works only for 64bit targets and e.g. maskmovq

> > > > & similar, which are more efficient with MMX regs), but RA has much

> > > > more freedom to allocate the most effective register set even for

> > > > 32bit targets.

> > > >

> > > > WDYT?

> > > >

> > >

> > > Since MMX registers are used to pass and return __m64 values,

> > > we can't really get rid of MMX instructions in 32-bit mode.  If people

> > > have to stay with 32-bit mode, they need MMX.  I don't think we should

> > > extend TARGET_MMX_WITH_SSE to 32-bit mode.

> >

> > No, TARGET_MMX_WITH_SSE is still enabled only for 64bit targets. We

> > should not *disable* SSE alternatives on 32bit targets.


I don't think my patch set disables any SSE alternatives in 32-bit
mode.   However,
it DOES NOT enable any SSE alternatives in 32-bit mode.  To really enable SSE
alternatives in

(define_insn "*mmx_<code><mode>3"
  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
        (any_logic:MMXMODEI
          (match_operand:MMXMODEI 1 "register_mmxmem_operand" "%0,0,Yv")
          (match_operand:MMXMODEI 2 "register_mmxmem_operand" "ym,x,Yv")))]
  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
  "@
   p<logic>\t{%2, %0|%0, %2}
   p<logic>\t{%2, %0|%0, %2}
   vp<logic>\t{%2, %1, %0|%0, %1, %2}"
  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
   (set_attr "type" "mmxadd,sselog,sselog")
   (set_attr "mode" "DI,TI,TI")])

register_mmxmem_operand must return true for SSE alternatives:

;; Match register operands, but include memory operands for
;; !TARGET_MMX_WITH_SSE.
(define_predicate "register_mmxmem_operand"
  (ior (match_operand 0 "register_operand")
       (and (not (match_test "TARGET_MMX_WITH_SSE"))
            (match_operand 0 "memory_operand"))))

How do you enable SSE alternatives in 32-bit mode without enabling
TARGET_MMX_WITH_SSE for 32-bit mode?

> The correct isa attribute definition would be:

>

> [(set_attr "isa" "*,sse2_noavx,avx")

>  (set_attr "mmx_isa" "native,*,*")]

>

> Uros.




-- 
H.J.
Uros Bizjak Feb. 17, 2019, 5:27 p.m. | #6
On Sun, Feb 17, 2019 at 6:10 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Sun, Feb 17, 2019 at 7:57 AM Uros Bizjak <ubizjak@gmail.com> wrote:

> >

> > On Sun, Feb 17, 2019 at 4:53 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> >

> > > > > > On x86-64, since __m64 is returned and passed in XMM registers, we can

> > > > > > emulate MMX intrinsics with SSE instructions. To support it, we added

> > > > > >

> > > > > >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> > > > > >

> > > > > > ;; Define instruction set of MMX instructions

> > > > > > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"

> > > > > >   (const_string "base"))

> > > > > >

> > > > > >          (eq_attr "mmx_isa" "native")

> > > > > >            (symbol_ref "!TARGET_MMX_WITH_SSE")

> > > > > >          (eq_attr "mmx_isa" "x64")

> > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE")

> > > > > >          (eq_attr "mmx_isa" "x64_avx")

> > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")

> > > > > >          (eq_attr "mmx_isa" "x64_noavx")

> > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

> > > > > >

> > > > > > We added SSE emulation to MMX patterns and disabled MMX alternatives with

> > > > > > TARGET_MMX_WITH_SSE.

> > > > > >

> > > > > > Most of MMX instructions have equivalent SSE versions and results of some

> > > > > > SSE versions need to be reshuffled to the right order for MMX.  Thee are

> > > > > > couple tricky cases:

> > > > > >

> > > > > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX

> > > > > > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the

> > > > > > mask operand and handle unmapped bits 64:127 at memory address by

> > > > > > adjusting source and mask operands together with memory address.

> > > > > >

> > > > > > 2. MMX movntq is emulated with SSE2 DImode movnti, which is available

> > > > > > in 64-bit mode.

> > > > > >

> > > > > > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.

> > > > > > SSE emulation must clear the bit 4 in the shuffle control mask.

> > > > > >

> > > > > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve

> > > > > > the upper 64 bits of destination XMM register.

> > > > > >

> > > > > > Tests are also added to check each SSE emulation of MMX intrinsics.

> > > > > >

> > > > > > There are no regressions on i686 and x86-64.  For x86-64, GCC is also

> > > > > > tested with

> > > > > >

> > > > > > --with-arch=native --with-cpu=native

> > > > > >

> > > > > > on AVX2 and AVX512F machines.

> > > > >

> > > > > An idea that would take patch a step further also on 32 bit targets:

> > > > >

> > > > > *Assuming* that operations on XMM registers are as fast (or perhaps

> > > > > faster) than operations on MMX registers, we can change mmx_isa

> > > > > attribute in e.g.

> > > > >

> > > > > +  "@

> > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > +   vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > > > > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > > > >

> > > > > to:

> > > > >

> > > > > [(set_attr "isa" "*,noavx,avx")

> > > > >  (set_attr "mmx_isa" "native,*,*")]

> > > > >

> > > > > So, for x86_64 everything stays the same, but for x86_32 we now allow

> > > > > intrinsics to use xmm registers in addition to mmx registers. We can't

> > > > > disable MMX for x64_32 anyway due to ISA constraints (and some tricky

> > > > > cases, e.g. monvti that works only for 64bit targets and e.g. maskmovq

> > > > > & similar, which are more efficient with MMX regs), but RA has much

> > > > > more freedom to allocate the most effective register set even for

> > > > > 32bit targets.

> > > > >

> > > > > WDYT?

> > > > >

> > > >

> > > > Since MMX registers are used to pass and return __m64 values,

> > > > we can't really get rid of MMX instructions in 32-bit mode.  If people

> > > > have to stay with 32-bit mode, they need MMX.  I don't think we should

> > > > extend TARGET_MMX_WITH_SSE to 32-bit mode.

> > >

> > > No, TARGET_MMX_WITH_SSE is still enabled only for 64bit targets. We

> > > should not *disable* SSE alternatives on 32bit targets.

>

> I don't think my patch set disables any SSE alternatives in 32-bit

> mode.   However,

> it DOES NOT enable any SSE alternatives in 32-bit mode.  To really enable SSE

> alternatives in

>

> (define_insn "*mmx_<code><mode>3"

>   [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")

>         (any_logic:MMXMODEI

>           (match_operand:MMXMODEI 1 "register_mmxmem_operand" "%0,0,Yv")

>           (match_operand:MMXMODEI 2 "register_mmxmem_operand" "ym,x,Yv")))]

>   "(TARGET_MMX || TARGET_MMX_WITH_SSE)

>    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"

>   "@

>    p<logic>\t{%2, %0|%0, %2}

>    p<logic>\t{%2, %0|%0, %2}

>    vp<logic>\t{%2, %1, %0|%0, %1, %2}"

>   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

>    (set_attr "type" "mmxadd,sselog,sselog")

>    (set_attr "mode" "DI,TI,TI")])

>

> register_mmxmem_operand must return true for SSE alternatives:


It returns true for register and memory operands for 32bit targets, because

#define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> ;; Match register operands, but include memory operands for

> ;; !TARGET_MMX_WITH_SSE.

> (define_predicate "register_mmxmem_operand"

>   (ior (match_operand 0 "register_operand")

>        (and (not (match_test "TARGET_MMX_WITH_SSE"))

>             (match_operand 0 "memory_operand"))))

>

> How do you enable SSE alternatives in 32-bit mode without enabling

> TARGET_MMX_WITH_SSE for 32-bit mode?


Check the new attribute definitions below:

> > The correct isa attribute definition would be:

> >

> > [(set_attr "isa" "*,sse2_noavx,avx")

> >  (set_attr "mmx_isa" "native,*,*")]


Uros.
H.J. Lu Feb. 17, 2019, 5:36 p.m. | #7
On Sun, Feb 17, 2019 at 9:27 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On Sun, Feb 17, 2019 at 6:10 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> >

> > On Sun, Feb 17, 2019 at 7:57 AM Uros Bizjak <ubizjak@gmail.com> wrote:

> > >

> > > On Sun, Feb 17, 2019 at 4:53 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > >

> > > > > > > On x86-64, since __m64 is returned and passed in XMM registers, we can

> > > > > > > emulate MMX intrinsics with SSE instructions. To support it, we added

> > > > > > >

> > > > > > >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> > > > > > >

> > > > > > > ;; Define instruction set of MMX instructions

> > > > > > > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"

> > > > > > >   (const_string "base"))

> > > > > > >

> > > > > > >          (eq_attr "mmx_isa" "native")

> > > > > > >            (symbol_ref "!TARGET_MMX_WITH_SSE")

> > > > > > >          (eq_attr "mmx_isa" "x64")

> > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE")

> > > > > > >          (eq_attr "mmx_isa" "x64_avx")

> > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")

> > > > > > >          (eq_attr "mmx_isa" "x64_noavx")

> > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

> > > > > > >

> > > > > > > We added SSE emulation to MMX patterns and disabled MMX alternatives with

> > > > > > > TARGET_MMX_WITH_SSE.

> > > > > > >

> > > > > > > Most of MMX instructions have equivalent SSE versions and results of some

> > > > > > > SSE versions need to be reshuffled to the right order for MMX.  Thee are

> > > > > > > couple tricky cases:

> > > > > > >

> > > > > > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX

> > > > > > > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the

> > > > > > > mask operand and handle unmapped bits 64:127 at memory address by

> > > > > > > adjusting source and mask operands together with memory address.

> > > > > > >

> > > > > > > 2. MMX movntq is emulated with SSE2 DImode movnti, which is available

> > > > > > > in 64-bit mode.

> > > > > > >

> > > > > > > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.

> > > > > > > SSE emulation must clear the bit 4 in the shuffle control mask.

> > > > > > >

> > > > > > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve

> > > > > > > the upper 64 bits of destination XMM register.

> > > > > > >

> > > > > > > Tests are also added to check each SSE emulation of MMX intrinsics.

> > > > > > >

> > > > > > > There are no regressions on i686 and x86-64.  For x86-64, GCC is also

> > > > > > > tested with

> > > > > > >

> > > > > > > --with-arch=native --with-cpu=native

> > > > > > >

> > > > > > > on AVX2 and AVX512F machines.

> > > > > >

> > > > > > An idea that would take patch a step further also on 32 bit targets:

> > > > > >

> > > > > > *Assuming* that operations on XMM registers are as fast (or perhaps

> > > > > > faster) than operations on MMX registers, we can change mmx_isa

> > > > > > attribute in e.g.

> > > > > >

> > > > > > +  "@

> > > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > > +   vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > > > > > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > > > > >

> > > > > > to:

> > > > > >

> > > > > > [(set_attr "isa" "*,noavx,avx")

> > > > > >  (set_attr "mmx_isa" "native,*,*")]

> > > > > >

> > > > > > So, for x86_64 everything stays the same, but for x86_32 we now allow

> > > > > > intrinsics to use xmm registers in addition to mmx registers. We can't

> > > > > > disable MMX for x64_32 anyway due to ISA constraints (and some tricky

> > > > > > cases, e.g. monvti that works only for 64bit targets and e.g. maskmovq

> > > > > > & similar, which are more efficient with MMX regs), but RA has much

> > > > > > more freedom to allocate the most effective register set even for

> > > > > > 32bit targets.

> > > > > >

> > > > > > WDYT?

> > > > > >

> > > > >

> > > > > Since MMX registers are used to pass and return __m64 values,

> > > > > we can't really get rid of MMX instructions in 32-bit mode.  If people

> > > > > have to stay with 32-bit mode, they need MMX.  I don't think we should

> > > > > extend TARGET_MMX_WITH_SSE to 32-bit mode.

> > > >

> > > > No, TARGET_MMX_WITH_SSE is still enabled only for 64bit targets. We

> > > > should not *disable* SSE alternatives on 32bit targets.

> >

> > I don't think my patch set disables any SSE alternatives in 32-bit

> > mode.   However,

> > it DOES NOT enable any SSE alternatives in 32-bit mode.  To really enable SSE

> > alternatives in

> >

> > (define_insn "*mmx_<code><mode>3"

> >   [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")

> >         (any_logic:MMXMODEI

> >           (match_operand:MMXMODEI 1 "register_mmxmem_operand" "%0,0,Yv")

> >           (match_operand:MMXMODEI 2 "register_mmxmem_operand" "ym,x,Yv")))]

> >   "(TARGET_MMX || TARGET_MMX_WITH_SSE)

> >    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"

> >   "@

> >    p<logic>\t{%2, %0|%0, %2}

> >    p<logic>\t{%2, %0|%0, %2}

> >    vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> >   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> >    (set_attr "type" "mmxadd,sselog,sselog")

> >    (set_attr "mode" "DI,TI,TI")])

> >

> > register_mmxmem_operand must return true for SSE alternatives:

>

> It returns true for register and memory operands for 32bit targets, because

>

> #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)


Will

(match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv"))))]

work well with RA?  I got some wrong code before register_mmxmem_operand
was added to match "ym,x,Yv".

> > ;; Match register operands, but include memory operands for

> > ;; !TARGET_MMX_WITH_SSE.

> > (define_predicate "register_mmxmem_operand"

> >   (ior (match_operand 0 "register_operand")

> >        (and (not (match_test "TARGET_MMX_WITH_SSE"))

> >             (match_operand 0 "memory_operand"))))

> >

> > How do you enable SSE alternatives in 32-bit mode without enabling

> > TARGET_MMX_WITH_SSE for 32-bit mode?

>

> Check the new attribute definitions below:

> > > The correct isa attribute definition would be:

> > >

> > > [(set_attr "isa" "*,sse2_noavx,avx")

> > >  (set_attr "mmx_isa" "native,*,*")]

>

> Uros.






-- 
H.J.
Uros Bizjak Feb. 17, 2019, 6:49 p.m. | #8
On Sun, Feb 17, 2019 at 6:37 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> > > > > > > > On x86-64, since __m64 is returned and passed in XMM registers, we can

> > > > > > > > emulate MMX intrinsics with SSE instructions. To support it, we added

> > > > > > > >

> > > > > > > >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> > > > > > > >

> > > > > > > > ;; Define instruction set of MMX instructions

> > > > > > > > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"

> > > > > > > >   (const_string "base"))

> > > > > > > >

> > > > > > > >          (eq_attr "mmx_isa" "native")

> > > > > > > >            (symbol_ref "!TARGET_MMX_WITH_SSE")

> > > > > > > >          (eq_attr "mmx_isa" "x64")

> > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE")

> > > > > > > >          (eq_attr "mmx_isa" "x64_avx")

> > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")

> > > > > > > >          (eq_attr "mmx_isa" "x64_noavx")

> > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

> > > > > > > >

> > > > > > > > We added SSE emulation to MMX patterns and disabled MMX alternatives with

> > > > > > > > TARGET_MMX_WITH_SSE.

> > > > > > > >

> > > > > > > > Most of MMX instructions have equivalent SSE versions and results of some

> > > > > > > > SSE versions need to be reshuffled to the right order for MMX.  Thee are

> > > > > > > > couple tricky cases:

> > > > > > > >

> > > > > > > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX

> > > > > > > > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the

> > > > > > > > mask operand and handle unmapped bits 64:127 at memory address by

> > > > > > > > adjusting source and mask operands together with memory address.

> > > > > > > >

> > > > > > > > 2. MMX movntq is emulated with SSE2 DImode movnti, which is available

> > > > > > > > in 64-bit mode.

> > > > > > > >

> > > > > > > > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.

> > > > > > > > SSE emulation must clear the bit 4 in the shuffle control mask.

> > > > > > > >

> > > > > > > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve

> > > > > > > > the upper 64 bits of destination XMM register.

> > > > > > > >

> > > > > > > > Tests are also added to check each SSE emulation of MMX intrinsics.

> > > > > > > >

> > > > > > > > There are no regressions on i686 and x86-64.  For x86-64, GCC is also

> > > > > > > > tested with

> > > > > > > >

> > > > > > > > --with-arch=native --with-cpu=native

> > > > > > > >

> > > > > > > > on AVX2 and AVX512F machines.

> > > > > > >

> > > > > > > An idea that would take patch a step further also on 32 bit targets:

> > > > > > >

> > > > > > > *Assuming* that operations on XMM registers are as fast (or perhaps

> > > > > > > faster) than operations on MMX registers, we can change mmx_isa

> > > > > > > attribute in e.g.

> > > > > > >

> > > > > > > +  "@

> > > > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > > > +   vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > > > > > > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > > > > > >

> > > > > > > to:

> > > > > > >

> > > > > > > [(set_attr "isa" "*,noavx,avx")

> > > > > > >  (set_attr "mmx_isa" "native,*,*")]

> > > > > > >

> > > > > > > So, for x86_64 everything stays the same, but for x86_32 we now allow

> > > > > > > intrinsics to use xmm registers in addition to mmx registers. We can't

> > > > > > > disable MMX for x64_32 anyway due to ISA constraints (and some tricky

> > > > > > > cases, e.g. monvti that works only for 64bit targets and e.g. maskmovq

> > > > > > > & similar, which are more efficient with MMX regs), but RA has much

> > > > > > > more freedom to allocate the most effective register set even for

> > > > > > > 32bit targets.

> > > > > > >

> > > > > > > WDYT?

> > > > > > >

> > > > > >

> > > > > > Since MMX registers are used to pass and return __m64 values,

> > > > > > we can't really get rid of MMX instructions in 32-bit mode.  If people

> > > > > > have to stay with 32-bit mode, they need MMX.  I don't think we should

> > > > > > extend TARGET_MMX_WITH_SSE to 32-bit mode.

> > > > >

> > > > > No, TARGET_MMX_WITH_SSE is still enabled only for 64bit targets. We

> > > > > should not *disable* SSE alternatives on 32bit targets.

> > >

> > > I don't think my patch set disables any SSE alternatives in 32-bit

> > > mode.   However,

> > > it DOES NOT enable any SSE alternatives in 32-bit mode.  To really enable SSE

> > > alternatives in

> > >

> > > (define_insn "*mmx_<code><mode>3"

> > >   [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")

> > >         (any_logic:MMXMODEI

> > >           (match_operand:MMXMODEI 1 "register_mmxmem_operand" "%0,0,Yv")

> > >           (match_operand:MMXMODEI 2 "register_mmxmem_operand" "ym,x,Yv")))]

> > >   "(TARGET_MMX || TARGET_MMX_WITH_SSE)

> > >    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"

> > >   "@

> > >    p<logic>\t{%2, %0|%0, %2}

> > >    p<logic>\t{%2, %0|%0, %2}

> > >    vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > >   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > >    (set_attr "type" "mmxadd,sselog,sselog")

> > >    (set_attr "mode" "DI,TI,TI")])

> > >

> > > register_mmxmem_operand must return true for SSE alternatives:

> >

> > It returns true for register and memory operands for 32bit targets, because

> >

> > #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

>

> Will

>

> (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv"))))]

>

> work well with RA?  I got some wrong code before register_mmxmem_operand

> was added to match "ym,x,Yv".


I see no reason why it shouldn't.

Uros.
H.J. Lu Feb. 17, 2019, 8:46 p.m. | #9
On Sun, Feb 17, 2019 at 10:49 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On Sun, Feb 17, 2019 at 6:37 PM H.J. Lu <hjl.tools@gmail.com> wrote:

>

> > > > > > > > > On x86-64, since __m64 is returned and passed in XMM registers, we can

> > > > > > > > > emulate MMX intrinsics with SSE instructions. To support it, we added

> > > > > > > > >

> > > > > > > > >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> > > > > > > > >

> > > > > > > > > ;; Define instruction set of MMX instructions

> > > > > > > > > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"

> > > > > > > > >   (const_string "base"))

> > > > > > > > >

> > > > > > > > >          (eq_attr "mmx_isa" "native")

> > > > > > > > >            (symbol_ref "!TARGET_MMX_WITH_SSE")

> > > > > > > > >          (eq_attr "mmx_isa" "x64")

> > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE")

> > > > > > > > >          (eq_attr "mmx_isa" "x64_avx")

> > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")

> > > > > > > > >          (eq_attr "mmx_isa" "x64_noavx")

> > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

> > > > > > > > >

> > > > > > > > > We added SSE emulation to MMX patterns and disabled MMX alternatives with

> > > > > > > > > TARGET_MMX_WITH_SSE.

> > > > > > > > >

> > > > > > > > > Most of MMX instructions have equivalent SSE versions and results of some

> > > > > > > > > SSE versions need to be reshuffled to the right order for MMX.  Thee are

> > > > > > > > > couple tricky cases:

> > > > > > > > >

> > > > > > > > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX

> > > > > > > > > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the

> > > > > > > > > mask operand and handle unmapped bits 64:127 at memory address by

> > > > > > > > > adjusting source and mask operands together with memory address.

> > > > > > > > >

> > > > > > > > > 2. MMX movntq is emulated with SSE2 DImode movnti, which is available

> > > > > > > > > in 64-bit mode.

> > > > > > > > >

> > > > > > > > > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.

> > > > > > > > > SSE emulation must clear the bit 4 in the shuffle control mask.

> > > > > > > > >

> > > > > > > > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve

> > > > > > > > > the upper 64 bits of destination XMM register.

> > > > > > > > >

> > > > > > > > > Tests are also added to check each SSE emulation of MMX intrinsics.

> > > > > > > > >

> > > > > > > > > There are no regressions on i686 and x86-64.  For x86-64, GCC is also

> > > > > > > > > tested with

> > > > > > > > >

> > > > > > > > > --with-arch=native --with-cpu=native

> > > > > > > > >

> > > > > > > > > on AVX2 and AVX512F machines.

> > > > > > > >

> > > > > > > > An idea that would take patch a step further also on 32 bit targets:

> > > > > > > >

> > > > > > > > *Assuming* that operations on XMM registers are as fast (or perhaps

> > > > > > > > faster) than operations on MMX registers, we can change mmx_isa

> > > > > > > > attribute in e.g.

> > > > > > > >

> > > > > > > > +  "@

> > > > > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > > > > +   vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > > > > > > > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > > > > > > >

> > > > > > > > to:

> > > > > > > >

> > > > > > > > [(set_attr "isa" "*,noavx,avx")

> > > > > > > >  (set_attr "mmx_isa" "native,*,*")]

> > > > > > > >

> > > > > > > > So, for x86_64 everything stays the same, but for x86_32 we now allow

> > > > > > > > intrinsics to use xmm registers in addition to mmx registers. We can't

> > > > > > > > disable MMX for x64_32 anyway due to ISA constraints (and some tricky

> > > > > > > > cases, e.g. monvti that works only for 64bit targets and e.g. maskmovq

> > > > > > > > & similar, which are more efficient with MMX regs), but RA has much

> > > > > > > > more freedom to allocate the most effective register set even for

> > > > > > > > 32bit targets.

> > > > > > > >

> > > > > > > > WDYT?

> > > > > > > >

> > > > > > >

> > > > > > > Since MMX registers are used to pass and return __m64 values,

> > > > > > > we can't really get rid of MMX instructions in 32-bit mode.  If people

> > > > > > > have to stay with 32-bit mode, they need MMX.  I don't think we should

> > > > > > > extend TARGET_MMX_WITH_SSE to 32-bit mode.

> > > > > >

> > > > > > No, TARGET_MMX_WITH_SSE is still enabled only for 64bit targets. We

> > > > > > should not *disable* SSE alternatives on 32bit targets.

> > > >

> > > > I don't think my patch set disables any SSE alternatives in 32-bit

> > > > mode.   However,

> > > > it DOES NOT enable any SSE alternatives in 32-bit mode.  To really enable SSE

> > > > alternatives in

> > > >

> > > > (define_insn "*mmx_<code><mode>3"

> > > >   [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")

> > > >         (any_logic:MMXMODEI

> > > >           (match_operand:MMXMODEI 1 "register_mmxmem_operand" "%0,0,Yv")

> > > >           (match_operand:MMXMODEI 2 "register_mmxmem_operand" "ym,x,Yv")))]

> > > >   "(TARGET_MMX || TARGET_MMX_WITH_SSE)

> > > >    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"

> > > >   "@

> > > >    p<logic>\t{%2, %0|%0, %2}

> > > >    p<logic>\t{%2, %0|%0, %2}

> > > >    vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > > >   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > > >    (set_attr "type" "mmxadd,sselog,sselog")

> > > >    (set_attr "mode" "DI,TI,TI")])

> > > >

> > > > register_mmxmem_operand must return true for SSE alternatives:

> > >

> > > It returns true for register and memory operands for 32bit targets, because

> > >

> > > #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> >

> > Will

> >

> > (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv"))))]

> >

> > work well with RA?  I got some wrong code before register_mmxmem_operand

> > was added to match "ym,x,Yv".

>

> I see no reason why it shouldn't.


This will be equivalent to replace register_operand in

[(match_operand:VI1_AVX512VLBW 1 "register_operand" "v")

with nonimmediate_operand.  If it should work, I can do it in i386.md and
sse.md to check it out.

-- 
H.J.
H.J. Lu Feb. 18, 2019, 2:22 p.m. | #10
On Sun, Feb 17, 2019 at 12:46 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Sun, Feb 17, 2019 at 10:49 AM Uros Bizjak <ubizjak@gmail.com> wrote:

> >

> > On Sun, Feb 17, 2019 at 6:37 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> >

> > > > > > > > > > On x86-64, since __m64 is returned and passed in XMM registers, we can

> > > > > > > > > > emulate MMX intrinsics with SSE instructions. To support it, we added

> > > > > > > > > >

> > > > > > > > > >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> > > > > > > > > >

> > > > > > > > > > ;; Define instruction set of MMX instructions

> > > > > > > > > > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"

> > > > > > > > > >   (const_string "base"))

> > > > > > > > > >

> > > > > > > > > >          (eq_attr "mmx_isa" "native")

> > > > > > > > > >            (symbol_ref "!TARGET_MMX_WITH_SSE")

> > > > > > > > > >          (eq_attr "mmx_isa" "x64")

> > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE")

> > > > > > > > > >          (eq_attr "mmx_isa" "x64_avx")

> > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")

> > > > > > > > > >          (eq_attr "mmx_isa" "x64_noavx")

> > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

> > > > > > > > > >

> > > > > > > > > > We added SSE emulation to MMX patterns and disabled MMX alternatives with

> > > > > > > > > > TARGET_MMX_WITH_SSE.

> > > > > > > > > >

> > > > > > > > > > Most of MMX instructions have equivalent SSE versions and results of some

> > > > > > > > > > SSE versions need to be reshuffled to the right order for MMX.  Thee are

> > > > > > > > > > couple tricky cases:

> > > > > > > > > >

> > > > > > > > > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX

> > > > > > > > > > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the

> > > > > > > > > > mask operand and handle unmapped bits 64:127 at memory address by

> > > > > > > > > > adjusting source and mask operands together with memory address.

> > > > > > > > > >

> > > > > > > > > > 2. MMX movntq is emulated with SSE2 DImode movnti, which is available

> > > > > > > > > > in 64-bit mode.

> > > > > > > > > >

> > > > > > > > > > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.

> > > > > > > > > > SSE emulation must clear the bit 4 in the shuffle control mask.

> > > > > > > > > >

> > > > > > > > > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve

> > > > > > > > > > the upper 64 bits of destination XMM register.

> > > > > > > > > >

> > > > > > > > > > Tests are also added to check each SSE emulation of MMX intrinsics.

> > > > > > > > > >

> > > > > > > > > > There are no regressions on i686 and x86-64.  For x86-64, GCC is also

> > > > > > > > > > tested with

> > > > > > > > > >

> > > > > > > > > > --with-arch=native --with-cpu=native

> > > > > > > > > >

> > > > > > > > > > on AVX2 and AVX512F machines.

> > > > > > > > >

> > > > > > > > > An idea that would take patch a step further also on 32 bit targets:

> > > > > > > > >

> > > > > > > > > *Assuming* that operations on XMM registers are as fast (or perhaps

> > > > > > > > > faster) than operations on MMX registers, we can change mmx_isa

> > > > > > > > > attribute in e.g.

> > > > > > > > >

> > > > > > > > > +  "@

> > > > > > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > > > > > +   vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > > > > > > > > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > > > > > > > >

> > > > > > > > > to:

> > > > > > > > >

> > > > > > > > > [(set_attr "isa" "*,noavx,avx")

> > > > > > > > >  (set_attr "mmx_isa" "native,*,*")]

> > > > > > > > >

> > > > > > > > > So, for x86_64 everything stays the same, but for x86_32 we now allow

> > > > > > > > > intrinsics to use xmm registers in addition to mmx registers. We can't

> > > > > > > > > disable MMX for x64_32 anyway due to ISA constraints (and some tricky

> > > > > > > > > cases, e.g. monvti that works only for 64bit targets and e.g. maskmovq

> > > > > > > > > & similar, which are more efficient with MMX regs), but RA has much

> > > > > > > > > more freedom to allocate the most effective register set even for

> > > > > > > > > 32bit targets.

> > > > > > > > >

> > > > > > > > > WDYT?

> > > > > > > > >

> > > > > > > >

> > > > > > > > Since MMX registers are used to pass and return __m64 values,

> > > > > > > > we can't really get rid of MMX instructions in 32-bit mode.  If people

> > > > > > > > have to stay with 32-bit mode, they need MMX.  I don't think we should

> > > > > > > > extend TARGET_MMX_WITH_SSE to 32-bit mode.

> > > > > > >

> > > > > > > No, TARGET_MMX_WITH_SSE is still enabled only for 64bit targets. We

> > > > > > > should not *disable* SSE alternatives on 32bit targets.

> > > > >

> > > > > I don't think my patch set disables any SSE alternatives in 32-bit

> > > > > mode.   However,

> > > > > it DOES NOT enable any SSE alternatives in 32-bit mode.  To really enable SSE

> > > > > alternatives in

> > > > >

> > > > > (define_insn "*mmx_<code><mode>3"

> > > > >   [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")

> > > > >         (any_logic:MMXMODEI

> > > > >           (match_operand:MMXMODEI 1 "register_mmxmem_operand" "%0,0,Yv")

> > > > >           (match_operand:MMXMODEI 2 "register_mmxmem_operand" "ym,x,Yv")))]

> > > > >   "(TARGET_MMX || TARGET_MMX_WITH_SSE)

> > > > >    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"

> > > > >   "@

> > > > >    p<logic>\t{%2, %0|%0, %2}

> > > > >    p<logic>\t{%2, %0|%0, %2}

> > > > >    vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > > > >   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > > > >    (set_attr "type" "mmxadd,sselog,sselog")

> > > > >    (set_attr "mode" "DI,TI,TI")])

> > > > >

> > > > > register_mmxmem_operand must return true for SSE alternatives:

> > > >

> > > > It returns true for register and memory operands for 32bit targets, because

> > > >

> > > > #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> > >

> > > Will

> > >

> > > (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv"))))]

> > >

> > > work well with RA?  I got some wrong code before register_mmxmem_operand

> > > was added to match "ym,x,Yv".

> >

> > I see no reason why it shouldn't.

>

> This will be equivalent to replace register_operand in

>

> [(match_operand:VI1_AVX512VLBW 1 "register_operand" "v")

>

> with nonimmediate_operand.  If it should work, I can do it in i386.md and

> sse.md to check it out.

>


I tried:

sed -i -e "s/\"register_operand\"[
\t]\+\(\"[^=^\+^f]\+\"[^=]\+$\)/\"nonimmediate_operand\" \1/" i386.md

and got

(gdb) call debug_rtx (insn)
(insn 65 19 67 2 (parallel [
            (set (reg/f:SI 97)
                (plus:SI (mem/u/c:SI (plus:SI (reg:SI 82)
                            (const:SI (unspec:SI [
                                        (symbol_ref:SI
("gomp_tls_data") [flags 0x62] <var_decl 0x7fffea6c5e10
gomp_tls_data>)
                                    ] UNSPEC_GOTNTPOFF))) [17  S4 A8])
                    (mem/u/c:SI (const_int 0 [0]) [0  S4 A8 AS2])))
            (clobber (reg:CC 17 flags))
        ]) "/export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lock.c":139:7
-1
     (expr_list:REG_DEAD (reg:SI 82)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (expr_list:REG_EQUIV (symbol_ref:SI ("gomp_tls_data")
[flags 0x62] <var_decl 0x7fffea6c5e10 gomp_tls_data>)
                (nil)))))
(gdb) c
Continuing.
during RTL pass: ira
/export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lock.c: In
function ‘gomp_test_nest_lock_25’:
/export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lock.c:149:1:
internal compiler error: in elimination_costs_in_insn, at
reload1.c:3640
  149 | }
      | ^
0x108b258 elimination_costs_in_insn
/export/gnu/import/git/gitlab/x86-gcc/gcc/reload1.c:3637
0x108596f calculate_elim_costs_all_insns()
/export/gnu/import/git/gitlab/x86-gcc/gcc/reload1.c:1609
0xe61a7a ira_costs()
/export/gnu/import/git/gitlab/x86-gcc/gcc/ira-costs.c:2298
0xe56613 ira_build()
/export/gnu/import/git/gitlab/x86-gcc/gcc/ira-build.c:3432
0xe4b31d ira
/export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:5346
0xe4bba0 execute
/export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:5657
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.


-- 
H.J.
Uros Bizjak Feb. 18, 2019, 2:37 p.m. | #11
On Mon, Feb 18, 2019 at 3:22 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> > > > > > > > > > > On x86-64, since __m64 is returned and passed in XMM registers, we can

> > > > > > > > > > > emulate MMX intrinsics with SSE instructions. To support it, we added

> > > > > > > > > > >

> > > > > > > > > > >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> > > > > > > > > > >

> > > > > > > > > > > ;; Define instruction set of MMX instructions

> > > > > > > > > > > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"

> > > > > > > > > > >   (const_string "base"))

> > > > > > > > > > >

> > > > > > > > > > >          (eq_attr "mmx_isa" "native")

> > > > > > > > > > >            (symbol_ref "!TARGET_MMX_WITH_SSE")

> > > > > > > > > > >          (eq_attr "mmx_isa" "x64")

> > > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE")

> > > > > > > > > > >          (eq_attr "mmx_isa" "x64_avx")

> > > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")

> > > > > > > > > > >          (eq_attr "mmx_isa" "x64_noavx")

> > > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

> > > > > > > > > > >

> > > > > > > > > > > We added SSE emulation to MMX patterns and disabled MMX alternatives with

> > > > > > > > > > > TARGET_MMX_WITH_SSE.

> > > > > > > > > > >

> > > > > > > > > > > Most of MMX instructions have equivalent SSE versions and results of some

> > > > > > > > > > > SSE versions need to be reshuffled to the right order for MMX.  Thee are

> > > > > > > > > > > couple tricky cases:

> > > > > > > > > > >

> > > > > > > > > > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX

> > > > > > > > > > > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the

> > > > > > > > > > > mask operand and handle unmapped bits 64:127 at memory address by

> > > > > > > > > > > adjusting source and mask operands together with memory address.

> > > > > > > > > > >

> > > > > > > > > > > 2. MMX movntq is emulated with SSE2 DImode movnti, which is available

> > > > > > > > > > > in 64-bit mode.

> > > > > > > > > > >

> > > > > > > > > > > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.

> > > > > > > > > > > SSE emulation must clear the bit 4 in the shuffle control mask.

> > > > > > > > > > >

> > > > > > > > > > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve

> > > > > > > > > > > the upper 64 bits of destination XMM register.

> > > > > > > > > > >

> > > > > > > > > > > Tests are also added to check each SSE emulation of MMX intrinsics.

> > > > > > > > > > >

> > > > > > > > > > > There are no regressions on i686 and x86-64.  For x86-64, GCC is also

> > > > > > > > > > > tested with

> > > > > > > > > > >

> > > > > > > > > > > --with-arch=native --with-cpu=native

> > > > > > > > > > >

> > > > > > > > > > > on AVX2 and AVX512F machines.

> > > > > > > > > >

> > > > > > > > > > An idea that would take patch a step further also on 32 bit targets:

> > > > > > > > > >

> > > > > > > > > > *Assuming* that operations on XMM registers are as fast (or perhaps

> > > > > > > > > > faster) than operations on MMX registers, we can change mmx_isa

> > > > > > > > > > attribute in e.g.

> > > > > > > > > >

> > > > > > > > > > +  "@

> > > > > > > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > > > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > > > > > > +   vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > > > > > > > > > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > > > > > > > > >

> > > > > > > > > > to:

> > > > > > > > > >

> > > > > > > > > > [(set_attr "isa" "*,noavx,avx")

> > > > > > > > > >  (set_attr "mmx_isa" "native,*,*")]

> > > > > > > > > >

> > > > > > > > > > So, for x86_64 everything stays the same, but for x86_32 we now allow

> > > > > > > > > > intrinsics to use xmm registers in addition to mmx registers. We can't

> > > > > > > > > > disable MMX for x64_32 anyway due to ISA constraints (and some tricky

> > > > > > > > > > cases, e.g. monvti that works only for 64bit targets and e.g. maskmovq

> > > > > > > > > > & similar, which are more efficient with MMX regs), but RA has much

> > > > > > > > > > more freedom to allocate the most effective register set even for

> > > > > > > > > > 32bit targets.

> > > > > > > > > >

> > > > > > > > > > WDYT?

> > > > > > > > > >

> > > > > > > > >

> > > > > > > > > Since MMX registers are used to pass and return __m64 values,

> > > > > > > > > we can't really get rid of MMX instructions in 32-bit mode.  If people

> > > > > > > > > have to stay with 32-bit mode, they need MMX.  I don't think we should

> > > > > > > > > extend TARGET_MMX_WITH_SSE to 32-bit mode.

> > > > > > > >

> > > > > > > > No, TARGET_MMX_WITH_SSE is still enabled only for 64bit targets. We

> > > > > > > > should not *disable* SSE alternatives on 32bit targets.

> > > > > >

> > > > > > I don't think my patch set disables any SSE alternatives in 32-bit

> > > > > > mode.   However,

> > > > > > it DOES NOT enable any SSE alternatives in 32-bit mode.  To really enable SSE

> > > > > > alternatives in

> > > > > >

> > > > > > (define_insn "*mmx_<code><mode>3"

> > > > > >   [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")

> > > > > >         (any_logic:MMXMODEI

> > > > > >           (match_operand:MMXMODEI 1 "register_mmxmem_operand" "%0,0,Yv")

> > > > > >           (match_operand:MMXMODEI 2 "register_mmxmem_operand" "ym,x,Yv")))]

> > > > > >   "(TARGET_MMX || TARGET_MMX_WITH_SSE)

> > > > > >    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"

> > > > > >   "@

> > > > > >    p<logic>\t{%2, %0|%0, %2}

> > > > > >    p<logic>\t{%2, %0|%0, %2}

> > > > > >    vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > > > > >   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > > > > >    (set_attr "type" "mmxadd,sselog,sselog")

> > > > > >    (set_attr "mode" "DI,TI,TI")])

> > > > > >

> > > > > > register_mmxmem_operand must return true for SSE alternatives:

> > > > >

> > > > > It returns true for register and memory operands for 32bit targets, because

> > > > >

> > > > > #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> > > >

> > > > Will

> > > >

> > > > (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv"))))]

> > > >

> > > > work well with RA?  I got some wrong code before register_mmxmem_operand

> > > > was added to match "ym,x,Yv".

> > >

> > > I see no reason why it shouldn't.

> >

> > This will be equivalent to replace register_operand in

> >

> > [(match_operand:VI1_AVX512VLBW 1 "register_operand" "v")

> >

> > with nonimmediate_operand.  If it should work, I can do it in i386.md and

> > sse.md to check it out.

> >

>

> I tried:

>

> sed -i -e "s/\"register_operand\"[

> \t]\+\(\"[^=^\+^f]\+\"[^=]\+$\)/\"nonimmediate_operand\" \1/" i386.md


I don't know what is the point in changing these operands, but

(match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")

should work without problems.

Uros.

> and got

>

> (gdb) call debug_rtx (insn)

> (insn 65 19 67 2 (parallel [

>             (set (reg/f:SI 97)

>                 (plus:SI (mem/u/c:SI (plus:SI (reg:SI 82)

>                             (const:SI (unspec:SI [

>                                         (symbol_ref:SI

> ("gomp_tls_data") [flags 0x62] <var_decl 0x7fffea6c5e10

> gomp_tls_data>)

>                                     ] UNSPEC_GOTNTPOFF))) [17  S4 A8])

>                     (mem/u/c:SI (const_int 0 [0]) [0  S4 A8 AS2])))

>             (clobber (reg:CC 17 flags))

>         ]) "/export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lock.c":139:7

> -1

>      (expr_list:REG_DEAD (reg:SI 82)

>         (expr_list:REG_UNUSED (reg:CC 17 flags)

>             (expr_list:REG_EQUIV (symbol_ref:SI ("gomp_tls_data")

> [flags 0x62] <var_decl 0x7fffea6c5e10 gomp_tls_data>)

>                 (nil)))))

> (gdb) c

> Continuing.

> during RTL pass: ira

> /export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lock.c: In

> function ‘gomp_test_nest_lock_25’:

> /export/gnu/import/git/gitlab/x86-gcc/libgomp/config/linux/lock.c:149:1:

> internal compiler error: in elimination_costs_in_insn, at

> reload1.c:3640

>   149 | }

>       | ^

> 0x108b258 elimination_costs_in_insn

> /export/gnu/import/git/gitlab/x86-gcc/gcc/reload1.c:3637

> 0x108596f calculate_elim_costs_all_insns()

> /export/gnu/import/git/gitlab/x86-gcc/gcc/reload1.c:1609

> 0xe61a7a ira_costs()

> /export/gnu/import/git/gitlab/x86-gcc/gcc/ira-costs.c:2298

> 0xe56613 ira_build()

> /export/gnu/import/git/gitlab/x86-gcc/gcc/ira-build.c:3432

> 0xe4b31d ira

> /export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:5346

> 0xe4bba0 execute

> /export/gnu/import/git/gitlab/x86-gcc/gcc/ira.c:5657

> Please submit a full bug report,

> with preprocessed source if appropriate.

> Please include the complete backtrace with any bug report.

>

>

> --

> H.J.
H.J. Lu Feb. 18, 2019, 2:47 p.m. | #12
On Mon, Feb 18, 2019 at 6:37 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On Mon, Feb 18, 2019 at 3:22 PM H.J. Lu <hjl.tools@gmail.com> wrote:

>

> > > > > > > > > > > > On x86-64, since __m64 is returned and passed in XMM registers, we can

> > > > > > > > > > > > emulate MMX intrinsics with SSE instructions. To support it, we added

> > > > > > > > > > > >

> > > > > > > > > > > >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> > > > > > > > > > > >

> > > > > > > > > > > > ;; Define instruction set of MMX instructions

> > > > > > > > > > > > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"

> > > > > > > > > > > >   (const_string "base"))

> > > > > > > > > > > >

> > > > > > > > > > > >          (eq_attr "mmx_isa" "native")

> > > > > > > > > > > >            (symbol_ref "!TARGET_MMX_WITH_SSE")

> > > > > > > > > > > >          (eq_attr "mmx_isa" "x64")

> > > > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE")

> > > > > > > > > > > >          (eq_attr "mmx_isa" "x64_avx")

> > > > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")

> > > > > > > > > > > >          (eq_attr "mmx_isa" "x64_noavx")

> > > > > > > > > > > >            (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

> > > > > > > > > > > >

> > > > > > > > > > > > We added SSE emulation to MMX patterns and disabled MMX alternatives with

> > > > > > > > > > > > TARGET_MMX_WITH_SSE.

> > > > > > > > > > > >

> > > > > > > > > > > > Most of MMX instructions have equivalent SSE versions and results of some

> > > > > > > > > > > > SSE versions need to be reshuffled to the right order for MMX.  Thee are

> > > > > > > > > > > > couple tricky cases:

> > > > > > > > > > > >

> > > > > > > > > > > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX

> > > > > > > > > > > > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the

> > > > > > > > > > > > mask operand and handle unmapped bits 64:127 at memory address by

> > > > > > > > > > > > adjusting source and mask operands together with memory address.

> > > > > > > > > > > >

> > > > > > > > > > > > 2. MMX movntq is emulated with SSE2 DImode movnti, which is available

> > > > > > > > > > > > in 64-bit mode.

> > > > > > > > > > > >

> > > > > > > > > > > > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.

> > > > > > > > > > > > SSE emulation must clear the bit 4 in the shuffle control mask.

> > > > > > > > > > > >

> > > > > > > > > > > > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve

> > > > > > > > > > > > the upper 64 bits of destination XMM register.

> > > > > > > > > > > >

> > > > > > > > > > > > Tests are also added to check each SSE emulation of MMX intrinsics.

> > > > > > > > > > > >

> > > > > > > > > > > > There are no regressions on i686 and x86-64.  For x86-64, GCC is also

> > > > > > > > > > > > tested with

> > > > > > > > > > > >

> > > > > > > > > > > > --with-arch=native --with-cpu=native

> > > > > > > > > > > >

> > > > > > > > > > > > on AVX2 and AVX512F machines.

> > > > > > > > > > >

> > > > > > > > > > > An idea that would take patch a step further also on 32 bit targets:

> > > > > > > > > > >

> > > > > > > > > > > *Assuming* that operations on XMM registers are as fast (or perhaps

> > > > > > > > > > > faster) than operations on MMX registers, we can change mmx_isa

> > > > > > > > > > > attribute in e.g.

> > > > > > > > > > >

> > > > > > > > > > > +  "@

> > > > > > > > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > > > > > > > +   p<logic>\t{%2, %0|%0, %2}

> > > > > > > > > > > +   vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > > > > > > > > > > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > > > > > > > > > >

> > > > > > > > > > > to:

> > > > > > > > > > >

> > > > > > > > > > > [(set_attr "isa" "*,noavx,avx")

> > > > > > > > > > >  (set_attr "mmx_isa" "native,*,*")]

> > > > > > > > > > >

> > > > > > > > > > > So, for x86_64 everything stays the same, but for x86_32 we now allow

> > > > > > > > > > > intrinsics to use xmm registers in addition to mmx registers. We can't

> > > > > > > > > > > disable MMX for x64_32 anyway due to ISA constraints (and some tricky

> > > > > > > > > > > cases, e.g. monvti that works only for 64bit targets and e.g. maskmovq

> > > > > > > > > > > & similar, which are more efficient with MMX regs), but RA has much

> > > > > > > > > > > more freedom to allocate the most effective register set even for

> > > > > > > > > > > 32bit targets.

> > > > > > > > > > >

> > > > > > > > > > > WDYT?

> > > > > > > > > > >

> > > > > > > > > >

> > > > > > > > > > Since MMX registers are used to pass and return __m64 values,

> > > > > > > > > > we can't really get rid of MMX instructions in 32-bit mode.  If people

> > > > > > > > > > have to stay with 32-bit mode, they need MMX.  I don't think we should

> > > > > > > > > > extend TARGET_MMX_WITH_SSE to 32-bit mode.

> > > > > > > > >

> > > > > > > > > No, TARGET_MMX_WITH_SSE is still enabled only for 64bit targets. We

> > > > > > > > > should not *disable* SSE alternatives on 32bit targets.

> > > > > > >

> > > > > > > I don't think my patch set disables any SSE alternatives in 32-bit

> > > > > > > mode.   However,

> > > > > > > it DOES NOT enable any SSE alternatives in 32-bit mode.  To really enable SSE

> > > > > > > alternatives in

> > > > > > >

> > > > > > > (define_insn "*mmx_<code><mode>3"

> > > > > > >   [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")

> > > > > > >         (any_logic:MMXMODEI

> > > > > > >           (match_operand:MMXMODEI 1 "register_mmxmem_operand" "%0,0,Yv")

> > > > > > >           (match_operand:MMXMODEI 2 "register_mmxmem_operand" "ym,x,Yv")))]

> > > > > > >   "(TARGET_MMX || TARGET_MMX_WITH_SSE)

> > > > > > >    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"

> > > > > > >   "@

> > > > > > >    p<logic>\t{%2, %0|%0, %2}

> > > > > > >    p<logic>\t{%2, %0|%0, %2}

> > > > > > >    vp<logic>\t{%2, %1, %0|%0, %1, %2}"

> > > > > > >   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

> > > > > > >    (set_attr "type" "mmxadd,sselog,sselog")

> > > > > > >    (set_attr "mode" "DI,TI,TI")])

> > > > > > >

> > > > > > > register_mmxmem_operand must return true for SSE alternatives:

> > > > > >

> > > > > > It returns true for register and memory operands for 32bit targets, because

> > > > > >

> > > > > > #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

> > > > >

> > > > > Will

> > > > >

> > > > > (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv"))))]

> > > > >

> > > > > work well with RA?  I got some wrong code before register_mmxmem_operand

> > > > > was added to match "ym,x,Yv".

> > > >

> > > > I see no reason why it shouldn't.

> > >

> > > This will be equivalent to replace register_operand in

> > >

> > > [(match_operand:VI1_AVX512VLBW 1 "register_operand" "v")

> > >

> > > with nonimmediate_operand.  If it should work, I can do it in i386.md and

> > > sse.md to check it out.

> > >

> >

> > I tried:

> >

> > sed -i -e "s/\"register_operand\"[

> > \t]\+\(\"[^=^\+^f]\+\"[^=]\+$\)/\"nonimmediate_operand\" \1/" i386.md

>

> I don't know what is the point in changing these operands, but


The point is we can't replace register_operand with nonimmediate_operand
in all places.

> (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")

>

> should work without problems.

>


32-bit MMX has very low priority.  I will try it in the second phase.

-- 
H.J.