V2 [PATCH] x86: Don't generate vzeroupper if caller is AVX_U128_DIRTY

Message ID CAMe9rOr3UtJ4_bGNWQ7nQTiboX1Lhxxzim6yd5boozKR8AeaRw@mail.gmail.com
State New
Headers show
Series
  • V2 [PATCH] x86: Don't generate vzeroupper if caller is AVX_U128_DIRTY
Related show

Commit Message

H.J. Lu Jan. 8, 2019, 4:16 p.m.
On Tue, Jan 8, 2019 at 6:54 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On Tue, Jan 8, 2019 at 3:39 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> >

> > On Mon, Jan 7, 2019 at 11:12 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > >

> > > On Mon, Jan 7, 2019 at 6:40 PM H.J. Lu <hongjiu.lu@intel.com> wrote:

> > > >

> > > > There is no need to generate vzeroupper if caller uses upper bits of

> > > > AVX/AVX512 registers,  We track caller's avx_u128_state and avoid

> > > > vzeroupper when caller's avx_u128_state is AVX_U128_DIRTY.

> > > >

> > > > Tested on i686 and x86-64 with and without --with-arch=native.

> > > >

> > > > OK for trunk?

> > >

> > > In principle OK, but I think we don't have to cache the result of

> > > ix86_avx_u128_mode_entry. Simply call the function from

> > > ix86_avx_u128_mode_exit; it is a simple function, so I guess we can

> > > afford to re-call it one more time per function.

> >

> > Do we really need ix86_avx_u128_mode_entry?  We can just

> > set entry state to AVX_U128_CLEAN and set exit state to

> > AVX_U128_DIRTY if caller returns AVX/AVX512 register or passes

> > AVX/AVX512 registers to callee.

> >

> > Does this patch look OK?

>

> No, the compiler is then free to move optimal insertion point at the

> beginning of the function.

>


Here is the updated patch.  OK for trunk?

Thanks.

-- 
H.J.

Comments

Uros Bizjak Jan. 8, 2019, 5:29 p.m. | #1
On Tue, Jan 8, 2019 at 5:17 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Tue, Jan 8, 2019 at 6:54 AM Uros Bizjak <ubizjak@gmail.com> wrote:

> >

> > On Tue, Jan 8, 2019 at 3:39 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> > >

> > > On Mon, Jan 7, 2019 at 11:12 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > > >

> > > > On Mon, Jan 7, 2019 at 6:40 PM H.J. Lu <hongjiu.lu@intel.com> wrote:

> > > > >

> > > > > There is no need to generate vzeroupper if caller uses upper bits of

> > > > > AVX/AVX512 registers,  We track caller's avx_u128_state and avoid

> > > > > vzeroupper when caller's avx_u128_state is AVX_U128_DIRTY.

> > > > >

> > > > > Tested on i686 and x86-64 with and without --with-arch=native.

> > > > >

> > > > > OK for trunk?

> > > >

> > > > In principle OK, but I think we don't have to cache the result of

> > > > ix86_avx_u128_mode_entry. Simply call the function from

> > > > ix86_avx_u128_mode_exit; it is a simple function, so I guess we can

> > > > afford to re-call it one more time per function.

> > >

> > > Do we really need ix86_avx_u128_mode_entry?  We can just

> > > set entry state to AVX_U128_CLEAN and set exit state to

> > > AVX_U128_DIRTY if caller returns AVX/AVX512 register or passes

> > > AVX/AVX512 registers to callee.

> > >

> > > Does this patch look OK?

> >

> > No, the compiler is then free to move optimal insertion point at the

> > beginning of the function.

> >

>

> Here is the updated patch.  OK for trunk?


OK with the comment fix.

Thanks,
Uros.

-  return AVX_U128_CLEAN;
+  /* Entry mode is set to AVX_U128_DIRTY if there are 256bit or 512bit

s/Entry/Exit/

+     modes used in function arguments.  */

... , otherwise return AVX_U128_CLEAN.

+  return ix86_avx_u128_mode_entry ();
 }
H.J. Lu Jan. 8, 2019, 5:36 p.m. | #2
On Tue, Jan 8, 2019 at 9:29 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>

> On Tue, Jan 8, 2019 at 5:17 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> >

> > On Tue, Jan 8, 2019 at 6:54 AM Uros Bizjak <ubizjak@gmail.com> wrote:

> > >

> > > On Tue, Jan 8, 2019 at 3:39 PM H.J. Lu <hjl.tools@gmail.com> wrote:

> > > >

> > > > On Mon, Jan 7, 2019 at 11:12 PM Uros Bizjak <ubizjak@gmail.com> wrote:

> > > > >

> > > > > On Mon, Jan 7, 2019 at 6:40 PM H.J. Lu <hongjiu.lu@intel.com> wrote:

> > > > > >

> > > > > > There is no need to generate vzeroupper if caller uses upper bits of

> > > > > > AVX/AVX512 registers,  We track caller's avx_u128_state and avoid

> > > > > > vzeroupper when caller's avx_u128_state is AVX_U128_DIRTY.

> > > > > >

> > > > > > Tested on i686 and x86-64 with and without --with-arch=native.

> > > > > >

> > > > > > OK for trunk?

> > > > >

> > > > > In principle OK, but I think we don't have to cache the result of

> > > > > ix86_avx_u128_mode_entry. Simply call the function from

> > > > > ix86_avx_u128_mode_exit; it is a simple function, so I guess we can

> > > > > afford to re-call it one more time per function.

> > > >

> > > > Do we really need ix86_avx_u128_mode_entry?  We can just

> > > > set entry state to AVX_U128_CLEAN and set exit state to

> > > > AVX_U128_DIRTY if caller returns AVX/AVX512 register or passes

> > > > AVX/AVX512 registers to callee.

> > > >

> > > > Does this patch look OK?

> > >

> > > No, the compiler is then free to move optimal insertion point at the

> > > beginning of the function.

> > >

> >

> > Here is the updated patch.  OK for trunk?

>

> OK with the comment fix.

>

> Thanks,

> Uros.

>

> -  return AVX_U128_CLEAN;

> +  /* Entry mode is set to AVX_U128_DIRTY if there are 256bit or 512bit

>

> s/Entry/Exit/

>

> +     modes used in function arguments.  */

>

> ... , otherwise return AVX_U128_CLEAN.

>

> +  return ix86_avx_u128_mode_entry ();

>  }


This is what I am checking in.

Thanks.

-- 
H.J.
From 315e6eadf7021748de375c59da9cf451351c9597 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 7 Jan 2019 06:56:44 -0800
Subject: [PATCH] x86: Don't generate vzeroupper if caller passes AVX/AVX512
 registers

There is no need to generate vzeroupper if caller passes arguments in
AVX/AVX512 registers.

Tested on i686 and x86-64 with and without --with-arch=native.

gcc/

	PR target/88717
	* config/i386/i386.c (ix86_avx_u128_mode_exit): Call
	ix86_avx_u128_mode_entry.

gcc/testsuite/

	PR target/88717
	* gcc.target/i386/pr88717.c: New test.
---
 gcc/config/i386/i386.c                  |  5 ++++-
 gcc/testsuite/gcc.target/i386/pr88717.c | 24 ++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr88717.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d01278d866f..bd48e080f46 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19137,7 +19137,10 @@ ix86_avx_u128_mode_exit (void)
   if (reg && ix86_check_avx_upper_register (reg))
     return AVX_U128_DIRTY;
 
-  return AVX_U128_CLEAN;
+  /* Exit mode is set to AVX_U128_DIRTY if there are 256bit or 512bit
+     modes used in function arguments, otherwise return AVX_U128_CLEAN.
+   */
+  return ix86_avx_u128_mode_entry ();
 }
 
 /* Return a mode that ENTITY is assumed to be
diff --git a/gcc/testsuite/gcc.target/i386/pr88717.c b/gcc/testsuite/gcc.target/i386/pr88717.c
new file mode 100644
index 00000000000..01680998f1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr88717.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mvzeroupper" } */
+
+#include <immintrin.h>
+
+__m128
+foo1 (__m256 x)
+{
+  return _mm256_castps256_ps128 (x);
+}
+
+void
+foo2 (float *p, __m256 x)
+{
+  *p = ((__v8sf)x)[0];
+}
+
+void
+foo3 (float *p, __m512 x)
+{
+  *p = ((__v16sf)x)[0];
+}
+
+/* { dg-final { scan-assembler-not "vzeroupper" } } */

Patch

From 702ece14923f9922be5a6ed835a8efbe24e890ba Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 7 Jan 2019 06:56:44 -0800
Subject: [PATCH] x86: Don't generate vzeroupper if caller passes AVX/AVX512
 registers

There is no need to generate vzeroupper if caller passes arguments in
AVX/AVX512 registers.

Tested on i686 and x86-64 with and without --with-arch=native.

gcc/

	PR target/88717
	* config/i386/i386.c (ix86_avx_u128_mode_exit): Call
	ix86_avx_u128_mode_entry.

gcc/testsuite/

	PR target/88717
	* gcc.target/i386/pr88717.c: New test.
---
 gcc/config/i386/i386.c                  |  4 +++-
 gcc/testsuite/gcc.target/i386/pr88717.c | 24 ++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr88717.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d01278d866f..7d82a241143 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19137,7 +19137,9 @@  ix86_avx_u128_mode_exit (void)
   if (reg && ix86_check_avx_upper_register (reg))
     return AVX_U128_DIRTY;
 
-  return AVX_U128_CLEAN;
+  /* Entry mode is set to AVX_U128_DIRTY if there are 256bit or 512bit
+     modes used in function arguments.  */
+  return ix86_avx_u128_mode_entry ();
 }
 
 /* Return a mode that ENTITY is assumed to be
diff --git a/gcc/testsuite/gcc.target/i386/pr88717.c b/gcc/testsuite/gcc.target/i386/pr88717.c
new file mode 100644
index 00000000000..01680998f1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr88717.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mvzeroupper" } */
+
+#include <immintrin.h>
+
+__m128
+foo1 (__m256 x)
+{
+  return _mm256_castps256_ps128 (x);
+}
+
+void
+foo2 (float *p, __m256 x)
+{
+  *p = ((__v8sf)x)[0];
+}
+
+void
+foo3 (float *p, __m512 x)
+{
+  *p = ((__v16sf)x)[0];
+}
+
+/* { dg-final { scan-assembler-not "vzeroupper" } } */
-- 
2.20.1