aarch64: Change costs for TX2 to expose more vectorization opportunities

Message ID 20200706160849.GC29652@bell-sw.com
State New
Headers show
Series
  • aarch64: Change costs for TX2 to expose more vectorization opportunities
Related show

Commit Message

Anton Youdkevitch July 6, 2020, 4:08 p.m.
This patch changes some vector costs for TX2 so that
more vectorizations beneficial for TX2 chip can happen.

The new cost model makes the x264 benchmark of CPU2017
7% faster with no negative performance impact on other
benchmarks.

Bootstrapped on linux-aarch64

	2020-07-06 Anton Youdkevitch <anton.youdkevitch@bell-sw.com>
gcc/
    * config/aarch64/aarch64.c (thunderx2t99_regmove_cost):
    Change instruction cost
    (thunderx2t99_vector_cost): Likewise

Comments

Richard Sandiford July 6, 2020, 5:47 p.m. | #1
Anton Youdkevitch <anton.youdkevitch@bell-sw.com> writes:
> This patch changes some vector costs for TX2 so that

> more vectorizations beneficial for TX2 chip can happen.

>

> The new cost model makes the x264 benchmark of CPU2017

> 7% faster with no negative performance impact on other

> benchmarks.

>

> Bootstrapped on linux-aarch64

>

> 	2020-07-06 Anton Youdkevitch <anton.youdkevitch@bell-sw.com>

> gcc/

>     * config/aarch64/aarch64.c (thunderx2t99_regmove_cost):

>     Change instruction cost

>     (thunderx2t99_vector_cost): Likewise


OK if Andrew agrees.

Thanks,
Richard

>

> From 3440e019c05fe5b565041cad549c6eefa2004a2b Mon Sep 17 00:00:00 2001

> From: Anton Youdkevitch <anton.youdkevitch@bell-sw.com>

> Date: Tue, 26 May 2020 04:23:04 -0700

> Subject: [PATCH] Change costs for TX2 to expose more vectorization opportunities

>

> Make the costs such that they do not exaclty reflect

> the actual instructions costs from the manual but make

> the codegen emit the code we want it to.

> ---

>  gcc/config/aarch64/aarch64.c | 18 +++++++++---------

>  1 file changed, 9 insertions(+), 9 deletions(-)

>

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

> index e92c7e6..18c2251 100644

> --- a/gcc/config/aarch64/aarch64.c

> +++ b/gcc/config/aarch64/aarch64.c

> @@ -535,9 +535,9 @@ static const struct cpu_regmove_cost thunderx2t99_regmove_cost =

>  {

>    1, /* GP2GP  */

>    /* Avoid the use of int<->fp moves for spilling.  */

> -  8, /* GP2FP  */

> -  8, /* FP2GP  */

> -  4  /* FP2FP  */

> +  5, /* GP2FP  */

> +  6, /* FP2GP  */

> +  3, /* FP2FP  */

>  };

>  

>  static const struct cpu_regmove_cost thunderx3t110_regmove_cost =

> @@ -704,15 +704,15 @@ static const struct cpu_vector_cost thunderx2t99_vector_cost =

>    6, /* scalar_fp_stmt_cost  */

>    4, /* scalar_load_cost  */

>    1, /* scalar_store_cost  */

> -  5, /* vec_int_stmt_cost  */

> -  6, /* vec_fp_stmt_cost  */

> +  4, /* vec_int_stmt_cost  */

> +  5, /* vec_fp_stmt_cost  */

>    10, /* vec_permute_cost  */

>    6, /* vec_to_scalar_cost  */

>    5, /* scalar_to_vec_cost  */

> -  8, /* vec_align_load_cost  */

> -  8, /* vec_unalign_load_cost  */

> -  4, /* vec_unalign_store_cost  */

> -  4, /* vec_store_cost  */

> +  4, /* vec_align_load_cost  */

> +  4, /* vec_unalign_load_cost  */

> +  1, /* vec_unalign_store_cost  */

> +  1, /* vec_store_cost  */

>    2, /* cond_taken_branch_cost  */

>    1  /* cond_not_taken_branch_cost  */

>  };
Richard Biener via Gcc-patches July 6, 2020, 5:54 p.m. | #2
I approve of this patch. I'm responsible for GCC for TX2 at Marvell. Andrew Pinski should certainly chime in if he wants.

Joel

´╗┐On 7/6/20, 10:48 AM, "Gcc-patches on behalf of Richard Sandiford" <gcc-patches-bounces@gcc.gnu.org on behalf of richard.sandiford@arm.com> wrote:

    External Email

    ----------------------------------------------------------------------
    Anton Youdkevitch <anton.youdkevitch@bell-sw.com> writes:
    > This patch changes some vector costs for TX2 so that

    > more vectorizations beneficial for TX2 chip can happen.

    >

    > The new cost model makes the x264 benchmark of CPU2017

    > 7% faster with no negative performance impact on other

    > benchmarks.

    >

    > Bootstrapped on linux-aarch64

    >

    > 	2020-07-06 Anton Youdkevitch <anton.youdkevitch@bell-sw.com>

    > gcc/

    >     * config/aarch64/aarch64.c (thunderx2t99_regmove_cost):

    >     Change instruction cost

    >     (thunderx2t99_vector_cost): Likewise


    OK if Andrew agrees.

    Thanks,
    Richard

    >

    > From 3440e019c05fe5b565041cad549c6eefa2004a2b Mon Sep 17 00:00:00 2001

    > From: Anton Youdkevitch <anton.youdkevitch@bell-sw.com>

    > Date: Tue, 26 May 2020 04:23:04 -0700

    > Subject: [PATCH] Change costs for TX2 to expose more vectorization opportunities

    >

    > Make the costs such that they do not exaclty reflect

    > the actual instructions costs from the manual but make

    > the codegen emit the code we want it to.

    > ---

    >  gcc/config/aarch64/aarch64.c | 18 +++++++++---------

    >  1 file changed, 9 insertions(+), 9 deletions(-)

    >

    > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

    > index e92c7e6..18c2251 100644

    > --- a/gcc/config/aarch64/aarch64.c

    > +++ b/gcc/config/aarch64/aarch64.c

    > @@ -535,9 +535,9 @@ static const struct cpu_regmove_cost thunderx2t99_regmove_cost =

    >  {

    >    1, /* GP2GP  */

    >    /* Avoid the use of int<->fp moves for spilling.  */

    > -  8, /* GP2FP  */

    > -  8, /* FP2GP  */

    > -  4  /* FP2FP  */

    > +  5, /* GP2FP  */

    > +  6, /* FP2GP  */

    > +  3, /* FP2FP  */

    >  };

    >  

    >  static const struct cpu_regmove_cost thunderx3t110_regmove_cost =

    > @@ -704,15 +704,15 @@ static const struct cpu_vector_cost thunderx2t99_vector_cost =

    >    6, /* scalar_fp_stmt_cost  */

    >    4, /* scalar_load_cost  */

    >    1, /* scalar_store_cost  */

    > -  5, /* vec_int_stmt_cost  */

    > -  6, /* vec_fp_stmt_cost  */

    > +  4, /* vec_int_stmt_cost  */

    > +  5, /* vec_fp_stmt_cost  */

    >    10, /* vec_permute_cost  */

    >    6, /* vec_to_scalar_cost  */

    >    5, /* scalar_to_vec_cost  */

    > -  8, /* vec_align_load_cost  */

    > -  8, /* vec_unalign_load_cost  */

    > -  4, /* vec_unalign_store_cost  */

    > -  4, /* vec_store_cost  */

    > +  4, /* vec_align_load_cost  */

    > +  4, /* vec_unalign_load_cost  */

    > +  1, /* vec_unalign_store_cost  */

    > +  1, /* vec_store_cost  */

    >    2, /* cond_taken_branch_cost  */

    >    1  /* cond_not_taken_branch_cost  */

    >  };
Richard Sandiford July 6, 2020, 6:04 p.m. | #3
Joel Jones <joelj@marvell.com> writes:
> I approve of this patch. I'm responsible for GCC for TX2 at Marvell. Andrew Pinski should certainly chime in if he wants.


Ah, in that case, the patch is OK.

Thanks,
Richard
Anton Youdkevitch July 7, 2020, 3:06 p.m. | #4
As I don't have the commit privilege, if this is a sufficient approval
can someone commit it for me?

-- 
   Thanks,
   Anton


On 06.7.2020 21:04 , Richard Sandiford wrote:
> Joel Jones <joelj@marvell.com> writes:

>> I approve of this patch. I'm responsible for GCC for TX2 at Marvell. Andrew Pinski should certainly chime in if he wants.

> Ah, in that case, the patch is OK.

>

> Thanks,

> Richard
Anton Youdkevitch July 9, 2020, 3:27 p.m. | #5
Richard,

Can you approve the backporting of the patch to GCC10?
Also, since I don't have the commit permission can you push
it if approved?

-- 
   Thanks,
   Anton

On 06.7.2020 21:04 , Richard Sandiford wrote:
> Joel Jones <joelj@marvell.com> writes:

>> I approve of this patch. I'm responsible for GCC for TX2 at Marvell. Andrew Pinski should certainly chime in if he wants.

> Ah, in that case, the patch is OK.

>

> Thanks,

> Richard
Richard Sandiford July 10, 2020, 6:14 p.m. | #6
Anton Youdkevitch <anton.youdkevitch@bell-sw.com> writes:
> Richard,

>

> Can you approve the backporting of the patch to GCC10?

> Also, since I don't have the commit permission can you push

> it if approved?


Yeah, that's fine.  Now pushed there too.

Thanks,
Richard

Patch

From 3440e019c05fe5b565041cad549c6eefa2004a2b Mon Sep 17 00:00:00 2001
From: Anton Youdkevitch <anton.youdkevitch@bell-sw.com>
Date: Tue, 26 May 2020 04:23:04 -0700
Subject: [PATCH] Change costs for TX2 to expose more vectorization opportunities

Make the costs such that they do not exaclty reflect
the actual instructions costs from the manual but make
the codegen emit the code we want it to.
---
 gcc/config/aarch64/aarch64.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e92c7e6..18c2251 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -535,9 +535,9 @@  static const struct cpu_regmove_cost thunderx2t99_regmove_cost =
 {
   1, /* GP2GP  */
   /* Avoid the use of int<->fp moves for spilling.  */
-  8, /* GP2FP  */
-  8, /* FP2GP  */
-  4  /* FP2FP  */
+  5, /* GP2FP  */
+  6, /* FP2GP  */
+  3, /* FP2FP  */
 };
 
 static const struct cpu_regmove_cost thunderx3t110_regmove_cost =
@@ -704,15 +704,15 @@  static const struct cpu_vector_cost thunderx2t99_vector_cost =
   6, /* scalar_fp_stmt_cost  */
   4, /* scalar_load_cost  */
   1, /* scalar_store_cost  */
-  5, /* vec_int_stmt_cost  */
-  6, /* vec_fp_stmt_cost  */
+  4, /* vec_int_stmt_cost  */
+  5, /* vec_fp_stmt_cost  */
   10, /* vec_permute_cost  */
   6, /* vec_to_scalar_cost  */
   5, /* scalar_to_vec_cost  */
-  8, /* vec_align_load_cost  */
-  8, /* vec_unalign_load_cost  */
-  4, /* vec_unalign_store_cost  */
-  4, /* vec_store_cost  */
+  4, /* vec_align_load_cost  */
+  4, /* vec_unalign_load_cost  */
+  1, /* vec_unalign_store_cost  */
+  1, /* vec_store_cost  */
   2, /* cond_taken_branch_cost  */
   1  /* cond_not_taken_branch_cost  */
 };
-- 
2.7.4