Defer pow (C, x) folding until after vectorization always (PR middle-end/82004)

Message ID 20180219220250.GX5867@tucnak
State New
Headers show
Series
  • Defer pow (C, x) folding until after vectorization always (PR middle-end/82004)
Related show

Commit Message

Jakub Jelinek Feb. 19, 2018, 10:02 p.m.
Hi!

While I've over-simplified the testcase and so this patch doesn't help
the 628.pop2_s miscompare, I still believe it is beneficial to defer this
folding until late for these reasons:
1) if we propagate a constant into the second pow argument too, it will
   be likely more precise than going through the exp (cst * x) way
2) except when C is M_E, pow is fewer operations and thus smaller IL

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-02-19  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/82004
	* match.pd (pow(C,x) -> exp(log(C)*x)): Delay all folding until
	after vectorization.

	* gfortran.dg/pr82004.f90: New test.


	Jakub

Comments

Richard Biener Feb. 20, 2018, 1:23 p.m. | #1
On February 19, 2018 11:02:50 PM GMT+01:00, Jakub Jelinek <jakub@redhat.com> wrote:
>Hi!

>

>While I've over-simplified the testcase and so this patch doesn't help

>the 628.pop2_s miscompare, I still believe it is beneficial to defer

>this

>folding until late for these reasons:

>1) if we propagate a constant into the second pow argument too, it will

>   be likely more precise than going through the exp (cst * x) way

>2) except when C is M_E, pow is fewer operations and thus smaller IL

>

>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK. 

Richard. 

>2018-02-19  Jakub Jelinek  <jakub@redhat.com>

>

>	PR middle-end/82004

>	* match.pd (pow(C,x) -> exp(log(C)*x)): Delay all folding until

>	after vectorization.

>

>	* gfortran.dg/pr82004.f90: New test.

>

>--- gcc/match.pd.jj	2018-02-15 12:15:51.655780636 +0100

>+++ gcc/match.pd	2018-02-19 17:38:06.390763194 +0100

>@@ -4006,7 +4006,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

>   (simplify

>    (pows REAL_CST@0 @1)

>    (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0)

>-	&& real_isfinite (TREE_REAL_CST_PTR (@0)))

>+	&& real_isfinite (TREE_REAL_CST_PTR (@0))

>+	/* As libmvec doesn't have a vectorized exp2, defer optimizing

>+	   the use_exp2 case until after vectorization.  It seems actually

>+	   beneficial for all constants to postpone this until later,

>+	   because exp(log(C)*x), while faster, will have worse precision

>+	   and if x folds into a constant too, that is unnecessary

>+	   pessimization.  */

>+	&& canonicalize_math_after_vectorization_p ())

>     (with {

>        const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);

>        bool use_exp2 = false;

>@@ -4021,10 +4028,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

>      }

>      (if (!use_exp2)

>       (exps (mult (logs @0) @1))

>-      /* As libmvec doesn't have a vectorized exp2, defer optimizing

>-	 this until after vectorization.  */

>-      (if (canonicalize_math_after_vectorization_p ())

>-	(exp2s (mult (log2s @0) @1))))))))

>+      (exp2s (mult (log2s @0) @1)))))))

> 

>  (for sqrts (SQRT)

>       cbrts (CBRT)

>--- gcc/testsuite/gfortran.dg/pr82004.f90.jj	2018-02-19

>17:58:57.435682156 +0100

>+++ gcc/testsuite/gfortran.dg/pr82004.f90	2018-02-19 17:58:34.127684892

>+0100

>@@ -0,0 +1,18 @@

>+! PR middle-end/82004

>+! { dg-do run }

>+! { dg-options "-Ofast" }

>+

>+  integer, parameter :: r8 = selected_real_kind(13), i4 = kind(1)

>+  integer (i4), parameter :: a = 400, b = 2

>+  real (r8), parameter, dimension(b) :: c = (/ .001_r8, 10.00_r8 /)

>+  real (r8) :: d, e, f, g, h

>+  real (r8), parameter :: j &

>+    = 10**(log10(c(1))-(log10(c(b))-log10(c(1)))/real(a))

>+

>+  d = c(1)

>+  e = c(b)

>+  f = (log10(e)-log10(d))/real(a)

>+  g = log10(d) - f

>+  h = 10**(g)

>+  if (h.ne.j) stop 1

>+end

>

>	Jakub

Patch

--- gcc/match.pd.jj	2018-02-15 12:15:51.655780636 +0100
+++ gcc/match.pd	2018-02-19 17:38:06.390763194 +0100
@@ -4006,7 +4006,14 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (simplify
    (pows REAL_CST@0 @1)
    (if (real_compare (GT_EXPR, TREE_REAL_CST_PTR (@0), &dconst0)
-	&& real_isfinite (TREE_REAL_CST_PTR (@0)))
+	&& real_isfinite (TREE_REAL_CST_PTR (@0))
+	/* As libmvec doesn't have a vectorized exp2, defer optimizing
+	   the use_exp2 case until after vectorization.  It seems actually
+	   beneficial for all constants to postpone this until later,
+	   because exp(log(C)*x), while faster, will have worse precision
+	   and if x folds into a constant too, that is unnecessary
+	   pessimization.  */
+	&& canonicalize_math_after_vectorization_p ())
     (with {
        const REAL_VALUE_TYPE *const value = TREE_REAL_CST_PTR (@0);
        bool use_exp2 = false;
@@ -4021,10 +4028,7 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
      }
      (if (!use_exp2)
       (exps (mult (logs @0) @1))
-      /* As libmvec doesn't have a vectorized exp2, defer optimizing
-	 this until after vectorization.  */
-      (if (canonicalize_math_after_vectorization_p ())
-	(exp2s (mult (log2s @0) @1))))))))
+      (exp2s (mult (log2s @0) @1)))))))
 
  (for sqrts (SQRT)
       cbrts (CBRT)
--- gcc/testsuite/gfortran.dg/pr82004.f90.jj	2018-02-19 17:58:57.435682156 +0100
+++ gcc/testsuite/gfortran.dg/pr82004.f90	2018-02-19 17:58:34.127684892 +0100
@@ -0,0 +1,18 @@ 
+! PR middle-end/82004
+! { dg-do run }
+! { dg-options "-Ofast" }
+
+  integer, parameter :: r8 = selected_real_kind(13), i4 = kind(1)
+  integer (i4), parameter :: a = 400, b = 2
+  real (r8), parameter, dimension(b) :: c = (/ .001_r8, 10.00_r8 /)
+  real (r8) :: d, e, f, g, h
+  real (r8), parameter :: j &
+    = 10**(log10(c(1))-(log10(c(b))-log10(c(1)))/real(a))
+
+  d = c(1)
+  e = c(b)
+  f = (log10(e)-log10(d))/real(a)
+  g = log10(d) - f
+  h = 10**(g)
+  if (h.ne.j) stop 1
+end