[2/2,v2] PPC64: Add libmvec SIMD double-precision power function.

Message ID 20190626145245.6571-1-shawn@git.icu
State New
Headers show
Series
  • Untitled series #14554
Related show

Commit Message

Shawn Landden June 26, 2019, 2:52 p.m.
Based off the ./sysdeps/ieee754/dbl-64/pow.c implementation,
and provides identical results.

Unlike other libmvec functions, this sets the underflow and overflow bits.
The caller can check these flags, and possibly re-run the calculations with
scalar pow to figure out what is causing the overflow or underflow.

I may have not normalized the data for benchmarking this properly,
but operating only on integers betwee 0-2^32 and floats between 0.5 and 1 I get the following:

Running 20 times over 32MiB
vector: mean 535.824919 (sd 0.246088)
scalar: mean 286.384220 (sd 0.027630)

Which is a very impressive speed boost.

v2: Allow 1 ulp error, which matches non-SIMD version, and makes this pass all tests.

2019-05-11  Shawn Landden  <shawn@git.icu>

        [BZ #24210]
        * NEWS: Noted the addition of PPC64 vector pow function.
        * sysdeps/powerpc/bits/math-vector.h: Added entry for vector pow.
        * sysdeps/powerpc/powerpc64/fpu/Versions: Added vector pow entry.
        * sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile:
        (libmvec-sysdep_routines, CFLAGS-vec_d_pow2_vsx.c):
        (CFLAGS-d_pow_log2_data.c): Added build of VSX SIMD pow
        function and tests.
        * sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c:
        Added entry for vector pow.
        * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_pow2_vsx.c: New file.
        * sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c: Likewise.
        * sysdeps/powerpc/powerpc64/fpu/multiarch/d_powf_log2_data.c: Likewise.
        * sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist: SIMD pow
        added.
---
 sysdeps/powerpc/bits/math-vector.h            |   2 +
 sysdeps/powerpc/fpu/libm-test-ulps            |   3 +
 sysdeps/powerpc/powerpc64/fpu/Versions        |   2 +-
 .../powerpc/powerpc64/fpu/multiarch/Makefile  |   4 +-
 .../powerpc64/fpu/multiarch/math_config_dbl.h |  28 +-
 .../powerpc64/fpu/multiarch/s_pow_log2_data.c | 197 ++++++++
 .../multiarch/test-double-vlen2-wrappers.c    |   1 +
 .../powerpc64/fpu/multiarch/vec_d_pow2_vsx.c  | 429 ++++++++++++++++++
 .../powerpc64/fpu/multiarch/vec_math_err.c    |  22 +-
 .../linux/powerpc/powerpc64/libmvec.abilist   |   1 +
 10 files changed, 678 insertions(+), 11 deletions(-)
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/s_pow_log2_data.c
 create mode 100644 sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_pow2_vsx.c

-- 
2.20.1

Comments

Tulio Magno Quites Machado Filho July 3, 2019, 9:20 p.m. | #1
Shawn Landden <shawn@git.icu> writes:

> Based off the ./sysdeps/ieee754/dbl-64/pow.c implementation,

> and provides identical results.

>

> Unlike other libmvec functions, this sets the underflow and overflow bits.

> The caller can check these flags, and possibly re-run the calculations with

> scalar pow to figure out what is causing the overflow or underflow.

>

> I may have not normalized the data for benchmarking this properly,

> but operating only on integers betwee 0-2^32 and floats between 0.5 and 1 I get the following:

>

> Running 20 times over 32MiB

> vector: mean 535.824919 (sd 0.246088)

> scalar: mean 286.384220 (sd 0.027630)

>

> Which is a very impressive speed boost.


This patch looks good to me with some fixes that I'm listing here.
I'm going to merge it as soon as we fix the issues with the single precision
implementation.  So, I don't think we need a v3 here.

> diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile

> index 25d29b9a4a..43fa8505b8 100644

> --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile

> +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile

> @@ -52,6 +52,7 @@ libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx \

>                             vec_s_powf4_vsx s_powf_log2_data \

>  			   vec_math_errf vec_math_err \

>  			   vec_d_exp2_vsx vec_d_exp_data \

> +                           vec_d_pow2_vsx s_pow_log2_data \


Replace 8 spaces with tabs.
Fixed.

>  			   vec_d_sincos2_vsx vec_s_sincosf4_vsx

>  CFLAGS-vec_d_cos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector

>  CFLAGS-vec_d_log2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector

> @@ -69,6 +70,7 @@ CFLAGS-vec_s_exp2f_data.c += -mabi=altivec -maltivec -mvsx

>  CFLAGS-vec_d_exp2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector

>  CFLAGS-vec_d_exp_data.c += -mabi=altivec -maltivec -mvsx

>  CFLAGS-vec_s_powf4_vsx.c += -mabi=altivec -maltivec -mvsx

> +CFLAGS-vec_s_pow2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector


s/vec_s/vec_d/
Fixed.

s_pow_log2_data.c includes altivec.h indirectly, which also requires to use:

    CFLAGS-s_pow_log2_data.c += -mabi=altivec -maltivec -mvsx

Fixed.

> diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c

> index 7162d06e0c..f3f351093b 100644

> --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c

> +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c

> @@ -1,5 +1,5 @@

> -/* Double-precision math error handling.

> -   Copyright (C) 2019 Free Software Foundation, Inc.

> +/* Single-precision math error handling.

> +   Copyright (C) 2017-2019 Free Software Foundation, Inc.


I think this change got into the patch by mistake.  Please, let me know if I'm
mistaken.
Removed.

> @@ -38,3 +36,15 @@ __math_oflow (uint32_t sign)

>  {

>    return xflow (sign, 0x1p769);

>  }

> +

> +

> +double __math_invalid(double x)

> +{

> +  return (x - x) / (x - x);

> +}

> +

> +double __math_divzero(uint32_t sign)

> +{

> +  return (double)(sign ? -1.0 : 1.0) / 0.0;

> +}

> +


Missing attribute_hidden.
Fixed.

-- 
Tulio Magno

Patch

diff --git a/sysdeps/powerpc/bits/math-vector.h b/sysdeps/powerpc/bits/math-vector.h
index 5709efbae0..ce10dc4bb7 100644
--- a/sysdeps/powerpc/bits/math-vector.h
+++ b/sysdeps/powerpc/bits/math-vector.h
@@ -50,6 +50,8 @@ 
 #  define __DECL_SIMD_exp __DECL_SIMD_PPC64
 #  undef __DECL_SIMD_powf
 #  define __DECL_SIMD_powf __DECL_SIMD_PPC64
+#  undef __DECL_SIMD_pow
+#  define __DECL_SIMD_pow __DECL_SIMD_PPC64
 
 # endif
 #endif
diff --git a/sysdeps/powerpc/fpu/libm-test-ulps b/sysdeps/powerpc/fpu/libm-test-ulps
index 35b13ea374..a28982a2ed 100644
--- a/sysdeps/powerpc/fpu/libm-test-ulps
+++ b/sysdeps/powerpc/fpu/libm-test-ulps
@@ -2533,6 +2533,9 @@  ifloat128: 2
 ildouble: 1
 ldouble: 1
 
+Function: "pow_vlen2":
+double: 1
+
 Function: "sin":
 double: 1
 float: 1
diff --git a/sysdeps/powerpc/powerpc64/fpu/Versions b/sysdeps/powerpc/powerpc64/fpu/Versions
index 225f7c8475..8cb0176c6d 100644
--- a/sysdeps/powerpc/powerpc64/fpu/Versions
+++ b/sysdeps/powerpc/powerpc64/fpu/Versions
@@ -2,6 +2,6 @@  libmvec {
   GLIBC_2.30 {
     _ZGVbN2v_cos; _ZGVbN4v_cosf; _ZGVbN2v_sin; _ZGVbN4v_sinf;
     _ZGVbN2v_log; _ZGVbN4v_logf; _ZGVbN2vvv_sincos; _ZGVbN4vvv_sincosf;
-    _ZGVbN4v_expf; _ZGVbN2v_exp; _ZGVbN4vv_powf;
+    _ZGVbN4v_expf; _ZGVbN2v_exp; _ZGVbN4vv_powf; _ZGVbN2vv_pow;
   }
 }
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
index 25d29b9a4a..43fa8505b8 100644
--- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
@@ -52,6 +52,7 @@  libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx \
                            vec_s_powf4_vsx s_powf_log2_data \
 			   vec_math_errf vec_math_err \
 			   vec_d_exp2_vsx vec_d_exp_data \
+                           vec_d_pow2_vsx s_pow_log2_data \
 			   vec_d_sincos2_vsx vec_s_sincosf4_vsx
 CFLAGS-vec_d_cos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
 CFLAGS-vec_d_log2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
@@ -69,6 +70,7 @@  CFLAGS-vec_s_exp2f_data.c += -mabi=altivec -maltivec -mvsx
 CFLAGS-vec_d_exp2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
 CFLAGS-vec_d_exp_data.c += -mabi=altivec -maltivec -mvsx
 CFLAGS-vec_s_powf4_vsx.c += -mabi=altivec -maltivec -mvsx
+CFLAGS-vec_s_pow2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
 CFLAGS-vec_math_err.c += -mabi=altivec -maltivec -mvsx
 endif
 
@@ -77,7 +79,7 @@  ifeq ($(subdir),math)
 ifeq ($(build-mathvec),yes)
 libmvec-tests += double-vlen2 float-vlen4
 
-double-vlen2-funcs = cos sin sincos log exp
+double-vlen2-funcs = cos sin sincos log exp pow
 float-vlen4-funcs = cos sin sincos log exp pow
 
 double-vlen2-arch-ext-cflags = -mabi=altivec -maltivec -mvsx -DREQUIRE_VSX
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_dbl.h b/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_dbl.h
index 7c94f5b9b2..b42cd5953a 100644
--- a/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_dbl.h
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/math_config_dbl.h
@@ -70,6 +70,17 @@  asuint64 (double f)
   return u.i;
 }
 
+static inline vector unsigned long long
+vasuint64 (vector double f)
+{
+  union
+  {
+    vector double f;
+    vector unsigned long long i;
+  } u = {f};
+  return u.i;
+}
+
 static inline double
 asdouble (uint64_t i)
 {
@@ -81,6 +92,17 @@  asdouble (uint64_t i)
   return u.f;
 }
 
+static inline vector double
+vasdouble (vector unsigned long long i)
+{
+  union
+  {
+    vector unsigned long long i;
+    vector double f;
+  } u = {i};
+  return u.f;
+}
+
 static inline int
 issignaling_inline (double x)
 {
@@ -179,9 +201,9 @@  extern const struct log2_data
 #define POW_LOG_POLY_ORDER 8
 extern const struct pow_log_data
 {
-  double ln2hi;
-  double ln2lo;
-  double poly[POW_LOG_POLY_ORDER - 1]; /* First coefficient is 1.  */
+  vector double ln2hi;
+  vector double ln2lo;
+  vector double poly[POW_LOG_POLY_ORDER - 1]; /* First coefficient is 1.  */
   /* Note: the pad field is unused, but allows slightly faster indexing.  */
   /* See e_pow_log_data.c for details.  */
   struct {double invc, pad, logc, logctail;} tab[1 << POW_LOG_TABLE_BITS];
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/s_pow_log2_data.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/s_pow_log2_data.c
new file mode 100644
index 0000000000..0985025423
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/s_pow_log2_data.c
@@ -0,0 +1,197 @@ 
+/* Data for the log part of pow, adjusted for libmvec.
+   Copyright (C) 2018-2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+/* Based on sysdeps/ieee754/dbl-64/e_pow_log_data.c,
+   which originally came from Szabolcs Nagy at ARM Ltd. */
+#include "math_config_dbl.h"
+
+#define N (1 << POW_LOG_TABLE_BITS)
+
+const struct pow_log_data __pow_log_data = {
+.ln2hi = {0x1.62e42fefa3800p-1, 0x1.62e42fefa3800p-1},
+.ln2lo = {0x1.ef35793c76730p-45, 0x1.ef35793c76730p-45},
+.poly = {
+#if N == 128 && POW_LOG_POLY_ORDER == 8
+// relative error: 0x1.11922ap-70
+// in -0x1.6bp-8 0x1.6bp-8
+// Coefficients are scaled to match the scaling during evaluation.
+{-0x1p-1, -0x1p-1},
+{0x1.555555555556p-2 * -2, 0x1.555555555556p-2 * -2},
+{-0x1.0000000000006p-2 * -2, -0x1.0000000000006p-2 * -2},
+{0x1.999999959554ep-3 * 4, 0x1.999999959554ep-3 * 4},
+{-0x1.555555529a47ap-3 * 4, -0x1.555555529a47ap-3 * 4},
+{0x1.2495b9b4845e9p-3 * -8, 0x1.2495b9b4845e9p-3 * -8},
+{-0x1.0002b8b263fc3p-3 * -8, -0x1.0002b8b263fc3p-3 * -8},
+#endif
+},
+/* Algorithm:
+
+	x = 2^k z
+	log(x) = k ln2 + log(c) + log(z/c)
+	log(z/c) = poly(z/c - 1)
+
+where z is in [0x1.69555p-1; 0x1.69555p0] which is split into N subintervals
+and z falls into the ith one, then table entries are computed as
+
+	tab[i].invc = 1/c
+	tab[i].logc = round(0x1p43*log(c))/0x1p43
+	tab[i].logctail = (double)(log(c) - logc)
+
+where c is chosen near the center of the subinterval such that 1/c has only a
+few precision bits so z/c - 1 is exactly representible as double:
+
+	1/c = center < 1 ? round(N/center)/N : round(2*N/center)/N/2
+
+Note: |z/c - 1| < 1/N for the chosen c, |log(c) - logc - logctail| < 0x1p-97,
+the last few bits of logc are rounded away so k*ln2hi + logc has no rounding
+error and the interval for z is selected such that near x == 1, where log(x)
+is tiny, large cancellation error is avoided in logc + poly(z/c - 1).  */
+.tab = {
+#if N == 128
+#define A(a,b,c) {a,0,b,c},
+A(0x1.6a00000000000p+0, -0x1.62c82f2b9c800p-2, 0x1.ab42428375680p-48)
+A(0x1.6800000000000p+0, -0x1.5d1bdbf580800p-2, -0x1.ca508d8e0f720p-46)
+A(0x1.6600000000000p+0, -0x1.5767717455800p-2, -0x1.362a4d5b6506dp-45)
+A(0x1.6400000000000p+0, -0x1.51aad872df800p-2, -0x1.684e49eb067d5p-49)
+A(0x1.6200000000000p+0, -0x1.4be5f95777800p-2, -0x1.41b6993293ee0p-47)
+A(0x1.6000000000000p+0, -0x1.4618bc21c6000p-2, 0x1.3d82f484c84ccp-46)
+A(0x1.5e00000000000p+0, -0x1.404308686a800p-2, 0x1.c42f3ed820b3ap-50)
+A(0x1.5c00000000000p+0, -0x1.3a64c55694800p-2, 0x1.0b1c686519460p-45)
+A(0x1.5a00000000000p+0, -0x1.347dd9a988000p-2, 0x1.5594dd4c58092p-45)
+A(0x1.5800000000000p+0, -0x1.2e8e2bae12000p-2, 0x1.67b1e99b72bd8p-45)
+A(0x1.5600000000000p+0, -0x1.2895a13de8800p-2, 0x1.5ca14b6cfb03fp-46)
+A(0x1.5600000000000p+0, -0x1.2895a13de8800p-2, 0x1.5ca14b6cfb03fp-46)
+A(0x1.5400000000000p+0, -0x1.22941fbcf7800p-2, -0x1.65a242853da76p-46)
+A(0x1.5200000000000p+0, -0x1.1c898c1699800p-2, -0x1.fafbc68e75404p-46)
+A(0x1.5000000000000p+0, -0x1.1675cababa800p-2, 0x1.f1fc63382a8f0p-46)
+A(0x1.4e00000000000p+0, -0x1.1058bf9ae4800p-2, -0x1.6a8c4fd055a66p-45)
+A(0x1.4c00000000000p+0, -0x1.0a324e2739000p-2, -0x1.c6bee7ef4030ep-47)
+A(0x1.4a00000000000p+0, -0x1.0402594b4d000p-2, -0x1.036b89ef42d7fp-48)
+A(0x1.4a00000000000p+0, -0x1.0402594b4d000p-2, -0x1.036b89ef42d7fp-48)
+A(0x1.4800000000000p+0, -0x1.fb9186d5e4000p-3, 0x1.d572aab993c87p-47)
+A(0x1.4600000000000p+0, -0x1.ef0adcbdc6000p-3, 0x1.b26b79c86af24p-45)
+A(0x1.4400000000000p+0, -0x1.e27076e2af000p-3, -0x1.72f4f543fff10p-46)
+A(0x1.4200000000000p+0, -0x1.d5c216b4fc000p-3, 0x1.1ba91bbca681bp-45)
+A(0x1.4000000000000p+0, -0x1.c8ff7c79aa000p-3, 0x1.7794f689f8434p-45)
+A(0x1.4000000000000p+0, -0x1.c8ff7c79aa000p-3, 0x1.7794f689f8434p-45)
+A(0x1.3e00000000000p+0, -0x1.bc286742d9000p-3, 0x1.94eb0318bb78fp-46)
+A(0x1.3c00000000000p+0, -0x1.af3c94e80c000p-3, 0x1.a4e633fcd9066p-52)
+A(0x1.3a00000000000p+0, -0x1.a23bc1fe2b000p-3, -0x1.58c64dc46c1eap-45)
+A(0x1.3a00000000000p+0, -0x1.a23bc1fe2b000p-3, -0x1.58c64dc46c1eap-45)
+A(0x1.3800000000000p+0, -0x1.9525a9cf45000p-3, -0x1.ad1d904c1d4e3p-45)
+A(0x1.3600000000000p+0, -0x1.87fa06520d000p-3, 0x1.bbdbf7fdbfa09p-45)
+A(0x1.3400000000000p+0, -0x1.7ab890210e000p-3, 0x1.bdb9072534a58p-45)
+A(0x1.3400000000000p+0, -0x1.7ab890210e000p-3, 0x1.bdb9072534a58p-45)
+A(0x1.3200000000000p+0, -0x1.6d60fe719d000p-3, -0x1.0e46aa3b2e266p-46)
+A(0x1.3000000000000p+0, -0x1.5ff3070a79000p-3, -0x1.e9e439f105039p-46)
+A(0x1.3000000000000p+0, -0x1.5ff3070a79000p-3, -0x1.e9e439f105039p-46)
+A(0x1.2e00000000000p+0, -0x1.526e5e3a1b000p-3, -0x1.0de8b90075b8fp-45)
+A(0x1.2c00000000000p+0, -0x1.44d2b6ccb8000p-3, 0x1.70cc16135783cp-46)
+A(0x1.2c00000000000p+0, -0x1.44d2b6ccb8000p-3, 0x1.70cc16135783cp-46)
+A(0x1.2a00000000000p+0, -0x1.371fc201e9000p-3, 0x1.178864d27543ap-48)
+A(0x1.2800000000000p+0, -0x1.29552f81ff000p-3, -0x1.48d301771c408p-45)
+A(0x1.2600000000000p+0, -0x1.1b72ad52f6000p-3, -0x1.e80a41811a396p-45)
+A(0x1.2600000000000p+0, -0x1.1b72ad52f6000p-3, -0x1.e80a41811a396p-45)
+A(0x1.2400000000000p+0, -0x1.0d77e7cd09000p-3, 0x1.a699688e85bf4p-47)
+A(0x1.2400000000000p+0, -0x1.0d77e7cd09000p-3, 0x1.a699688e85bf4p-47)
+A(0x1.2200000000000p+0, -0x1.fec9131dbe000p-4, -0x1.575545ca333f2p-45)
+A(0x1.2000000000000p+0, -0x1.e27076e2b0000p-4, 0x1.a342c2af0003cp-45)
+A(0x1.2000000000000p+0, -0x1.e27076e2b0000p-4, 0x1.a342c2af0003cp-45)
+A(0x1.1e00000000000p+0, -0x1.c5e548f5bc000p-4, -0x1.d0c57585fbe06p-46)
+A(0x1.1c00000000000p+0, -0x1.a926d3a4ae000p-4, 0x1.53935e85baac8p-45)
+A(0x1.1c00000000000p+0, -0x1.a926d3a4ae000p-4, 0x1.53935e85baac8p-45)
+A(0x1.1a00000000000p+0, -0x1.8c345d631a000p-4, 0x1.37c294d2f5668p-46)
+A(0x1.1a00000000000p+0, -0x1.8c345d631a000p-4, 0x1.37c294d2f5668p-46)
+A(0x1.1800000000000p+0, -0x1.6f0d28ae56000p-4, -0x1.69737c93373dap-45)
+A(0x1.1600000000000p+0, -0x1.51b073f062000p-4, 0x1.f025b61c65e57p-46)
+A(0x1.1600000000000p+0, -0x1.51b073f062000p-4, 0x1.f025b61c65e57p-46)
+A(0x1.1400000000000p+0, -0x1.341d7961be000p-4, 0x1.c5edaccf913dfp-45)
+A(0x1.1400000000000p+0, -0x1.341d7961be000p-4, 0x1.c5edaccf913dfp-45)
+A(0x1.1200000000000p+0, -0x1.16536eea38000p-4, 0x1.47c5e768fa309p-46)
+A(0x1.1000000000000p+0, -0x1.f0a30c0118000p-5, 0x1.d599e83368e91p-45)
+A(0x1.1000000000000p+0, -0x1.f0a30c0118000p-5, 0x1.d599e83368e91p-45)
+A(0x1.0e00000000000p+0, -0x1.b42dd71198000p-5, 0x1.c827ae5d6704cp-46)
+A(0x1.0e00000000000p+0, -0x1.b42dd71198000p-5, 0x1.c827ae5d6704cp-46)
+A(0x1.0c00000000000p+0, -0x1.77458f632c000p-5, -0x1.cfc4634f2a1eep-45)
+A(0x1.0c00000000000p+0, -0x1.77458f632c000p-5, -0x1.cfc4634f2a1eep-45)
+A(0x1.0a00000000000p+0, -0x1.39e87b9fec000p-5, 0x1.502b7f526feaap-48)
+A(0x1.0a00000000000p+0, -0x1.39e87b9fec000p-5, 0x1.502b7f526feaap-48)
+A(0x1.0800000000000p+0, -0x1.f829b0e780000p-6, -0x1.980267c7e09e4p-45)
+A(0x1.0800000000000p+0, -0x1.f829b0e780000p-6, -0x1.980267c7e09e4p-45)
+A(0x1.0600000000000p+0, -0x1.7b91b07d58000p-6, -0x1.88d5493faa639p-45)
+A(0x1.0400000000000p+0, -0x1.fc0a8b0fc0000p-7, -0x1.f1e7cf6d3a69cp-50)
+A(0x1.0400000000000p+0, -0x1.fc0a8b0fc0000p-7, -0x1.f1e7cf6d3a69cp-50)
+A(0x1.0200000000000p+0, -0x1.fe02a6b100000p-8, -0x1.9e23f0dda40e4p-46)
+A(0x1.0200000000000p+0, -0x1.fe02a6b100000p-8, -0x1.9e23f0dda40e4p-46)
+A(0x1.0000000000000p+0, 0x0.0000000000000p+0, 0x0.0000000000000p+0)
+A(0x1.0000000000000p+0, 0x0.0000000000000p+0, 0x0.0000000000000p+0)
+A(0x1.fc00000000000p-1, 0x1.0101575890000p-7, -0x1.0c76b999d2be8p-46)
+A(0x1.f800000000000p-1, 0x1.0205658938000p-6, -0x1.3dc5b06e2f7d2p-45)
+A(0x1.f400000000000p-1, 0x1.8492528c90000p-6, -0x1.aa0ba325a0c34p-45)
+A(0x1.f000000000000p-1, 0x1.0415d89e74000p-5, 0x1.111c05cf1d753p-47)
+A(0x1.ec00000000000p-1, 0x1.466aed42e0000p-5, -0x1.c167375bdfd28p-45)
+A(0x1.e800000000000p-1, 0x1.894aa149fc000p-5, -0x1.97995d05a267dp-46)
+A(0x1.e400000000000p-1, 0x1.ccb73cdddc000p-5, -0x1.a68f247d82807p-46)
+A(0x1.e200000000000p-1, 0x1.eea31c006c000p-5, -0x1.e113e4fc93b7bp-47)
+A(0x1.de00000000000p-1, 0x1.1973bd1466000p-4, -0x1.5325d560d9e9bp-45)
+A(0x1.da00000000000p-1, 0x1.3bdf5a7d1e000p-4, 0x1.cc85ea5db4ed7p-45)
+A(0x1.d600000000000p-1, 0x1.5e95a4d97a000p-4, -0x1.c69063c5d1d1ep-45)
+A(0x1.d400000000000p-1, 0x1.700d30aeac000p-4, 0x1.c1e8da99ded32p-49)
+A(0x1.d000000000000p-1, 0x1.9335e5d594000p-4, 0x1.3115c3abd47dap-45)
+A(0x1.cc00000000000p-1, 0x1.b6ac88dad6000p-4, -0x1.390802bf768e5p-46)
+A(0x1.ca00000000000p-1, 0x1.c885801bc4000p-4, 0x1.646d1c65aacd3p-45)
+A(0x1.c600000000000p-1, 0x1.ec739830a2000p-4, -0x1.dc068afe645e0p-45)
+A(0x1.c400000000000p-1, 0x1.fe89139dbe000p-4, -0x1.534d64fa10afdp-45)
+A(0x1.c000000000000p-1, 0x1.1178e8227e000p-3, 0x1.1ef78ce2d07f2p-45)
+A(0x1.be00000000000p-1, 0x1.1aa2b7e23f000p-3, 0x1.ca78e44389934p-45)
+A(0x1.ba00000000000p-1, 0x1.2d1610c868000p-3, 0x1.39d6ccb81b4a1p-47)
+A(0x1.b800000000000p-1, 0x1.365fcb0159000p-3, 0x1.62fa8234b7289p-51)
+A(0x1.b400000000000p-1, 0x1.4913d8333b000p-3, 0x1.5837954fdb678p-45)
+A(0x1.b200000000000p-1, 0x1.527e5e4a1b000p-3, 0x1.633e8e5697dc7p-45)
+A(0x1.ae00000000000p-1, 0x1.6574ebe8c1000p-3, 0x1.9cf8b2c3c2e78p-46)
+A(0x1.ac00000000000p-1, 0x1.6f0128b757000p-3, -0x1.5118de59c21e1p-45)
+A(0x1.aa00000000000p-1, 0x1.7898d85445000p-3, -0x1.c661070914305p-46)
+A(0x1.a600000000000p-1, 0x1.8beafeb390000p-3, -0x1.73d54aae92cd1p-47)
+A(0x1.a400000000000p-1, 0x1.95a5adcf70000p-3, 0x1.7f22858a0ff6fp-47)
+A(0x1.a000000000000p-1, 0x1.a93ed3c8ae000p-3, -0x1.8724350562169p-45)
+A(0x1.9e00000000000p-1, 0x1.b31d8575bd000p-3, -0x1.c358d4eace1aap-47)
+A(0x1.9c00000000000p-1, 0x1.bd087383be000p-3, -0x1.d4bc4595412b6p-45)
+A(0x1.9a00000000000p-1, 0x1.c6ffbc6f01000p-3, -0x1.1ec72c5962bd2p-48)
+A(0x1.9600000000000p-1, 0x1.db13db0d49000p-3, -0x1.aff2af715b035p-45)
+A(0x1.9400000000000p-1, 0x1.e530effe71000p-3, 0x1.212276041f430p-51)
+A(0x1.9200000000000p-1, 0x1.ef5ade4dd0000p-3, -0x1.a211565bb8e11p-51)
+A(0x1.9000000000000p-1, 0x1.f991c6cb3b000p-3, 0x1.bcbecca0cdf30p-46)
+A(0x1.8c00000000000p-1, 0x1.07138604d5800p-2, 0x1.89cdb16ed4e91p-48)
+A(0x1.8a00000000000p-1, 0x1.0c42d67616000p-2, 0x1.7188b163ceae9p-45)
+A(0x1.8800000000000p-1, 0x1.1178e8227e800p-2, -0x1.c210e63a5f01cp-45)
+A(0x1.8600000000000p-1, 0x1.16b5ccbacf800p-2, 0x1.b9acdf7a51681p-45)
+A(0x1.8400000000000p-1, 0x1.1bf99635a6800p-2, 0x1.ca6ed5147bdb7p-45)
+A(0x1.8200000000000p-1, 0x1.214456d0eb800p-2, 0x1.a87deba46baeap-47)
+A(0x1.7e00000000000p-1, 0x1.2bef07cdc9000p-2, 0x1.a9cfa4a5004f4p-45)
+A(0x1.7c00000000000p-1, 0x1.314f1e1d36000p-2, -0x1.8e27ad3213cb8p-45)
+A(0x1.7a00000000000p-1, 0x1.36b6776be1000p-2, 0x1.16ecdb0f177c8p-46)
+A(0x1.7800000000000p-1, 0x1.3c25277333000p-2, 0x1.83b54b606bd5cp-46)
+A(0x1.7600000000000p-1, 0x1.419b423d5e800p-2, 0x1.8e436ec90e09dp-47)
+A(0x1.7400000000000p-1, 0x1.4718dc271c800p-2, -0x1.f27ce0967d675p-45)
+A(0x1.7200000000000p-1, 0x1.4c9e09e173000p-2, -0x1.e20891b0ad8a4p-45)
+A(0x1.7000000000000p-1, 0x1.522ae0738a000p-2, 0x1.ebe708164c759p-45)
+A(0x1.6e00000000000p-1, 0x1.57bf753c8d000p-2, 0x1.fadedee5d40efp-46)
+A(0x1.6c00000000000p-1, 0x1.5d5bddf596000p-2, -0x1.a0b2a08a465dcp-47)
+#endif
+},
+};
+
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c
index f368e47947..fd10f16a1f 100644
--- a/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/test-double-vlen2-wrappers.c
@@ -25,4 +25,5 @@  VECTOR_WRAPPER (WRAPPER_NAME (cos), _ZGVbN2v_cos)
 VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVbN2v_sin)
 VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVbN2v_log)
 VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVbN2v_exp)
+VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVbN2vv_pow)
 VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincos), _ZGVbN2vvv_sincos)
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_pow2_vsx.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_pow2_vsx.c
new file mode 100644
index 0000000000..9ee0e77456
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_d_pow2_vsx.c
@@ -0,0 +1,429 @@ 
+/* Double-precision vector pow function.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+/* Based on sysdeps/ieee754/dbl-64/e_pow.c which came from
+   Szabolcs Nagy at ARM Ltd. */
+#include <altivec.h>
+#include <math.h>
+#include <stdbool.h>
+#include <math-barriers.h>
+
+#include "math_config_dbl.h"
+
+typedef vector long long unsigned v64u;
+
+/*
+Worst-case error: 0.54 ULP (~= ulperr_exp + 1024*Ln2*relerr_log*2^53)
+relerr_log: 1.3 * 2^-68 (Relative error of log, 1.5 * 2^-68 without fma)
+ulperr_exp: 0.509 ULP (ULP error of exp, 0.511 ULP without fma)
+*/
+
+#define T __pow_log_data.tab
+#define A __pow_log_data.poly
+#define Ln2hi __pow_log_data.ln2hi
+#define Ln2lo __pow_log_data.ln2lo
+#define N (1 << POW_LOG_TABLE_BITS)
+#define OFF 0x3fe6955500000000
+
+/* Compute y+TAIL = log(x) where the rounded result is y and TAIL has about
+   additional 15 bits precision.  IX is the bit representation of x, but
+   normalized in the subnormal range using the sign bit for the exponent.  */
+struct two_v_doubles {
+  vector double y;
+  vector double tail;
+};
+static inline struct two_v_doubles log_inline(v64u ix)
+{
+  /* double_t for better performance on targets with FLT_EVAL_METHOD==2.  */
+  vector double z, r, y, kd, hi, t1, t2, lo, lo1, lo2, p;
+  v64u iz, tmp, i;
+  vector signed long long k;
+
+  /* x = 2^k z; where z is in range [OFF,2*OFF) and exact.
+     The range is split into N subintervals.
+     The ith subinterval contains z and c is near its center.  */
+  tmp = ix - OFF;
+  i = (tmp >> (52 - POW_LOG_TABLE_BITS)) % N;
+  k = ((vector signed long long)tmp >> 52); /* arithmetic shift */
+  iz = ix - (tmp & 0xfffULL << 52);
+  z = vasdouble(iz);
+  // we don't use __builtin_vectorconvert as we want to support old gcc
+  kd[0] = (double)k[0];
+  kd[1] = (double)k[1];
+
+  /* log(x) = k*Ln2 + log(c) + log1p(z/c-1).  */
+  vector double invc = {T[i[0]].invc, T[i[1]].invc};
+  vector double logc = {T[i[0]].logc, T[i[1]].logc};
+  vector double logctail = {T[i[0]].logctail, T[i[1]].logctail};
+
+  /* Note: 1/c is j/N or j/N/2 where j is an integer in [N,2N) and
+     |z/c - 1| < 1/N, so r = z/c - 1 is exactly representible.  */
+  // This is so that the results are identical to the non-SIMD version
+#if __FP_FAST_FMA
+  vector double negone = {-1.0, -1.0};
+  r = vec_madd(z, invc, negone);
+#else
+  vector double zhi = vasdouble((iz + (1ULL << 31)) & (-1ULL << 32));
+  vector double zlo = z - zhi;
+  vector double rhi = zhi * invc - 1.0;
+  vector double rlo = zlo * invc;
+  r = rhi + rlo;
+#endif
+  /* k*Ln2 + log(c) + r.  */
+  t1 = kd * Ln2hi + logc;
+  t2 = t1 + r;
+  lo1 = kd * Ln2lo + logctail;
+  lo2 = t1 - t2 + r;
+
+  /* Evaluation is optimized assuming superscalar pipelined execution.  */
+  vector double ar, ar2, ar3, lo3, lo4;
+  ar = A[0] * r; /* A[0] = -0.5.  */
+  ar2 = r * ar;
+  ar3 = r * ar2;
+  /* k*Ln2 + log(c) + r + A[0]*r*r.  */
+#if __FP_FAST_FMA
+  hi = t2 + ar2;
+  lo3 = vec_madd(ar, r, -ar2);
+  lo4 = t2 - hi + ar2;
+#else
+  vector double arhi = A[0] * rhi;
+  vector double arhi2 = rhi * arhi;
+  hi = t2 + arhi2;
+  lo3 = rlo * (ar + arhi);
+  lo4 = t2 - hi + arhi2;
+#endif
+  /* p = log1p(r) - r - A[0]*r*r.  */
+  p = (ar3 * (A[1] + r * A[2] +
+    ar2 * (A[3] + r * A[4] + ar2 * (A[5] + r * A[6]))));
+  lo = lo1 + lo2 + lo3 + lo4 + p;
+  y = hi + lo;
+  struct two_v_doubles ret;
+  ret.tail = hi - y + lo;
+  ret.y = y;
+  return ret;
+}
+
+
+/* Handle cases that may overflow or underflow when computing the result that
+   is scale*(1+TMP) without intermediate rounding.  The bit representation of
+   scale is in SBITS, however it has a computed exponent that may have
+   overflown into the sign bit so that needs to be adjusted before using it as
+   a double.  (int32_t)KI is the k used in the argument reduction and exponent
+   adjustment of scale, positive k here means the result may overflow and
+   negative k means the result may underflow.  */
+static inline double
+specialcase (double_t tmp, uint64_t sbits, uint64_t ki)
+{
+  double_t scale, y;
+
+  if ((ki & 0x80000000) == 0)
+    {
+      /* k > 0, the exponent of scale might have overflowed by <= 460.  */
+      sbits -= 1009ull << 52;
+      scale = asdouble (sbits);
+      y = 0x1p1009 * (scale + scale * tmp);
+      return y;
+    }
+  /* k < 0, need special care in the subnormal range.  */
+  sbits += 1022ull << 52;
+  /* Note: sbits is signed scale.  */
+  scale = asdouble (sbits);
+  y = scale + scale * tmp;
+  if (fabs (y) < 1.0)
+    {
+      /* Round y to the right precision before scaling it into the subnormal
+	 range to avoid double rounding that can cause 0.5+E/2 ulp error where
+	 E is the worst-case ulp error outside the subnormal range.  So this
+	 is only useful if the goal is better than 1 ulp worst-case error.  */
+      double_t hi, lo, one = 1.0;
+      if (y < 0.0)
+	one = -1.0;
+      lo = scale - y + scale * tmp;
+      hi = one + y;
+      lo = one - hi + y + lo;
+      y = math_narrow_eval (hi + lo) - one;
+      /* Fix the sign of 0.  */
+      if (y == 0.0)
+	y = asdouble (sbits & 0x8000000000000000);
+      /* The underflow exception needs to be signaled explicitly.  */
+      math_force_eval (math_opt_barrier (0x1p-1022) * 0x1p-1022);
+    }
+  y = 0x1p-1022 * y;
+  return y;
+}
+
+#undef N
+#undef T
+#define N (1 << EXP_TABLE_BITS)
+#define InvLn2N __exp_data.invln2N
+#define NegLn2hiN __exp_data.negln2hiN
+#define NegLn2loN __exp_data.negln2loN
+#define Shift __exp_data.shift
+#define T __exp_data.tab
+#define C2 __exp_data.poly[5 - EXP_POLY_ORDER]
+#define C3 __exp_data.poly[6 - EXP_POLY_ORDER]
+#define C4 __exp_data.poly[7 - EXP_POLY_ORDER]
+#define C5 __exp_data.poly[8 - EXP_POLY_ORDER]
+#define C6 __exp_data.poly[9 - EXP_POLY_ORDER]
+
+#define SIGN_BIAS (0x800 << EXP_TABLE_BITS)
+
+/* Computes sign*exp(x+xtail) where |xtail| < 2^-8/N and |xtail| <= |x|.
+   The sign_bias argument is SIGN_BIAS or 0 and sets the sign to -1 or 1.  */
+static inline
+vector double
+exp_inline(vector double x, vector double xtail, v64u sign_bias)
+{
+  v64u zero = {0, 0};
+  v64u ki, idx, top, sbits, res, res_mask = zero;
+  /* double_t for better performance on targets with FLT_EVAL_METHOD==2.  */
+  vector double kd, z, r, r2, scale, tmp;
+
+  v64u abstop = vasuint64 (x) & 0x7ff0000000000000;
+  v64u comp_one = (v64u)vec_cmpge ((abstop - asuint64 (0x1p-54)) & 0xfff0000000000000,
+    zero + ((asuint64 (512.0) & 0xfff0000000000000) - (asuint64 (0x1p-54) & 0xfff0000000000000)));
+  if (!vec_all_eq (comp_one, zero))
+    {
+      v64u comp_tiny = (v64u)vec_cmpge (zero + asuint64(0x1p-54), abstop);
+      if (!vec_all_eq (comp_tiny, zero))
+	{
+	  /* Avoid spurious underflow for tiny x.  */
+	  /* Note: 0 is common input.  */
+	  vector double one = WANT_ROUNDING ? 1.0 + x + vasdouble(zero) : 1.0 + vasdouble(zero);
+	  res = vasuint64 (vec_sel (-one, one, (v64u)(vec_cmpgt (sign_bias, zero + 1))));
+	}
+      v64u comp_xflow = (v64u)vec_cmpge (abstop, zero + (asuint64(1024.0) & 0xfff0000000000000));
+      comp_xflow &= ~res_mask;
+      if (!vec_all_eq (comp_xflow, zero))
+	{
+	  vector double inf = {INFINITY, INFINITY};
+	  /* Note: inf and nan are already handled.  */
+	  v64u is_uflow = (v64u) vec_cmpne (vasuint64 (x) >> 63, zero);
+	  if (!vec_all_eq (is_uflow, zero))
+	      (void)__math_uflow (0/* this only determines output */);
+	  if (!vec_all_eq (comp_xflow & ~is_uflow, zero))
+	      (void)__math_oflow (0/* this only determines output */);
+	  res = vasuint64(inf);
+	  res = vec_sel (res, vasuint64(-vasdouble(res)), ~vec_cmpeq(sign_bias, zero));
+	  res = vec_sel (res, zero, is_uflow);
+	  res = vec_sel (res, asuint64(-0.0) + zero, is_uflow & ~vec_cmpeq(sign_bias, zero));
+	  res_mask |= comp_xflow;
+	}
+      if (vec_all_eq (res_mask, ~zero))
+	  return vasdouble(res);
+      /* Large x is special cased below.  */
+      abstop = zero;
+    }
+
+  /* exp(x) = 2^(k/N) * exp(r), with exp(r) in [2^(-1/2N),2^(1/2N)].  */
+  /* x = ln2/N*k + r, with int k and r in [-ln2/2N, ln2/2N].  */
+  z = InvLn2N * x;
+  /* z - kd is in [-0.5-2^-16, 0.5] in all rounding modes.  */
+  kd = (vector double)(z + Shift);
+  ki = vasuint64(kd);
+  kd -= Shift;
+  r = x + kd * NegLn2hiN + kd * NegLn2loN;
+  /* The code assumes 2^-200 < |xtail| < 2^-8/N.  */
+  r += xtail;
+  /* 2^(k/N) ~= scale * (1 + tail).  */
+  idx = 2 * (ki % N);
+  top = (ki + sign_bias) << (52 - EXP_TABLE_BITS);
+  // we can't use __builtin_vectorconvert as we want to support old gcc
+  vector double tail = {asdouble(T[idx[0]]), asdouble(T[idx[1]])};
+  /* This is only a valid scale when -1023*N < k < 1024*N.  */
+  v64u Tadd = {T[idx[0] + 1], T[idx[1] + 1]};
+  sbits = Tadd + top;
+  /* exp(x) = 2^(k/N) * exp(r) ~= scale + scale * (tail + exp(r) - 1).  */
+  /* Evaluation is optimized assuming superscalar pipelined execution.  */
+  r2 = r * r;
+  /* Without fma the worst case error is 0.25/N ulp larger.  */
+  /* Worst case error is less than 0.5+1.11/N+(abs poly error * 2^53) ulp.  */
+  tmp = tail + r + r2 * (C2 + r * C3) + r2 * r2 * (C4 + r * C5);
+  v64u is_abstop_zero = (v64u)vec_cmpeq (abstop, zero);
+  is_abstop_zero &= ~res_mask;
+  if (!vec_all_eq(is_abstop_zero, zero))
+    {
+      for (int i=0;i<2;i++)
+	{
+	  if (is_abstop_zero[i] == 0)
+	      continue;
+	  res[i] = asuint64 (specialcase (tmp[i], sbits[i], ki[i]));
+	  res_mask |= is_abstop_zero;
+	}
+      if (vec_all_eq (res_mask, ~zero))
+	  return vasdouble(res);
+    }
+  scale = vasdouble(sbits);
+  /* Note: tmp == 0 or |tmp| > 2^-200 and scale > 2^-739, so there
+     is no spurious underflow here even without fma.  */
+  return vec_sel(scale + scale * tmp, vasdouble(res), res_mask);
+}
+
+/* Returns 0 if not int, 1 if odd int, 2 if even int.  The argument is
+   the bit representation of a non-zero finite floating-point value.  */
+static inline int checkint(uint64_t iy)
+{
+  int e = iy >> 52 & 0x7ff;
+  if (e < 0x3ff)
+      return 0;
+  if (e > 0x3ff + 52)
+      return 2;
+  if (iy & ((1ULL << (0x3ff + 52 - e)) - 1))
+      return 0;
+  if (iy & (1ULL << (0x3ff + 52 - e)))
+      return 1;
+  return 2;
+}
+
+/* Returns 1 if input is the bit representation of 0, infinity or nan.  */
+static inline int
+zeroinfnan (uint64_t i)
+{
+  return 2 * i - 1 >= 2 * asuint64 (INFINITY) - 1;
+}
+
+/* Top 12 bits of a double (sign and exponent bits).  */
+static inline uint32_t
+top12 (double x)
+{
+  return asuint64 (x) >> 52;
+}
+
+static
+double
+mainspecialcase (double x, double y, bool *is_subnormal, uint64_t *sign_bias)
+{
+  uint64_t iy = asuint64 (y);
+  uint64_t ix = asuint64 (x);
+  uint32_t topx = top12 (x);
+  uint32_t topy = top12 (y);
+      /* Note: if |y| > 1075 * ln2 * 2^53 ~= 0x1.749p62 then pow(x,y) = inf/0
+	 and if |y| < 2^-54 / 1075 ~= 0x1.e7b6p-65 then pow(x,y) = +-1.  */
+      /* Special cases: (x < 0x1p-126 or inf or nan) or
+	 (|y| < 0x1p-65 or |y| >= 0x1p63 or nan).  */
+  if (__glibc_unlikely (zeroinfnan (iy)))
+    {
+      if (2 * iy == 0)
+        return issignaling_inline (x) ? x + y : 1.0;
+      if (ix == asuint64 (1.0))
+        return issignaling_inline (y) ? x + y : 1.0;
+      if (2 * ix > 2 * asuint64 (INFINITY)
+          || 2 * iy > 2 * asuint64 (INFINITY))
+        return x + y;
+      if (2 * ix == 2 * asuint64 (1.0))
+        return 1.0;
+      if ((2 * ix < 2 * asuint64 (1.0)) == !(iy >> 63))
+        return 0.0; /* |x|<1 && y==inf or |x|>1 && y==-inf.  */
+      return y * y;
+    }
+  if (__glibc_unlikely (zeroinfnan (ix)))
+    {
+      double_t x2 = x * x;
+      if (ix >> 63 && checkint (iy) == 1)
+        {
+          x2 = -x2;
+          *sign_bias = 1;
+        }
+      if (WANT_ERRNO && 2 * ix == 0 && iy >> 63)
+        return __math_divzero (*sign_bias);
+      /* Without the barrier some versions of clang hoist the 1/x2 and
+         thus division by zero exception can be signaled spuriously.  */
+      return iy >> 63 ? math_opt_barrier (1 / x2) : x2;
+    }
+      /* Here x and y are non-zero finite.  */
+  if (ix >> 63)
+    {
+      /* Finite x < 0.  */
+      int yint = checkint (iy);
+      if (yint == 0)
+        return __math_invalid (x);
+      if (yint == 1)
+        *sign_bias = SIGN_BIAS;
+      ix &= 0x7fffffffffffffff;
+      topx &= 0x7ff;
+    }
+  if ((topy & 0x7ff) - 0x3be >= 0x43e - 0x3be)
+    {
+      /* Note: sign_bias == 0 here because y is not odd.  */
+      if (ix == asuint64 (1.0))
+        return 1.0;
+      if ((topy & 0x7ff) < 0x3be)
+        {
+          /* |y| < 2^-65, x^y ~= 1 + y*log(x).  */
+          if (WANT_ROUNDING)
+	      return ix > asuint64 (1.0) ? 1.0 + y : 1.0 - y;
+          else
+	      return 1.0;
+        }
+      return (ix > asuint64 (1.0)) == (topy < 0x800) ? __math_oflow (0)
+						     : __math_uflow (0);
+    }
+  if (topx == 0)
+    {
+      /* Normalize subnormal x so exponent becomes negative.  */
+      ix = asuint64 (x * 0x1p52);
+      ix &= 0x7fffffffffffffff;
+      ix -= 52ULL << 52;
+    }
+  *is_subnormal = true;
+  return asdouble(ix);
+}
+
+vector double
+_ZGVbN2vv_pow (vector double x, vector double y)
+{
+  v64u zero = {0, 0};
+  v64u sign_bias = zero;
+  v64u res, res_mask = zero;
+
+  v64u is_special1 = (v64u)vec_cmpge (vasuint64(x) - 0x0010000000000000, zero + 0x7ff0000000000000 - 0x0010000000000000);
+  v64u is_special2 = (v64u)vec_cmpge ((vasuint64(y) & 0x7ff0000000000000) - 0x3be0000000000000, zero + 0x43e0000000000000 - 0x3be0000000000000);
+  v64u is_special = is_special1 | is_special2;
+  if (!vec_all_eq (is_special, zero))
+    {
+      for (int i=0;i<2;i++)
+	{
+	  if (is_special[i] == 0)
+	      continue;
+	  bool is_subnormal = false;
+	  double r = mainspecialcase(x[i], y[i], &is_subnormal, &sign_bias[i]);
+	  if (!is_subnormal)
+	    {
+	      res[i] = asuint64(r);
+	      res_mask[i] = 0xffffffffffffffff;
+	    }
+	      else
+	    {
+	      x[i] = r;
+	    }
+	}
+      if (!vec_any_eq (res_mask, zero))
+	  return vasdouble(res);
+  }
+
+  struct two_v_doubles logres = log_inline (vasuint64 (x));
+  vector double ehi, elo;
+#if __FP_FAST_FMA
+  ehi = y * logres.y;
+  elo = y * logres.tail + vec_madd (y, logres.y, -ehi);
+#else
+  vector double yhi = vasdouble(vasuint64(y) & -1ULL << 27);
+  vector double ylo = y - yhi;
+  vector double lhi = vasdouble(vasuint64(logres.y) & -1ULL << 27);
+  vector double llo = logres.y - lhi + logres.tail;
+  ehi = yhi * lhi;
+  elo = ylo * lhi + y * llo; /* |elo| < |ehi| * 2^-25.  */
+#endif
+  return vec_sel (exp_inline (ehi, elo, sign_bias), vasdouble (res), res_mask);
+}
diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c
index 7162d06e0c..f3f351093b 100644
--- a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c
+++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c
@@ -1,5 +1,5 @@ 
-/* Double-precision math error handling.
-   Copyright (C) 2019 Free Software Foundation, Inc.
+/* Single-precision math error handling.
+   Copyright (C) 2017-2019 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -16,14 +16,12 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include <fpu/math-barriers.h>
 #include "math_config_dbl.h"
-
-/* NOINLINE reduces code size.  */
+/* NOINLINE prevents fenv semantics breaking optimizations.  */
 NOINLINE static double
 xflow (uint32_t sign, double y)
 {
-  y = math_opt_barrier (sign ? -y : y) * y;
+  y = (sign ? -y : y) * y;
   return y;
 }
 
@@ -38,3 +36,15 @@  __math_oflow (uint32_t sign)
 {
   return xflow (sign, 0x1p769);
 }
+
+
+double __math_invalid(double x)
+{
+  return (x - x) / (x - x);
+}
+
+double __math_divzero(uint32_t sign)
+{
+  return (double)(sign ? -1.0 : 1.0) / 0.0;
+}
+
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
index eedc6c84de..c28e99ef0b 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/libmvec.abilist
@@ -2,6 +2,7 @@  GLIBC_2.30 _ZGVbN2v_cos F
 GLIBC_2.30 _ZGVbN2v_exp F
 GLIBC_2.30 _ZGVbN2v_log F
 GLIBC_2.30 _ZGVbN2v_sin F
+GLIBC_2.30 _ZGVbN2vv_pow F
 GLIBC_2.30 _ZGVbN2vvv_sincos F
 GLIBC_2.30 _ZGVbN4v_cosf F
 GLIBC_2.30 _ZGVbN4v_expf F