AArch64+SVE: Add support for unpacked unary ops and BIC

Message ID BA0660D5-C52E-49C6-9A2B-8CF995AE7CA7@arm.com
State New
Headers show
Series
  • AArch64+SVE: Add support for unpacked unary ops and BIC
Related show

Commit Message

Joe Ramsay June 8, 2020, 8:51 a.m.
Hi!

MD patterns extended for unary ops ABS, CLS, CLZ, CNT, NEG and NOT
to support unpacked vectors. Also extended patterns for BIC to
support unpacked vectors where input elements are of the same width.

Tested on x86_64-linux and aarch64-linux hosts.

Thanks,
Joe

gcc/ChangeLog:

2020-05-27  Joe Ramsay  <joe.ramsay@arm.com>

        * config/aarch64/aarch64-sve.md (<optab><mode>2): Add support for unpacked vectors.
        * config/aarch64/aarch64-sve.md (@aarch64_pred_<optab><mode>): Add support for unpacked vectors.
        * config/aarch64/aarch64-sve.md (@cond_<optab><mode>): Add support for unpacked vectors.
        * config/aarch64/aarch64-sve.md (@aarch64_bic<mode>): Enable unpacked BIC.
        * config/aarch64/aarch64-sve-md (*bic<mode>3): Enable unpacked BIC.

gcc/testsuite/ChangeLog:

2020-05-27  Joe Ramsay  <joe.ramsay@arm.com>

        * gcc.target/aarch64/sve/logical_unpacked_abs.c: New test.
        * gcc.target/aarch64/sve/logical_unpacked_bic_1.c: New test.
        * gcc.target/aarch64/sve/logical_unpacked_bic_2.c: New test.
        * gcc.target/aarch64/sve/logical_unpacked_bic_3.c: New test.
        * gcc.target/aarch64/sve/logical_unpacked_bic_4.c: New test.
        * gcc.target/aarch64/sve/logical_unpacked_neg.c: New test.
        * gcc.target/aarch64/sve/logical_unpacked_not.c: New test.
---
gcc/config/aarch64/aarch64-sve.md                  | 48 +++++++++++-----------
.../gcc.target/aarch64/sve/logical_unpacked_abs.c  | 16 ++++++++
.../aarch64/sve/logical_unpacked_bic_1.c           | 15 +++++++
.../aarch64/sve/logical_unpacked_bic_2.c           | 15 +++++++
.../aarch64/sve/logical_unpacked_bic_3.c           | 15 +++++++
.../aarch64/sve/logical_unpacked_bic_4.c           | 15 +++++++
.../gcc.target/aarch64/sve/logical_unpacked_neg.c  | 16 ++++++++
.../gcc.target/aarch64/sve/logical_unpacked_not.c  | 16 ++++++++
8 files changed, 132 insertions(+), 24 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_abs.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_3.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_4.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_neg.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_not.c

--
2.7.4

Comments

Richard Sandiford June 8, 2020, 9:43 a.m. | #1
Hi Joe,

Joe Ramsay <Joe.Ramsay@arm.com> writes:
> Hi!

>

> MD patterns extended for unary ops ABS, CLS, CLZ, CNT, NEG and NOT

> to support unpacked vectors. Also extended patterns for BIC to

> support unpacked vectors where input elements are of the same width.


Thanks for the patch.  Looks good, but...

> @@ -2848,12 +2848,12 @@

>

> ;; Predicated integer unary arithmetic with merging.

> (define_expand "@cond_<optab><mode>"

> -  [(set (match_operand:SVE_FULL_I 0 "register_operand")

> -               (unspec:SVE_FULL_I

> +  [(set (match_operand:SVE_I 0 "register_operand")

> +              (unspec:SVE_I

>                   [(match_operand:<VPRED> 1 "register_operand")

> -                  (SVE_INT_UNARY:SVE_FULL_I

> -                    (match_operand:SVE_FULL_I 2 "register_operand"))

> -                 (match_operand:SVE_FULL_I 3 "aarch64_simd_reg_or_zero")]

> +                 (SVE_INT_UNARY:SVE_I

> +                   (match_operand:SVE_I 2 "register_operand"))

> +                 (match_operand:SVE_I 3 "aarch64_simd_reg_or_zero")]

>                   UNSPEC_SEL))]

>    "TARGET_SVE"

> )


...it shouldn't be necessary to change the @cond_<optab><mode>
expander to get this to work.  If we did change it, we'd also need
to change the associated define_insns (*cond_<optab><mode>_2 and
*cond_<optab><mode>_any), otherwise the expanders would generate
unrecognisable instructions.  So I think we should drop this part
of the patch and leave it as future work.

The other SVE_INT_UNARYs are clrsb, clz, popcount, ss_abs and ss_neg.
I agree that we can't easily test those, but that the change should in
principle be correct for them too.  (In particular, all of them need
the element type (Vetype) rather than the container type (Vctype),
which is one of the main questions when extending these patterns.)

Thanks,
Richard

Patch

diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 8f0944c..f7100a2 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -2822,11 +2822,11 @@ 

;; Unpredicated integer unary arithmetic.
(define_expand "<optab><mode>2"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand")
-               (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand")
+              (unspec:SVE_I
                  [(match_dup 2)
-                  (SVE_INT_UNARY:SVE_FULL_I
-                    (match_operand:SVE_FULL_I 1 "register_operand"))]
+                 (SVE_INT_UNARY:SVE_I
+                   (match_operand:SVE_I 1 "register_operand"))]
                  UNSPEC_PRED_X))]
   "TARGET_SVE"
   {
@@ -2836,11 +2836,11 @@ 

;; Integer unary arithmetic predicated with a PTRUE.
(define_insn "@aarch64_pred_<optab><mode>"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=w")
-               (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand" "=w")
+              (unspec:SVE_I
                  [(match_operand:<VPRED> 1 "register_operand" "Upl")
-                  (SVE_INT_UNARY:SVE_FULL_I
-                    (match_operand:SVE_FULL_I 2 "register_operand" "w"))]
+                 (SVE_INT_UNARY:SVE_I
+                   (match_operand:SVE_I 2 "register_operand" "w"))]
                  UNSPEC_PRED_X))]
   "TARGET_SVE"
   "<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
@@ -2848,12 +2848,12 @@ 

;; Predicated integer unary arithmetic with merging.
(define_expand "@cond_<optab><mode>"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand")
-               (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand")
+              (unspec:SVE_I
                  [(match_operand:<VPRED> 1 "register_operand")
-                  (SVE_INT_UNARY:SVE_FULL_I
-                    (match_operand:SVE_FULL_I 2 "register_operand"))
-                 (match_operand:SVE_FULL_I 3 "aarch64_simd_reg_or_zero")]
+                 (SVE_INT_UNARY:SVE_I
+                   (match_operand:SVE_I 2 "register_operand"))
+                 (match_operand:SVE_I 3 "aarch64_simd_reg_or_zero")]
                  UNSPEC_SEL))]
   "TARGET_SVE"
)
@@ -4234,13 +4234,13 @@ 

;; Unpredicated BIC.
(define_expand "@aarch64_bic<mode>"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand")
-               (and:SVE_FULL_I
-                 (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand")
+              (and:SVE_I
+                (unspec:SVE_I
                    [(match_dup 3)
-                    (not:SVE_FULL_I (match_operand:SVE_FULL_I 2 "register_operand"))]
+                   (not:SVE_I (match_operand:SVE_I 2 "register_operand"))]
                    UNSPEC_PRED_X)
-                 (match_operand:SVE_FULL_I 1 "register_operand")))]
+                (match_operand:SVE_I 1 "register_operand")))]
   "TARGET_SVE"
   {
     operands[3] = CONSTM1_RTX (<VPRED>mode);
@@ -4249,14 +4249,14 @@ 

;; Predicated BIC.
(define_insn_and_rewrite "*bic<mode>3"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=w")
-               (and:SVE_FULL_I
-                 (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand" "=w")
+              (and:SVE_I
+                (unspec:SVE_I
                    [(match_operand 3)
-                    (not:SVE_FULL_I
-                      (match_operand:SVE_FULL_I 2 "register_operand" "w"))]
+                   (not:SVE_I
+                     (match_operand:SVE_I 2 "register_operand" "w"))]
                    UNSPEC_PRED_X)
-                 (match_operand:SVE_FULL_I 1 "register_operand" "w")))]
+                (match_operand:SVE_I 1 "register_operand" "w")))]
   "TARGET_SVE"
   "bic\t%0.d, %1.d, %2.d"
   "&& !CONSTANT_P (operands[3])"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_abs.c b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_abs.c
new file mode 100644
index 0000000..814e44c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_abs.c
@@ -0,0 +1,16 @@ 
+/* { dg-options "-O3 -msve-vector-bits=256" } */
+
+#include <stdint.h>
+#include <stdlib.h>
+
+void
+f (uint32_t *restrict dst, int8_t *restrict src)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] = (int8_t) abs(src[i]);
+}
+
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.s,} 1 } } */
+/* { dg-final { scan-assembler-times {\tabs\tz[0-9]+\.b, p[0-9]+/m, z[0-9]+\.b\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tsxtb\tz[0-9]+\.s,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_1.c b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_1.c
new file mode 100644
index 0000000..2460305
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_1.c
@@ -0,0 +1,15 @@ 
+/* { dg-options "-O3 -msve-vector-bits=256" } */
+
+#include <stdint.h>
+
+void
+f (uint64_t *restrict dst, uint32_t *restrict src1, uint32_t *restrict src2)
+{
+  for (int i = 0; i < 3; ++i)
+    dst[i] = (uint32_t) (src1[i] & ~src2[i]);
+}
+
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.d,} 2 } } */
+/* { dg-final { scan-assembler-times {\tbic\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tuxtw\tz[0-9]+\.d,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d,} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_2.c b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_2.c
new file mode 100644
index 0000000..61066a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_2.c
@@ -0,0 +1,15 @@ 
+/* { dg-options "-O3 -msve-vector-bits=256" } */
+
+#include <stdint.h>
+
+void
+f (uint64_t *restrict dst, uint8_t *restrict src1, uint8_t *restrict src2)
+{
+  for (int i = 0; i < 3; ++i)
+    dst[i] = (uint8_t) (src1[i] & ~src2[i]);
+}
+
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.d,} 2 } } */
+/* { dg-final { scan-assembler-times {\tbic\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tuxtb\tz[0-9]+\.d,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d,} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_3.c b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_3.c
new file mode 100644
index 0000000..2c9586a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_3.c
@@ -0,0 +1,15 @@ 
+/* { dg-options "-O3 -msve-vector-bits=256" } */
+
+#include <stdint.h>
+
+void
+f (uint32_t *restrict dst, uint16_t *restrict src1, uint16_t *restrict src2)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] = (uint16_t) (src1[i] & ~src2[i]);
+}
+
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tbic\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tuxth\tz[0-9]+\.s,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_4.c b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_4.c
new file mode 100644
index 0000000..5fca214
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_bic_4.c
@@ -0,0 +1,15 @@ 
+/* { dg-options "-O3 -msve-vector-bits=256" } */
+
+#include <stdint.h>
+
+void
+f (uint16_t *restrict dst, uint8_t *restrict src1, uint8_t *restrict src2)
+{
+  for (int i = 0; i < 15; ++i)
+    dst[i] = (uint8_t) (src1[i] & ~src2[i]);
+}
+
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tbic\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tuxtb\tz[0-9]+\.h,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h,} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_neg.c b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_neg.c
new file mode 100644
index 0000000..1f8b3d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_neg.c
@@ -0,0 +1,16 @@ 
+/* { dg-options "-O3 -msve-vector-bits=256" } */
+
+#include <stdint.h>
+#include <stdlib.h>
+
+void
+f (uint32_t *restrict dst, int8_t *restrict src)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] = (int8_t) -src[i];
+}
+
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.s,} 1 } } */
+/* { dg-final { scan-assembler-times {\tneg\tz[0-9]+\.b, p[0-9]+/m, z[0-9]+\.b\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tsxtb\tz[0-9]+\.s,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_not.c b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_not.c
new file mode 100644
index 0000000..a9d36b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/logical_unpacked_not.c
@@ -0,0 +1,16 @@ 
+/* { dg-options "-O3 -msve-vector-bits=256" } */
+
+#include <stdint.h>
+#include <stdlib.h>
+
+void
+f (uint32_t *restrict dst, int8_t *restrict src)
+{
+  for (int i = 0; i < 7; ++i)
+    dst[i] = (int8_t) ~src[i];
+}
+
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.s,} 1 } } */
+/* { dg-final { scan-assembler-times {\tnot\tz[0-9]+\.b, p[0-9]+/m, z[0-9]+\.b\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tsxtb\tz[0-9]+\.s,} 1 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s,} 1 } } */