middle-end: Fold popcount(x&4) to (x>>2)&1 and friends.

Message ID 013901d65e96$68255230$386ff690$@nextmovesoftware.com
State New
Headers show
Series
  • middle-end: Fold popcount(x&4) to (x>>2)&1 and friends.
Related show

Commit Message

Roger Sayle July 20, 2020, 1:05 p.m.
This patch complements one from June 12th which is still awaiting
review: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547937.html

This patch optimizes popcount and parity of an argument known to have
at most a single bit set, to be that single bit.  Hence, popcount(x&8)
is simplified to (x>>3)&1.   This generalizes the existing optimization
of popcount(x&1) being simplified to x&1, which is moved with this
patch to avoid a duplicate pattern warning in match.pd.

This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
and "make -k check" with no new failures.  If this is approved after
(or at the same time) as the patch above, I'm happy to resolve the
conflicts and retest before committing.


2020-07-20  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* match.pd (popcount(x) -> x>>C): New simplification.

gcc/testsuite
	* gcc.dg/fold-popcount-5.c: New test.
	* gcc.dg/fold-parity-5.c: Likewise.


Ok for mainline?
Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

Comments

Bill Schmidt via Gcc-patches July 20, 2020, 1:39 p.m. | #1
On Mon, Jul 20, 2020 at 3:06 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>

>

> This patch complements one from June 12th which is still awaiting

> review: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547937.html

>

> This patch optimizes popcount and parity of an argument known to have

> at most a single bit set, to be that single bit.  Hence, popcount(x&8)

> is simplified to (x>>3)&1.   This generalizes the existing optimization

> of popcount(x&1) being simplified to x&1, which is moved with this

> patch to avoid a duplicate pattern warning in match.pd.

>

> This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap"

> and "make -k check" with no new failures.  If this is approved after

> (or at the same time) as the patch above, I'm happy to resolve the

> conflicts and retest before committing.


Given you know the constant bit position of the possibly nonzero
bit you can elide the conversion to unsigned for all but the case
of a possibly negative input (IIRC GCC doesn't yet take advantage
of negative right shift undefinedness - but maybe sanitizers complain).
Also the shift amount doesn't need to be in the same type as
the shifted amount so using either size_int() or integer_type_node
for that argument should reduce INTEGER_CST waste.

Any reason you are not tackling IFN_POPCOUNT/PARITY?
You could use

(for pfun (POPCOUNT PARITY)
 ...

and automagically get all builtins and the IFN.

Thanks,
Richard.



>

> 2020-07-20  Roger Sayle  <roger@nextmovesoftware.com>

>

> gcc/ChangeLog

>         * match.pd (popcount(x) -> x>>C): New simplification.

>

> gcc/testsuite

>         * gcc.dg/fold-popcount-5.c: New test.

>         * gcc.dg/fold-parity-5.c: Likewise.

>

>

> Ok for mainline?

> Thanks in advance,

> Roger

> --

> Roger Sayle

> NextMove Software

> Cambridge, UK

>
Bill Schmidt via Gcc-patches July 20, 2020, 2:38 p.m. | #2
* Richard Biener via Gcc-patches:

> Given you know the constant bit position of the possibly nonzero

> bit you can elide the conversion to unsigned for all but the case

> of a possibly negative input (IIRC GCC doesn't yet take advantage

> of negative right shift undefinedness - but maybe sanitizers complain).


It's not undefined: “Signed '>>' acts on negative numbers by sign
extension.”

Thanks,
Florian
Roger Sayle July 22, 2020, 8:03 p.m. | #3
Hi Richard,

Many thanks for the peer review and feedback.  I completely agree that POPCOUNT
and PARITY iterators simplifies things and handle the IFN_ variants.  Likewise, using
integer_type_node as the type of shift constants also matches the idioms used 
elsewhere in match.pd and fold.  The following (combined) patch implements those
suggestions, for both submitted patches and the existing POPCOUNT simplifications,
cleaning up this logic and making this part of match.pd more consistent.
[I hadn't appreciated using POPCOUNT/PARITY avoids the need for an explicit for].

I've kept the shiftrt unsigned which is both a requirement for the transformation (when
the single bit is the sign bit), but this also matches the (canonicalization) preference in
the middle-end that unsigned logical shifts are preferred over arithmetic shifts when
the distinction isn't important [lshrdi3 is sometimes cheaper than ashrdi3].

This revised patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
and "make -k check" with no new failures.
Ok for mainline?


2020-07-22  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	* match.pd (popcount(x)&1 -> parity(x)): New simplification.
	(parity(~x) -> parity(x)): New simplification.
	(parity(x)^parity(y) -> parity(x^y)): New simplification.
	(parity(x&1) -> x&1): New simplification.
	(popcount(x) -> x>>C): New simplification.

gcc/testsuite/ChangeLog
	* gcc.dg/fold-popcount-5.c: New test.
	* gcc.dg/fold-parity-1.c: Likewise.
	* gcc.dg/fold-parity-2.c: Likewise.
	* gcc.dg/fold-parity-3.c: Likewise.
	* gcc.dg/fold-parity-4.c: Likewise.
	* gcc.dg/fold-parity-5.c: Likewise.


Thanks in advance,
Roger
--
Roger Sayle
NextMove Software
Cambridge, UK

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 

Sent: 20 July 2020 14:40
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] middle-end: Fold popcount(x&4) to (x>>2)&1 and friends.

On Mon, Jul 20, 2020 at 3:06 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
> This patch complements one from June 12th which is still awaiting review: 

> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547937.html

>

> This patch optimizes popcount and parity of an argument known to have 

> at most a single bit set, to be that single bit.  Hence, popcount(x&8)

> is simplified to (x>>3)&1.   This generalizes the existing optimization

> of popcount(x&1) being simplified to x&1, which is moved with this 

> patch to avoid a duplicate pattern warning in match.pd.

>

> This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap"

> and "make -k check" with no new failures.  If this is approved after 

> (or at the same time) as the patch above, I'm happy to resolve the 

> conflicts and retest before committing.


Given you know the constant bit position of the possibly nonzero bit you can elide the conversion to unsigned for all but the case of a possibly negative input (IIRC GCC doesn't yet take advantage of negative right shift undefinedness - but maybe sanitizers complain).
Also the shift amount doesn't need to be in the same type as the shifted amount so using either size_int() or integer_type_node for that argument should reduce INTEGER_CST waste.

Any reason you are not tackling IFN_POPCOUNT/PARITY?
You could use

(for pfun (POPCOUNT PARITY)
 ...

and automagically get all builtins and the IFN.

Thanks,
Richard.
diff --git a/gcc/match.pd b/gcc/match.pd
index c6ae7a7..a096a17 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5964,25 +5964,55 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (IFN_FMA @0 @1 @2))))
 
 /* POPCOUNT simplifications.  */
-(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL
-	       BUILT_IN_POPCOUNTIMAX)
-  /* popcount(X&1) is nop_expr(X&1).  */
-  (simplify
-    (popcount @0)
-    (if (tree_nonzero_bits (@0) == 1)
-      (convert @0)))
-  /* popcount(X) + popcount(Y) is popcount(X|Y) when X&Y must be zero.  */
-  (simplify
-    (plus (popcount:s @0) (popcount:s @1))
-    (if (wi::bit_and (tree_nonzero_bits (@0), tree_nonzero_bits (@1)) == 0)
-      (popcount (bit_ior @0 @1))))
-  /* popcount(X) == 0 is X == 0, and related (in)equalities.  */
+/* popcount(X) + popcount(Y) is popcount(X|Y) when X&Y must be zero.  */
+(simplify
+  (plus (POPCOUNT:s @0) (POPCOUNT:s @1))
+  (if (wi::bit_and (tree_nonzero_bits (@0), tree_nonzero_bits (@1)) == 0)
+    (POPCOUNT (bit_ior @0 @1))))
+
+/* popcount(X) == 0 is X == 0, and related (in)equalities.  */
+(for popcount (POPCOUNT)
   (for cmp (le eq ne gt)
        rep (eq eq ne ne)
     (simplify
       (cmp (popcount @0) integer_zerop)
       (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))))
 
+/* Canonicalize POPCOUNT(x)&1 as PARITY(X).  */
+(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL
+	       BUILT_IN_POPCOUNTIMAX)
+     parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL
+	     BUILT_IN_PARITYIMAX)
+  (simplify
+    (bit_and (popcount @0) integer_onep)
+    (parity @0)))
+
+/* PARITY simplifications.  */
+/* parity(~X) is parity(X).  */
+(simplify
+  (PARITY (bit_not @0))
+  (PARITY @0))
+
+/* parity(X)^parity(Y) is parity(X^Y).  */
+(simplify
+  (bit_xor (PARITY:s @0) (PARITY:s @1))
+  (PARITY (bit_xor @0 @1)))
+
+/* Common POPCOUNT/PARITY simplifications.  */
+/* popcount(X&C1) is (X>>C2)&1 when C1 == 1<<C2.  Same for parity(X&C1).  */
+(for pfun (POPCOUNT PARITY)
+  (simplify
+    (pfun @0)
+    (with { wide_int nz = tree_nonzero_bits (@0); }
+      (switch
+	(if (nz == 1)
+	  (convert @0))
+	(if (wi::popcount (nz) == 1)
+	  (with { tree utype = unsigned_type_for (TREE_TYPE (@0)); }
+	    (convert (rshift:utype (convert:utype @0)
+				   { build_int_cst (integer_type_node,
+						    wi::ctz (nz)); }))))))))
+
 #if GIMPLE
 /* 64- and 32-bits branchless implementations of popcount are detected:
 
diff --git a/gcc/testsuite/gcc.dg/fold-popcount-5.c b/gcc/testsuite/gcc.dg/fold-popcount-5.c
new file mode 100644
index 0000000..943726f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-popcount-5.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int test_and4(unsigned int a)
+{
+  return __builtin_popcount(a&4);
+}
+
+int test_and4l(unsigned long b)
+{
+  return __builtin_popcountl(b&4);
+}
+
+int test_and4ll(unsigned long long c)
+{
+  return __builtin_popcountll(c&4);
+}
+
+int test_shift(unsigned int d)
+{
+  int bits = 8*sizeof(unsigned int)-1;
+  return __builtin_popcount(d<<31);
+}
+
+int test_shiftl(unsigned long e)
+{
+  int bits = 8*sizeof(unsigned long)-1;
+  return __builtin_popcountl(e<<bits);
+}
+
+int test_shiftll(unsigned long long f)
+{
+  int bits = 8*sizeof(unsigned long long)-1;
+  return __builtin_popcountll(f<<bits);
+}
+
+/* { dg-final { scan-tree-dump-times "popcount" 0 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/fold-parity-1.c b/gcc/testsuite/gcc.dg/fold-parity-1.c
new file mode 100644
index 0000000..3ba56c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-parity-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-original" } */
+
+int foo(unsigned int x)
+{
+  return __builtin_popcount(x) & 1;
+}
+
+int fool(unsigned long x)
+{
+  return __builtin_popcountl(x) & 1;
+}
+
+int fooll(unsigned long long x)
+{
+  return __builtin_popcountll(x) & 1;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 0 "original" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_parity" 3 "original" } } */
+
diff --git a/gcc/testsuite/gcc.dg/fold-parity-2.c b/gcc/testsuite/gcc.dg/fold-parity-2.c
new file mode 100644
index 0000000..8c7acbf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-parity-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo(unsigned int x)
+{
+  return __builtin_parity(~x);
+}
+
+int fool(unsigned long x)
+{
+  return __builtin_parityl(~x);
+}
+
+int fooll(unsigned long long x)
+{
+  return __builtin_parityll(~x);
+}
+
+/* { dg-final { scan-tree-dump-times "~" 0 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/fold-parity-3.c b/gcc/testsuite/gcc.dg/fold-parity-3.c
new file mode 100644
index 0000000..e0355cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-parity-3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo(unsigned int x)
+{
+  return __builtin_parity(x&1);
+}
+
+int fool(unsigned long x)
+{
+  return __builtin_parityl(x&1);
+}
+
+int fooll(unsigned long long x)
+{
+  return __builtin_parityll(x&1);
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_parity" 0 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/fold-parity-4.c b/gcc/testsuite/gcc.dg/fold-parity-4.c
new file mode 100644
index 0000000..5dfedab
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-parity-4.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo(unsigned int x, unsigned int y)
+{
+  return __builtin_parity(x) ^ __builtin_parity(y);
+}
+
+int fool(unsigned long x, unsigned long y)
+{
+  return __builtin_parityl(x) ^ __builtin_parityl(y);
+}
+
+int fooll(unsigned long long x, unsigned long long y)
+{
+  return __builtin_parityll(x) ^ __builtin_parityll(y);
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_parity" 3 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/fold-parity-5.c b/gcc/testsuite/gcc.dg/fold-parity-5.c
new file mode 100644
index 0000000..69d3a6a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-parity-5.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int test_and4(unsigned int a)
+{
+  return __builtin_parity(a&4);
+}
+
+int test_and4l(unsigned long b)
+{
+  return __builtin_parityl(b&4);
+}
+
+int test_and4ll(unsigned long long c)
+{
+  return __builtin_parityll(c&4);
+}
+
+int test_shift(unsigned int d)
+{
+  int bits = 8*sizeof(unsigned int)-1;
+  return __builtin_parity(d<<31);
+}
+
+int test_shiftl(unsigned long e)
+{
+  int bits = 8*sizeof(unsigned long)-1;
+  return __builtin_parityl(e<<bits);
+}
+
+int test_shiftll(unsigned long long f)
+{
+  int bits = 8*sizeof(unsigned long long)-1;
+  return __builtin_parityll(f<<bits);
+}
+
+/* { dg-final { scan-tree-dump-times "parity" 0 "optimized" } } */
+
Marc Glisse July 23, 2020, 8:15 a.m. | #4
On Wed, 22 Jul 2020, Roger Sayle wrote:

> Many thanks for the peer review and feedback.  I completely agree that POPCOUNT

> and PARITY iterators simplifies things and handle the IFN_ variants.


Is there a reason why the iterators cannot be used for this one?

+(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL
+	       BUILT_IN_POPCOUNTIMAX)
+     parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL
+	     BUILT_IN_PARITYIMAX)
+  (simplify
+    (bit_and (popcount @0) integer_onep)
+    (parity @0)))


-- 
Marc Glisse
Bill Schmidt via Gcc-patches July 23, 2020, 9:01 a.m. | #5
On Thu, Jul 23, 2020 at 10:15 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>

> On Wed, 22 Jul 2020, Roger Sayle wrote:

>

> > Many thanks for the peer review and feedback.  I completely agree that POPCOUNT

> > and PARITY iterators simplifies things and handle the IFN_ variants.

>

> Is there a reason why the iterators cannot be used for this one?

>

> +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL

> +              BUILT_IN_POPCOUNTIMAX)

> +     parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL

> +            BUILT_IN_PARITYIMAX)

> +  (simplify

> +    (bit_and (popcount @0) integer_onep)

> +    (parity @0)))


I should indeed work fine to even do

 (simplify
   (bit_and (POPCOUNT @0) integer_onep)
   (PARITY @0))

Likewise for the existing

+/* popcount(X) == 0 is X == 0, and related (in)equalities.  */
+(for popcount (POPCOUNT)
   (for cmp (le eq ne gt)
        rep (eq eq ne ne)
     (simplify
       (cmp (popcount @0) integer_zerop)
       (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))))

you can elide the (for ...)

OK with those changes.

Richard.

>

> --

> Marc Glisse
Marc Glisse July 23, 2020, 9:10 a.m. | #6
On Thu, 23 Jul 2020, Marc Glisse wrote:

> On Wed, 22 Jul 2020, Roger Sayle wrote:

>

>> Many thanks for the peer review and feedback.  I completely agree that 

>> POPCOUNT

>> and PARITY iterators simplifies things and handle the IFN_ variants.

>

> Is there a reason why the iterators cannot be used for this one?

>

> +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL

> +	       BUILT_IN_POPCOUNTIMAX)

> +     parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL

> +	     BUILT_IN_PARITYIMAX)

> +  (simplify

> +    (bit_and (popcount @0) integer_onep)

> +    (parity @0)))


Ah, maybe because we may have platforms/modes where the optab for popcount 
is supported but not the one for parity, and we are not allowed to create 
internal calls if the optab is not supported? I think expand_parity tries 
to expand parity as popcount&1, so it should be fine, but I didn't 
actually try it...

-- 
Marc Glisse
Bill Schmidt via Gcc-patches July 23, 2020, 10:05 a.m. | #7
On Thu, Jul 23, 2020 at 11:11 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>

> On Thu, 23 Jul 2020, Marc Glisse wrote:

>

> > On Wed, 22 Jul 2020, Roger Sayle wrote:

> >

> >> Many thanks for the peer review and feedback.  I completely agree that

> >> POPCOUNT

> >> and PARITY iterators simplifies things and handle the IFN_ variants.

> >

> > Is there a reason why the iterators cannot be used for this one?

> >

> > +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL

> > +            BUILT_IN_POPCOUNTIMAX)

> > +     parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL

> > +          BUILT_IN_PARITYIMAX)

> > +  (simplify

> > +    (bit_and (popcount @0) integer_onep)

> > +    (parity @0)))

>

> Ah, maybe because we may have platforms/modes where the optab for popcount

> is supported but not the one for parity, and we are not allowed to create

> internal calls if the optab is not supported? I think expand_parity tries

> to expand parity as popcount&1, so it should be fine, but I didn't

> actually try it...


Hmm, that might indeed be a problem.  An explicit
direct_internal_fn_supported_p guard would help here, like
maybe

  (if (PARITY != IFN_PARITY
       || direct_internal_fn_supported_p (IFN_PARITY, type, OPTIMIZE_FOR_BOTH))
  (PARITY @0)))

magic that eventually could be auto-generated by genmatch ... (but only if
iterating, so it would be a bit iffy)

Richard.

> --

> Marc Glisse
Richard Sandiford July 23, 2020, 10:58 a.m. | #8
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Thu, Jul 23, 2020 at 11:11 AM Marc Glisse <marc.glisse@inria.fr> wrote:

>>

>> On Thu, 23 Jul 2020, Marc Glisse wrote:

>>

>> > On Wed, 22 Jul 2020, Roger Sayle wrote:

>> >

>> >> Many thanks for the peer review and feedback.  I completely agree that

>> >> POPCOUNT

>> >> and PARITY iterators simplifies things and handle the IFN_ variants.

>> >

>> > Is there a reason why the iterators cannot be used for this one?

>> >

>> > +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL

>> > +            BUILT_IN_POPCOUNTIMAX)

>> > +     parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL

>> > +          BUILT_IN_PARITYIMAX)

>> > +  (simplify

>> > +    (bit_and (popcount @0) integer_onep)

>> > +    (parity @0)))

>>

>> Ah, maybe because we may have platforms/modes where the optab for popcount

>> is supported but not the one for parity, and we are not allowed to create

>> internal calls if the optab is not supported? I think expand_parity tries

>> to expand parity as popcount&1, so it should be fine, but I didn't

>> actually try it...

>

> Hmm, that might indeed be a problem.  An explicit

> direct_internal_fn_supported_p guard would help here, like

> maybe

>

>   (if (PARITY != IFN_PARITY

>        || direct_internal_fn_supported_p (IFN_PARITY, type, OPTIMIZE_FOR_BOTH))

>   (PARITY @0)))

>

> magic that eventually could be auto-generated by genmatch ... (but only if

> iterating, so it would be a bit iffy)


The matching code should already reject folds to IFN_PARITY calls that
aren't supported by the target (gimple-match-head.c:build_call_internal).

Thanks,
Richard
Bill Schmidt via Gcc-patches July 23, 2020, 11:04 a.m. | #9
On Thu, Jul 23, 2020 at 12:58 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>

> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> > On Thu, Jul 23, 2020 at 11:11 AM Marc Glisse <marc.glisse@inria.fr> wrote:

> >>

> >> On Thu, 23 Jul 2020, Marc Glisse wrote:

> >>

> >> > On Wed, 22 Jul 2020, Roger Sayle wrote:

> >> >

> >> >> Many thanks for the peer review and feedback.  I completely agree that

> >> >> POPCOUNT

> >> >> and PARITY iterators simplifies things and handle the IFN_ variants.

> >> >

> >> > Is there a reason why the iterators cannot be used for this one?

> >> >

> >> > +(for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL

> >> > +            BUILT_IN_POPCOUNTIMAX)

> >> > +     parity (BUILT_IN_PARITY BUILT_IN_PARITYL BUILT_IN_PARITYLL

> >> > +          BUILT_IN_PARITYIMAX)

> >> > +  (simplify

> >> > +    (bit_and (popcount @0) integer_onep)

> >> > +    (parity @0)))

> >>

> >> Ah, maybe because we may have platforms/modes where the optab for popcount

> >> is supported but not the one for parity, and we are not allowed to create

> >> internal calls if the optab is not supported? I think expand_parity tries

> >> to expand parity as popcount&1, so it should be fine, but I didn't

> >> actually try it...

> >

> > Hmm, that might indeed be a problem.  An explicit

> > direct_internal_fn_supported_p guard would help here, like

> > maybe

> >

> >   (if (PARITY != IFN_PARITY

> >        || direct_internal_fn_supported_p (IFN_PARITY, type, OPTIMIZE_FOR_BOTH))

> >   (PARITY @0)))

> >

> > magic that eventually could be auto-generated by genmatch ... (but only if

> > iterating, so it would be a bit iffy)

>

> The matching code should already reject folds to IFN_PARITY calls that

> aren't supported by the target (gimple-match-head.c:build_call_internal).


Ah, indeed, so it should work unchanged even.

Richard.

> Thanks,

> Richard
Roger Sayle July 23, 2020, 12:01 p.m. | #10
On Thu, Jul 23, 2020 at 10:02 Richard Biener wrote:
> Likewise for the existing

>

>+/* popcount(X) == 0 is X == 0, and related (in)equalities.  */ (for 

>+(for popcount (POPCOUNT)

>   (for cmp (le eq ne gt)

>       rep (eq eq ne ne)

>   (simplify

>       (cmp (popcount @0) integer_zerop)

>       (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))))

>

> you can elide the (for ...)


Unfortunately, I tried that myself when preparing this patch:

Attempt #1:

   (for cmp (le eq ne gt)
        rep (eq eq ne ne)
     (simplify
       (cmp (POPCOUNT @0) integer_zerop)
       (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))))

results in:
build/genmatch --gimple ../../patcha3/gcc/match.pd \
    > tmp-gimple-match.c

../../patcha3/gcc/match.pd:5977:11 error: operator-list POPCOUNT cannot be expanded inside 'for'
    (cmp (POPCOUNT @0) integer_zerop)
          ^
Attempt #2:

(for popcount (POPCOUNT)
     cmp (le eq ne gt)
     rep (eq eq ne ne)
  (simplify
    (cmp (popcount @0) integer_zerop)
    (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))

results in:
    > tmp-gimple-match.c

../../patcha3/gcc/match.pd:5975:22 error: All user-defined identifiers must have a multiple number of operator substitutions of the smallest number of substitutions
     cmp (le eq ne gt)
                     ^

I'll leave it the way it is with the nested FORs, and investigate your other suggestions.

> OK with those changes.

>

> Richard.



Roger
--
Bill Schmidt via Gcc-patches July 23, 2020, 12:05 p.m. | #11
On Thu, Jul 23, 2020 at 2:01 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>

>

> On Thu, Jul 23, 2020 at 10:02 Richard Biener wrote:

> > Likewise for the existing

> >

> >+/* popcount(X) == 0 is X == 0, and related (in)equalities.  */ (for

> >+(for popcount (POPCOUNT)

> >   (for cmp (le eq ne gt)

> >       rep (eq eq ne ne)

> >   (simplify

> >       (cmp (popcount @0) integer_zerop)

> >       (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))))

> >

> > you can elide the (for ...)

>

> Unfortunately, I tried that myself when preparing this patch:

>

> Attempt #1:

>

>    (for cmp (le eq ne gt)

>         rep (eq eq ne ne)

>      (simplify

>        (cmp (POPCOUNT @0) integer_zerop)

>        (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))))

>

> results in:

> build/genmatch --gimple ../../patcha3/gcc/match.pd \

>     > tmp-gimple-match.c

> ../../patcha3/gcc/match.pd:5977:11 error: operator-list POPCOUNT cannot be expanded inside 'for'

>     (cmp (POPCOUNT @0) integer_zerop)

>           ^

> Attempt #2:

>

> (for popcount (POPCOUNT)

>      cmp (le eq ne gt)

>      rep (eq eq ne ne)

>   (simplify

>     (cmp (popcount @0) integer_zerop)

>     (rep @0 { build_zero_cst (TREE_TYPE (@0)); })))

>

> results in:

>     > tmp-gimple-match.c

> ../../patcha3/gcc/match.pd:5975:22 error: All user-defined identifiers must have a multiple number of operator substitutions of the smallest number of substitutions

>      cmp (le eq ne gt)

>                      ^


Oh yeah - I failed to spot the outer (for ..), indeed it doesn't work
then.  We're not magically
creating a nest from the first vairant which we could make work - I
erred on the side of
"too much implicitness" caution here.

> I'll leave it the way it is with the nested FORs, and investigate your other suggestions.


Yeah, fine.

> > OK with those changes.

> >

> > Richard.

>

>

> Roger

> --

>

>

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index c6ae7a7..0b3b626 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5966,11 +5966,6 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* POPCOUNT simplifications.  */
 (for popcount (BUILT_IN_POPCOUNT BUILT_IN_POPCOUNTL BUILT_IN_POPCOUNTLL
 	       BUILT_IN_POPCOUNTIMAX)
-  /* popcount(X&1) is nop_expr(X&1).  */
-  (simplify
-    (popcount @0)
-    (if (tree_nonzero_bits (@0) == 1)
-      (convert @0)))
   /* popcount(X) + popcount(Y) is popcount(X|Y) when X&Y must be zero.  */
   (simplify
     (plus (popcount:s @0) (popcount:s @1))
@@ -5983,6 +5978,23 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
       (cmp (popcount @0) integer_zerop)
       (rep @0 { build_zero_cst (TREE_TYPE (@0)); }))))
 
+/* popcount(X&C1) is (X>>C2)&1 when C1 == 1<<C2.  Same for parity(X&C1).  */
+(for pfun (BUILT_IN_POPCOUNT BUILT_IN_PARITY
+	   BUILT_IN_POPCOUNTL BUILT_IN_PARITYL
+	   BUILT_IN_POPCOUNTLL BUILT_IN_PARITYLL
+	   BUILT_IN_POPCOUNTIMAX BUILT_IN_PARITYIMAX)
+  (simplify
+    (pfun @0)
+    (with { wide_int nz = tree_nonzero_bits (@0); }
+      (switch
+	(if (nz == 1)
+	  (convert @0))
+	(if (wi::popcount (nz) == 1)
+	  (with { tree utype = unsigned_type_for (TREE_TYPE (@0));
+		  int bits = wi::ctz (nz); }
+	    (convert (rshift:utype (convert:utype @0)
+				   { build_int_cst (utype, bits); }))))))))
+
 #if GIMPLE
 /* 64- and 32-bits branchless implementations of popcount are detected:
 
diff --git a/gcc/testsuite/gcc.dg/fold-popcount-5.c b/gcc/testsuite/gcc.dg/fold-popcount-5.c
new file mode 100644
index 0000000..943726f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-popcount-5.c
@@ -0,0 +1,38 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int test_and4(unsigned int a)
+{
+  return __builtin_popcount(a&4);
+}
+
+int test_and4l(unsigned long b)
+{
+  return __builtin_popcountl(b&4);
+}
+
+int test_and4ll(unsigned long long c)
+{
+  return __builtin_popcountll(c&4);
+}
+
+int test_shift(unsigned int d)
+{
+  int bits = 8*sizeof(unsigned int)-1;
+  return __builtin_popcount(d<<31);
+}
+
+int test_shiftl(unsigned long e)
+{
+  int bits = 8*sizeof(unsigned long)-1;
+  return __builtin_popcountl(e<<bits);
+}
+
+int test_shiftll(unsigned long long f)
+{
+  int bits = 8*sizeof(unsigned long long)-1;
+  return __builtin_popcountll(f<<bits);
+}
+
+/* { dg-final { scan-tree-dump-times "popcount" 0 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/fold-parity-5.c b/gcc/testsuite/gcc.dg/fold-parity-5.c
new file mode 100644
index 0000000..69d3a6a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-parity-5.c
@@ -0,0 +1,38 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int test_and4(unsigned int a)
+{
+  return __builtin_parity(a&4);
+}
+
+int test_and4l(unsigned long b)
+{
+  return __builtin_parityl(b&4);
+}
+
+int test_and4ll(unsigned long long c)
+{
+  return __builtin_parityll(c&4);
+}
+
+int test_shift(unsigned int d)
+{
+  int bits = 8*sizeof(unsigned int)-1;
+  return __builtin_parity(d<<31);
+}
+
+int test_shiftl(unsigned long e)
+{
+  int bits = 8*sizeof(unsigned long)-1;
+  return __builtin_parityl(e<<bits);
+}
+
+int test_shiftll(unsigned long long f)
+{
+  int bits = 8*sizeof(unsigned long long)-1;
+  return __builtin_parityll(f<<bits);
+}
+
+/* { dg-final { scan-tree-dump-times "parity" 0 "optimized" } } */
+