Fix PR66552, Missed optimization when shift amount is result of signed modulus

Message ID 20200217074133.57269-1-helijia@linux.ibm.com
State New
Headers show
Series
  • Fix PR66552, Missed optimization when shift amount is result of signed modulus
Related show

Commit Message

Li Jia He Feb. 17, 2020, 7:41 a.m.
Hi,

This patch wants to fix PR66552 on gimple and optimizes (x shift (n mod C))
to (x shift (n bit_and (C - 1))) when C is a constant and power of two as
discussed in PR66552.

The regression testing for the patch was done on GCC mainline on

    powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions.  Is it OK for GCC11 ?

Thanks,
Lijia He

gcc/ChangeLog
2020-02-17  Li Jia He  <helijia@linux.ibm.com>

	PR tree-optimization/66552
	* match.pd (x shift (n mod pow2(c))): Optimizes to
	(x shift (n bit_and (pow2(c) - 1)).

testsuite/ChangeLog
2019-02-17  Li Jia He  <helijia@linux.ibm.com>

	PR tree-optimization/66552
	* testsuite/gcc.dg/pr66552.c: New testcase.
---
 gcc/match.pd                   |  6 ++++++
 gcc/testsuite/gcc.dg/pr66552.c | 14 ++++++++++++++
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr66552.c

-- 
2.17.1

Comments

Richard Biener Feb. 17, 2020, 8:01 a.m. | #1
On Mon, 17 Feb 2020, Li Jia He wrote:

> Hi,

> 

> This patch wants to fix PR66552 on gimple and optimizes (x shift (n mod C))

> to (x shift (n bit_and (C - 1))) when C is a constant and power of two as

> discussed in PR66552.

> 

> The regression testing for the patch was done on GCC mainline on

> 

>     powerpc64le-unknown-linux-gnu (Power 9 LE)

> 

> with no regressions.  Is it OK for GCC11 ?


I fail to see the connection with a shift operation, can you explain?

Thanks,
Richard.

> Thanks,

> Lijia He

> 

> gcc/ChangeLog

> 2020-02-17  Li Jia He  <helijia@linux.ibm.com>

> 

> 	PR tree-optimization/66552

> 	* match.pd (x shift (n mod pow2(c))): Optimizes to

> 	(x shift (n bit_and (pow2(c) - 1)).

> 

> testsuite/ChangeLog

> 2019-02-17  Li Jia He  <helijia@linux.ibm.com>

> 

> 	PR tree-optimization/66552

> 	* testsuite/gcc.dg/pr66552.c: New testcase.

> ---

>  gcc/match.pd                   |  6 ++++++

>  gcc/testsuite/gcc.dg/pr66552.c | 14 ++++++++++++++

>  2 files changed, 20 insertions(+)

>  create mode 100644 gcc/testsuite/gcc.dg/pr66552.c

> 

> diff --git a/gcc/match.pd b/gcc/match.pd

> index 73834c25593..1d74f7dba7f 100644

> --- a/gcc/match.pd

> +++ b/gcc/match.pd

> @@ -546,6 +546,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

>   (simplify

>    (mod (mod@2 @0 @1) @1)

>    @2)

> + /* Optimize (x shift (n mod C)) to (x shift (n bit_and (C - 1))) when C is a

> +    constant and power of two.  */

> + (for shift (lshift rshift)

> +  (simplify

> +   (shift @0 (mod @1 integer_pow2p@2))

> +   (shift @0 (bit_and @1 (minus @2 { build_int_cst (TREE_TYPE (@2), 1); })))))

>   /* From extract_muldiv_1: (X * C1) % C2 is zero if C1 is a multiple of C2.  */

>   (simplify

>    (mod (mult @0 INTEGER_CST@1) INTEGER_CST@2)

> diff --git a/gcc/testsuite/gcc.dg/pr66552.c b/gcc/testsuite/gcc.dg/pr66552.c

> new file mode 100644

> index 00000000000..7583c9ad25a

> --- /dev/null

> +++ b/gcc/testsuite/gcc.dg/pr66552.c

> @@ -0,0 +1,14 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -fdump-tree-lower" } */

> +

> +unsigned a(unsigned x, int n)

> +{

> +  return x >> (n % 32);

> +}

> +

> +unsigned b(unsigned x, int n)

> +{

> +  return x << (n % 32);

> +}

> +

> +/* { dg-final { scan-tree-dump-not " % " "lower" } } */

> 


-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Jakub Jelinek Feb. 17, 2020, 8:07 a.m. | #2
On Mon, Feb 17, 2020 at 09:01:13AM +0100, Richard Biener wrote:
> On Mon, 17 Feb 2020, Li Jia He wrote:

> > This patch wants to fix PR66552 on gimple and optimizes (x shift (n mod C))

> > to (x shift (n bit_and (C - 1))) when C is a constant and power of two as

> > discussed in PR66552.

> > 

> > The regression testing for the patch was done on GCC mainline on

> > 

> >     powerpc64le-unknown-linux-gnu (Power 9 LE)

> > 

> > with no regressions.  Is it OK for GCC11 ?

> 

> I fail to see the connection with a shift operation, can you explain?


As mentioned in the PR, the reason why we can only optimize unsigned
mod C into bit_and C-1 (where C is pow2p) is that signed modulo can be negative for
negative values (is non-positive).  For shifts negative values would be UB
and so we can assume it will not be negative (i.e. x will be either positive
or a negative multiple of C) and can use bit_and then.

	Jakub
Richard Biener Feb. 17, 2020, 8:30 a.m. | #3
On Mon, 17 Feb 2020, Jakub Jelinek wrote:

> On Mon, Feb 17, 2020 at 09:01:13AM +0100, Richard Biener wrote:

> > On Mon, 17 Feb 2020, Li Jia He wrote:

> > > This patch wants to fix PR66552 on gimple and optimizes (x shift (n mod C))

> > > to (x shift (n bit_and (C - 1))) when C is a constant and power of two as

> > > discussed in PR66552.

> > > 

> > > The regression testing for the patch was done on GCC mainline on

> > > 

> > >     powerpc64le-unknown-linux-gnu (Power 9 LE)

> > > 

> > > with no regressions.  Is it OK for GCC11 ?

> > 

> > I fail to see the connection with a shift operation, can you explain?

> 

> As mentioned in the PR, the reason why we can only optimize unsigned

> mod C into bit_and C-1 (where C is pow2p) is that signed modulo can be negative for

> negative values (is non-positive).  For shifts negative values would be UB

> and so we can assume it will not be negative (i.e. x will be either positive

> or a negative multiple of C) and can use bit_and then.


That's clearly information that should be in the comment before the 
pattern.  We could also generally do the transform if we know
the other operand of the modulo is nonnegative.

Also the pattern doing the standalone transform does

/* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
   i.e. "X % C" into "X & (C - 1)", if X and C are positive.
   Also optimize A % (C << N)  where C is a power of 2,
   to A & ((C << N) - 1).  */
(match (power_of_two_cand @1)
 INTEGER_CST@1)
(match (power_of_two_cand @1)
 (lshift INTEGER_CST@1 @2))
(for mod (trunc_mod floor_mod)
 (simplify
  (mod @0 (convert?@3 (power_of_two_cand@1 @2)))
  (if ((TYPE_UNSIGNED (type)
        || tree_expr_nonnegative_p (@0))
        && tree_nop_conversion_p (type, TREE_TYPE (@3))
        && integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0)
   (bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 1); 
}))))))

so it also checks whether @2 is not negative, the new pattern lacks
this and could make use of power_of_two_cand as well (in fact I'd
place it next to the pattern above.

Also the above applies for trunc_mod and floor_mod but the proposed
patch applies the transform for ceil_mod and round_mod as well.

Richard.
Jiufu Guo Feb. 17, 2020, 12:10 p.m. | #4
Li Jia He <helijia@linux.ibm.com> writes:

> Hi,

> @@ -546,6 +546,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

>   (simplify

>    (mod (mod@2 @0 @1) @1)

>    @2)

> + /* Optimize (x shift (n mod C)) to (x shift (n bit_and (C - 1))) when C is a

> +    constant and power of two.  */

> + (for shift (lshift rshift)

> +  (simplify

> +   (shift @0 (mod @1 integer_pow2p@2))

> +   (shift @0 (bit_and @1 (minus @2 { build_int_cst (TREE_TYPE (@2), 1); })))))

>   /* From extract_muldiv_1: (X * C1) % C2 is zero if C1 is a multiple of C2.  */

>   (simplify

>    (mod (mult @0 INTEGER_CST@1) INTEGER_CST@2)

> +unsigned a(unsigned x, int n)

> +{

> +  return x >> (n % 32);

> +}

> +


With this patch, the behavior will change on negtive @2. While this
would not be an issue because it is UB.

In below foo, r and r1 are not same after patch.
---------
int __attribute__ ((noinline))
foo (unsigned x, int n, int t)
{
  int r = x >> (n % 4);
  int r1 = x >> t;
  __builtin_printf ("%d : %d\n", r, r1);  
  return r == r1;
}

int main()
{
  unsigned x = 0x12c;
  int n = -3;
  return foo (x, n, n %4);
}
---------
For: "x=0x12c, n=-3; x >> (n % 4)"  without patch, it is 0; with patch
it is 0x96.
Li Jia He Feb. 18, 2020, 2:54 a.m. | #5
Hi,

On 2020/2/17 4:30 PM, Richard Biener wrote:
> On Mon, 17 Feb 2020, Jakub Jelinek wrote:

> 

>> On Mon, Feb 17, 2020 at 09:01:13AM +0100, Richard Biener wrote:

>>> On Mon, 17 Feb 2020, Li Jia He wrote:

>>>> This patch wants to fix PR66552 on gimple and optimizes (x shift (n mod C))

>>>> to (x shift (n bit_and (C - 1))) when C is a constant and power of two as

>>>> discussed in PR66552.

>>>>

>>>> The regression testing for the patch was done on GCC mainline on

>>>>

>>>>      powerpc64le-unknown-linux-gnu (Power 9 LE)

>>>>

>>>> with no regressions.  Is it OK for GCC11 ?

>>>

>>> I fail to see the connection with a shift operation, can you explain?

>>

>> As mentioned in the PR, the reason why we can only optimize unsigned

>> mod C into bit_and C-1 (where C is pow2p) is that signed modulo can be negative for

>> negative values (is non-positive).  For shifts negative values would be UB

>> and so we can assume it will not be negative (i.e. x will be either positive

>> or a negative multiple of C) and can use bit_and then.

> 

> That's clearly information that should be in the comment before the

> pattern.  We could also generally do the transform if we know

> the other operand of the modulo is nonnegative.

> 

> Also the pattern doing the standalone transform does

> 

> /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,

>     i.e. "X % C" into "X & (C - 1)", if X and C are positive.

>     Also optimize A % (C << N)  where C is a power of 2,

>     to A & ((C << N) - 1).  */

> (match (power_of_two_cand @1)

>   INTEGER_CST@1)

> (match (power_of_two_cand @1)

>   (lshift INTEGER_CST@1 @2))

> (for mod (trunc_mod floor_mod)

>   (simplify

>    (mod @0 (convert?@3 (power_of_two_cand@1 @2)))

>    (if ((TYPE_UNSIGNED (type)

>          || tree_expr_nonnegative_p (@0))

>          && tree_nop_conversion_p (type, TREE_TYPE (@3))

>          && integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0)

>     (bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 1);

> }))))))

> 

> so it also checks whether @2 is not negative, the new pattern lacks

> this and could make use of power_of_two_cand as well (in fact I'd

> place it next to the pattern above.

> 


Thank you for your suggestions.  I have modified the code according to your
suggestions. But power_of_two_cand is not used, it will cause pr70354-2.c
failed ((0x1234ULL << (i % 54)) will transfer to (0x1234ULL << (i & 54))).

The regression testing for the patch was done on GCC mainline on powerpc64le
with no regressions.  Would you like to help to see if there are other 
problems?
Thanks in advance.  The code is in the attachment.

> Also the above applies for trunc_mod and floor_mod but the proposed

> patch applies the transform for ceil_mod and round_mod as well.

> 

> Richard.

> 


-- 
BR,
Lijia He
diff --git a/gcc/match.pd b/gcc/match.pd
index 73834c25593..a8fc7918621 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -598,12 +598,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
    i.e. "X % C" into "X & (C - 1)", if X and C are positive.
    Also optimize A % (C << N)  where C is a power of 2,
-   to A & ((C << N) - 1).  */
+   to A & ((C << N) - 1).  And optimize "A shift (N % C)" where C
+   is a power of 2 and shift operation included "<<" and ">>",
+   to  "A shift (N & (C - 1))".  */
 (match (power_of_two_cand @1)
  INTEGER_CST@1)
 (match (power_of_two_cand @1)
  (lshift INTEGER_CST@1 @2))
 (for mod (trunc_mod floor_mod)
+ (for shift (lshift rshift)
+  (simplify
+   (shift @0 (mod @1 integer_pow2p@2))
+   (if (tree_expr_nonnegative_p (@2))
+   (shift @0 (bit_and @1 (minus @2 { build_int_cst (TREE_TYPE (@2), 1); }))))))
  (simplify
   (mod @0 (convert?@3 (power_of_two_cand@1 @2)))
   (if ((TYPE_UNSIGNED (type)
diff --git a/gcc/testsuite/gcc.dg/pr66552.c b/gcc/testsuite/gcc.dg/pr66552.c
new file mode 100644
index 00000000000..7583c9ad25a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr66552.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lower" } */
+
+unsigned a(unsigned x, int n)
+{
+  return x >> (n % 32);
+}
+
+unsigned b(unsigned x, int n)
+{
+  return x << (n % 32);
+}
+
+/* { dg-final { scan-tree-dump-not " % " "lower" } } */
Marc Glisse Feb. 22, 2020, 3:12 p.m. | #6
On Tue, 18 Feb 2020, Li Jia He wrote:

>> Also the pattern doing the standalone transform does

>> 

>> /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,

>>     i.e. "X % C" into "X & (C - 1)", if X and C are positive.

>>     Also optimize A % (C << N)  where C is a power of 2,

>>     to A & ((C << N) - 1).  */

>> (match (power_of_two_cand @1)

>>   INTEGER_CST@1)

>> (match (power_of_two_cand @1)

>>   (lshift INTEGER_CST@1 @2))

>> (for mod (trunc_mod floor_mod)

>>   (simplify

>>    (mod @0 (convert?@3 (power_of_two_cand@1 @2)))

>>    (if ((TYPE_UNSIGNED (type)

>>          || tree_expr_nonnegative_p (@0))

>>          && tree_nop_conversion_p (type, TREE_TYPE (@3))

>>          && integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0)

>>     (bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 1);

>> }))))))

>> 

>> so it also checks whether @2 is not negative, the new pattern lacks

>> this and could make use of power_of_two_cand as well (in fact I'd

>> place it next to the pattern above.

>> 

>

> Thank you for your suggestions.  I have modified the code according to your

> suggestions. But power_of_two_cand is not used, it will cause pr70354-2.c

> failed ((0x1234ULL << (i % 54)) will transfer to (0x1234ULL << (i & 54))).


How exactly did you use power_of_two_cand? As shown in the transformation 
above, it combines with integer_pow2p, it doesn't replace it.

-- 
Marc Glisse
Li Jia He Feb. 24, 2020, 5:58 a.m. | #7
Hi,

On 2020/2/22 11:12 PM, Marc Glisse wrote:
> On Tue, 18 Feb 2020, Li Jia He wrote:

> 

>>> Also the pattern doing the standalone transform does

>>>

>>> /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,

>>>     i.e. "X % C" into "X & (C - 1)", if X and C are positive.

>>>     Also optimize A % (C << N)  where C is a power of 2,

>>>     to A & ((C << N) - 1).  */

>>> (match (power_of_two_cand @1)

>>>   INTEGER_CST@1)

>>> (match (power_of_two_cand @1)

>>>   (lshift INTEGER_CST@1 @2))

>>> (for mod (trunc_mod floor_mod)

>>>   (simplify

>>>    (mod @0 (convert?@3 (power_of_two_cand@1 @2)))

>>>    (if ((TYPE_UNSIGNED (type)

>>>          || tree_expr_nonnegative_p (@0))

>>>          && tree_nop_conversion_p (type, TREE_TYPE (@3))

>>>          && integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0)

>>>     (bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 1);

>>> }))))))

>>>

>>> so it also checks whether @2 is not negative, the new pattern lacks

>>> this and could make use of power_of_two_cand as well (in fact I'd

>>> place it next to the pattern above.

>>>

>>

>> Thank you for your suggestions.  I have modified the code according to 

>> your

>> suggestions. But power_of_two_cand is not used, it will cause pr70354-2.c

>> failed ((0x1234ULL << (i % 54)) will transfer to (0x1234ULL << (i & 

>> 54))).

> 

> How exactly did you use power_of_two_cand? As shown in the 

> transformation above, it combines with integer_pow2p, it doesn't replace 

> it.

> 


I used power_of_two_cand as follows:

diff --git a/gcc/match.pd b/gcc/match.pd
index 73834c25593..a90e93d8af0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -598,12 +598,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
     i.e. "X % C" into "X & (C - 1)", if X and C are positive.
     Also optimize A % (C << N)  where C is a power of 2,
-   to A & ((C << N) - 1).  */
+   to A & ((C << N) - 1).  And optimize "A shift (N % C)" where C
+   is a power of 2 and shift operation included "<<" and ">>",
+   to  "A shift (N & (C - 1))".  */
  (match (power_of_two_cand @1)
   INTEGER_CST@1)
  (match (power_of_two_cand @1)
   (lshift INTEGER_CST@1 @2))
  (for mod (trunc_mod floor_mod)
+ (for shift (lshift rshift)
+  (simplify
+   (shift @0 (mod @1 (power_of_two_cand @2)))
+   (if (tree_expr_nonnegative_p (@2))
+   (shift @0 (bit_and @1 (minus @2 { build_int_cst (TREE_TYPE (@2), 1); 
}))))))
   (simplify
    (mod @0 (convert?@3 (power_of_two_cand@1 @2)))
    (if ((TYPE_UNSIGNED (type)

I found that there will be a generated tree_power_of_two_cand function in
generic-match.c and gimple_power_of_two_cand function in gimple-match.c.

bool
tree_power_of_two_cand (tree t, tree *res_ops)
{
   const tree type = TREE_TYPE (t);
   if (TREE_SIDE_EFFECTS (t)) return false;
   switch (TREE_CODE (t))
     {
     case INTEGER_CST:
       {
         {
/* #line 604 "../../gcc-mirror/gcc/match.pd" */
           tree captures[1] ATTRIBUTE_UNUSED = { t };
           if (__builtin_expect (dump_file && (dump_flags & 
TDF_FOLDING), 0)) fprintf (dump_file, "Matching expression %s:%d, 
%s:%d\n", "match.pd", 604, __FILE__, __LINE__);
           res_ops[0] = captures[0];
           return true;
         }
         break;
       }
     case LSHIFT_EXPR:
       {
         tree _p0 = TREE_OPERAND (t, 0);
         tree _p1 = TREE_OPERAND (t, 1);
         switch (TREE_CODE (_p0))
           {
           case INTEGER_CST:
             {
               {
/* #line 606 "../../gcc-mirror/gcc/match.pd" */
                 tree captures[2] ATTRIBUTE_UNUSED = { _p0, _p1 };
                 if (__builtin_expect (dump_file && (dump_flags & 
TDF_FOLDING), 0)) fprintf (dump_file, "Matching expression %s:%d, 
%s:%d\n", "match.pd", 606, __FILE__, __LINE__);
                 res_ops[0] = captures[0];
                 return true;
               }
               break;
             }
           default:;
           }
         break;
       }
     default:;
     }
   return false;
}

In the tree_power_of_two_cand function, we can see that if t is an
INTEGER_CST, it will be captured directly, so using power_of_two_cand
may not be appropriate here.

-- 
BR,
Lijia He
Marc Glisse Feb. 24, 2020, 8:16 a.m. | #8
On Mon, 24 Feb 2020, Li Jia He wrote:

> Hi,

>

> On 2020/2/22 11:12 PM, Marc Glisse wrote:

>> On Tue, 18 Feb 2020, Li Jia He wrote:

>> 

>>>> Also the pattern doing the standalone transform does

>>>> 

>>>> /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,

>>>>     i.e. "X % C" into "X & (C - 1)", if X and C are positive.

>>>>     Also optimize A % (C << N)  where C is a power of 2,

>>>>     to A & ((C << N) - 1).  */

>>>> (match (power_of_two_cand @1)

>>>>   INTEGER_CST@1)

>>>> (match (power_of_two_cand @1)

>>>>   (lshift INTEGER_CST@1 @2))

>>>> (for mod (trunc_mod floor_mod)

>>>>   (simplify

>>>>    (mod @0 (convert?@3 (power_of_two_cand@1 @2)))

>>>>    (if ((TYPE_UNSIGNED (type)

>>>>          || tree_expr_nonnegative_p (@0))

>>>>          && tree_nop_conversion_p (type, TREE_TYPE (@3))

>>>>          && integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0)

>>>>     (bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 1);

>>>> }))))))

>>>> 

>>>> so it also checks whether @2 is not negative, the new pattern lacks

>>>> this and could make use of power_of_two_cand as well (in fact I'd

>>>> place it next to the pattern above.

>>>> 

>>> 

>>> Thank you for your suggestions.  I have modified the code according to 

>>> your

>>> suggestions. But power_of_two_cand is not used, it will cause pr70354-2.c

>>> failed ((0x1234ULL << (i % 54)) will transfer to (0x1234ULL << (i & 54))).

>> 

>> How exactly did you use power_of_two_cand? As shown in the transformation 

>> above, it combines with integer_pow2p, it doesn't replace it.

>> 

>

> I used power_of_two_cand as follows:

>

> diff --git a/gcc/match.pd b/gcc/match.pd

> index 73834c25593..a90e93d8af0 100644

> --- a/gcc/match.pd

> +++ b/gcc/match.pd

> @@ -598,12 +598,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

> /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,

>    i.e. "X % C" into "X & (C - 1)", if X and C are positive.

>    Also optimize A % (C << N)  where C is a power of 2,

> -   to A & ((C << N) - 1).  */

> +   to A & ((C << N) - 1).  And optimize "A shift (N % C)" where C

> +   is a power of 2 and shift operation included "<<" and ">>",

> +   to  "A shift (N & (C - 1))".  */

> (match (power_of_two_cand @1)

>  INTEGER_CST@1)

> (match (power_of_two_cand @1)

>  (lshift INTEGER_CST@1 @2))

> (for mod (trunc_mod floor_mod)

> + (for shift (lshift rshift)

> +  (simplify

> +   (shift @0 (mod @1 (power_of_two_cand @2)))

> +   (if (tree_expr_nonnegative_p (@2))

> +   (shift @0 (bit_and @1 (minus @2 { build_int_cst (TREE_TYPE (@2), 1); 

> }))))))

>  (simplify

>   (mod @0 (convert?@3 (power_of_two_cand@1 @2)))

>   (if ((TYPE_UNSIGNED (type)

>

> I found that there will be a generated tree_power_of_two_cand function in

> generic-match.c and gimple_power_of_two_cand function in gimple-match.c.

>

> bool

> tree_power_of_two_cand (tree t, tree *res_ops)

> {

>  const tree type = TREE_TYPE (t);

>  if (TREE_SIDE_EFFECTS (t)) return false;

>  switch (TREE_CODE (t))

>    {

>    case INTEGER_CST:

>      {

>        {

> /* #line 604 "../../gcc-mirror/gcc/match.pd" */

>          tree captures[1] ATTRIBUTE_UNUSED = { t };

>          if (__builtin_expect (dump_file && (dump_flags & TDF_FOLDING), 0)) 

> fprintf (dump_file, "Matching expression %s:%d, %s:%d\n", "match.pd", 604, 

> __FILE__, __LINE__);

>          res_ops[0] = captures[0];

>          return true;

>        }

>        break;

>      }

>    case LSHIFT_EXPR:

>      {

>        tree _p0 = TREE_OPERAND (t, 0);

>        tree _p1 = TREE_OPERAND (t, 1);

>        switch (TREE_CODE (_p0))

>          {

>          case INTEGER_CST:

>            {

>              {

> /* #line 606 "../../gcc-mirror/gcc/match.pd" */

>                tree captures[2] ATTRIBUTE_UNUSED = { _p0, _p1 };

>                if (__builtin_expect (dump_file && (dump_flags & 

> TDF_FOLDING), 0)) fprintf (dump_file, "Matching expression %s:%d, %s:%d\n", 

> "match.pd", 606, __FILE__, __LINE__);

>                res_ops[0] = captures[0];

>                return true;

>              }

>              break;

>            }

>          default:;

>          }

>        break;

>      }

>    default:;

>    }

>  return false;

> }

>

> In the tree_power_of_two_cand function, we can see that if t is an

> INTEGER_CST, it will be captured directly, so using power_of_two_cand

> may not be appropriate here.


Please look at the one transformation that already uses power_of_two_cand. 
It matches (power_of_two_cand@1 @2), then tests integer_pow2p (@2) && 
tree_int_cst_sgn (@2) > 0, and uses @1 in the output. In some sense, 
tree_power_of_two_cand prepares the argument for a call to integer_pow2p, 
it doesn't replace it (although we could move the extra tests into 
power_of_two_cand if we expect all users will need exactly the same).

-- 
Marc Glisse
Li Jia He Feb. 25, 2020, 11:05 a.m. | #9
Hi,

On 2020/2/24 4:16 PM, Marc Glisse wrote:
> On Mon, 24 Feb 2020, Li Jia He wrote:

> 

>> Hi,

>>

>> On 2020/2/22 11:12 PM, Marc Glisse wrote:

>>> On Tue, 18 Feb 2020, Li Jia He wrote:

>>>

>>>>> Also the pattern doing the standalone transform does

>>>>>

>>>>> /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,

>>>>>     i.e. "X % C" into "X & (C - 1)", if X and C are positive.

>>>>>     Also optimize A % (C << N)  where C is a power of 2,

>>>>>     to A & ((C << N) - 1).  */

>>>>> (match (power_of_two_cand @1)

>>>>>   INTEGER_CST@1)

>>>>> (match (power_of_two_cand @1)

>>>>>   (lshift INTEGER_CST@1 @2))

>>>>> (for mod (trunc_mod floor_mod)

>>>>>   (simplify

>>>>>    (mod @0 (convert?@3 (power_of_two_cand@1 @2)))

>>>>>    (if ((TYPE_UNSIGNED (type)

>>>>>          || tree_expr_nonnegative_p (@0))

>>>>>          && tree_nop_conversion_p (type, TREE_TYPE (@3))

>>>>>          && integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0)

>>>>>     (bit_and @0 (convert (minus @1 { build_int_cst (TREE_TYPE (@1), 

>>>>> 1);

>>>>> }))))))

>>>>>

>>>>> so it also checks whether @2 is not negative, the new pattern lacks

>>>>> this and could make use of power_of_two_cand as well (in fact I'd

>>>>> place it next to the pattern above.

>>>>>

>>>>

>>>> Thank you for your suggestions.  I have modified the code according 

>>>> to your

>>>> suggestions. But power_of_two_cand is not used, it will cause 

>>>> pr70354-2.c

>>>> failed ((0x1234ULL << (i % 54)) will transfer to (0x1234ULL << (i & 

>>>> 54))).

>>>

>>> How exactly did you use power_of_two_cand? As shown in the 

>>> transformation above, it combines with integer_pow2p, it doesn't 

>>> replace it.

>>>

>>

>> I used power_of_two_cand as follows:

>>

>> diff --git a/gcc/match.pd b/gcc/match.pd

>> index 73834c25593..a90e93d8af0 100644

>> --- a/gcc/match.pd

>> +++ b/gcc/match.pd

>> @@ -598,12 +598,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

>> /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,

>>    i.e. "X % C" into "X & (C - 1)", if X and C are positive.

>>    Also optimize A % (C << N)  where C is a power of 2,

>> -   to A & ((C << N) - 1).  */

>> +   to A & ((C << N) - 1).  And optimize "A shift (N % C)" where C

>> +   is a power of 2 and shift operation included "<<" and ">>",

>> +   to  "A shift (N & (C - 1))".  */

>> (match (power_of_two_cand @1)

>>  INTEGER_CST@1)

>> (match (power_of_two_cand @1)

>>  (lshift INTEGER_CST@1 @2))

>> (for mod (trunc_mod floor_mod)

>> + (for shift (lshift rshift)

>> +  (simplify

>> +   (shift @0 (mod @1 (power_of_two_cand @2)))

>> +   (if (tree_expr_nonnegative_p (@2))

>> +   (shift @0 (bit_and @1 (minus @2 { build_int_cst (TREE_TYPE (@2), 

>> 1); }))))))

>>  (simplify

>>   (mod @0 (convert?@3 (power_of_two_cand@1 @2)))

>>   (if ((TYPE_UNSIGNED (type)

>>

>> I found that there will be a generated tree_power_of_two_cand function in

>> generic-match.c and gimple_power_of_two_cand function in gimple-match.c.

>>

>> bool

>> tree_power_of_two_cand (tree t, tree *res_ops)

>> {

>>  const tree type = TREE_TYPE (t);

>>  if (TREE_SIDE_EFFECTS (t)) return false;

>>  switch (TREE_CODE (t))

>>    {

>>    case INTEGER_CST:

>>      {

>>        {

>> /* #line 604 "../../gcc-mirror/gcc/match.pd" */

>>          tree captures[1] ATTRIBUTE_UNUSED = { t };

>>          if (__builtin_expect (dump_file && (dump_flags & 

>> TDF_FOLDING), 0)) fprintf (dump_file, "Matching expression %s:%d, 

>> %s:%d\n", "match.pd", 604, __FILE__, __LINE__);

>>          res_ops[0] = captures[0];

>>          return true;

>>        }

>>        break;

>>      }

>>    case LSHIFT_EXPR:

>>      {

>>        tree _p0 = TREE_OPERAND (t, 0);

>>        tree _p1 = TREE_OPERAND (t, 1);

>>        switch (TREE_CODE (_p0))

>>          {

>>          case INTEGER_CST:

>>            {

>>              {

>> /* #line 606 "../../gcc-mirror/gcc/match.pd" */

>>                tree captures[2] ATTRIBUTE_UNUSED = { _p0, _p1 };

>>                if (__builtin_expect (dump_file && (dump_flags & 

>> TDF_FOLDING), 0)) fprintf (dump_file, "Matching expression %s:%d, 

>> %s:%d\n", "match.pd", 606, __FILE__, __LINE__);

>>                res_ops[0] = captures[0];

>>                return true;

>>              }

>>              break;

>>            }

>>          default:;

>>          }

>>        break;

>>      }

>>    default:;

>>    }

>>  return false;

>> }

>>

>> In the tree_power_of_two_cand function, we can see that if t is an

>> INTEGER_CST, it will be captured directly, so using power_of_two_cand

>> may not be appropriate here.

> 

> Please look at the one transformation that already uses 

> power_of_two_cand. It matches (power_of_two_cand@1 @2), then tests 

> integer_pow2p (@2) && tree_int_cst_sgn (@2) > 0, and uses @1 in the 

> output. In some sense, tree_power_of_two_cand prepares the argument for 

> a call to integer_pow2p, it doesn't replace it (although we could move 

> the extra tests into power_of_two_cand if we expect all users will need 

> exactly the same).

> 


Thank you for your guidance.  I have made some changes based on my
understanding.  Would you like to see it again? Thank you in advance.
The code is in the attachment.

-- 
BR,
Lijia He
diff --git a/gcc/match.pd b/gcc/match.pd
index 73834c25593..9e7619c5be8 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -598,12 +598,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
    i.e. "X % C" into "X & (C - 1)", if X and C are positive.
    Also optimize A % (C << N)  where C is a power of 2,
-   to A & ((C << N) - 1).  */
+   to A & ((C << N) - 1).  And optimize "A shift (B % C)" where C
+   is a power of 2, shift operation included "<<" and ">>" and assume
+   (B % C) will not be negative as shifts negative values would be UB,
+   to  "A shift (B & (C - 1))".  */
 (match (power_of_two_cand @1)
  INTEGER_CST@1)
 (match (power_of_two_cand @1)
  (lshift INTEGER_CST@1 @2))
 (for mod (trunc_mod floor_mod)
+ (for shift (lshift rshift)
+  (simplify
+   (shift @0 (mod @1 (power_of_two_cand@2 @3)))
+   (if (integer_pow2p (@3) && tree_int_cst_sgn (@3) > 0)
+    (shift @0 (bit_and @1 (minus @2 { build_int_cst (TREE_TYPE (@2),
+						      1); }))))))
  (simplify
   (mod @0 (convert?@3 (power_of_two_cand@1 @2)))
   (if ((TYPE_UNSIGNED (type)
diff --git a/gcc/testsuite/gcc.dg/pr66552.c b/gcc/testsuite/gcc.dg/pr66552.c
new file mode 100644
index 00000000000..7583c9ad25a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr66552.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lower" } */
+
+unsigned a(unsigned x, int n)
+{
+  return x >> (n % 32);
+}
+
+unsigned b(unsigned x, int n)
+{
+  return x << (n % 32);
+}
+
+/* { dg-final { scan-tree-dump-not " % " "lower" } } */

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 73834c25593..1d74f7dba7f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -546,6 +546,12 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (simplify
   (mod (mod@2 @0 @1) @1)
   @2)
+ /* Optimize (x shift (n mod C)) to (x shift (n bit_and (C - 1))) when C is a
+    constant and power of two.  */
+ (for shift (lshift rshift)
+  (simplify
+   (shift @0 (mod @1 integer_pow2p@2))
+   (shift @0 (bit_and @1 (minus @2 { build_int_cst (TREE_TYPE (@2), 1); })))))
  /* From extract_muldiv_1: (X * C1) % C2 is zero if C1 is a multiple of C2.  */
  (simplify
   (mod (mult @0 INTEGER_CST@1) INTEGER_CST@2)
diff --git a/gcc/testsuite/gcc.dg/pr66552.c b/gcc/testsuite/gcc.dg/pr66552.c
new file mode 100644
index 00000000000..7583c9ad25a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr66552.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lower" } */
+
+unsigned a(unsigned x, int n)
+{
+  return x >> (n % 32);
+}
+
+unsigned b(unsigned x, int n)
+{
+  return x << (n % 32);
+}
+
+/* { dg-final { scan-tree-dump-not " % " "lower" } } */