[PR84660] Fix combine bug with SHIFT_COUNT_TRUNCATED.

Message ID 20180320221023.9712-1-jimw@sifive.com
State New
Headers show
Series
  • [PR84660] Fix combine bug with SHIFT_COUNT_TRUNCATED.
Related show

Commit Message

Jim Wilson March 20, 2018, 10:10 p.m.
This fixes a wrong-code issue on RISC-V, but in theory could be a problem for
any SHIFT_COUNT_TRUNCATED target.

The testcase computes 46 or 47 (0x2e or 0x2f), then ANDs the value with 0xf,
and then SHIFTs the value.  On a SHIFT_COUNT_TRUNCATED target, the AND can get
optimized away because for a 32-bit shift we can use the 46/47 directly and
still get the correct result.  Combine then tries to convert the 32-bit shift
into a 64-bit shift, and then we have a problem, as the AND 0xf is no longer
redundant.  So we must prevent combine from converting a 32-bit shift to a
64-bit shift on a SHIFT_COUNT_TRUNCATED target when there are non-zero bits
in the shift count that matter for the larger shift mode.

Combine already has code to handle the case where shifts are being narrowed
and this accidentally changes the shift amount.  This patch adds additional
code to handle the case where shifts are being widened.

This was tested with a cross riscv64-linux toolchain build and make check.
The new testcase fails without the patch, and works with the patch.  This was
also tested with an x86_64-linux build and make check.  There were no
regressions.

OK?

Jim

	gcc/
	PR rtl-optimization/84660
	* combine.c (force_int_to_mode) <ASHIFT>: If SHIFT_COUNT_TRUNCATED,
	call nonzero_bits and compare against xmode precision.

	gcc/testsuite/
	PR rtl-optimization/84660
	* gcc.dg/pr84660.c: New.
---
 gcc/combine.c                  | 10 ++++++++--
 gcc/testsuite/gcc.dg/pr84660.c | 17 +++++++++++++++++
 2 files changed, 25 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr84660.c

-- 
2.14.1

Comments

Richard Biener March 21, 2018, 9:45 a.m. | #1
On Tue, Mar 20, 2018 at 11:10 PM, Jim Wilson <jimw@sifive.com> wrote:
> This fixes a wrong-code issue on RISC-V, but in theory could be a problem for

> any SHIFT_COUNT_TRUNCATED target.

>

> The testcase computes 46 or 47 (0x2e or 0x2f), then ANDs the value with 0xf,

> and then SHIFTs the value.  On a SHIFT_COUNT_TRUNCATED target, the AND can get

> optimized away because for a 32-bit shift we can use the 46/47 directly and

> still get the correct result.  Combine then tries to convert the 32-bit shift

> into a 64-bit shift, and then we have a problem, as the AND 0xf is no longer

> redundant.  So we must prevent combine from converting a 32-bit shift to a

> 64-bit shift on a SHIFT_COUNT_TRUNCATED target when there are non-zero bits

> in the shift count that matter for the larger shift mode.

>

> Combine already has code to handle the case where shifts are being narrowed

> and this accidentally changes the shift amount.  This patch adds additional

> code to handle the case where shifts are being widened.

>

> This was tested with a cross riscv64-linux toolchain build and make check.

> The new testcase fails without the patch, and works with the patch.  This was

> also tested with an x86_64-linux build and make check.  There were no

> regressions.

>

> OK?


IMHO the real issue is that SHIFT_COUNT_TRUNCATED is used for
optimizing away the and early.  We then rely on the compiler not
assuming anything on the value.  Like if you'd do that on GIMPLE
VRP would come along and second-guess you because the shift
value may not be too large.  This combine issue sounds very similar.

So I'm suggesting again to instead provide patterns that
match (shft A (and B cst))

Richard.

> Jim

>

>         gcc/

>         PR rtl-optimization/84660

>         * combine.c (force_int_to_mode) <ASHIFT>: If SHIFT_COUNT_TRUNCATED,

>         call nonzero_bits and compare against xmode precision.

>

>         gcc/testsuite/

>         PR rtl-optimization/84660

>         * gcc.dg/pr84660.c: New.

> ---

>  gcc/combine.c                  | 10 ++++++++--

>  gcc/testsuite/gcc.dg/pr84660.c | 17 +++++++++++++++++

>  2 files changed, 25 insertions(+), 2 deletions(-)

>  create mode 100644 gcc/testsuite/gcc.dg/pr84660.c

>

> diff --git a/gcc/combine.c b/gcc/combine.c

> index ff672ad2adb..4ed59eb88c8 100644

> --- a/gcc/combine.c

> +++ b/gcc/combine.c

> @@ -8897,14 +8897,20 @@ force_int_to_mode (rtx x, scalar_int_mode mode, scalar_int_mode xmode,

>          However, we cannot do anything with shifts where we cannot

>          guarantee that the counts are smaller than the size of the mode

>          because such a count will have a different meaning in a

> -        wider mode.  */

> +        different mode.  If we are narrowing the mode, the shift count must

> +        be compared against mode.  If we are widening the mode, and shift

> +        counts are truncated, then the shift count must be compared against

> +        xmode.  */

>

>        if (! (CONST_INT_P (XEXP (x, 1))

>              && INTVAL (XEXP (x, 1)) >= 0

>              && INTVAL (XEXP (x, 1)) < GET_MODE_PRECISION (mode))

>           && ! (GET_MODE (XEXP (x, 1)) != VOIDmode

>                 && (nonzero_bits (XEXP (x, 1), GET_MODE (XEXP (x, 1)))

> -                   < (unsigned HOST_WIDE_INT) GET_MODE_PRECISION (mode))))

> +                   < (unsigned HOST_WIDE_INT) GET_MODE_PRECISION (mode))

> +               && (! SHIFT_COUNT_TRUNCATED

> +                   || (nonzero_bits (XEXP (x, 1), GET_MODE (XEXP (x, 1)))

> +                       < (unsigned HOST_WIDE_INT) GET_MODE_PRECISION (xmode)))))

>         break;

>

>        /* If the shift count is a constant and we can do arithmetic in

> diff --git a/gcc/testsuite/gcc.dg/pr84660.c b/gcc/testsuite/gcc.dg/pr84660.c

> new file mode 100644

> index 00000000000..a87fa0a914d

> --- /dev/null

> +++ b/gcc/testsuite/gcc.dg/pr84660.c

> @@ -0,0 +1,17 @@

> +/* { dg-do run } */

> +/* { dg-options "-O2" } */

> +

> +extern void abort (void);

> +extern void exit (int);

> +

> +unsigned int __attribute__ ((noinline, noclone))

> +foo(unsigned int i) {

> +

> +  return 0xFFFF & (0xd066 << (((i & 0x1) ^ 0x2f) & 0xf));

> +}

> +

> +int main() {

> +  if (foo (1) != 0x8000)

> +    abort ();

> +  exit (0);

> +}

> --

> 2.14.1
Jim Wilson March 22, 2018, 5:58 p.m. | #2
On Wed, Mar 21, 2018 at 2:45 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Mar 20, 2018 at 11:10 PM, Jim Wilson <jimw@sifive.com> wrote:

>> This fixes a wrong-code issue on RISC-V, but in theory could be a problem for

>> any SHIFT_COUNT_TRUNCATED target.


> IMHO the real issue is that SHIFT_COUNT_TRUNCATED is used for

> optimizing away the and early.  We then rely on the compiler not

> assuming anything on the value.  Like if you'd do that on GIMPLE

> VRP would come along and second-guess you because the shift

> value may not be too large.  This combine issue sounds very similar.

>

> So I'm suggesting again to instead provide patterns that

> match (shft A (and B cst))


I found a message from 2013 that talks about this.
    https://gcc.gnu.org/ml/gcc-patches/2013-11/msg00169.html
I can try looking at this.

I'd question the timing though.  I don't think that trying to rewrite
shift patterns in the RISC-V port at this point in the release process
is a good idea.  I think we should fix the combine bug now with a
simple patch, and try reworking the shift support after the gcc-8
release.

Jim
Richard Biener March 23, 2018, 12:33 p.m. | #3
On Thu, Mar 22, 2018 at 6:58 PM, Jim Wilson <jimw@sifive.com> wrote:
> On Wed, Mar 21, 2018 at 2:45 AM, Richard Biener

> <richard.guenther@gmail.com> wrote:

>> On Tue, Mar 20, 2018 at 11:10 PM, Jim Wilson <jimw@sifive.com> wrote:

>>> This fixes a wrong-code issue on RISC-V, but in theory could be a problem for

>>> any SHIFT_COUNT_TRUNCATED target.

>

>> IMHO the real issue is that SHIFT_COUNT_TRUNCATED is used for

>> optimizing away the and early.  We then rely on the compiler not

>> assuming anything on the value.  Like if you'd do that on GIMPLE

>> VRP would come along and second-guess you because the shift

>> value may not be too large.  This combine issue sounds very similar.

>>

>> So I'm suggesting again to instead provide patterns that

>> match (shft A (and B cst))

>

> I found a message from 2013 that talks about this.

>     https://gcc.gnu.org/ml/gcc-patches/2013-11/msg00169.html

> I can try looking at this.

>

> I'd question the timing though.  I don't think that trying to rewrite

> shift patterns in the RISC-V port at this point in the release process

> is a good idea.  I think we should fix the combine bug now with a

> simple patch, and try reworking the shift support after the gcc-8

> release.


I'm leaving the "simple" combiner patch to review by others
but for RISC-V you could simply #define SHIFT_COUNT_TRUNCATED to zero
to fix the issue.  Then add patterns if it turns out to be required
to avoid regressions.  For example x86 has

;; Avoid useless masking of count operand.
(define_insn_and_split "*<shift_insn><mode>3_mask"
  [(set (match_operand:SWI48 0 "nonimmediate_operand")
        (any_shiftrt:SWI48
          (match_operand:SWI48 1 "nonimmediate_operand")
          (subreg:QI
            (and:SI
              (match_operand:SI 2 "register_operand" "c,r")
              (match_operand:SI 3 "const_int_operand")) 0)))
   (clobber (reg:CC FLAGS_REG))]
  "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)
   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1))
      == GET_MODE_BITSIZE (<MODE>mode)-1
   && can_create_pseudo_p ()"
  "#"
  "&& 1"
  [(parallel
     [(set (match_dup 0)
           (any_shiftrt:SWI48 (match_dup 1)
                              (match_dup 2)))
      (clobber (reg:CC FLAGS_REG))])]
  "operands[2] = gen_lowpart (QImode, operands[2]);"
  [(set_attr "isa" "*,bmi2")])

not sure why it uses define_insn_and_split here.  The splitter exactly
re-introduces the issue you hit with SHIFT_COUNT_TRUNCATED
simplification...

I suppose shift expanders of the x86 backends ensure QImode
shift counts here to more reliably match the above.

Richard.

> Jim
Jim Wilson April 2, 2018, 10:48 p.m. | #4
On Fri, Mar 23, 2018 at 5:33 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> I'm leaving the "simple" combiner patch to review by others

> but for RISC-V you could simply #define SHIFT_COUNT_TRUNCATED to zero

> to fix the issue.  Then add patterns if it turns out to be required

> to avoid regressions.  For example x86 has


I checked in a patch to do this.  It took me quite a few iterations to
work out how to do it without regressions, but it looks OK in the end.
6 additional patterns and changing the shift count mode to QImode in
all existing shift patterns and I get effectively the same code as
before with SHIFT_COUNT_TRUNCATED turned off.  With DImode shift
counts, I ran into problems with unnecessary sign/zero extensions from
SImode for shift counts.  With SImode shift counts, I ran into
problems with unnecessary truncations from DImode.  Using QImode shift
counts avoided both of those problems.

That leaves the other targets still using SHIFT_COUNT_TRUNCATED as
potentially broken.  I count 16 of them.  Though this particular
testcase might not be triggerable on most of them, since it requires
shift patterns for two different modes, and most 32-bit targets
probably only have one mode that shift is supported for.

Jim
Richard Biener April 3, 2018, 11:47 a.m. | #5
On Tue, Apr 3, 2018 at 12:48 AM, Jim Wilson <jimw@sifive.com> wrote:
> On Fri, Mar 23, 2018 at 5:33 AM, Richard Biener

> <richard.guenther@gmail.com> wrote:

>> I'm leaving the "simple" combiner patch to review by others

>> but for RISC-V you could simply #define SHIFT_COUNT_TRUNCATED to zero

>> to fix the issue.  Then add patterns if it turns out to be required

>> to avoid regressions.  For example x86 has

>

> I checked in a patch to do this.  It took me quite a few iterations to

> work out how to do it without regressions, but it looks OK in the end.

> 6 additional patterns and changing the shift count mode to QImode in

> all existing shift patterns and I get effectively the same code as

> before with SHIFT_COUNT_TRUNCATED turned off.  With DImode shift

> counts, I ran into problems with unnecessary sign/zero extensions from

> SImode for shift counts.  With SImode shift counts, I ran into

> problems with unnecessary truncations from DImode.  Using QImode shift

> counts avoided both of those problems.

>

> That leaves the other targets still using SHIFT_COUNT_TRUNCATED as

> potentially broken.  I count 16 of them.  Though this particular

> testcase might not be triggerable on most of them, since it requires

> shift patterns for two different modes, and most 32-bit targets

> probably only have one mode that shift is supported for.


Thanks a lot for going this route for fixing.  I only count the sparc
as important platform unconditionally defining it to 1, the rest is
weird embedded targets or targest with at least one mode where
it is set to zero.  OK, maybe add mips to the list of targets.

So I'd be happy to just kill it off early during GCC 9 development...

But I guess I proposed this before and people weren't entirely
happy.

Richard.

> Jim

Patch

diff --git a/gcc/combine.c b/gcc/combine.c
index ff672ad2adb..4ed59eb88c8 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -8897,14 +8897,20 @@  force_int_to_mode (rtx x, scalar_int_mode mode, scalar_int_mode xmode,
 	 However, we cannot do anything with shifts where we cannot
 	 guarantee that the counts are smaller than the size of the mode
 	 because such a count will have a different meaning in a
-	 wider mode.  */
+	 different mode.  If we are narrowing the mode, the shift count must
+	 be compared against mode.  If we are widening the mode, and shift
+	 counts are truncated, then the shift count must be compared against
+	 xmode.  */
 
       if (! (CONST_INT_P (XEXP (x, 1))
 	     && INTVAL (XEXP (x, 1)) >= 0
 	     && INTVAL (XEXP (x, 1)) < GET_MODE_PRECISION (mode))
 	  && ! (GET_MODE (XEXP (x, 1)) != VOIDmode
 		&& (nonzero_bits (XEXP (x, 1), GET_MODE (XEXP (x, 1)))
-		    < (unsigned HOST_WIDE_INT) GET_MODE_PRECISION (mode))))
+		    < (unsigned HOST_WIDE_INT) GET_MODE_PRECISION (mode))
+		&& (! SHIFT_COUNT_TRUNCATED
+		    || (nonzero_bits (XEXP (x, 1), GET_MODE (XEXP (x, 1)))
+			< (unsigned HOST_WIDE_INT) GET_MODE_PRECISION (xmode)))))
 	break;
 
       /* If the shift count is a constant and we can do arithmetic in
diff --git a/gcc/testsuite/gcc.dg/pr84660.c b/gcc/testsuite/gcc.dg/pr84660.c
new file mode 100644
index 00000000000..a87fa0a914d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr84660.c
@@ -0,0 +1,17 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+extern void abort (void);
+extern void exit (int);
+
+unsigned int __attribute__ ((noinline, noclone))
+foo(unsigned int i) {
+
+  return 0xFFFF & (0xd066 << (((i & 0x1) ^ 0x2f) & 0xf));
+}
+
+int main() {
+  if (foo (1) != 0x8000)
+    abort ();
+  exit (0);
+}