[1/2,expr.c] Optimize switch with sign-extended index.

Message ID 20180502220516.6521-1-jimw@sifive.com
State New
Headers show
Series
  • [1/2,expr.c] Optimize switch with sign-extended index.
Related show

Commit Message

Jim Wilson May 2, 2018, 10:05 p.m.
This improves the code for a switch statement on targets that sign-extend
function arguments, such as RISC-V.  Given a simple testcase

extern void asdf(int);
void foo(int x) {
  switch (x) {
  case 0: asdf(10); break;
  case 1: asdf(11); break;
  case 2: asdf(12); break;
  case 3: asdf(13); break;
  case 4: asdf(14); break;
  }
}

Compiled for a 64-bit target, we get for the tablejump

	li	a5,4
	bgtu	a0,a5,.L1
	slli	a0,a0,32
	lui	a5,%hi(.L4)
	addi	a5,a5,%lo(.L4)
	srli	a0,a0,30
	add	a0,a0,a5
	lw	a5,0(a0)
	jr	a5

There is some unnecessary shifting here.  a0 (x) gets shifted left by 32 then
shifted right by 30 to zero-extend it and multiply by 4 for the table index.
However, after the unsigned greater than branch, we know the value is between
0 and 4.  We also know that a 32-bit int is passed as a 64-bit sign-extended
long for this target.  Thus we get the same exact value if we sign-extend
instead of zero-extend, and the code is one instruction shorter.  We get a slli
by 2 instead of the slli 32/srli 30.

The following patch implements this optimization.  It checks for a range that
does not have the sign-bit set, and an index value that is already sign
extended, and then does a sign extend instead of an zero extend.

This has been tested with a riscv{32,64}-{elf,linux} builds and testsuite runs.
There were no regressions.  It was also tested with an x86_64-linux build and
testsuite run.

Ok?

Jim

	gcc/
	* expr.c (do_tablejump): When converting index to Pmode, if we have a
	sign extended promoted subreg, and the range does not have the sign bit
	set, then do a sign extend.
---
 gcc/expr.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

-- 
2.14.1

Comments

Richard Biener May 3, 2018, 7:29 a.m. | #1
On Thu, May 3, 2018 at 12:05 AM, Jim Wilson <jimw@sifive.com> wrote:
> This improves the code for a switch statement on targets that sign-extend

> function arguments, such as RISC-V.  Given a simple testcase

>

> extern void asdf(int);

> void foo(int x) {

>   switch (x) {

>   case 0: asdf(10); break;

>   case 1: asdf(11); break;

>   case 2: asdf(12); break;

>   case 3: asdf(13); break;

>   case 4: asdf(14); break;

>   }

> }

>

> Compiled for a 64-bit target, we get for the tablejump

>

>         li      a5,4

>         bgtu    a0,a5,.L1

>         slli    a0,a0,32

>         lui     a5,%hi(.L4)

>         addi    a5,a5,%lo(.L4)

>         srli    a0,a0,30

>         add     a0,a0,a5

>         lw      a5,0(a0)

>         jr      a5

>

> There is some unnecessary shifting here.  a0 (x) gets shifted left by 32 then

> shifted right by 30 to zero-extend it and multiply by 4 for the table index.

> However, after the unsigned greater than branch, we know the value is between

> 0 and 4.  We also know that a 32-bit int is passed as a 64-bit sign-extended

> long for this target.  Thus we get the same exact value if we sign-extend

> instead of zero-extend, and the code is one instruction shorter.  We get a slli

> by 2 instead of the slli 32/srli 30.

>

> The following patch implements this optimization.  It checks for a range that

> does not have the sign-bit set, and an index value that is already sign

> extended, and then does a sign extend instead of an zero extend.

>

> This has been tested with a riscv{32,64}-{elf,linux} builds and testsuite runs.

> There were no regressions.  It was also tested with an x86_64-linux build and

> testsuite run.

>

> Ok?


Just as a note, IIRC all the SUBREG_PROMOTED_* stuff is quite fragile
- I remember
Eric fixing things up a bit but some verification would be nice to
have (instrumentation
at RTL level that for SUBREG_PROMOTED_* the bits are as expected).

Richard.

> Jim

>

>         gcc/

>         * expr.c (do_tablejump): When converting index to Pmode, if we have a

>         sign extended promoted subreg, and the range does not have the sign bit

>         set, then do a sign extend.

> ---

>  gcc/expr.c | 19 +++++++++++++++++--

>  1 file changed, 17 insertions(+), 2 deletions(-)

>

> diff --git a/gcc/expr.c b/gcc/expr.c

> index 9dd0e60d24d..919e20a22f7 100644

> --- a/gcc/expr.c

> +++ b/gcc/expr.c

> @@ -11782,11 +11782,26 @@ do_tablejump (rtx index, machine_mode mode, rtx range, rtx table_label,

>      emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,

>                              default_label, default_probability);

>

> -

>    /* If index is in range, it must fit in Pmode.

>       Convert to Pmode so we can index with it.  */

>    if (mode != Pmode)

> -    index = convert_to_mode (Pmode, index, 1);

> +    {

> +      unsigned int width;

> +

> +      /* We know the value of INDEX is between 0 and RANGE.  If we have a

> +        sign-extended subreg, and RANGE does not have the sign bit set, then

> +        we have a value that is valid for both sign and zero extension.  In

> +        this case, we get better code if we sign extend.  */

> +      if (GET_CODE (index) == SUBREG

> +         && SUBREG_PROMOTED_VAR_P (index)

> +         && SUBREG_PROMOTED_SIGNED_P (index)

> +         && ((width = GET_MODE_PRECISION (as_a <scalar_int_mode> (mode)))

> +             <= HOST_BITS_PER_WIDE_INT)

> +         && ! (INTVAL (range) & (HOST_WIDE_INT_1U << (width - 1))))

> +       index = convert_to_mode (Pmode, index, 0);

> +      else

> +       index = convert_to_mode (Pmode, index, 1);

> +    }

>

>    /* Don't let a MEM slip through, because then INDEX that comes

>       out of PIC_CASE_VECTOR_ADDRESS won't be a valid address,

> --

> 2.14.1

>
Jim Wilson May 3, 2018, 9:29 p.m. | #2
On Thu, May 3, 2018 at 12:29 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> Just as a note, IIRC all the SUBREG_PROMOTED_* stuff is quite fragile

> - I remember

> Eric fixing things up a bit but some verification would be nice to

> have (instrumentation

> at RTL level that for SUBREG_PROMOTED_* the bits are as expected).


If you are using SUBREG_PROMOTED_* in a late optimization pass like
combine, then this requires that all earlier optimization passes
propagate the info correctly.  I suppose there could be issues there.
But that isn't what this patch is doing.  This is code called during
initial RTL generation.  The SUBREG_PROMOTED_* bits are set during
this process because we know that arguments are passed sign-extended
to full register size.  We are then consuming the info while still in
the RTL generation phase.  I think that there is little that can go
wrong here.

Verifying this info in RTL generation would effectively be verifying
that the argument passing conventions are implemented correctly, and
we already have other ways to do that.

It might be useful to try to verify this info before combine where it
is more likely to be wrong.  I don't think there is any easy way to
verify this at compile time.  This would probably require emitting
code to check at application run-time that a promoted subreg actually
has a properly promoted value, and call abort if it doesn't.  This
would likely be an expensive check that we don't want enabled by
default, but might be useful for debugging purposes.  I don't think we
have any --enable-checking code like this at present.  We have
compiler compile-time checking and compiler run-time checking, but I
don't think that we have application run-time checking.  This would be
more like a sanitizer option, except to validate info in the RTL.  Is
this what you are asking for?

Jim
Richard Biener May 4, 2018, 7:18 a.m. | #3
On Thu, May 3, 2018 at 11:29 PM, Jim Wilson <jimw@sifive.com> wrote:
> On Thu, May 3, 2018 at 12:29 AM, Richard Biener

> <richard.guenther@gmail.com> wrote:

>> Just as a note, IIRC all the SUBREG_PROMOTED_* stuff is quite fragile

>> - I remember

>> Eric fixing things up a bit but some verification would be nice to

>> have (instrumentation

>> at RTL level that for SUBREG_PROMOTED_* the bits are as expected).

>

> If you are using SUBREG_PROMOTED_* in a late optimization pass like

> combine, then this requires that all earlier optimization passes

> propagate the info correctly.  I suppose there could be issues there.

> But that isn't what this patch is doing.  This is code called during

> initial RTL generation.  The SUBREG_PROMOTED_* bits are set during

> this process because we know that arguments are passed sign-extended

> to full register size.  We are then consuming the info while still in

> the RTL generation phase.  I think that there is little that can go

> wrong here.


Indeed.  But IIRC the info stays around and people might be tempted to use it...
I do see various uses in later RTL optimizers.

> Verifying this info in RTL generation would effectively be verifying

> that the argument passing conventions are implemented correctly, and

> we already have other ways to do that.

>

> It might be useful to try to verify this info before combine where it

> is more likely to be wrong.  I don't think there is any easy way to

> verify this at compile time.  This would probably require emitting

> code to check at application run-time that a promoted subreg actually

> has a properly promoted value, and call abort if it doesn't.  This

> would likely be an expensive check that we don't want enabled by

> default, but might be useful for debugging purposes.  I don't think we

> have any --enable-checking code like this at present.  We have

> compiler compile-time checking and compiler run-time checking, but I

> don't think that we have application run-time checking.  This would be

> more like a sanitizer option, except to validate info in the RTL.  Is

> this what you are asking for?


Yes, that's what I was suggesting.  I guess similarly "sanitizing"
on-the-side info we have at GIMPLE level would be interesting, like
verifying range-info.

Just an idea - I'm not actually expecting you to implemen this,
esp. since doing the actual instrumentation on RTL can be a bit
tricky (best not emit a call to abort but use the targets trap
instruction for simplicity).

Richard.

>

> Jim
Jim Wilson May 9, 2018, 9:20 p.m. | #4
On Wed, May 2, 2018 at 3:05 PM, Jim Wilson <jimw@sifive.com> wrote:
> This improves the code for a switch statement on targets that sign-extend

> function arguments, such as RISC-V.  Given a simple testcase

> ...

>         gcc/

>         * expr.c (do_tablejump): When converting index to Pmode, if we have a

>         sign extended promoted subreg, and the range does not have the sign bit

>         set, then do a sign extend.


Ping.

Jim
Jim Wilson May 16, 2018, 4:57 p.m. | #5
On Wed, May 9, 2018 at 2:20 PM, Jim Wilson <jimw@sifive.com> wrote:
> On Wed, May 2, 2018 at 3:05 PM, Jim Wilson <jimw@sifive.com> wrote:

>> This improves the code for a switch statement on targets that sign-extend

>> function arguments, such as RISC-V.  Given a simple testcase

>> ...

>>         gcc/

>>         * expr.c (do_tablejump): When converting index to Pmode, if we have a

>>         sign extended promoted subreg, and the range does not have the sign bit

>>         set, then do a sign extend.


ping^2

https://gcc.gnu.org/ml/gcc-patches/2018-05/msg00118.html

Jim
Eric Botcazou May 17, 2018, 7:25 a.m. | #6
> The following patch implements this optimization.  It checks for a range

> that does not have the sign-bit set, and an index value that is already

> sign extended, and then does a sign extend instead of an zero extend.

> 

> This has been tested with a riscv{32,64}-{elf,linux} builds and testsuite

> runs. There were no regressions.  It was also tested with an x86_64-linux

> build and testsuite run.

> 

> Ok?

> 

> Jim

> 

> 	gcc/

> 	* expr.c (do_tablejump): When converting index to Pmode, if we have a

> 	sign extended promoted subreg, and the range does not have the sign bit

> 	set, then do a sign extend.


Richard dragged me into this so I feel somewhat entitled to step up...

The patch looks OK to me, modulo:

> +      /* We know the value of INDEX is between 0 and RANGE.  If we have a

> +	 sign-extended subreg, and RANGE does not have the sign bit set, then

> +	 we have a value that is valid for both sign and zero extension.  In

> +	 this case, we get better code if we sign extend.  */

> +      if (GET_CODE (index) == SUBREG

> +	  && SUBREG_PROMOTED_VAR_P (index)

> +	  && SUBREG_PROMOTED_SIGNED_P (index)

> +	  && ((width = GET_MODE_PRECISION (as_a <scalar_int_mode> (mode)))

> +	      <= HOST_BITS_PER_WIDE_INT)

> +	  && ! (INTVAL (range) & (HOST_WIDE_INT_1U << (width - 1))))


I'd use UINTVAL instead of INTVAL here.

-- 
Eric Botcazou
Jim Wilson May 17, 2018, 10:41 p.m. | #7
On Thu, May 17, 2018 at 12:25 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> The patch looks OK to me, modulo:

>

>> +       && ! (INTVAL (range) & (HOST_WIDE_INT_1U << (width - 1))))

>

> I'd use UINTVAL instead of INTVAL here.


Thanks.  Committed with that change.

Jim
Jeff Law May 19, 2018, 1:38 p.m. | #8
On 05/02/2018 04:05 PM, Jim Wilson wrote:
> This improves the code for a switch statement on targets that sign-extend

> function arguments, such as RISC-V.  Given a simple testcase

> 

> extern void asdf(int);

> void foo(int x) {

>   switch (x) {

>   case 0: asdf(10); break;

>   case 1: asdf(11); break;

>   case 2: asdf(12); break;

>   case 3: asdf(13); break;

>   case 4: asdf(14); break;

>   }

> }

> 

> Compiled for a 64-bit target, we get for the tablejump

> 

> 	li	a5,4

> 	bgtu	a0,a5,.L1

> 	slli	a0,a0,32

> 	lui	a5,%hi(.L4)

> 	addi	a5,a5,%lo(.L4)

> 	srli	a0,a0,30

> 	add	a0,a0,a5

> 	lw	a5,0(a0)

> 	jr	a5

> 

> There is some unnecessary shifting here.  a0 (x) gets shifted left by 32 then

> shifted right by 30 to zero-extend it and multiply by 4 for the table index.

> However, after the unsigned greater than branch, we know the value is between

> 0 and 4.  We also know that a 32-bit int is passed as a 64-bit sign-extended

> long for this target.  Thus we get the same exact value if we sign-extend

> instead of zero-extend, and the code is one instruction shorter.  We get a slli

> by 2 instead of the slli 32/srli 30.

> 

> The following patch implements this optimization.  It checks for a range that

> does not have the sign-bit set, and an index value that is already sign

> extended, and then does a sign extend instead of an zero extend.

> 

> This has been tested with a riscv{32,64}-{elf,linux} builds and testsuite runs.

> There were no regressions.  It was also tested with an x86_64-linux build and

> testsuite run.

> 

> Ok?

> 

> Jim

> 

> 	gcc/

> 	* expr.c (do_tablejump): When converting index to Pmode, if we have a

> 	sign extended promoted subreg, and the range does not have the sign bit

> 	set, then do a sign extend.

OK.

Jeff

Patch

diff --git a/gcc/expr.c b/gcc/expr.c
index 9dd0e60d24d..919e20a22f7 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11782,11 +11782,26 @@  do_tablejump (rtx index, machine_mode mode, rtx range, rtx table_label,
     emit_cmp_and_jump_insns (index, range, GTU, NULL_RTX, mode, 1,
 			     default_label, default_probability);
 
-
   /* If index is in range, it must fit in Pmode.
      Convert to Pmode so we can index with it.  */
   if (mode != Pmode)
-    index = convert_to_mode (Pmode, index, 1);
+    {
+      unsigned int width;
+
+      /* We know the value of INDEX is between 0 and RANGE.  If we have a
+	 sign-extended subreg, and RANGE does not have the sign bit set, then
+	 we have a value that is valid for both sign and zero extension.  In
+	 this case, we get better code if we sign extend.  */
+      if (GET_CODE (index) == SUBREG
+	  && SUBREG_PROMOTED_VAR_P (index)
+	  && SUBREG_PROMOTED_SIGNED_P (index)
+	  && ((width = GET_MODE_PRECISION (as_a <scalar_int_mode> (mode)))
+	      <= HOST_BITS_PER_WIDE_INT)
+	  && ! (INTVAL (range) & (HOST_WIDE_INT_1U << (width - 1))))
+	index = convert_to_mode (Pmode, index, 0);
+      else
+	index = convert_to_mode (Pmode, index, 1);
+    }
 
   /* Don't let a MEM slip through, because then INDEX that comes
      out of PIC_CASE_VECTOR_ADDRESS won't be a valid address,