[v2,Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

Message ID 44d5392a-f033-ed0d-d679-116b3eafa0b9@arm.com
State New
Headers show
Series
  • [v2,Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks
Related show

Commit Message

Sam Tebbs Aug. 1, 2018, 3:07 p.m.
Hi all,

This patch adds an optimisation that exploits the AArch64 BFXIL
instruction when or-ing the result of two bitwise and operations
with non-overlapping bitmasks
(e.g. (a & 0xFFFF0000) | (b & 0x0000FFFF)).

Example:

unsigned long long combine(unsigned long long a, unsigned long
long b) {
   return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);
}

void read(unsigned long long a, unsigned long long b, unsigned
long long *c) {
   *c = combine(a, b);
}

When compiled with -O2, read would result in:

read:
   and   x5, x1, #0xffffffff
   and   x4, x0, #0xffffffff00000000
   orr   x4, x4, x5
   str   x4, [x2]
   ret

But with this patch results in:

read:
   mov    x4, x0
   bfxil    x4, x1, 0, 32
   str    x4, [x2]
   ret

Bootstrapped and regtested on aarch64-none-linux-gnu and
aarch64-none-elf with no regressions.


gcc/
2018-08-01  Sam Tebbs<sam.tebbs@arm.com>

         PR target/85628
         * config/aarch64/aarch64.md (*aarch64_bfxil):
         Define.
         * config/aarch64/constraints.md (Ulc): Define
         * config/aarch64/aarch64-protos.h
         (aarch64_is_left_consecutive): Define.
         * config/aarch64/aarch64.c (aarch64_is_left_consecutive):
         New function.

gcc/testsuite
2018-08-01  Sam Tebbs<sam.tebbs@arm.com>

         PR target/85628
         * gcc.target/aarch64/combine_bfxil.c: New file.
         * gcc.target/aarch64/combine_bfxil_2.c: New file.

Comments

Sam Tebbs Aug. 28, 2018, 12:28 p.m. | #1
ping

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00108.html


On 08/01/2018 04:07 PM, Sam Tebbs wrote:
> Hi all,

>

> This patch adds an optimisation that exploits the AArch64 BFXIL

> instruction when or-ing the result of two bitwise and operations

> with non-overlapping bitmasks

> (e.g. (a & 0xFFFF0000) | (b & 0x0000FFFF)).

>

> Example:

>

> unsigned long long combine(unsigned long long a, unsigned long

> long b) {

>   return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);

> }

>

> void read(unsigned long long a, unsigned long long b, unsigned

> long long *c) {

>   *c = combine(a, b);

> }

>

> When compiled with -O2, read would result in:

>

> read:

>   and   x5, x1, #0xffffffff

>   and   x4, x0, #0xffffffff00000000

>   orr   x4, x4, x5

>   str   x4, [x2]

>   ret

>

> But with this patch results in:

>

> read:

>   mov    x4, x0

>   bfxil    x4, x1, 0, 32

>   str    x4, [x2]

>   ret

>

> Bootstrapped and regtested on aarch64-none-linux-gnu and

> aarch64-none-elf with no regressions.

>

>

> gcc/

> 2018-08-01  Sam Tebbs<sam.tebbs@arm.com>

>

>         PR target/85628

>         * config/aarch64/aarch64.md (*aarch64_bfxil):

>         Define.

>         * config/aarch64/constraints.md (Ulc): Define

>         * config/aarch64/aarch64-protos.h

>         (aarch64_is_left_consecutive): Define.

>         * config/aarch64/aarch64.c (aarch64_is_left_consecutive):

>         New function.

>

> gcc/testsuite

> 2018-08-01  Sam Tebbs<sam.tebbs@arm.com>

>

>         PR target/85628

>         * gcc.target/aarch64/combine_bfxil.c: New file.

>         * gcc.target/aarch64/combine_bfxil_2.c: New file.

>
James Greenhalgh Aug. 28, 2018, 10:53 p.m. | #2
On Wed, Aug 01, 2018 at 10:07:23AM -0500, Sam Tebbs wrote:
> Hi all,

> 

> This patch adds an optimisation that exploits the AArch64 BFXIL

> instruction when or-ing the result of two bitwise and operations

> with non-overlapping bitmasks

> (e.g. (a & 0xFFFF0000) | (b & 0x0000FFFF)).

> 

> Example:

> 

> unsigned long long combine(unsigned long long a, unsigned long

> long b) {

>    return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);

> }

> 

> void read(unsigned long long a, unsigned long long b, unsigned

> long long *c) {

>    *c = combine(a, b);

> }

> 

> When compiled with -O2, read would result in:

> 

> read:

>    and   x5, x1, #0xffffffff

>    and   x4, x0, #0xffffffff00000000

>    orr   x4, x4, x5

>    str   x4, [x2]

>    ret

> 

> But with this patch results in:

> 

> read:

>    mov    x4, x0

>    bfxil    x4, x1, 0, 32

>    str    x4, [x2]

>    ret

> 

> Bootstrapped and regtested on aarch64-none-linux-gnu and

> aarch64-none-elf with no regressions.

> 

> 

> gcc/

> 2018-08-01  Sam Tebbs<sam.tebbs@arm.com>

> 

>          PR target/85628

>          * config/aarch64/aarch64.md (*aarch64_bfxil):

>          Define.

>          * config/aarch64/constraints.md (Ulc): Define

>          * config/aarch64/aarch64-protos.h

>          (aarch64_is_left_consecutive): Define.


Hm, I'm not very sure about the naming here; "left consecutive" isn't a
common phrase to denote the mask you're looking for (exact_log2 (-i) != -1
if I'm reading right), and is misleading 0x0000ffff is 'left consecutive'
too, just with zeroes rather than ones.

>          * config/aarch64/aarch64.c (aarch64_is_left_consecutive):

>          New function.

> 

> gcc/testsuite

> 2018-08-01  Sam Tebbs<sam.tebbs@arm.com>

> 

>          PR target/85628

>          * gcc.target/aarch64/combine_bfxil.c: New file.

>          * gcc.target/aarch64/combine_bfxil_2.c: New file.

> 


> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h

> index af5db9c595385f7586692258f750b6aceb3ed9c8..01d9e1bd634572fcfa60208ba4dc541805af5ccd 100644

> --- a/gcc/config/aarch64/aarch64-protos.h

> +++ b/gcc/config/aarch64/aarch64-protos.h

> @@ -574,4 +574,6 @@ rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);

>  

>  poly_uint64 aarch64_regmode_natural_size (machine_mode);

>  

> +bool aarch64_is_left_consecutive (HOST_WIDE_INT);

> +

>  #endif /* GCC_AARCH64_PROTOS_H */

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

> index fa01475aa9ee579b6a3b2526295b622157120660..3cfa51b15af3e241672f1383cf881c12a44494a5 100644

> --- a/gcc/config/aarch64/aarch64.c

> +++ b/gcc/config/aarch64/aarch64.c

> @@ -1454,6 +1454,14 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,

>      return SImode;

>  }

>  

> +/* Implement IS_LEFT_CONSECUTIVE.  Check if I's bits are consecutive


What is IS_LEFT_CONSECUTIVE - I don't see it elsewhere in the GCC code, so
what does the comment refer to implementing?

> +   ones from the MSB.  */

> +bool

> +aarch64_is_left_consecutive (HOST_WIDE_INT i)

> +{

> +  return (i | (i - 1)) == HOST_WIDE_INT_M1;


exact_log2(-i) != HOST_WIDE_INT_M1?

I don't have issues with the rest of the patch; but please try to find a more
descriptive name for this.

Thanks,
James
Sam Tebbs Aug. 30, 2018, 3:53 p.m. | #3
On 08/28/2018 11:53 PM, James Greenhalgh wrote:
> Hm, I'm not very sure about the naming here; "left consecutive" isn't a

> common phrase to denote the mask you're looking for (exact_log2 (-i) != -1

> if I'm reading right), and is misleading 0x0000ffff is 'left consecutive'

> too, just with zeroes rather than ones.

>

I think you're right about it not being the best naming. Do you have any 
suggestions for a better name?
>> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h

>> index af5db9c595385f7586692258f750b6aceb3ed9c8..01d9e1bd634572fcfa60208ba4dc541805af5ccd 100644

>> --- a/gcc/config/aarch64/aarch64-protos.h

>> +++ b/gcc/config/aarch64/aarch64-protos.h

>> @@ -574,4 +574,6 @@ rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);

>>   

>>   poly_uint64 aarch64_regmode_natural_size (machine_mode);

>>   

>> +bool aarch64_is_left_consecutive (HOST_WIDE_INT);

>> +

>>   #endif /* GCC_AARCH64_PROTOS_H */

>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

>> index fa01475aa9ee579b6a3b2526295b622157120660..3cfa51b15af3e241672f1383cf881c12a44494a5 100644

>> --- a/gcc/config/aarch64/aarch64.c

>> +++ b/gcc/config/aarch64/aarch64.c

>> @@ -1454,6 +1454,14 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,

>>       return SImode;

>>   }

>>   

>> +/* Implement IS_LEFT_CONSECUTIVE.  Check if I's bits are consecutive

> What is IS_LEFT_CONSECUTIVE - I don't see it elsewhere in the GCC code, so

> what does the comment refer to implementing?

Thanks for pointing out this mistake, it should read 
"AARCH64_IS_LEFT_CONSECUTIVE" to refer to the function definition in 
aarch64-protos.h. This will of course change once a better name is 
thought of.

Thanks,
Sam
Kyrill Tkachov Aug. 31, 2018, 10:59 a.m. | #4
On 30/08/18 16:53, Sam Tebbs wrote:
>

>

> On 08/28/2018 11:53 PM, James Greenhalgh wrote:

> > Hm, I'm not very sure about the naming here; "left consecutive" isn't a

> > common phrase to denote the mask you're looking for (exact_log2 (-i) != -1

> > if I'm reading right), and is misleading 0x0000ffff is 'left consecutive'

> > too, just with zeroes rather than ones.

> >

> I think you're right about it not being the best naming. Do you have any

> suggestions for a better name?


Naming things is hard... :(

How about aarch64_hi_bits_all_ones_p ?

Kyrill

> >> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h

> >> index af5db9c595385f7586692258f750b6aceb3ed9c8..01d9e1bd634572fcfa60208ba4dc541805af5ccd 100644

> >> --- a/gcc/config/aarch64/aarch64-protos.h

> >> +++ b/gcc/config/aarch64/aarch64-protos.h

> >> @@ -574,4 +574,6 @@ rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);

> >>

> >>   poly_uint64 aarch64_regmode_natural_size (machine_mode);

> >>

> >> +bool aarch64_is_left_consecutive (HOST_WIDE_INT);

> >> +

> >>   #endif /* GCC_AARCH64_PROTOS_H */

> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

> >> index fa01475aa9ee579b6a3b2526295b622157120660..3cfa51b15af3e241672f1383cf881c12a44494a5 100644

> >> --- a/gcc/config/aarch64/aarch64.c

> >> +++ b/gcc/config/aarch64/aarch64.c

> >> @@ -1454,6 +1454,14 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,

> >>       return SImode;

> >>   }

> >>

> >> +/* Implement IS_LEFT_CONSECUTIVE.  Check if I's bits are consecutive

> > What is IS_LEFT_CONSECUTIVE - I don't see it elsewhere in the GCC code, so

> > what does the comment refer to implementing?

> Thanks for pointing out this mistake, it should read

> "AARCH64_IS_LEFT_CONSECUTIVE" to refer to the function definition in

> aarch64-protos.h. This will of course change once a better name is

> thought of.

>

> Thanks,

> Sam
Sam Tebbs Aug. 31, 2018, 3:27 p.m. | #5
On 08/31/2018 11:59 AM, Kyrill Tkachov wrote:

>

> On 30/08/18 16:53, Sam Tebbs wrote:

>>

>>

>> On 08/28/2018 11:53 PM, James Greenhalgh wrote:

>> > Hm, I'm not very sure about the naming here; "left consecutive" 

>> isn't a

>> > common phrase to denote the mask you're looking for (exact_log2 

>> (-i) != -1

>> > if I'm reading right), and is misleading 0x0000ffff is 'left 

>> consecutive'

>> > too, just with zeroes rather than ones.

>> >

>> I think you're right about it not being the best naming. Do you have any

>> suggestions for a better name?

>

> Naming things is hard... :(

>

> How about aarch64_hi_bits_all_ones_p ?

>

> Kyrill


That sounds good! Although I'm confused about what the 'p' means.
Kyrill Tkachov Aug. 31, 2018, 3:37 p.m. | #6
On 31/08/18 16:27, Sam Tebbs wrote:
> On 08/31/2018 11:59 AM, Kyrill Tkachov wrote:

>

> >

> > On 30/08/18 16:53, Sam Tebbs wrote:

> >>

> >>

> >> On 08/28/2018 11:53 PM, James Greenhalgh wrote:

> >> > Hm, I'm not very sure about the naming here; "left consecutive"

> >> isn't a

> >> > common phrase to denote the mask you're looking for (exact_log2

> >> (-i) != -1

> >> > if I'm reading right), and is misleading 0x0000ffff is 'left

> >> consecutive'

> >> > too, just with zeroes rather than ones.

> >> >

> >> I think you're right about it not being the best naming. Do you have any

> >> suggestions for a better name?

> >

> > Naming things is hard... :(

> >

> > How about aarch64_hi_bits_all_ones_p ?

> >

> > Kyrill

>

> That sounds good! Although I'm confused about what the 'p' means.

>


It stands for "predicate" meaning a boolean function with no side-effects.
It's the preferred way to name these kinds of functions in GCC (though I can't seem to find the documentation mandate for it).
IIRC it's a remnant from the LISP days. You'll find a lot of *_p functions in GCC.

Kyrill
Alexander Monakov Aug. 31, 2018, 3:51 p.m. | #7
On Fri, 31 Aug 2018, Kyrill Tkachov wrote:
> > That sounds good! Although I'm confused about what the 'p' means.

> 

> It stands for "predicate" meaning a boolean function with no side-effects.

> It's the preferred way to name these kinds of functions in GCC (though I can't

> seem to find the documentation mandate for it).

> IIRC it's a remnant from the LISP days. You'll find a lot of *_p functions in

> GCC.


It's mentioned in the jargon file as "The -P convention" (see e.g. at
https://www.dourish.com/goodies/jargon.html ).

Alexander
Sam Tebbs Aug. 31, 2018, 4:38 p.m. | #8
On 08/28/2018 11:53 PM, James Greenhalgh wrote:
> On Wed, Aug 01, 2018 at 10:07:23AM -0500, Sam Tebbs wrote:

>> +   ones from the MSB.  */

>> +bool

>> +aarch64_is_left_consecutive (HOST_WIDE_INT i)

>> +{

>> +  return (i | (i - 1)) == HOST_WIDE_INT_M1;

> exact_log2(-i) != HOST_WIDE_INT_M1?


I could change this but I'm not sure what benefit it would have over the 
original implementation. You're welcome to explain so as I could be 
missing something.
Sam Tebbs Sept. 4, 2018, 3:13 p.m. | #9
Hi James,

Thanks for the feedback. Here is an update with the changes you proposed 
and an updated changelog.

gcc/
2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

         PR target/85628
         * config/aarch64/aarch64.md (*aarch64_bfxil):
         Define.
         * config/aarch64/constraints.md (Ulc): Define
         * config/aarch64/aarch64-protos.h (aarch64_high_bits_all_ones_p):
         Define.
         * config/aarch64/aarch64.c (aarch64_high_bits_all_ones_p): New function.

gcc/testsuite
2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

         PR target/85628
         * gcc.target/aarch64/combine_bfxil.c: New file.
         * gcc.target/aarch64/combine_bfxil_2.c: New file.


On 08/28/2018 11:53 PM, James Greenhalgh wrote:
> Hm, I'm not very sure about the naming here; "left consecutive" isn't a

> common phrase to denote the mask you're looking for (exact_log2 (-i) != -1

> if I'm reading right), and is misleading 0x0000ffff is 'left consecutive'

> too, just with zeroes rather than ones.

>

>>           * config/aarch64/aarch64.c (aarch64_is_left_consecutive):

>>           New function.

>>

>> gcc/testsuite

>> 2018-08-01  Sam Tebbs<sam.tebbs@arm.com>

>>

>>           PR target/85628

>>           * gcc.target/aarch64/combine_bfxil.c: New file.

>>           * gcc.target/aarch64/combine_bfxil_2.c: New file.

>>

>>

>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

>> index fa01475aa9ee579b6a3b2526295b622157120660..3cfa51b15af3e241672f1383cf881c12a44494a5 100644

>> --- a/gcc/config/aarch64/aarch64.c

>> +++ b/gcc/config/aarch64/aarch64.c

>> @@ -1454,6 +1454,14 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,

>>       return SImode;

>>   }

>>   

>> +/* Implement IS_LEFT_CONSECUTIVE.  Check if I's bits are consecutive

> What is IS_LEFT_CONSECUTIVE - I don't see it elsewhere in the GCC code, so

> what does the comment refer to implementing?

>

>> +   ones from the MSB.  */

>> +bool

>> +aarch64_is_left_consecutive (HOST_WIDE_INT i)

>> +{

>> +  return (i | (i - 1)) == HOST_WIDE_INT_M1;

> exact_log2(-i) != HOST_WIDE_INT_M1?
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index ef95fc829b83886e2ff00e4664e31af916e99b0c..b43e70285b3e6c45b830ce3790c38307b2294f81 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -575,4 +575,6 @@ rtl_opt_pass *make_pass_track_speculation (gcc::context *);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
+bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1de76e075471acaa68584595023e3878b10538e2..352515e9fc92c7b14a631c76bbe3d25030174bcf 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1474,6 +1474,13 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,
     return SImode;
 }
 
+/* Return true if I's bits are consecutive ones from the MSB.  */
+bool
+aarch64_high_bits_all_ones_p (HOST_WIDE_INT i)
+{
+  return exact_log2(-i) != HOST_WIDE_INT_M1;
+}
+
 /* Implement TARGET_CONSTANT_ALIGNMENT.  Make strings word-aligned so
    that strcpy from constants will be faster.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 955769a64d2030839cdb337321a808626188190e..88f66104db31320389f05cdd5d161db9992a77b8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5336,6 +5336,31 @@
   [(set_attr "type" "rev")]
 )
 
+(define_insn "*aarch64_bfxil<mode>"
+  [(set (match_operand:GPI 0 "register_operand" "=r,r")
+    (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r,0")
+		    (match_operand:GPI 3 "const_int_operand" "n, Ulc"))
+	    (and:GPI (match_operand:GPI 2 "register_operand" "0,r")
+		    (match_operand:GPI 4 "const_int_operand" "Ulc, n"))))]
+  "(INTVAL (operands[3]) == ~INTVAL (operands[4]))
+  && (aarch64_high_bits_all_ones_p (INTVAL (operands[3]))
+    || aarch64_high_bits_all_ones_p (INTVAL (operands[4])))"
+  {
+    switch (which_alternative)
+    {
+      case 0:
+	operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[3])));
+	return "bfxil\\t%<w>0, %<w>1, 0, %3";
+      case 1:
+	operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[4])));
+	return "bfxil\\t%<w>0, %<w>2, 0, %3";
+      default:
+	gcc_unreachable ();
+    }
+  }
+  [(set_attr "type" "bfm")]
+)
+
 ;; There are no canonicalisation rules for the position of the lshiftrt, ashift
 ;; operations within an IOR/AND RTX, therefore we have two patterns matching
 ;; each valid permutation.
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 72cacdabdac52dcb40b480f7a5bfbf4997c742d8..31fc3eafd8bba03cc773e226223a6293c6dde8d4 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -172,6 +172,13 @@
   A constraint that matches the immediate constant -1."
   (match_test "op == constm1_rtx"))
 
+(define_constraint "Ulc"
+ "@internal
+ A constraint that matches a constant integer whose bits are consecutive ones
+ from the MSB."
+ (and (match_code "const_int")
+      (match_test "aarch64_high_bits_all_ones_p (ival)")))
+
 (define_constraint "Usv"
   "@internal
    A constraint that matches a VG-based constant that can be loaded by
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
new file mode 100644
index 0000000000000000000000000000000000000000..3bc1dd5b216477efe7494dbcdac7a5bf465af218
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -0,0 +1,98 @@
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps" } */
+
+extern void abort(void);
+
+unsigned long long
+combine_balanced (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);
+}
+
+unsigned long long
+combine_minimal (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xfffffffffffffffe) | (b & 0x0000000000000001);
+}
+
+unsigned long long
+combine_unbalanced (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xffffffffff000000ll) | (b & 0x0000000000ffffffll);
+}
+
+unsigned int
+combine_balanced_int (unsigned int a, unsigned int b)
+{
+  return (a & 0xffff0000ll) | (b & 0x0000ffffll);
+}
+
+unsigned int
+combine_unbalanced_int (unsigned int a, unsigned int b)
+{
+  return (a & 0xffffff00ll) | (b & 0x000000ffll);
+}
+
+__attribute__ ((noinline)) void
+foo (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d)
+{
+  *c = combine_minimal(a, b);
+  *d = combine_minimal(b, a);
+}
+
+__attribute__ ((noinline)) void
+foo2 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d)
+{
+  *c = combine_balanced (a, b);
+  *d = combine_balanced (b, a);
+}
+
+__attribute__ ((noinline)) void
+foo3 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d)
+{
+  *c = combine_unbalanced (a, b);
+  *d = combine_unbalanced (b, a);
+}
+
+void
+foo4 (unsigned int a, unsigned int b, unsigned int *c, unsigned int *d)
+{
+  *c = combine_balanced_int(a, b);
+  *d = combine_balanced_int(b, a);
+}
+
+void
+foo5 (unsigned int a, unsigned int b, unsigned int *c, unsigned int *d)
+{
+  *c = combine_unbalanced_int(a, b);
+  *d = combine_unbalanced_int(b, a);
+}
+
+int
+main(void)
+{
+  unsigned long long a = 0x0123456789ABCDEF, b = 0xFEDCBA9876543210, c, d;
+  foo3(a, b, &c, &d);
+  if(c != 0x0123456789543210) abort();
+  if(d != 0xfedcba9876abcdef) abort();
+  foo2(a, b, &c, &d);
+  if(c != 0x0123456776543210) abort();
+  if(d != 0xfedcba9889abcdef) abort();
+  foo(a, b, &c, &d);
+  if(c != 0x0123456789abcdee) abort();
+  if(d != 0xfedcba9876543211) abort();
+
+  unsigned int a2 = 0x01234567, b2 = 0xFEDCBA98, c2, d2;
+  foo4(a2, b2, &c2, &d2);
+  if(c2 != 0x0123ba98) abort();
+  if(d2 != 0xfedc4567) abort();
+  foo5(a2, b2, &c2, &d2);
+  if(c2 != 0x01234598) abort();
+  if(d2 != 0xfedcba67) abort();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "bfxil\\t" 10 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..0fc140443bc67bcf12b93d72b7970e095620021e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned long long
+combine_non_consecutive (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xfffffff200f00000ll) | (b & 0x00001000ffffffffll);
+}
+
+void
+foo4 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d) {
+  /* { dg-final { scan-assembler-not "bfxil\\t" } } */
+  *c = combine_non_consecutive (a, b);
+  *d = combine_non_consecutive (b, a);
+}
James Greenhalgh Sept. 11, 2018, 3:20 p.m. | #10
On Tue, Sep 04, 2018 at 10:13:43AM -0500, Sam Tebbs wrote:
> Hi James,

> 

> Thanks for the feedback. Here is an update with the changes you proposed 

> and an updated changelog.

> 

> gcc/

> 2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

> 

>          PR target/85628

>          * config/aarch64/aarch64.md (*aarch64_bfxil):

>          Define.

>          * config/aarch64/constraints.md (Ulc): Define

>          * config/aarch64/aarch64-protos.h (aarch64_high_bits_all_ones_p):

>          Define.

>          * config/aarch64/aarch64.c (aarch64_high_bits_all_ones_p): New function.

> 

> gcc/testsuite

> 2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

> 

>          PR target/85628

>          * gcc.target/aarch64/combine_bfxil.c: New file.

>          * gcc.target/aarch64/combine_bfxil_2.c: New file.

> 

> 


<snip>

> +/* Return true if I's bits are consecutive ones from the MSB.  */

> +bool

> +aarch64_high_bits_all_ones_p (HOST_WIDE_INT i)

> +{

> +  return exact_log2(-i) != HOST_WIDE_INT_M1;

> +}


You need a space in here between the function name and the bracket:

  exact_log2 (-i)


> +extern void abort(void);


The same comment applies multiple places in this file.

Likewise; if (

Otherwise, OK, please apply with those fixes.

Thanks,
James

> +unsigned long long

> +combine_balanced (unsigned long long a, unsigned long long b)

> +{

> +  return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);

> +}

> +

> +unsigned long long

> +combine_minimal (unsigned long long a, unsigned long long b)

> +{

> +  return (a & 0xfffffffffffffffe) | (b & 0x0000000000000001);

> +}

> +

> +unsigned long long

> +combine_unbalanced (unsigned long long a, unsigned long long b)

> +{

> +  return (a & 0xffffffffff000000ll) | (b & 0x0000000000ffffffll);

> +}

> +

> +unsigned int

> +combine_balanced_int (unsigned int a, unsigned int b)

> +{

> +  return (a & 0xffff0000ll) | (b & 0x0000ffffll);

> +}

> +

> +unsigned int

> +combine_unbalanced_int (unsigned int a, unsigned int b)

> +{

> +  return (a & 0xffffff00ll) | (b & 0x000000ffll);

> +}

> +

> +__attribute__ ((noinline)) void

> +foo (unsigned long long a, unsigned long long b, unsigned long long *c,

> +  unsigned long long *d)

> +{

> +  *c = combine_minimal(a, b);

> +  *d = combine_minimal(b, a);

> +}

> +

> +__attribute__ ((noinline)) void

> +foo2 (unsigned long long a, unsigned long long b, unsigned long long *c,

> +  unsigned long long *d)

> +{

> +  *c = combine_balanced (a, b);

> +  *d = combine_balanced (b, a);

> +}

> +

> +__attribute__ ((noinline)) void

> +foo3 (unsigned long long a, unsigned long long b, unsigned long long *c,

> +  unsigned long long *d)

> +{

> +  *c = combine_unbalanced (a, b);

> +  *d = combine_unbalanced (b, a);

> +}

> +

> +void

> +foo4 (unsigned int a, unsigned int b, unsigned int *c, unsigned int *d)

> +{

> +  *c = combine_balanced_int(a, b);

> +  *d = combine_balanced_int(b, a);

> +}

> +

> +void

> +foo5 (unsigned int a, unsigned int b, unsigned int *c, unsigned int *d)

> +{

> +  *c = combine_unbalanced_int(a, b);

> +  *d = combine_unbalanced_int(b, a);

> +}

> +

> +int

> +main(void)

> +{

> +  unsigned long long a = 0x0123456789ABCDEF, b = 0xFEDCBA9876543210, c, d;

> +  foo3(a, b, &c, &d);

> +  if(c != 0x0123456789543210) abort();

> +  if(d != 0xfedcba9876abcdef) abort();

> +  foo2(a, b, &c, &d);

> +  if(c != 0x0123456776543210) abort();

> +  if(d != 0xfedcba9889abcdef) abort();

> +  foo(a, b, &c, &d);

> +  if(c != 0x0123456789abcdee) abort();

> +  if(d != 0xfedcba9876543211) abort();

> +

> +  unsigned int a2 = 0x01234567, b2 = 0xFEDCBA98, c2, d2;

> +  foo4(a2, b2, &c2, &d2);

> +  if(c2 != 0x0123ba98) abort();

> +  if(d2 != 0xfedc4567) abort();

> +  foo5(a2, b2, &c2, &d2);

> +  if(c2 != 0x01234598) abort();

> +  if(d2 != 0xfedcba67) abort();

> +  return 0;

> +}

> +

> +/* { dg-final { scan-assembler-times "bfxil\\t" 10 } } */

> diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c

> new file mode 100644

> index 0000000000000000000000000000000000000000..0fc140443bc67bcf12b93d72b7970e095620021e

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c

> @@ -0,0 +1,16 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2" } */

> +

> +unsigned long long

> +combine_non_consecutive (unsigned long long a, unsigned long long b)

> +{

> +  return (a & 0xfffffff200f00000ll) | (b & 0x00001000ffffffffll);

> +}

> +

> +void

> +foo4 (unsigned long long a, unsigned long long b, unsigned long long *c,

> +  unsigned long long *d) {

> +  /* { dg-final { scan-assembler-not "bfxil\\t" } } */

> +  *c = combine_non_consecutive (a, b);

> +  *d = combine_non_consecutive (b, a);

> +}
Sam Tebbs Sept. 13, 2018, 9:25 a.m. | #11
On 09/11/2018 04:20 PM, James Greenhalgh wrote:
> On Tue, Sep 04, 2018 at 10:13:43AM -0500, Sam Tebbs wrote:

>> Hi James,

>>

>> Thanks for the feedback. Here is an update with the changes you proposed

>> and an updated changelog.

>>

>> gcc/

>> 2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

>>

>>           PR target/85628

>>           * config/aarch64/aarch64.md (*aarch64_bfxil):

>>           Define.

>>           * config/aarch64/constraints.md (Ulc): Define

>>           * config/aarch64/aarch64-protos.h (aarch64_high_bits_all_ones_p):

>>           Define.

>>           * config/aarch64/aarch64.c (aarch64_high_bits_all_ones_p): New function.

>>

>> gcc/testsuite

>> 2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

>>

>>           PR target/85628

>>           * gcc.target/aarch64/combine_bfxil.c: New file.

>>           * gcc.target/aarch64/combine_bfxil_2.c: New file.

>>

>>

> <snip>

>

>> +/* Return true if I's bits are consecutive ones from the MSB.  */

>> +bool

>> +aarch64_high_bits_all_ones_p (HOST_WIDE_INT i)

>> +{

>> +  return exact_log2(-i) != HOST_WIDE_INT_M1;

>> +}

> You need a space in here between the function name and the bracket:

>

>    exact_log2 (-i)

>

>

>> +extern void abort(void);

> The same comment applies multiple places in this file.

>

> Likewise; if (

>

> Otherwise, OK, please apply with those fixes.

>

> Thanks,

> James


Thanks for noticing that, here's the fixed version.

Sam
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index aae1db45ed69c14e306ccce056861a58d9acd834..b26e46f81a414bf71762527f84fd9ac38b81b829 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -624,4 +624,6 @@ rtl_opt_pass *make_pass_tag_collision_avoidance (gcc::context *);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
+bool aarch64_high_bits_all_ones_p (HOST_WIDE_INT);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d088ef7ee0f256ad0d4f59d2735121de2dd67eba..34acc58510110fbc2cb4abd19ec9d7a04bad3f4c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1432,6 +1432,13 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,
     return SImode;
 }
 
+/* Return true if I's bits are consecutive ones from the MSB.  */
+bool
+aarch64_high_bits_all_ones_p (HOST_WIDE_INT i)
+{
+  return exact_log2 (-i) != HOST_WIDE_INT_M1;
+}
+
 /* Implement TARGET_CONSTANT_ALIGNMENT.  Make strings word-aligned so
    that strcpy from constants will be faster.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 955769a64d2030839cdb337321a808626188190e..88f66104db31320389f05cdd5d161db9992a77b8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5336,6 +5336,31 @@
   [(set_attr "type" "rev")]
 )
 
+(define_insn "*aarch64_bfxil<mode>"
+  [(set (match_operand:GPI 0 "register_operand" "=r,r")
+    (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r,0")
+		    (match_operand:GPI 3 "const_int_operand" "n, Ulc"))
+	    (and:GPI (match_operand:GPI 2 "register_operand" "0,r")
+		    (match_operand:GPI 4 "const_int_operand" "Ulc, n"))))]
+  "(INTVAL (operands[3]) == ~INTVAL (operands[4]))
+  && (aarch64_high_bits_all_ones_p (INTVAL (operands[3]))
+    || aarch64_high_bits_all_ones_p (INTVAL (operands[4])))"
+  {
+    switch (which_alternative)
+    {
+      case 0:
+	operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[3])));
+	return "bfxil\\t%<w>0, %<w>1, 0, %3";
+      case 1:
+	operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[4])));
+	return "bfxil\\t%<w>0, %<w>2, 0, %3";
+      default:
+	gcc_unreachable ();
+    }
+  }
+  [(set_attr "type" "bfm")]
+)
+
 ;; There are no canonicalisation rules for the position of the lshiftrt, ashift
 ;; operations within an IOR/AND RTX, therefore we have two patterns matching
 ;; each valid permutation.
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 72cacdabdac52dcb40b480f7a5bfbf4997c742d8..31fc3eafd8bba03cc773e226223a6293c6dde8d4 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -172,6 +172,13 @@
   A constraint that matches the immediate constant -1."
   (match_test "op == constm1_rtx"))
 
+(define_constraint "Ulc"
+ "@internal
+ A constraint that matches a constant integer whose bits are consecutive ones
+ from the MSB."
+ (and (match_code "const_int")
+      (match_test "aarch64_high_bits_all_ones_p (ival)")))
+
 (define_constraint "Usv"
   "@internal
    A constraint that matches a VG-based constant that can be loaded by
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
new file mode 100644
index 0000000000000000000000000000000000000000..adb0582ed9d8207f7b52c8912d03345369747448
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -0,0 +1,98 @@
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps" } */
+
+extern void abort (void);
+
+unsigned long long
+combine_balanced (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);
+}
+
+unsigned long long
+combine_minimal (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xfffffffffffffffe) | (b & 0x0000000000000001);
+}
+
+unsigned long long
+combine_unbalanced (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xffffffffff000000ll) | (b & 0x0000000000ffffffll);
+}
+
+unsigned int
+combine_balanced_int (unsigned int a, unsigned int b)
+{
+  return (a & 0xffff0000ll) | (b & 0x0000ffffll);
+}
+
+unsigned int
+combine_unbalanced_int (unsigned int a, unsigned int b)
+{
+  return (a & 0xffffff00ll) | (b & 0x000000ffll);
+}
+
+__attribute__ ((noinline)) void
+foo (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d)
+{
+  *c = combine_minimal (a, b);
+  *d = combine_minimal (b, a);
+}
+
+__attribute__ ((noinline)) void
+foo2 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d)
+{
+  *c = combine_balanced (a, b);
+  *d = combine_balanced (b, a);
+}
+
+__attribute__ ((noinline)) void
+foo3 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d)
+{
+  *c = combine_unbalanced (a, b);
+  *d = combine_unbalanced (b, a);
+}
+
+void
+foo4 (unsigned int a, unsigned int b, unsigned int *c, unsigned int *d)
+{
+  *c = combine_balanced_int (a, b);
+  *d = combine_balanced_int (b, a);
+}
+
+void
+foo5 (unsigned int a, unsigned int b, unsigned int *c, unsigned int *d)
+{
+  *c = combine_unbalanced_int (a, b);
+  *d = combine_unbalanced_int (b, a);
+}
+
+int
+main (void)
+{
+  unsigned long long a = 0x0123456789ABCDEF, b = 0xFEDCBA9876543210, c, d;
+  foo3 (a, b, &c, &d);
+  if (c != 0x0123456789543210) abort ();
+  if (d != 0xfedcba9876abcdef) abort ();
+  foo2 (a, b, &c, &d);
+  if (c != 0x0123456776543210) abort ();
+  if (d != 0xfedcba9889abcdef) abort ();
+  foo (a, b, &c, &d);
+  if (c != 0x0123456789abcdee) abort ();
+  if (d != 0xfedcba9876543211) abort ();
+
+  unsigned int a2 = 0x01234567, b2 = 0xFEDCBA98, c2, d2;
+  foo4 (a2, b2, &c2, &d2);
+  if (c2 != 0x0123ba98) abort ();
+  if (d2 != 0xfedc4567) abort ();
+  foo5 (a2, b2, &c2, &d2);
+  if (c2 != 0x01234598) abort ();
+  if (d2 != 0xfedcba67) abort ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "bfxil\\t" 10 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..0fc140443bc67bcf12b93d72b7970e095620021e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned long long
+combine_non_consecutive (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xfffffff200f00000ll) | (b & 0x00001000ffffffffll);
+}
+
+void
+foo4 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d) {
+  /* { dg-final { scan-assembler-not "bfxil\\t" } } */
+  *c = combine_non_consecutive (a, b);
+  *d = combine_non_consecutive (b, a);
+}
Kyrill Tkachov Sept. 13, 2018, 9:49 a.m. | #12
On 13/09/18 10:25, Sam Tebbs wrote:
>

> On 09/11/2018 04:20 PM, James Greenhalgh wrote:

> > On Tue, Sep 04, 2018 at 10:13:43AM -0500, Sam Tebbs wrote:

> >> Hi James,

> >>

> >> Thanks for the feedback. Here is an update with the changes you proposed

> >> and an updated changelog.

> >>

> >> gcc/

> >> 2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

> >>

> >>           PR target/85628

> >>           * config/aarch64/aarch64.md (*aarch64_bfxil):

> >>           Define.

> >>           * config/aarch64/constraints.md (Ulc): Define

> >>           * config/aarch64/aarch64-protos.h (aarch64_high_bits_all_ones_p):

> >>           Define.

> >>           * config/aarch64/aarch64.c (aarch64_high_bits_all_ones_p): New function.

> >>

> >> gcc/testsuite

> >> 2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

> >>

> >>           PR target/85628

> >>           * gcc.target/aarch64/combine_bfxil.c: New file.

> >>           * gcc.target/aarch64/combine_bfxil_2.c: New file.

> >>

> >>

> > <snip>

> >

> >> +/* Return true if I's bits are consecutive ones from the MSB.  */

> >> +bool

> >> +aarch64_high_bits_all_ones_p (HOST_WIDE_INT i)

> >> +{

> >> +  return exact_log2(-i) != HOST_WIDE_INT_M1;

> >> +}

> > You need a space in here between the function name and the bracket:

> >

> >    exact_log2 (-i)

> >

> >

> >> +extern void abort(void);

> > The same comment applies multiple places in this file.

> >

> > Likewise; if (

> >

> > Otherwise, OK, please apply with those fixes.

> >

> > Thanks,

> > James

>

> Thanks for noticing that, here's the fixed version.

>


Thanks Sam, I've committed the patch on your behalf with r264264.
If you want to get write-after-approval access to the SVN repo to commit patches yourself in the future
please fill out the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi putting my address from the MAINTAINERS file as the approver.

Kyrill

> Sam
Christophe Lyon Sept. 18, 2018, 10 p.m. | #13
On Thu, 13 Sep 2018 at 11:49, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>

>

> On 13/09/18 10:25, Sam Tebbs wrote:

> >

> > On 09/11/2018 04:20 PM, James Greenhalgh wrote:

> > > On Tue, Sep 04, 2018 at 10:13:43AM -0500, Sam Tebbs wrote:

> > >> Hi James,

> > >>

> > >> Thanks for the feedback. Here is an update with the changes you proposed

> > >> and an updated changelog.

> > >>

> > >> gcc/

> > >> 2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

> > >>

> > >>           PR target/85628

> > >>           * config/aarch64/aarch64.md (*aarch64_bfxil):

> > >>           Define.

> > >>           * config/aarch64/constraints.md (Ulc): Define

> > >>           * config/aarch64/aarch64-protos.h (aarch64_high_bits_all_ones_p):

> > >>           Define.

> > >>           * config/aarch64/aarch64.c (aarch64_high_bits_all_ones_p): New function.

> > >>

> > >> gcc/testsuite

> > >> 2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

> > >>

> > >>           PR target/85628

> > >>           * gcc.target/aarch64/combine_bfxil.c: New file.

> > >>           * gcc.target/aarch64/combine_bfxil_2.c: New file.

> > >>

> > >>

> > > <snip>

> > >

> > >> +/* Return true if I's bits are consecutive ones from the MSB.  */

> > >> +bool

> > >> +aarch64_high_bits_all_ones_p (HOST_WIDE_INT i)

> > >> +{

> > >> +  return exact_log2(-i) != HOST_WIDE_INT_M1;

> > >> +}

> > > You need a space in here between the function name and the bracket:

> > >

> > >    exact_log2 (-i)

> > >

> > >

> > >> +extern void abort(void);

> > > The same comment applies multiple places in this file.

> > >

> > > Likewise; if (

> > >

> > > Otherwise, OK, please apply with those fixes.

> > >

> > > Thanks,

> > > James

> >

> > Thanks for noticing that, here's the fixed version.

> >

>

> Thanks Sam, I've committed the patch on your behalf with r264264.

> If you want to get write-after-approval access to the SVN repo to commit patches yourself in the future

> please fill out the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi putting my address from the MAINTAINERS file as the approver.

>


Hi,

You've probably already noticed by now since you fixed the
combine_bfi_1 issue introduced by this commit, but it add another
regression:
FAIL: gcc.target/aarch64/copysign-bsl.c scan-assembler b(sl|it|if)\tv[0-9]

Christophe

> Kyrill

>

> > Sam

>
Kyrill Tkachov Sept. 19, 2018, 9:31 a.m. | #14
Hi Christophe,

On 18/09/18 23:00, Christophe Lyon wrote:
> On Thu, 13 Sep 2018 at 11:49, Kyrill Tkachov

> <kyrylo.tkachov@foss.arm.com> wrote:

>>

>> On 13/09/18 10:25, Sam Tebbs wrote:

>>> On 09/11/2018 04:20 PM, James Greenhalgh wrote:

>>>> On Tue, Sep 04, 2018 at 10:13:43AM -0500, Sam Tebbs wrote:

>>>>> Hi James,

>>>>>

>>>>> Thanks for the feedback. Here is an update with the changes you proposed

>>>>> and an updated changelog.

>>>>>

>>>>> gcc/

>>>>> 2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

>>>>>

>>>>>            PR target/85628

>>>>>            * config/aarch64/aarch64.md (*aarch64_bfxil):

>>>>>            Define.

>>>>>            * config/aarch64/constraints.md (Ulc): Define

>>>>>            * config/aarch64/aarch64-protos.h (aarch64_high_bits_all_ones_p):

>>>>>            Define.

>>>>>            * config/aarch64/aarch64.c (aarch64_high_bits_all_ones_p): New function.

>>>>>

>>>>> gcc/testsuite

>>>>> 2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

>>>>>

>>>>>            PR target/85628

>>>>>            * gcc.target/aarch64/combine_bfxil.c: New file.

>>>>>            * gcc.target/aarch64/combine_bfxil_2.c: New file.

>>>>>

>>>>>

>>>> <snip>

>>>>

>>>>> +/* Return true if I's bits are consecutive ones from the MSB.  */

>>>>> +bool

>>>>> +aarch64_high_bits_all_ones_p (HOST_WIDE_INT i)

>>>>> +{

>>>>> +  return exact_log2(-i) != HOST_WIDE_INT_M1;

>>>>> +}

>>>> You need a space in here between the function name and the bracket:

>>>>

>>>>     exact_log2 (-i)

>>>>

>>>>

>>>>> +extern void abort(void);

>>>> The same comment applies multiple places in this file.

>>>>

>>>> Likewise; if (

>>>>

>>>> Otherwise, OK, please apply with those fixes.

>>>>

>>>> Thanks,

>>>> James

>>> Thanks for noticing that, here's the fixed version.

>>>

>> Thanks Sam, I've committed the patch on your behalf with r264264.

>> If you want to get write-after-approval access to the SVN repo to commit patches yourself in the future

>> please fill out the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi putting my address from the MAINTAINERS file as the approver.

>>

> Hi,

>

> You've probably already noticed by now since you fixed the

> combine_bfi_1 issue introduced by this commit, but it add another

> regression:

> FAIL: gcc.target/aarch64/copysign-bsl.c scan-assembler b(sl|it|if)\tv[0-9]


Yeah, that one is a bit more involved as it's an unexpected interaction with the copysign BSL pattern.
Would you be able to file a bugzilla issue to track it?

Thanks,
Kyrill

> Christophe

>

>> Kyrill

>>

>>> Sam
Christophe Lyon Sept. 20, 2018, 4:36 p.m. | #15
On Wed, 19 Sep 2018 at 11:31, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>

> Hi Christophe,

>

> On 18/09/18 23:00, Christophe Lyon wrote:

> > On Thu, 13 Sep 2018 at 11:49, Kyrill Tkachov

> > <kyrylo.tkachov@foss.arm.com> wrote:

> >>

> >> On 13/09/18 10:25, Sam Tebbs wrote:

> >>> On 09/11/2018 04:20 PM, James Greenhalgh wrote:

> >>>> On Tue, Sep 04, 2018 at 10:13:43AM -0500, Sam Tebbs wrote:

> >>>>> Hi James,

> >>>>>

> >>>>> Thanks for the feedback. Here is an update with the changes you proposed

> >>>>> and an updated changelog.

> >>>>>

> >>>>> gcc/

> >>>>> 2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

> >>>>>

> >>>>>            PR target/85628

> >>>>>            * config/aarch64/aarch64.md (*aarch64_bfxil):

> >>>>>            Define.

> >>>>>            * config/aarch64/constraints.md (Ulc): Define

> >>>>>            * config/aarch64/aarch64-protos.h (aarch64_high_bits_all_ones_p):

> >>>>>            Define.

> >>>>>            * config/aarch64/aarch64.c (aarch64_high_bits_all_ones_p): New function.

> >>>>>

> >>>>> gcc/testsuite

> >>>>> 2018-09-04  Sam Tebbs  <sam.tebbs@arm.com>

> >>>>>

> >>>>>            PR target/85628

> >>>>>            * gcc.target/aarch64/combine_bfxil.c: New file.

> >>>>>            * gcc.target/aarch64/combine_bfxil_2.c: New file.

> >>>>>

> >>>>>

> >>>> <snip>

> >>>>

> >>>>> +/* Return true if I's bits are consecutive ones from the MSB.  */

> >>>>> +bool

> >>>>> +aarch64_high_bits_all_ones_p (HOST_WIDE_INT i)

> >>>>> +{

> >>>>> +  return exact_log2(-i) != HOST_WIDE_INT_M1;

> >>>>> +}

> >>>> You need a space in here between the function name and the bracket:

> >>>>

> >>>>     exact_log2 (-i)

> >>>>

> >>>>

> >>>>> +extern void abort(void);

> >>>> The same comment applies multiple places in this file.

> >>>>

> >>>> Likewise; if (

> >>>>

> >>>> Otherwise, OK, please apply with those fixes.

> >>>>

> >>>> Thanks,

> >>>> James

> >>> Thanks for noticing that, here's the fixed version.

> >>>

> >> Thanks Sam, I've committed the patch on your behalf with r264264.

> >> If you want to get write-after-approval access to the SVN repo to commit patches yourself in the future

> >> please fill out the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi putting my address from the MAINTAINERS file as the approver.

> >>

> > Hi,

> >

> > You've probably already noticed by now since you fixed the

> > combine_bfi_1 issue introduced by this commit, but it add another

> > regression:

> > FAIL: gcc.target/aarch64/copysign-bsl.c scan-assembler b(sl|it|if)\tv[0-9]

>

> Yeah, that one is a bit more involved as it's an unexpected interaction with the copysign BSL pattern.

> Would you be able to file a bugzilla issue to track it?

>


Sure, this is: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87369


> Thanks,

> Kyrill

>

> > Christophe

> >

> >> Kyrill

> >>

> >>> Sam

>

Patch

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index af5db9c595385f7586692258f750b6aceb3ed9c8..01d9e1bd634572fcfa60208ba4dc541805af5ccd 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -574,4 +574,6 @@  rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
+bool aarch64_is_left_consecutive (HOST_WIDE_INT);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fa01475aa9ee579b6a3b2526295b622157120660..3cfa51b15af3e241672f1383cf881c12a44494a5 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1454,6 +1454,14 @@  aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,
     return SImode;
 }
 
+/* Implement IS_LEFT_CONSECUTIVE.  Check if I's bits are consecutive
+   ones from the MSB.  */
+bool
+aarch64_is_left_consecutive (HOST_WIDE_INT i)
+{
+  return (i | (i - 1)) == HOST_WIDE_INT_M1;
+}
+
 /* Implement TARGET_CONSTANT_ALIGNMENT.  Make strings word-aligned so
    that strcpy from constants will be faster.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e9c16f9697b766a5c56b6269a83b7276654c5668..ff2db4af38e16630daeada79afc604c4696abf82 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5305,6 +5305,31 @@ 
   [(set_attr "type" "rev")]
 )
 
+(define_insn "*aarch64_bfxil<mode>"
+  [(set (match_operand:GPI 0 "register_operand" "=r,r")
+    (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r,0")
+		    (match_operand:GPI 3 "const_int_operand" "n, Ulc"))
+	    (and:GPI (match_operand:GPI 2 "register_operand" "0,r")
+		    (match_operand:GPI 4 "const_int_operand" "Ulc, n"))))]
+  "(INTVAL (operands[3]) == ~INTVAL (operands[4]))
+  && (aarch64_is_left_consecutive (INTVAL (operands[3]))
+    || aarch64_is_left_consecutive (INTVAL (operands[4])))"
+  {
+    switch (which_alternative)
+    {
+      case 0:
+	operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[3])));
+	return "bfxil\\t%<w>0, %<w>1, 0, %3";
+      case 1:
+	operands[3] = GEN_INT (ctz_hwi (~INTVAL (operands[4])));
+	return "bfxil\\t%<w>0, %<w>2, 0, %3";
+      default:
+	gcc_unreachable ();
+    }
+  }
+  [(set_attr "type" "bfm")]
+)
+
 ;; There are no canonicalisation rules for the position of the lshiftrt, ashift
 ;; operations within an IOR/AND RTX, therefore we have two patterns matching
 ;; each valid permutation.
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 72cacdabdac52dcb40b480f7a5bfbf4997c742d8..5bae0b70bbd11013a9fb27ec19cf7467eb20135f 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -172,6 +172,13 @@ 
   A constraint that matches the immediate constant -1."
   (match_test "op == constm1_rtx"))
 
+(define_constraint "Ulc"
+ "@internal
+ A constraint that matches a constant integer whose bits are consecutive ones
+ from the MSB."
+ (and (match_code "const_int")
+      (match_test "aarch64_is_left_consecutive (ival)")))
+
 (define_constraint "Usv"
   "@internal
    A constraint that matches a VG-based constant that can be loaded by
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
new file mode 100644
index 0000000000000000000000000000000000000000..3bc1dd5b216477efe7494dbcdac7a5bf465af218
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -0,0 +1,98 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps" } */
+
+extern void abort(void);
+
+unsigned long long
+combine_balanced (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);
+}
+
+unsigned long long
+combine_minimal (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xfffffffffffffffe) | (b & 0x0000000000000001);
+}
+
+unsigned long long
+combine_unbalanced (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xffffffffff000000ll) | (b & 0x0000000000ffffffll);
+}
+
+unsigned int
+combine_balanced_int (unsigned int a, unsigned int b)
+{
+  return (a & 0xffff0000ll) | (b & 0x0000ffffll);
+}
+
+unsigned int
+combine_unbalanced_int (unsigned int a, unsigned int b)
+{
+  return (a & 0xffffff00ll) | (b & 0x000000ffll);
+}
+
+__attribute__ ((noinline)) void
+foo (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d)
+{
+  *c = combine_minimal(a, b);
+  *d = combine_minimal(b, a);
+}
+
+__attribute__ ((noinline)) void
+foo2 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d)
+{
+  *c = combine_balanced (a, b);
+  *d = combine_balanced (b, a);
+}
+
+__attribute__ ((noinline)) void
+foo3 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d)
+{
+  *c = combine_unbalanced (a, b);
+  *d = combine_unbalanced (b, a);
+}
+
+void
+foo4 (unsigned int a, unsigned int b, unsigned int *c, unsigned int *d)
+{
+  *c = combine_balanced_int(a, b);
+  *d = combine_balanced_int(b, a);
+}
+
+void
+foo5 (unsigned int a, unsigned int b, unsigned int *c, unsigned int *d)
+{
+  *c = combine_unbalanced_int(a, b);
+  *d = combine_unbalanced_int(b, a);
+}
+
+int
+main(void)
+{
+  unsigned long long a = 0x0123456789ABCDEF, b = 0xFEDCBA9876543210, c, d;
+  foo3(a, b, &c, &d);
+  if(c != 0x0123456789543210) abort();
+  if(d != 0xfedcba9876abcdef) abort();
+  foo2(a, b, &c, &d);
+  if(c != 0x0123456776543210) abort();
+  if(d != 0xfedcba9889abcdef) abort();
+  foo(a, b, &c, &d);
+  if(c != 0x0123456789abcdee) abort();
+  if(d != 0xfedcba9876543211) abort();
+
+  unsigned int a2 = 0x01234567, b2 = 0xFEDCBA98, c2, d2;
+  foo4(a2, b2, &c2, &d2);
+  if(c2 != 0x0123ba98) abort();
+  if(d2 != 0xfedc4567) abort();
+  foo5(a2, b2, &c2, &d2);
+  if(c2 != 0x01234598) abort();
+  if(d2 != 0xfedcba67) abort();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "bfxil\\t" 10 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c
new file mode 100644
index 0000000000000000000000000000000000000000..0fc140443bc67bcf12b93d72b7970e095620021e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned long long
+combine_non_consecutive (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xfffffff200f00000ll) | (b & 0x00001000ffffffffll);
+}
+
+void
+foo4 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d) {
+  /* { dg-final { scan-assembler-not "bfxil\\t" } } */
+  *c = combine_non_consecutive (a, b);
+  *d = combine_non_consecutive (b, a);
+}