rl78 anddi3 improvement

Message ID 000701d37033$f3449f00$d9cddd00$@renesas.com
State New
Headers show
Series
  • rl78 anddi3 improvement
Related show

Commit Message

Sebastian Perta Dec. 8, 2017, 2:50 p.m.
Hello,

The following patch improves code size for 64 bit and for RL78:
it emits a library function call instead of emitting code for  the 64 bit
min for every single time.
The and function which was added in libgcc is hand written, so more optimal
than what GCC generates.

The change can easily be seen on the following test case:
unsigned long long my_anddi3(unsigned long long x, unsigned long long y){ 
return x & y;
}
I did not add this to the regression as it very simple and there are test
cases in the regression which test this, for example
gcc.dg/torture/stackalign/alloca-1.c  and
gcc.dg/torture/stackalign/vararg-2.c.
Regression test is OK, tested with the following command:
make -k check-gcc RUNTESTFLAGS=--target_board=rl78-sim

Please let me know if this is OK, Thank you!
Sebastian

Comments

Jeff Law Dec. 8, 2017, 3:33 p.m. | #1
On 12/08/2017 07:50 AM, Sebastian Perta wrote:
> Hello,

> 

> The following patch improves code size for 64 bit and for RL78:

> it emits a library function call instead of emitting code for  the 64 bit

> min for every single time.

> The and function which was added in libgcc is hand written, so more optimal

> than what GCC generates.

> 

> The change can easily be seen on the following test case:

> unsigned long long my_anddi3(unsigned long long x, unsigned long long y){ 

> return x & y;

> }

> I did not add this to the regression as it very simple and there are test

> cases in the regression which test this, for example

> gcc.dg/torture/stackalign/alloca-1.c  and

> gcc.dg/torture/stackalign/vararg-2.c.

> Regression test is OK, tested with the following command:

> make -k check-gcc RUNTESTFLAGS=--target_board=rl78-sim

> 

> Please let me know if this is OK, Thank you!

> Sebastian

> 

> Index: gcc/ChangeLog

> ===================================================================

> --- gcc/ChangeLog	(revision 255511)

> +++ gcc/ChangeLog	(working copy)

> @@ -1,3 +1,7 @@

> +2017-12-08  Sebastian Perta  <sebastian.perta@renesas.com>

> +

> +	* config/rl78/rl78.md: New define_expand "anddi3".

So I think you're ultimately far better off determining why GCC does not
generate efficient code for 64bit logicals on the rl78 target.

Once you start defining these kind of operations you'll often end up
defining more and more rather than addressing the core underlying issues.

Furthermore, defining multi-word operations will inhibit optimizations
that would have applied had the multi-word operation been broken down
into word-sized operations.  I've seen this time and time again when
32bit targets define DImode operations.  So not only will you end up
with a lot of these patterns, you then end up defining lots of special
cases for them to catch some of hte missed optimizations.

DJ has the final call here, but I think you'd be better off looking
deeper and trying to understand how to get better code for multi-word
operations without defining multi-word libcall expanders like this.

jeff
Sebastian Perta Dec. 11, 2017, 3:35 p.m. | #2
Hello Jeff,

Thank you for your comments.

>>So I think you're ultimately far better off determining why GCC does not

>>generate efficient code for 64bit logicals on the rl78 target.

I totally agree with you, this is why:
1. I have another patch: I define_expand movdi in which I instruct GCC to use 16 bit movw instead of movw, with this patch applied on the latest revision I reduce the code size of this function (my_anddi3) from 323 bytes to 245 bytes. I'm just waiting on the regression to finish I will post this patch.
2. I am working very hard for quite some time to improve/replace the "devirtualization" pass (see pass_rl78_devirt in rl78.c). I am working on a solution which will generate very similar code what I wrote in ___adddi3 and will also allow me to also change the calling convention to be efficient (similar to what the RL78 commercial compilers use) but unfortunately I still a long way from being finished (as it is quite difficult). I think DJ can explain much better why he needed to do things this way (add this pass and the *_virt and *_real variants for the instructions) in the first place.

However if you look closely at the patch you will see the that I put the following condition for the availability of the expand:
+  "optimize_size"
The idea behind this is the following:
Compared to the commercial RL78 compilers GCC is quite far behind (even 2x-3x bigger). When comparing the output code I observed the commercial compilers I saw they use quite extensively code merging techniques.
For GCC I found some work on this on a branch which didn't make to master (https://www.gnu.org/software/gcc/projects/cfo.html). I have ported this a while back to 4.9.2 but I didn't get significant code size improvement (I think I will give this another try after I finish point 2 above)
So I decided then to continue doing things this way which finally gave me some really good results (improved the code size by 30% on average, even more than 50% in some cases).
So even if/when I finish with point 2 above I think I will still like to have things done this way as they improve code size significantly.

I hope this explanation is satisfactory to you  as I have other patches (not only for 64 bit operations) which make use of the same idea.

Best Regards,
Sebastian


[https://www2.renesas.eu/media/email/unicef_2017.jpg]

This Christmas, instead of sending out cards, Renesas Electronics Europe have decided to support Unicef with a donation. For further details click here<https://www.unicef.org/> to find out about the valuable work they do, helping children all over the world.
We would like to take this opportunity to wish you a Merry Christmas and a prosperous New Year.



Renesas Electronics Europe Ltd, Dukes Meadow, Millboard Road, Bourne End, Buckinghamshire, SL8 5FH, UK. Registered in England & Wales under Registered No. 04586709.

Patch

Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog	(revision 255511)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,7 @@ 
+2017-12-08  Sebastian Perta  <sebastian.perta@renesas.com>
+
+	* config/rl78/rl78.md: New define_expand "anddi3".
+	
 2017-12-08  Martin Jambor  <mjambor@suse.cz>
 
 	PR tree-optimization/83141
Index: gcc/config/rl78/rl78.md
===================================================================
--- gcc/config/rl78/rl78.md	(revision 255511)
+++ gcc/config/rl78/rl78.md	(working copy)
@@ -718,3 +718,13 @@ 
   [(set_attr "valloc" "macax")
    (set_attr "is_g13_muldiv_insn" "yes")]
 )
+
+(define_expand "anddi3"
+ [(set (match_operand:DI          0 "nonimmediate_operand" "")
+	(and:DI (match_operand:DI 1 "general_operand"      "")
+		 (match_operand:DI    2 "general_operand"      "")))
+   ]
+  "optimize_size"
+  "rl78_emit_libcall (\"__anddi3\", AND, DImode, DImode, 3, operands);
+   DONE;"
+)
Index: libgcc/ChangeLog
===================================================================
--- libgcc/ChangeLog	(revision 255511)
+++ libgcc/ChangeLog	(working copy)
@@ -1,3 +1,8 @@ 
+2017-12-08  Sebastian Perta  <sebastian.perta@renesas.com>
+
+	* config/rl78/anddi3.S: New assembly file.
+	* config/rl78/t-rl78: Added anddi3.S to LIB2ADD.
+	
 2017-11-30  Michael Meissner  <meissner@linux.vnet.ibm.com>
 
 	* config/rs6000/_mulkc3.c (__mulkc3): Add forward declaration.
Index: libgcc/config/rl78/anddi3.S
===================================================================
--- libgcc/config/rl78/anddi3.S	(nonexistent)
+++ libgcc/config/rl78/anddi3.S	(working copy)
@@ -0,0 +1,66 @@ 
+;   Copyright (C) 2017 Free Software Foundation, Inc.
+;   Contributed by Sebastian Perta.
+; 
+; This file is free software; you can redistribute it and/or modify it
+; under the terms of the GNU General Public License as published by the
+; Free Software Foundation; either version 3, or (at your option) any
+; later version.
+; 
+; This file is distributed in the hope that it will be useful, but
+; WITHOUT ANY WARRANTY; without even the implied warranty of
+; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+; General Public License for more details.
+; 
+; Under Section 7 of GPL version 3, you are granted additional
+; permissions described in the GCC Runtime Library Exception, version
+; 3.1, as published by the Free Software Foundation.
+;
+; You should have received a copy of the GNU General Public License and
+; a copy of the GCC Runtime Library Exception along with this program;
+; see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+; <http://www.gnu.org/licenses/>.
+
+
+#include "vregs.h"
+
+    .text
+
+START_FUNC ___anddi3
+
+    movw  hl, sp
+
+    mov   a, [hl+4]
+    and   a, [hl+12]
+    mov   r8, a
+
+    mov   a, [hl+5]
+    and   a, [hl+13]
+    mov   r9, a
+
+    mov   a, [hl+6]
+    and   a, [hl+14]
+    mov   r10, a
+
+    mov   a, [hl+7]
+    and   a, [hl+15]
+    mov   r11, a
+
+    mov   a, [hl+8]
+    and   a, [hl+16]
+    mov   r12, a
+
+    mov   a, [hl+9]
+    and   a, [hl+17]
+    mov   r13, a
+
+    mov   a, [hl+10]
+    and   a, [hl+18]
+    mov   r14, a
+
+    mov   a, [hl+11]
+    and   a, [hl+19]
+    mov   r15, a
+
+    ret
+
+END_FUNC ___anddi3
Index: libgcc/config/rl78/t-rl78
===================================================================
--- libgcc/config/rl78/t-rl78	(revision 255511)
+++ libgcc/config/rl78/t-rl78	(working copy)
@@ -32,7 +32,8 @@ 
 	$(srcdir)/config/rl78/fpmath-sf.S \
 	$(srcdir)/config/rl78/cmpsi2.S \
 	$(srcdir)/config/rl78/adddi3.S \
-	$(srcdir)/config/rl78/subdi3.S
+	$(srcdir)/config/rl78/subdi3.S \
+	$(srcdir)/config/rl78/anddi3.S
 
 LIB2FUNCS_EXCLUDE = _clzhi2 _clzsi2 _ctzhi2 _ctzsi2 \
   _popcounthi2 _popcountsi2 \