RISC-V: Adjust __global_pointer$ value to reduce code size.

Message ID 20181015230137.7311-1-jimw@sifive.com
State New
Headers show
Series
  • RISC-V: Adjust __global_pointer$ value to reduce code size.
Related show

Commit Message

Jim Wilson Oct. 15, 2018, 11:01 p.m.
For RISC-V, the compiler doesn't emit gp-relative references, they are created
during relaxation.  Before this patch, the gp value is sdata+0x800 which is
guaranteed to cover as much as sdata+sbss as possible, but when they are
smaller than 0x1000 we end up wasting some of the gp-relative range.  This
changes the calculation to shift gp down when possible, while still covering
all of sdata+sbss.  This reduces code size for a trivial hello world program
by about 0.5%.  For a more realistic benchmark set, EEMBC Automotive-1.1, I'm
seeing about 0.1% code size reduction on average.  For large programs, this
has no effect.

This was tested with riscv{32,64}-{elf,linux} cross binutils build and check,
there were no regressions.  I did have to update one linker testcase to disable
relaxation because it assumed that variable accesses in data would not be
relaxed.  It was also tested with riscv32-elf and riscv64-linux cross gcc
build and check, and a linux+buildroot build and boot.

Committed.

Jim

	ld/
	* emulparams/elf32lriscv-defs.sh (DATA_START_SYMBOLS): New.
	(SDATA_START_SYMBOLS): Define __SDATA_BEGIN__.  Don't define
	__global_pointer$.
	(OTHER_END_SYMBOLS): New.  Define __global_pointer$.
	* testsuite/ld-riscv-elf/pcrel-lo-addend-2.d (#ld): Add --no-relax.
---
 ld/emulparams/elf32lriscv-defs.sh             | 13 ++++++++++++-
 ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d |  2 +-
 2 files changed, 13 insertions(+), 2 deletions(-)

-- 
2.17.1

Comments

Palmer Dabbelt Oct. 15, 2018, 11:43 p.m. | #1
On Mon, 15 Oct 2018 16:01:37 PDT (-0700), Jim Wilson wrote:
> For RISC-V, the compiler doesn't emit gp-relative references, they are created

> during relaxation.  Before this patch, the gp value is sdata+0x800 which is

> guaranteed to cover as much as sdata+sbss as possible, but when they are

> smaller than 0x1000 we end up wasting some of the gp-relative range.  This

> changes the calculation to shift gp down when possible, while still covering

> all of sdata+sbss.  This reduces code size for a trivial hello world program

> by about 0.5%.  For a more realistic benchmark set, EEMBC Automotive-1.1, I'm

> seeing about 0.1% code size reduction on average.  For large programs, this

> has no effect.

>

> This was tested with riscv{32,64}-{elf,linux} cross binutils build and check,

> there were no regressions.  I did have to update one linker testcase to disable

> relaxation because it assumed that variable accesses in data would not be

> relaxed.  It was also tested with riscv32-elf and riscv64-linux cross gcc

> build and check, and a linux+buildroot build and boot.

>

> Committed.

>

> Jim

>

> 	ld/

> 	* emulparams/elf32lriscv-defs.sh (DATA_START_SYMBOLS): New.

> 	(SDATA_START_SYMBOLS): Define __SDATA_BEGIN__.  Don't define

> 	__global_pointer$.

> 	(OTHER_END_SYMBOLS): New.  Define __global_pointer$.

> 	* testsuite/ld-riscv-elf/pcrel-lo-addend-2.d (#ld): Add --no-relax.

> ---

>  ld/emulparams/elf32lriscv-defs.sh             | 13 ++++++++++++-

>  ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d |  2 +-

>  2 files changed, 13 insertions(+), 2 deletions(-)

>

> diff --git a/ld/emulparams/elf32lriscv-defs.sh b/ld/emulparams/elf32lriscv-defs.sh

> index 91015d4471..5ac3b6023d 100644

> --- a/ld/emulparams/elf32lriscv-defs.sh

> +++ b/ld/emulparams/elf32lriscv-defs.sh

> @@ -30,8 +30,19 @@ TEXT_START_ADDR=0x10000

>  MAXPAGESIZE="CONSTANT (MAXPAGESIZE)"

>  COMMONPAGESIZE="CONSTANT (COMMONPAGESIZE)"

>

> -SDATA_START_SYMBOLS="${CREATE_SHLIB-__global_pointer$ = . + 0x800;}

> +DATA_START_SYMBOLS="${CREATE_SHLIB-__DATA_BEGIN__ = .;}"

> +

> +SDATA_START_SYMBOLS="${CREATE_SHLIB-__SDATA_BEGIN__ = .;}

>      *(.srodata.cst16) *(.srodata.cst8) *(.srodata.cst4) *(.srodata.cst2) *(.srodata .srodata.*)"

>

>  INITIAL_READONLY_SECTIONS=".interp         : { *(.interp) } ${CREATE_PIE-${INITIAL_READONLY_SECTIONS}}"

>  INITIAL_READONLY_SECTIONS="${RELOCATING+${CREATE_SHLIB-${INITIAL_READONLY_SECTIONS}}}"

> +

> +# We must cover as much of sdata as possible if it exists.  If sdata+bss is

> +# smaller than 0x1000 then we should start from bss end to cover as much of

> +# the program as possible.  But we can't allow gp to cover any of rodata, as

> +# the address of variables in rodata may change during relaxation, so we start

> +# from data in that case.


This smells like a bug lurking somewhere.  IIRC we avoided relaxing anything 
that points to a mergeable segment due to these sorts of constraints, are we 
just missing something?  I would consider it a bug if users can end up with 
incorrect relaxations, regardless of where they're targeting the GP.

> +OTHER_END_SYMBOLS="${CREATE_SHLIB-__BSS_END__ = .;

> +    __global_pointer$ = MIN(__SDATA_BEGIN__ + 0x800,

> +		            MAX(__DATA_BEGIN__ + 0x800, __BSS_END__ - 0x800));}"


FWIW, when drag racing linker relaxation I always ended up using an offset of 
0x7C0 instead of 0x800 because of that extra bit of pessimism in the linker 
relaxations that eats a few bytes of offset.  IIRC that saves you something 
like 3 instructions in Dhrystone's inner loop because it allows relaxing 
against the first few bytes of .sdata.

I don't know if it matters any more here, as if I understand how this is all 
working we'll end up with GP pointing quite a way before ".sdata+0x800" for 
small programs and for big programs the extra few symbols will just be noise.

> diff --git a/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d b/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d

> index 9e94c5c399..039de102c3 100644

> --- a/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d

> +++ b/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d

> @@ -1,5 +1,5 @@

>  #name: %pcrel_lo overflow with an addend

>  #source: pcrel-lo-addend-2.s

>  #as: -march=rv32ic

> -#ld: -melf32lriscv

> +#ld: -melf32lriscv --no-relax

>  #error: .*dangerous relocation: %pcrel_lo overflow with an addend
Jim Wilson Oct. 16, 2018, 12:39 a.m. | #2
On Mon, Oct 15, 2018 at 4:43 PM Palmer Dabbelt <palmer@sifive.com> wrote:
> This smells like a bug lurking somewhere.  IIRC we avoided relaxing anything

> that points to a mergeable segment due to these sorts of constraints, are we

> just missing something?  I would consider it a bug if users can end up with

> incorrect relaxations, regardless of where they're targeting the GP.


We avoid relaxation in mergeable and code sections.  In this case the
problem is rodata which is neither, but immediately follows .text and
hence variables in rodata can change address when .text shrinks.  We
probably never tried to relax references to rodata before.  I didn't
look at trying to handle this in relaxation itself, it was easier to
just fix gp to avoid overlapping rodata.  We should probably look at
the rodata problem later though.

> FWIW, when drag racing linker relaxation I always ended up using an offset of

> 0x7C0 instead of 0x800 because of that extra bit of pessimism in the linker

> relaxations that eats a few bytes of offset.  IIRC that saves you something

> like 3 instructions in Dhrystone's inner loop because it allows relaxing

> against the first few bytes of .sdata.


The current definition is .sdata+0x800 which makes everything .sdata
and later relaxable.  If you used .sdata+0x7c0, then you would get the
last 64 bytes of the .data section as relaxable.  My patch should do
this automatically for you if .sdata+.sbss is smaller than 0x1000.
Though if you have your own linker script you don't get any benefit
from this unless you update your linker script.;

> I don't know if it matters any more here, as if I understand how this is all

> working we'll end up with GP pointing quite a way before ".sdata+0x800" for

> small programs and for big programs the extra few symbols will just be noise.


For big programs there will be no extra symbols, because if
.sdata+.bss is 0x1000 or larger we get the exact same gp value as
before.  The smaller the program the more the benefit.  The larger the
program the less the benefit, with the benefit dropping to zero once
the program gets big enough.

Jim
Palmer Dabbelt Oct. 16, 2018, 1:20 a.m. | #3
On Mon, 15 Oct 2018 17:39:06 PDT (-0700), Jim Wilson wrote:
> On Mon, Oct 15, 2018 at 4:43 PM Palmer Dabbelt <palmer@sifive.com> wrote:

>> This smells like a bug lurking somewhere.  IIRC we avoided relaxing anything

>> that points to a mergeable segment due to these sorts of constraints, are we

>> just missing something?  I would consider it a bug if users can end up with

>> incorrect relaxations, regardless of where they're targeting the GP.

>

> We avoid relaxation in mergeable and code sections.  In this case the

> problem is rodata which is neither, but immediately follows .text and

> hence variables in rodata can change address when .text shrinks.  We

> probably never tried to relax references to rodata before.  I didn't

> look at trying to handle this in relaxation itself, it was easier to

> just fix gp to avoid overlapping rodata.  We should probably look at

> the rodata problem later though.


Ah, OK.  I guess I just assemed .rodata was mergeable because I assumed strings 
ended up in there.  Either way: it seems like a bug, but only one that you just 
found when writing this patch and not one introduced by this patch.  Certainly 
it shouldn't block a merge.

>> FWIW, when drag racing linker relaxation I always ended up using an offset of

>> 0x7C0 instead of 0x800 because of that extra bit of pessimism in the linker

>> relaxations that eats a few bytes of offset.  IIRC that saves you something

>> like 3 instructions in Dhrystone's inner loop because it allows relaxing

>> against the first few bytes of .sdata.

>

> The current definition is .sdata+0x800 which makes everything .sdata

> and later relaxable.  If you used .sdata+0x7c0, then you would get the

> last 64 bytes of the .data section as relaxable.  My patch should do

> this automatically for you if .sdata+.sbss is smaller than 0x1000.

> Though if you have your own linker script you don't get any benefit

> from this unless you update your linker script.;


Ah, yes, I guess that was the point: while ".sdata+0x800" should allow relaxing 
against a symbol at ".sdata+0", in practice we have a bit of pessimism in the 
relaxation code.  I'm actually afraid I don't understand exactly where it comes 
from, but if you look at "max_alignment" and "reserve_size" in 
"_bfd_riscv_relax_section()" you'll see there's a bit -- 0x7C0 seems a bit 
conservative, I'd guess that you can relax against ".sdata+0" down to 
"__global_pointer$=.sdata+0x7F0".

It actually looks like my misunderstanding of "max_alignment" is the source of 
this auipc+addend bug you're run in to: I appear to have just blindly copied 
this alignment pessimism (IIUC, this shrinks the relaxable region by the symbol 
size, which would avoid the overflows for LUI-based sequences) from the LUI 
relaxations into the AUIPC relaxations.  Now that I actually understand the 
code (or at least, I think I understand it) that seems completely bogus for the 
AUIPC case.

Sorry!

>> I don't know if it matters any more here, as if I understand how this is all

>> working we'll end up with GP pointing quite a way before ".sdata+0x800" for

>> small programs and for big programs the extra few symbols will just be noise.

>

> For big programs there will be no extra symbols, because if

> .sdata+.bss is 0x1000 or larger we get the exact same gp value as

> before.  The smaller the program the more the benefit.  The larger the

> program the less the benefit, with the benefit dropping to zero once

> the program gets big enough.


Yes, I agree -- and I think that's why this is really splitting hairs.  This 
really only matter when drag racing Dhrystone, which isn't worth the time -- 
and also certainly shouldn't block a merge :)

Patch

diff --git a/ld/emulparams/elf32lriscv-defs.sh b/ld/emulparams/elf32lriscv-defs.sh
index 91015d4471..5ac3b6023d 100644
--- a/ld/emulparams/elf32lriscv-defs.sh
+++ b/ld/emulparams/elf32lriscv-defs.sh
@@ -30,8 +30,19 @@  TEXT_START_ADDR=0x10000
 MAXPAGESIZE="CONSTANT (MAXPAGESIZE)"
 COMMONPAGESIZE="CONSTANT (COMMONPAGESIZE)"
 
-SDATA_START_SYMBOLS="${CREATE_SHLIB-__global_pointer$ = . + 0x800;}
+DATA_START_SYMBOLS="${CREATE_SHLIB-__DATA_BEGIN__ = .;}"
+
+SDATA_START_SYMBOLS="${CREATE_SHLIB-__SDATA_BEGIN__ = .;}
     *(.srodata.cst16) *(.srodata.cst8) *(.srodata.cst4) *(.srodata.cst2) *(.srodata .srodata.*)"
 
 INITIAL_READONLY_SECTIONS=".interp         : { *(.interp) } ${CREATE_PIE-${INITIAL_READONLY_SECTIONS}}"
 INITIAL_READONLY_SECTIONS="${RELOCATING+${CREATE_SHLIB-${INITIAL_READONLY_SECTIONS}}}"
+
+# We must cover as much of sdata as possible if it exists.  If sdata+bss is
+# smaller than 0x1000 then we should start from bss end to cover as much of
+# the program as possible.  But we can't allow gp to cover any of rodata, as
+# the address of variables in rodata may change during relaxation, so we start
+# from data in that case.
+OTHER_END_SYMBOLS="${CREATE_SHLIB-__BSS_END__ = .;
+    __global_pointer$ = MIN(__SDATA_BEGIN__ + 0x800,
+		            MAX(__DATA_BEGIN__ + 0x800, __BSS_END__ - 0x800));}"
diff --git a/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d b/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d
index 9e94c5c399..039de102c3 100644
--- a/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d
+++ b/ld/testsuite/ld-riscv-elf/pcrel-lo-addend-2.d
@@ -1,5 +1,5 @@ 
 #name: %pcrel_lo overflow with an addend
 #source: pcrel-lo-addend-2.s
 #as: -march=rv32ic
-#ld: -melf32lriscv
+#ld: -melf32lriscv --no-relax
 #error: .*dangerous relocation: %pcrel_lo overflow with an addend