impredictable alignment on ARM

Message ID oreeus8zcn.fsf@livre.home
State New
Headers show
Series
  • impredictable alignment on ARM
Related show

Commit Message

Alexandre Oliva Feb. 18, 2020, 1:21 a.m.
Consider the following asm input:

.thumb
.text
ldr r1, 0f
0f: .word 0x12345678

In this case, we report the word is misaligned and fail, though the
section is aligned to 2-byte boundaries, so the word *might* be properly
aligned, after all, if only the previous linked section enabled the text
section above to start 2 bytes after a 4-byte aligned address.  Anyway,
we probably don't want to worry about this case.

However, I think we should be concerned about the converse case:

.thumb
.text
ldr r1, 0f
ldr r2, 1f
0f: .word 0x01234567
1f: .word 0x89abcdef
nop

We do NOT report an error here, but if this text segment gets placed at
a 2-byte offset from a 4-byte aligned address (e.g., link the object
file in twice), the second pair will have misaligned words, and the
PC-relative offsets will resolve to aligned words that contain only part
of the word to be loaded.


The following patchlet arranges for us to complain when the target of
such an ldr doesn't ensure the expected alignment.  However, it's not
quite enough to solve the general problem.  Consider:

.thumb
.text
ldr sp, 0f
0f: .word 0x80000000

This extended form of ldr takes 4 bytes, and it doesn't require nor
ensure the target word to be aligned to a 4-byte boundary.  It just so
happens that, if it's not aligned, the value loaded into the register is
a rotated version of the word containing the misaligned address.

I'm not sure it would be appropriate for us to reject potentially
misaligned words: there might be (obfuscated) code intended to detect
and behave differently depending on whether it ends up at an even or odd
half-word.

However, I think it would be nice for us to at least warn that the code
might behave differently depending on the actual alignment it gets.  I'm
thinking something as simple as tracking the max natural alignment used
in each segment, and warning of potental linker-induced behavior changes
if that alignment is not recorded for the segment.

Tracking symbols with their natural alignments, and maybe even
references to them that expect a certain alignment, might be pushing too
far, on the one hand, and still missing relevant cases of separate
compilation or complex address computations on the other.

Is this something we might want to pursue, so as to warn even for e.g.:

.text
.word 0

but limited to once per segment?

Or should we track and warn about PC-relative addressing requirements,
so as to warn about segments containing PC-relative addressing (in
whatever forms) whose expected alignment exceeds the section's?  (this
could miss e.g. setting a register to PC + offset, and then loading a
word at the address stored in the register)

A combination of these?

Thoughts?


Here's the patchlet that covers only the PCrel-load-to-low-reg case:



-- 
Alexandre Oliva, freedom fighter   he/him   https://FSFLA.org/blogs/lxo
Free Software Evangelist           Stallman was right, but he's left :(
GNU Toolchain Engineer    FSMatrix: It was he who freed the first of us
FSF & FSFLA board member                The Savior shall return (true);

Comments

Richard Earnshaw (lists) March 3, 2020, 3:06 p.m. | #1
On 18/02/2020 01:21, Alexandre Oliva wrote:
> Consider the following asm input:

> 

> .thumb

> .text

> ldr r1, 0f

> 0f: .word 0x12345678

> 

> In this case, we report the word is misaligned and fail, though the

> section is aligned to 2-byte boundaries, so the word *might* be properly

> aligned, after all, if only the previous linked section enabled the text

> section above to start 2 bytes after a 4-byte aligned address.  Anyway,

> we probably don't want to worry about this case.

> 

> However, I think we should be concerned about the converse case:

> 

> .thumb

> .text

> ldr r1, 0f

> ldr r2, 1f

> 0f: .word 0x01234567

> 1f: .word 0x89abcdef

> nop

> 

> We do NOT report an error here, but if this text segment gets placed at

> a 2-byte offset from a 4-byte aligned address (e.g., link the object

> file in twice), the second pair will have misaligned words, and the

> PC-relative offsets will resolve to aligned words that contain only part

> of the word to be loaded.

> 

> 

> The following patchlet arranges for us to complain when the target of

> such an ldr doesn't ensure the expected alignment.  However, it's not

> quite enough to solve the general problem.  Consider:

> 

> .thumb

> .text

> ldr sp, 0f

> 0f: .word 0x80000000

> 

> This extended form of ldr takes 4 bytes, and it doesn't require nor

> ensure the target word to be aligned to a 4-byte boundary.  It just so

> happens that, if it's not aligned, the value loaded into the register is

> a rotated version of the word containing the misaligned address.

> 

> I'm not sure it would be appropriate for us to reject potentially

> misaligned words: there might be (obfuscated) code intended to detect

> and behave differently depending on whether it ends up at an even or odd

> half-word.

> 

> However, I think it would be nice for us to at least warn that the code

> might behave differently depending on the actual alignment it gets.  I'm

> thinking something as simple as tracking the max natural alignment used

> in each segment, and warning of potental linker-induced behavior changes

> if that alignment is not recorded for the segment.

> 

> Tracking symbols with their natural alignments, and maybe even

> references to them that expect a certain alignment, might be pushing too

> far, on the one hand, and still missing relevant cases of separate

> compilation or complex address computations on the other.

> 

> Is this something we might want to pursue, so as to warn even for e.g.:

> 

> .text

> .word 0

> 

> but limited to once per segment?

> 

> Or should we track and warn about PC-relative addressing requirements,

> so as to warn about segments containing PC-relative addressing (in

> whatever forms) whose expected alignment exceeds the section's?  (this

> could miss e.g. setting a register to PC + offset, and then loading a

> word at the address stored in the register)

> 

> A combination of these?

> 

> Thoughts?

> 


Hmm, interesting issue.  Given that we have .word and .4byte for all arm 
targets, where .4byte is intended to support deliberately unspecified 
alignment, I think it would be quite reasonable to warn whenever .word 
appears in a section with less than 4-byte alignment.


> 

> Here's the patchlet that covers only the PCrel-load-to-low-reg case:

> 

> --- gas/config/tc-arm.c	2020-01-28 12:50:34.000000000 +0100

> +++ gas/config/tc-arm.c	2020-02-18 00:13:11.486184639 +0100

> @@ -28755,6 +28755,9 @@

>   			  (((unsigned long) fixP->fx_frag->fr_address

>   			    + (unsigned long) fixP->fx_where) & ~3)

>   			  + (unsigned long) value);

> +	  else if (get_recorded_alignment (seg) < 2)

> +	    as_warn_where (fixP->fx_file, fixP->fx_line,

> +			   _("segment does not ensure enough alignment for target word"));

>   

>   	  if (value & ~0x3fc)

>   	    as_bad_where (fixP->fx_file, fixP->fx_line,

> 

> 


This is OK.

R.

Patch

--- gas/config/tc-arm.c	2020-01-28 12:50:34.000000000 +0100
+++ gas/config/tc-arm.c	2020-02-18 00:13:11.486184639 +0100
@@ -28755,6 +28755,9 @@ 
 			  (((unsigned long) fixP->fx_frag->fr_address
 			    + (unsigned long) fixP->fx_where) & ~3)
 			  + (unsigned long) value);
+	  else if (get_recorded_alignment (seg) < 2)
+	    as_warn_where (fixP->fx_file, fixP->fx_line,
+			   _("segment does not ensure enough alignment for target word"));
 
 	  if (value & ~0x3fc)
 	    as_bad_where (fixP->fx_file, fixP->fx_line,