ld: output section alignment and empty section.

Message ID f8f1491e-5693-0db5-5397-e87fe0a6f44e@adacore.com
State New
Headers show
Series
  • ld: output section alignment and empty section.
Related show

Commit Message

KONRAD Frederic March 5, 2020, 1:55 p.m.
Hi,

I just figured out that the following linker script hunk:

   . = ALIGN(0x4);
   .rodata :
   {
     . = ALIGN(0x4);
   }

is creating an empty .rodata output section while I was expecting it not to be
emited.

This is because we flag ". = ALIGN(xxx)" as SEC_KEEP in exp_fold_tree_1.

Dropping that for ". = ALIGN(constant)" seems to fix the behavior:

            else if ((abfd->flags & D_PAGED) != 0
@@ -5784,7 +5786,8 @@ assign_file_positions_for_load_sections (bfd *abfd,

                if (adjust != 0
                    && (s_start < p_end
-                     || p_end < p_start))
+                     || p_end < p_start)
+                  && sec->size)
                  {
                    _bfd_error_handler
                      /* xgettext:c-format */

What do you think?

Best Regards,
Fred

Comments

Alan Modra March 10, 2020, 6:42 a.m. | #1
On Thu, Mar 05, 2020 at 02:55:44PM +0100, KONRAD Frederic wrote:
> Hi,

> 

> I just figured out that the following linker script hunk:

> 

>   . = ALIGN(0x4);

>   .rodata :

>   {

>     . = ALIGN(0x4);

>   }

> 

> is creating an empty .rodata output section while I was expecting it not to be

> emited.

> 

> This is because we flag ". = ALIGN(xxx)" as SEC_KEEP in exp_fold_tree_1.

> 

> Dropping that for ". = ALIGN(constant)" seems to fix the behavior:


Yes it would, but you could also write the align inside .rodata using

  . = ALIGN(. != 0 ? 4 : 1);

That is one of the few forms of assigning to dot that ld recognises as
*not* resulting in "keep this section".  This is documented under node
"Output Section Discarding" in ld.info.

I don't think there is a good reason to change this behaviour as there
may even be scripts that rely on a simpler ALIGN to keep the section.

> Is that because strip_excluded_output_sections happens before the relaxation?

> Could that be a problem in the case of a constant alignment?


Yes and yes.  It is possible to create a script that assigns an
address (before the colon in an output section statement) or alignment
(after the colon) to a section less than the constant alignment in
some ". = ALIGN();" statement in that section, and the addresses of
prior sections conspire to make the alignment do nothing before
relaxation but to add padding when prior sections change.  The patches
that were reverted with e0a3af227e ran into something like that when I
tried to make the single operand ALIGN behave like the two operand
form.

> What do you think?


Fix your script.  :-)

-- 
Alan Modra
Australia Development Lab, IBM
KONRAD Frederic March 11, 2020, 2:14 p.m. | #2
Hi Alan,

Le 3/10/20 à 7:42 AM, Alan Modra a écrit :
> On Thu, Mar 05, 2020 at 02:55:44PM +0100, KONRAD Frederic wrote:

>> Hi,

>>

>> I just figured out that the following linker script hunk:

>>

>>    . = ALIGN(0x4);

>>    .rodata :

>>    {

>>      . = ALIGN(0x4);

>>    }

>>

>> is creating an empty .rodata output section while I was expecting it not to be

>> emited.

>>

>> This is because we flag ". = ALIGN(xxx)" as SEC_KEEP in exp_fold_tree_1.

>>

>> Dropping that for ". = ALIGN(constant)" seems to fix the behavior:

> 

> Yes it would, but you could also write the align inside .rodata using

> 

>    . = ALIGN(. != 0 ? 4 : 1);

> 

> That is one of the few forms of assigning to dot that ld recognises as

> *not* resulting in "keep this section".  This is documented under node

> "Output Section Discarding" in ld.info.


Arg I missed this..

> 

> I don't think there is a good reason to change this behaviour as there

> may even be scripts that rely on a simpler ALIGN to keep the section.

> 

>> Is that because strip_excluded_output_sections happens before the relaxation?

>> Could that be a problem in the case of a constant alignment?

> 

> Yes and yes.  It is possible to create a script that assigns an

> address (before the colon in an output section statement) or alignment

> (after the colon) to a section less than the constant alignment in

> some ". = ALIGN();" statement in that section, and the addresses of

> prior sections conspire to make the alignment do nothing before

> relaxation but to add padding when prior sections change.  The patches

> that were reverted with e0a3af227e ran into something like that when I

> tried to make the single operand ALIGN behave like the two operand

> form.

> 

>> What do you think?

> 

> Fix your script.  :-)

> 


Agreed I tried the above ". = ALIGN(. != 0 ? 4 : 1)" and it seems to do the job.

Thanks you very much for your input!
Fangrui Song March 12, 2020, 2:55 a.m. | #3
On 2020-03-10, Alan Modra wrote:
>On Thu, Mar 05, 2020 at 02:55:44PM +0100, KONRAD Frederic wrote:

>> Hi,

>>

>> I just figured out that the following linker script hunk:

>>

>>   . = ALIGN(0x4);

>>   .rodata :

>>   {

>>     . = ALIGN(0x4);

>>   }

>>

>> is creating an empty .rodata output section while I was expecting it not to be

>> emited.

>>

>> This is because we flag ". = ALIGN(xxx)" as SEC_KEEP in exp_fold_tree_1.

>>

>> Dropping that for ". = ALIGN(constant)" seems to fix the behavior:

>

>Yes it would, but you could also write the align inside .rodata using

>

>  . = ALIGN(. != 0 ? 4 : 1);

>

>That is one of the few forms of assigning to dot that ld recognises as

>*not* resulting in "keep this section".  This is documented under node

>"Output Section Discarding" in ld.info.

>

>I don't think there is a good reason to change this behaviour as there

>may even be scripts that rely on a simpler ALIGN to keep the section.

>

>> Is that because strip_excluded_output_sections happens before the relaxation?

>> Could that be a problem in the case of a constant alignment?

>

>Yes and yes.  It is possible to create a script that assigns an

>address (before the colon in an output section statement) or alignment

>(after the colon) to a section less than the constant alignment in

>some ". = ALIGN();" statement in that section, and the addresses of

>prior sections conspire to make the alignment do nothing before

>relaxation but to add padding when prior sections change.  The patches

>that were reverted with e0a3af227e ran into something like that when I

>tried to make the single operand ALIGN behave like the two operand

>form.

>

>> What do you think?

>

>Fix your script.  :-)


So the rule is:

SECTIONS {
   .not_created : { . = ALIGN(. != 0 ? 4 : 1); }
   .created : { . = ALIGN(4); }
}

I think it'd be nice to simply make the two output section descriptions create empty sections.

SECTIONS {
   .not_created : { PROVIDE(unreferenced = .); }

   .created0 : { . = ALIGN(. != 0 ? 4 : 1); }
   .created1 : { . = ALIGN(4); }
   .created2 : { PROVIDE(referenced = .); }
}

Patch

diff --git a/ld/ldexp.c b/ld/ldexp.c
index 23ee5c5..60a5d67 100644
--- a/ld/ldexp.c
+++ b/ld/ldexp.c
@@ -962,30 +962,43 @@  is_dot_ne_0 (const etree_type *tree)
   }

   /* Return true if TREE is ". = . + 0" or ". = . + sym" where sym is an
      absolute constant with value 0 defined in a linker script.  */

   static bfd_boolean
   is_dot_plus_0 (const etree_type *tree)
   {
     return (tree->type.node_class == etree_binary
   	  && tree->type.node_code == '+'
   	  && is_dot (tree->binary.lhs)
   	  && (is_value (tree->binary.rhs, 0)
   	      || is_sym_value (tree->binary.rhs, 0)));
   }

+/* Return true if TREE is "ALIGN (constant)".  */
+static bfd_boolean
+is_align_constant (const etree_type *tree)
+{
+  if (tree->type.node_class == etree_unary
+      && tree->type.node_code == ALIGN_K)
+    {
+      tree = tree->unary.child;
+      return tree->type.node_class == etree_value;
+    }
+  return FALSE;
+}
+
   /* Return true if TREE is "ALIGN (. != 0 ? some_expression : 1)".  */

   static bfd_boolean
   is_align_conditional (const etree_type *tree)
   {
     if (tree->type.node_class == etree_unary
         && tree->type.node_code == ALIGN_K)
       {
         tree = tree->unary.child;
         return (tree->type.node_class == etree_trinary
   	      && is_dot_ne_0 (tree->trinary.cond)
   	      && is_value (tree->trinary.rhs, 1));
       }
     return FALSE;
   }
@@ -1054,30 +1067,31 @@  exp_fold_tree_1 (etree_type *tree)
   	      exp_fold_tree_1 (tree->assign.src);
   	      expld.assigning_to_dot = FALSE;

   	      /* If we are assigning to dot inside an output section
   		 arrange to keep the section, except for certain
   		 expressions that evaluate to zero.  We ignore . = 0,
   		 . = . + 0, and . = ALIGN (. != 0 ? expr : 1).
   		 We can't ignore all expressions that evaluate to zero
   		 because an otherwise empty section might have padding
   		 added by an alignment expression that changes with
   		 relaxation.  Such a section might have zero size
   		 before relaxation and so be stripped incorrectly.  */
   	      if (expld.phase == lang_mark_phase_enum
   		  && expld.section != bfd_abs_section_ptr
   		  && expld.section != bfd_und_section_ptr
+		  && !is_align_constant (tree->assign.src)
   		  && !(expld.result.valid_p
   		       && expld.result.value == 0
   		       && (is_value (tree->assign.src, 0)
   			   || is_sym_value (tree->assign.src, 0)
   			   || is_dot_plus_0 (tree->assign.src)
   			   || is_align_conditional (tree->assign.src))))
   		expld.section->flags |= SEC_KEEP;

   	      if (!expld.result.valid_p
   		  || expld.section == bfd_und_section_ptr)
   		{
   		  if (expld.phase != lang_mark_phase_enum)
   		    einfo (_("%F%P:%pS invalid assignment to"
   			     " location counter\n"), tree);
   		}

--------------------------------------------------

Only the comment above which comes from:

commit e0a3af227ee0602ae69320fd6f931c363f14975b
Author: Alan Modra <amodra@gmail.com>
Date:   Thu Aug 6 15:56:34 2015 +0930

     Revert ALIGN changes

     Reverts a2c59f28 and e474ab13.  Since the unary form of ALIGN only
     references "dot" implicitly, there isn't really a strong argument for
     making ALIGN use a relative value when inside an output section.

         * ldexp.c (align_dot_val): Delete.
         (fold_unary <ALIGN_K, NEXT>): Revert 2015-07-10 change.
         (is_align_conditional): Revert 2015-07-20 change.
         (exp_fold_tree_1): Likewise, but keep expanded comment.
         * scripttempl/elf.sc (.ldata, .bss): Revert 2015-07-20 change.
         * ld.texinfo (<ALIGN>): Correct description.

kinds of worry me:

    /* If we are assigning to dot inside an output section
       arrange to keep the section, except for certain
       expressions that evaluate to zero.  We ignore . = 0,
       . = . + 0, and . = ALIGN (. != 0 ? expr : 1).
       We can't ignore all expressions that evaluate to zero
       because an otherwise empty section might have padding
       added by an alignment expression that changes with
       relaxation.  Such a section might have zero size
       before relaxation and so be stripped incorrectly.  */

Is that because strip_excluded_output_sections happens before the relaxation?
Could that be a problem in the case of a constant alignment?


The reason why I'm trying to remove those empty sections is that there are some
cases where later in the link process it confuses the program header, ie in my
case: .rodata is considered as overlapping with .srodata so it creates two
PT_LOAD segments in the Program Header:

architecture: riscv:rv64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x0000000080000000

Program Header:
      LOAD off    0x0000000000001000 vaddr 0x0000000080000000 paddr\
0x0000000080000000 align 2**12
           filesz 0x00000000000002b4 memsz 0x00000000000002b4 flags r-x
      LOAD off    0x00000000000022b0 vaddr 0x00000000800002b0 paddr\
   0x00000000800002b0 align 2**12
           filesz 0x0000000000000034 memsz 0x0000000000001040 flags rw-

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
    0 .host_target_interface 00000010  0000000000000000  0000000000000000
000022e4  2**0
                    CONTENTS, READONLY
    1 .text         000002b0  0000000080000000  0000000080000000  00001000  2**1
                    CONTENTS, ALLOC, LOAD, READONLY, CODE
    2 .rodata       00000000  00000000800002b0  00000000800002b0  000022b0  2**0
                    ALLOC
    3 .srodata      00000004  00000000800002b0  00000000800002b0  000012b0  2**3
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    4 .data         00000000  00000000800002b4  00000000800002b4  000022b4  2**0
                    CONTENTS, ALLOC, LOAD, DATA
    5 .sdata        00000028  00000000800002b8  00000000800002b8  000022b8  2**3
                    CONTENTS, ALLOC, LOAD, DATA
    6 .sdata2       00000004  00000000800002e0  00000000800002e0  000022e0  2**0
                    CONTENTS, ALLOC, LOAD, READONLY, DATA

And then when loading this file .srodata get's overwritten by .rodata..

An other possible fix would be to disregard empty sections in
_bfd_elf_map_sections_to_segments:

diff --git a/bfd/elf.c b/bfd/elf.c
index fcd84d2..c37aa23 100644
--- a/bfd/elf.c
+++ b/bfd/elf.c
@@ -4844,11 +4844,13 @@  _bfd_elf_map_sections_to_segments (bfd *abfd, struct
bfd_link_info *info)
                   segment.  */
                new_segment = TRUE;
              }
-         else if (hdr->lma < last_hdr->lma + last_size
+         else if ((hdr->lma < last_hdr->lma + last_size
                     || last_hdr->lma + last_size < last_hdr->lma)
+                   && hdr->size)
              {
                /* If this section has a load address that makes it overlap
-                the previous section, then we need a new segment.  */
+                the previous section, and this section is not empty then we
+                 need a new segment.  */
                new_segment = TRUE;
              }