[RFC,3/3] add r_debug multiple namespaces support

Message ID 20200626193228.1953-4-danielwa@cisco.com
State New
Headers show
Series
  • implement dlmopen hooks for gdb
Related show

Commit Message

Alejandro Colomar via Libc-alpha June 26, 2020, 7:32 p.m.
From: Conan C Huang <conhuang@cisco.com>


GLIBC Bugzilla: 15971
GDB Bugzilla: 11839

Glibc does not provide an interface for debugger to access libraries loaded in
multiple namespaces via dlmopen.

The current rtld-debugger interface is described in the file
elf/rtld-debugger-interface.txt under the "Standard debugger interface" heading.
This interface only provides access to the first link-map (LM_ID_BASE).

This solution converts r_debug into a linked-list, where each element correlates
to an unique namespace. Debugger (GDB) can traverse r_debug linked-list to
retrieve information for all loaded namespaces.
---
 elf/dl-debug.c | 13 ++++++++++---
 elf/link.h     |  4 ++++
 2 files changed, 14 insertions(+), 3 deletions(-)

-- 
2.17.1

Comments

Alejandro Colomar via Libc-alpha June 26, 2020, 9:05 p.m. | #1
* Daniel Walker via Libc-alpha:

> diff --git a/elf/link.h b/elf/link.h

> index 0048ad5d4d..5a42511636 100644

> --- a/elf/link.h

> +++ b/elf/link.h

> @@ -61,6 +61,10 @@ struct r_debug

>        } r_state;

>  

>      ElfW(Addr) r_ldbase;	/* Base address the linker is loaded at.  */

> +

> +    /* Link to next r_debug struct. Each r_debug struct represents a

> +       different namespace.  The first r_debug struct is the default.  */

> +    struct r_debug *next;

>    };

>  

>  /* This is the instance of that structure used by the dynamic linker.  */


How has this patch been tested?  I expect that it will cause an abilist
mismatch for the _r_debug symbol in the dynamic linker.

If we go this route to add this capability, I think we have to add a new
symbol version for the _r_debug symbol, and keep the old one at the
previous size.

How is your compatibility experience with the size and version change?
How many tools need updating before they work again?

A different approach would add another symbol (parallel to _r_debug) to
store this data.  This would avoid the need for any immediate tool
updates.

Thanks,
Florian
Alejandro Colomar via Libc-alpha June 26, 2020, 9:19 p.m. | #2
On 6/26/20 5:05 PM, Florian Weimer via Libc-alpha wrote:
> * Daniel Walker via Libc-alpha:

> 

>> diff --git a/elf/link.h b/elf/link.h

>> index 0048ad5d4d..5a42511636 100644

>> --- a/elf/link.h

>> +++ b/elf/link.h

>> @@ -61,6 +61,10 @@ struct r_debug

>>        } r_state;

>>  

>>      ElfW(Addr) r_ldbase;	/* Base address the linker is loaded at.  */

>> +

>> +    /* Link to next r_debug struct. Each r_debug struct represents a

>> +       different namespace.  The first r_debug struct is the default.  */

>> +    struct r_debug *next;

>>    };

>>  

>>  /* This is the instance of that structure used by the dynamic linker.  */

> 

> How has this patch been tested?  I expect that it will cause an abilist

> mismatch for the _r_debug symbol in the dynamic linker.

> 

> If we go this route to add this capability, I think we have to add a new

> symbol version for the _r_debug symbol, and keep the old one at the

> previous size.

> 

> How is your compatibility experience with the size and version change?

> How many tools need updating before they work again?

> 

> A different approach would add another symbol (parallel to _r_debug) to

> store this data.  This would avoid the need for any immediate tool

> updates.


I mention this in my response to the cover letter in this series.

This patch is probably unacceptable as-is because of application
expectations.

-- 
Cheers,
Carlos.
Alejandro Colomar via Libc-alpha June 26, 2020, 9:24 p.m. | #3
* Carlos O'Donell:

> On 6/26/20 5:05 PM, Florian Weimer via Libc-alpha wrote:

>> * Daniel Walker via Libc-alpha:

>> 

>>> diff --git a/elf/link.h b/elf/link.h

>>> index 0048ad5d4d..5a42511636 100644

>>> --- a/elf/link.h

>>> +++ b/elf/link.h

>>> @@ -61,6 +61,10 @@ struct r_debug

>>>        } r_state;

>>>  

>>>      ElfW(Addr) r_ldbase;	/* Base address the linker is loaded at.  */

>>> +

>>> +    /* Link to next r_debug struct. Each r_debug struct represents a

>>> +       different namespace.  The first r_debug struct is the default.  */

>>> +    struct r_debug *next;

>>>    };

>>>  

>>>  /* This is the instance of that structure used by the dynamic linker.  */

>> 

>> How has this patch been tested?  I expect that it will cause an abilist

>> mismatch for the _r_debug symbol in the dynamic linker.

>> 

>> If we go this route to add this capability, I think we have to add a new

>> symbol version for the _r_debug symbol, and keep the old one at the

>> previous size.

>> 

>> How is your compatibility experience with the size and version change?

>> How many tools need updating before they work again?

>> 

>> A different approach would add another symbol (parallel to _r_debug) to

>> store this data.  This would avoid the need for any immediate tool

>> updates.

>

> I mention this in my response to the cover letter in this series.


Your explanation there was truncated.

> This patch is probably unacceptable as-is because of application

> expectations.


But perhaps Cisco's experience shows that our worries are unfounded?

Thanks,
Florian
Alejandro Colomar via Libc-alpha June 26, 2020, 9:44 p.m. | #4
On 6/26/20 5:24 PM, Florian Weimer wrote:
> * Carlos O'Donell:

> 

>> On 6/26/20 5:05 PM, Florian Weimer via Libc-alpha wrote:

>>> * Daniel Walker via Libc-alpha:

>>>

>>>> diff --git a/elf/link.h b/elf/link.h

>>>> index 0048ad5d4d..5a42511636 100644

>>>> --- a/elf/link.h

>>>> +++ b/elf/link.h

>>>> @@ -61,6 +61,10 @@ struct r_debug

>>>>        } r_state;

>>>>  

>>>>      ElfW(Addr) r_ldbase;	/* Base address the linker is loaded at.  */

>>>> +

>>>> +    /* Link to next r_debug struct. Each r_debug struct represents a

>>>> +       different namespace.  The first r_debug struct is the default.  */

>>>> +    struct r_debug *next;

>>>>    };

>>>>  

>>>>  /* This is the instance of that structure used by the dynamic linker.  */

>>>

>>> How has this patch been tested?  I expect that it will cause an abilist

>>> mismatch for the _r_debug symbol in the dynamic linker.

>>>

>>> If we go this route to add this capability, I think we have to add a new

>>> symbol version for the _r_debug symbol, and keep the old one at the

>>> previous size.

>>>

>>> How is your compatibility experience with the size and version change?

>>> How many tools need updating before they work again?

>>>

>>> A different approach would add another symbol (parallel to _r_debug) to

>>> store this data.  This would avoid the need for any immediate tool

>>> updates.

>>

>> I mention this in my response to the cover letter in this series.

> 

> Your explanation there was truncated.


Truncated in which way?

>> This patch is probably unacceptable as-is because of application

>> expectations.

> 

> But perhaps Cisco's experience shows that our worries are unfounded?


That would be great!

I would want to see gdb changed to r_version > 0, and document that
the increasing r_version means changes that only expand the structure
and add new information while keeping old information backwards
compatible.

I'm not sure it would work to version _r_debug, since the debugger
is using DT_DEBUG and we only get to put one value in that
.dynamic entry.

-- 
Cheers,
Carlos.
Alejandro Colomar via Libc-alpha June 27, 2020, 1:21 a.m. | #5
On Fri, Jun 26, 2020 at 11:24:21PM +0200, Florian Weimer wrote:
> * Carlos O'Donell:

> 

> > On 6/26/20 5:05 PM, Florian Weimer via Libc-alpha wrote:

> >> * Daniel Walker via Libc-alpha:

> >> 

> >>> diff --git a/elf/link.h b/elf/link.h

> >>> index 0048ad5d4d..5a42511636 100644

> >>> --- a/elf/link.h

> >>> +++ b/elf/link.h

> >>> @@ -61,6 +61,10 @@ struct r_debug

> >>>        } r_state;

> >>>  

> >>>      ElfW(Addr) r_ldbase;	/* Base address the linker is loaded at.  */

> >>> +

> >>> +    /* Link to next r_debug struct. Each r_debug struct represents a

> >>> +       different namespace.  The first r_debug struct is the default.  */

> >>> +    struct r_debug *next;

> >>>    };

> >>>  

> >>>  /* This is the instance of that structure used by the dynamic linker.  */

> >> 

> >> How has this patch been tested?  I expect that it will cause an abilist

> >> mismatch for the _r_debug symbol in the dynamic linker.

> >> 

> >> If we go this route to add this capability, I think we have to add a new

> >> symbol version for the _r_debug symbol, and keep the old one at the

> >> previous size.

> >> 

> >> How is your compatibility experience with the size and version change?

> >> How many tools need updating before they work again?

> >> 

> >> A different approach would add another symbol (parallel to _r_debug) to

> >> store this data.  This would avoid the need for any immediate tool

> >> updates.

> >

> > I mention this in my response to the cover letter in this series.

> 

> Your explanation there was truncated.

> 

> > This patch is probably unacceptable as-is because of application

> > expectations.

> 

> But perhaps Cisco's experience shows that our worries are unfounded?


I don't know that we can confirm this. Per my understanding these changes have
been maintained as one off changes which haven't entered our product yet. We're
now currently trying to add this to our product. We use OpenEmbedded to build
an SDK which is then used in our product. The tooling in typical rebuilt
against any new glibc which we have, and we don't reuse binaries from the prior
builds which may have a different glibc (even different patch level).

Daniel
Florian Weimer June 27, 2020, 9:34 a.m. | #6
* Carlos O'Donell via Libc-alpha:

> Truncated in which way?


This part:

| Your proposed solution of bumping the version is unacceptable,
| and was last rejected by Roland McGrath. The problem is that
| when you bump the version the current 

> I'm not sure it would work to version _r_debug, since the debugger

> is using DT_DEBUG and we only get to put one value in that

> .dynamic entry.


The symbol version is needed to avoid problems due to copy relocations
if the symbol is referenced directly from the main program.  Without
that, the object could be truncated.  It's not a debugger
compatibility feature.
Alejandro Colomar via Libc-alpha June 28, 2020, 12:34 p.m. | #7
On 6/27/20 5:34 AM, Florian Weimer wrote:
> * Carlos O'Donell via Libc-alpha:

> 

>> Truncated in which way?

> 

> This part:

> 

> | Your proposed solution of bumping the version is unacceptable,

> | and was last rejected by Roland McGrath. The problem is that

> | when you bump the version the current 


Thanks. "It's Friday" is my only excuse.

I did provide some of the original links to the discussion.

Roland, as a steward at the time, was worried about exactly what
we see in gdb, which is "r_version != 1" may have made it into tooling.

We can test this. We can try to deploy a similar solution in Fedora Rawhide
and declare the semantics as we expect them to be.

That is to say that r_version == 1, is the entire structure as we have it,
and r_version == 2 *adds* but does not remove from the structure. Since
the data is maintained by the implementation and the caller is only
inspecting the data it should work.

>> I'm not sure it would work to version _r_debug, since the debugger

>> is using DT_DEBUG and we only get to put one value in that

>> .dynamic entry.

> 

> The symbol version is needed to avoid problems due to copy relocations

> if the symbol is referenced directly from the main program.  Without

> that, the object could be truncated.  It's not a debugger

> compatibility feature.


Correct, but this violates *how* you're supposed to use _r_debug.

If you have a static executable you can get away with referencing
_r_debug directly, but in that case symbol versions don't matter, and
you have whatever version you have at the time.

In the dynamic case it is different. The symbol should be looked up
via DT_DEBUG only which always points to the library-local address
of the data object (and the most recent version). In effect this
bypasses the COPY relocation?

If an application uses _r_debug, the symbol, directly, then they
should get a static copy via the COPY relocation, and it will not
be updated after that. Perhaps we can arrange for such an initial
_r_debug to indicate it's not active or initialized?

IMO the library should use a local-only reference to the _r_debug to
avoid going through the global reference.

I'm not keen to admit that a COPY reloc of _r_debug should work.

-- 
Cheers,
Carlos.
Alejandro Colomar via Libc-alpha June 29, 2020, 8:51 a.m. | #8
* Carlos O'Donell via Libc-alpha:

>>> I'm not sure it would work to version _r_debug, since the debugger

>>> is using DT_DEBUG and we only get to put one value in that

>>> .dynamic entry.

>> 

>> The symbol version is needed to avoid problems due to copy relocations

>> if the symbol is referenced directly from the main program.  Without

>> that, the object could be truncated.  It's not a debugger

>> compatibility feature.

>

> Correct, but this violates *how* you're supposed to use _r_debug.


If it is possible to link against it, we need to add the new symbol
version, in my opinion.

> In the dynamic case it is different. The symbol should be looked up

> via DT_DEBUG only which always points to the library-local address

> of the data object (and the most recent version). In effect this

> bypasses the COPY relocation?


How is this supposed to work if the dynamic linker does contain
DT_DEBUG?

I only observe DT_DEBUG in PIE binaries, but since the dynamic loader is
mapped at a random address even for ET_EXEC main programs, there must be
some other mechanism to locate it.

Thanks,
Florian

Patch

diff --git a/elf/dl-debug.c b/elf/dl-debug.c
index 4b3d3ad6ba..7eba477fd3 100644
--- a/elf/dl-debug.c
+++ b/elf/dl-debug.c
@@ -44,20 +44,27 @@  struct r_debug _r_debug;
 struct r_debug *
 _dl_debug_initialize (ElfW(Addr) ldbase, Lmid_t ns)
 {
-  struct r_debug *r;
+  struct r_debug *r, *rp;
 
   if (ns == LM_ID_BASE)
     r = &_r_debug;
   else
-    r = &GL(dl_ns)[ns]._ns_debug;
+    {
+      r = &GL(dl_ns)[ns]._ns_debug;
+      rp = &GL(dl_ns)[ns - 1]._ns_debug;
+      rp->next = r;
+      if (ns - 1 == LM_ID_BASE)
+        _r_debug.next = r;
+    }
 
   if (r->r_map == NULL || ldbase != 0)
     {
       /* Tell the debugger where to find the map of loaded objects.  */
-      r->r_version = 1	/* R_DEBUG_VERSION XXX */;
+      r->r_version = 3  /* R_DEBUG_VERSION XXX */;
       r->r_ldbase = ldbase ?: _r_debug.r_ldbase;
       r->r_map = (void *) GL(dl_ns)[ns]._ns_loaded;
       r->r_brk = (ElfW(Addr)) &_dl_debug_state;
+      r->next = NULL;
     }
 
   return r;
diff --git a/elf/link.h b/elf/link.h
index 0048ad5d4d..5a42511636 100644
--- a/elf/link.h
+++ b/elf/link.h
@@ -61,6 +61,10 @@  struct r_debug
       } r_state;
 
     ElfW(Addr) r_ldbase;	/* Base address the linker is loaded at.  */
+
+    /* Link to next r_debug struct. Each r_debug struct represents a
+       different namespace.  The first r_debug struct is the default.  */
+    struct r_debug *next;
   };
 
 /* This is the instance of that structure used by the dynamic linker.  */